Commit Graph

22 Commits

Author SHA1 Message Date
Wilco Dijkstra
3dc426b642 AArch64: Improve generic strlen
Improve performance by handling another 16 bytes before entering the loop.
Use ADDHN in the loop to avoid SHRN+FMOV when it terminates.  Change final
size computation to avoid increasing latency.  On Neoverse V1 performance
of the random strlen benchmark improves by 4.6%.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-08-07 14:58:46 +01:00
Paul Eggert
dff8da6b3e Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
Wilco Dijkstra
03c8ce5000 AArch64: Optimize strlen
Optimize strlen by unrolling the main loop.  Large strings are 64% faster on
modern CPUs.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-01-17 15:09:18 +00:00
Joseph Myers
6d7e8eda9b Update copyright dates with scripts/update-copyrights 2023-01-06 21:14:39 +00:00
Danila Kutenin
3c99806989 aarch64: Optimize string functions with shrn instruction
We found that string functions were using AND+ADDP
to find the nibble/syndrome mask but there is an easier
opportunity through `SHRN dst.8b, src.8h, 4` (shift
right every 2 bytes by 4 and narrow to 1 byte) and has
same latency on all SIMD ARMv8 targets as ADDP. There
are also possible gaps for memcmp but that's for
another patch.

We see 10-20% savings for small-mid size cases (<=128)
which are primary cases for general workloads.
2022-07-06 09:26:20 +01:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Szabolcs Nagy
45b1e17e91 aarch64: use PTR_ARG and SIZE_ARG instead of DELOUSE
DELOUSE was added to asm code to make them compatible with non-LP64
ABIs, but it is an unfortunate name and the code was not compatible
with ABIs where pointer and size_t are different. Glibc currently
only supports the LP64 ABI so these macros are not really needed or
tested, but for now the name is changed to be more meaningful instead
of removing them completely.

Some DELOUSE macros were dropped: clone, strlen and strnlen used it
unnecessarily.

The out of tree ILP32 patches are currently not maintained and will
likely need a rework to rebase them on top of the time64 changes.
2020-12-31 16:50:58 +00:00
Andrea Corallo
a365ac45b7 aarch64: MTE compatible strlen
Introduce an Arm MTE compatible strlen implementation.

The existing implementation assumes that any access to the pages in
which the string resides is safe.  This assumption is not true when
MTE is enabled.  This patch updates the algorithm to ensure that
accesses remain within the bounds of an MTE tag (16-byte chunks) and
improves overall performance on modern cores. On cores with less
efficient Advanced SIMD implementation such as Cortex-A53 it can
be slower.

Benchmarked on Cortex-A72, Cortex-A53, Neoverse N1.

Co-authored-by: Wilco Dijkstra <wilco.dijkstra@arm.com>
2020-06-09 09:21:11 +01:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
Siddhesh Poyarekar
436e4d5b96 [aarch64] Add an ASIMD variant of strlen for falkor
This variant of strlen uses vector loads and operations to reduce the
size of the code and also eliminate the non-ascii fallback.  This
works very well for falkor because of its two vector units and
efficient vector ops.  In the best case it reduces latency of cases in
bench-strlen by 48%, with gains throughout the benchmark.
strlen-walk also sees uniform gains in the 5%-15% range.

Overall the routine appears to work better than the stock one for falkor
regardless of the benchmark, length of string or cache state.

The same cannot be said of a53 and a72 though.  a53 performance was
greatly reduced and for a72 it was a bit of a mixed bag, slightly on the
negative side but I reckon it might be fast in some situations.

	* sysdeps/aarch64/strlen.S (__strlen): Rename to STRLEN.
	[!STRLEN](STRLEN): Set to __strlen.
	* sysdeps/aarch64/multiarch/strlen.c: New file.
	* sysdeps/aarch64/multiarch/strlen_generic.S: Likewise.
	* sysdeps/aarch64/multiarch/strlen_asimd.S: Likewise.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add strlen.
	* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add
	strlen_generic and strlen_asimd.

Reviewed-By: szabolcs.nagy@arm.com
CC: pinskia@gmail.com
2018-08-15 23:01:33 +05:30
Siddhesh Poyarekar
be64b1946b [aarch64] Fix value of MIN_PAGE_SIZE for testing
MIN_PAGE_SIZE is normally set to 4096 but for testing it can be set to
16 so that it exercises the page crossing code for every misaligned
access.  The value was set to 15, which is obviously wrong, so fixed
as obvious and tested.

	* sysdeps/aarch64/strlen.S [TEST_PAGE_CROSS](MIN_PAGE_SIZE):
	Fix value.
2018-08-08 22:47:17 +05:30
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
Joseph Myers
bfff8b1bec Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
Steve Ellcey
389d1f1b23 Partial ILP32 support for aarch64.
* sysdeps/aarch64/crti.S: Add include of sysdep.h.
	(call_weak_fn): Use PTR_REG to get correct reg name in ILP32.
	* sysdeps/aarch64/dl-irel.h: Add include of sysdep.h.
	(elf_irela): Use AARCH64_R macro to get correct relocation in ILP32.
	* sysdeps/aarch64/dl-machine.h: Add include of sysdep.h.
	(elf_machine_load_address, RTLD_START, RTLD_START_1, RTLD_START,
	elf_machine_type_class, ELF_MACHINE_JMP_SLOT, elf_machine_rela,
	elf_machine_lazy_rel): Add ifdef's for ILP32 support.
	* sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return,
	_dl_tlsdesc_return_lazy, _dl_tlsdesc_dynamic,
	_dl_tlsdesc_resolve_hold): Extend pointers in ILP32, use PTR_REG
	to get correct reg name for ILP32.
	* sysdeps/aarch64/dl-trampoline.S (ip01): New Macro.
	(RELA_SIZE): New Macro.
	(_dl_runtime_resolve, _dl_runtime_profile): Use new macros and PTR_REG
	to support ILP32.
	* sysdeps/aarch64/jmpbuf-unwind.h (_JMPBUF_CFA_UNWINDS_ADJ): Add
	cast for ILP32 mode.
	* sysdeps/aarch64/memcmp.S (memcmp): Extend arg pointers for ILP32 mode.
	* sysdeps/aarch64/memcpy.S (memmove, memcpy): Ditto.
	* sysdeps/aarch64/memset.S (__memset): Ditto.
	* sysdeps/aarch64/strchr.S (strchr): Ditto.
	* sysdeps/aarch64/strchrnul.S (__strchrnul): Ditto.
	* sysdeps/aarch64/strcmp.S (strcmp): Ditto.
	* sysdeps/aarch64/strcpy.S (strcpy): Ditto.
	* sysdeps/aarch64/strlen.S (__strlen): Ditto.
	* sysdeps/aarch64/strncmp.S (strncmp): Ditto.
	* sysdeps/aarch64/strnlen.S (strnlen): Ditto.
	* sysdeps/aarch64/strrchr.S (strrchr): Ditto.
	* sysdeps/unix/sysv/linux/aarch64/clone.S: Ditto.
	* sysdeps/unix/sysv/linux/aarch64/setcontext.S (__setcontext): Ditto.
	* sysdeps/unix/sysv/linux/aarch64/swapcontext.S (__swapcontext): Ditto.
	* sysdeps/aarch64/__longjmp.S (__longjmp): Extend pointers in ILP32,
	change PTR_MANGLE call to use register numbers instead of names.
	* sysdeps/unix/sysv/linux/aarch64/getcontext.S (__getcontext): Ditto.
	* sysdeps/aarch64/setjmp.S (__sigsetjmp): Extend arg pointers for
	ILP32 mode, change PTR_MANGLE calls to use register numbers.
	* sysdeps/aarch64/start.S (_start): Ditto.
	* sysdeps/aarch64/nptl/bits/pthreadtypes.h
	(__PTHREAD_RWLOCK_INT_FLAGS_SHARED): New define.
	(__SIZEOF_PTHREAD_ATTR_T, __SIZEOF_PTHREAD_MUTEX_T,
	__SIZEOF_PTHREAD_MUTEXATTR_T, __SIZEOF_PTHREAD_COND_T,
	__SIZEOF_PTHREAD_COND_COMPAT_T, __SIZEOF_PTHREAD_CONDATTR_T,
	__SIZEOF_PTHREAD_RWLOCK_T, __SIZEOF_PTHREAD_RWLOCKATTR_T,
	__SIZEOF_PTHREAD_BARRIER_T, __SIZEOF_PTHREAD_BARRIERATTR_T):
	Make defined values dependent on __ILP32__.
	* sysdeps/aarch64/nptl/bits/semaphore.h (__SIZEOF_SEM_T): Change define.
	(sem_t): Change __align type.
	* sysdeps/aarch64/sysdep.h (AARCH64_R, PTR_REG, PTR_LOG_SIZE, DELOUSE,
	PTR_SIZE): New Macros.
	(LDST_PCREL, LDST_GLOBAL) Update to use PTR_REG.
	* sysdeps/unix/sysv/linux/aarch64/bits/fcntl.h (O_LARGEFILE):
	Set when in ILP32 mode.
	(F_GETLK64, F_SETLK64, F_SETLKW64): Only set in LP64 mode.
	* sysdeps/unix/sysv/linux/aarch64/dl-cache.h (DL_CACHE_DEFAULT_ID):
	Set elf flags for ILP32.
	(add_system_dir): Set ILP32 library directories.
	* sysdeps/unix/sysv/linux/aarch64/init-first.c
	(_libc_vdso_platform_setup): Set minimum kernel version for ILP32.
	* sysdeps/unix/sysv/linux/aarch64/ldconfig.h
	(SYSDEP_KNOWN_INTERPRETER_NAMES): Add ILP32 names.
	* sysdeps/unix/sysv/linux/aarch64/sigcontextinfo.h (GET_PC, SET_PC):
	New Macros.
	* sysdeps/unix/sysv/linux/aarch64/sysdep.h: Handle ILP32 pointers.
2016-11-28 09:01:23 -08:00
Wilco Dijkstra
58ec4fb881 Add a simple rawmemchr implementation. Use strlen for rawmemchr(s, '\0') as it
is the fastest way to search for '\0'.  Otherwise use memchr with an infinite
size.  This is 3x faster on benchtests for large sizes.  Passes GLIBC tests.

	* sysdeps/aarch64/rawmemchr.S (__rawmemchr): New file.
	* sysdeps/aarch64/strlen.S (__strlen): Change to __strlen to avoid PLT.
2016-06-20 17:48:20 +01:00
Joseph Myers
f7a9f785e5 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
Wilco Dijkstra
c435989f52 Optimize the strlen implementation by using a page cross check and a fast check
for nul bytes which reverts to separate loop when a non-ASCII char is encountered.
Speedup on test-strlen is ~10%, long ASCII strings are processed ~60% faster,
and on random tests it is ~80% better.
2015-07-13 12:38:12 +01:00
Joseph Myers
b168057aaa Update copyright dates with scripts/update-copyrights. 2015-01-02 16:29:47 +00:00
Marcus Shawcroft
75eff3fe90 Relocate AArch64 from ports to libc.
This patch moves the AArch64 port to the main sysdeps hierarchy.  The
move is essentially:

  git mv ports/sysdeps/aarch64 sysdeps/aarch64
  git mv ports/sysdeps/unix/sysv/linux/aarch64 sysdeps/unix/sysv/linux/aarch64

The README is updated and I've updated ChangeLog.aarch64 along the
lines of the ARM move.  The AArch64 build has been tested to confirm
that there were no changes in objdump -dr output or the shared
objects.
2014-02-11 11:36:00 +00:00