Wilco Dijkstra
8ecb477ea1
AArch64: Remove memset-reg.h
...
Remove memset-reg.h by moving register definitions into the memset
implementations.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-09-10 14:18:03 +01:00
Wilco Dijkstra
cec3aef324
AArch64: Optimize memset
...
Improve small memsets by avoiding branches and use overlapping stores.
Use DC ZVA for copies over 128 bytes. Remove unnecessary code for ZVA sizes
other than 64 and 128. Performance of random memset benchmark improves by 24%
on Neoverse N1.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-09-09 15:30:00 +01:00
Paul Eggert
dff8da6b3e
Update copyright dates with scripts/update-copyrights
2024-01-01 10:53:40 -08:00
Wilco Dijkstra
3d7090f14b
AArch64: Add memset_zva64
...
Add a specialized memset for the common ZVA size of 64 to avoid the
overhead of reading the ZVA size. Since the code is identical to
__memset_falkor, remove the latter.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2023-11-13 16:50:44 +00:00
Wilco Dijkstra
9fd3409842
AArch64: Cleanup ifuncs
...
Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than
ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename
strlen_mte to strlen_generic. Remove rtld-memset.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-11-01 13:41:59 +00:00
Joseph Myers
6d7e8eda9b
Update copyright dates with scripts/update-copyrights
2023-01-06 21:14:39 +00:00
Paul Eggert
581c785bf3
Update copyright dates with scripts/update-copyrights
...
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Paul Eggert
2b778ceb40
Update copyright dates with scripts/update-copyrights
...
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Szabolcs Nagy
45b1e17e91
aarch64: use PTR_ARG and SIZE_ARG instead of DELOUSE
...
DELOUSE was added to asm code to make them compatible with non-LP64
ABIs, but it is an unfortunate name and the code was not compatible
with ABIs where pointer and size_t are different. Glibc currently
only supports the LP64 ABI so these macros are not really needed or
tested, but for now the name is changed to be more meaningful instead
of removing them completely.
Some DELOUSE macros were dropped: clone, strlen and strnlen used it
unnecessarily.
The out of tree ILP32 patches are currently not maintained and will
likely need a rework to rebase them on top of the time64 changes.
2020-12-31 16:50:58 +00:00
Joseph Myers
d614a75396
Update copyright dates with scripts/update-copyrights.
2020-01-01 00:14:33 +00:00
Paul Eggert
5a82c74822
Prefer https to http for gnu.org and fsf.org URLs
...
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:
sed -ri '
s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
$(find $(git ls-files) -prune -type f \
! -name '*.po' \
! -name 'ChangeLog*' \
! -path COPYING ! -path COPYING.LIB \
! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
! -path manual/texinfo.tex ! -path scripts/config.guess \
! -path scripts/config.sub ! -path scripts/install-sh \
! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
! -path INSTALL ! -path locale/programs/charmap-kw.h \
! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
! '(' -name configure \
-execdir test -f configure.ac -o -f configure.in ';' ')' \
! '(' -name preconfigure \
-execdir test -f preconfigure.ac ';' ')' \
-print)
and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:
chmod a+x sysdeps/unix/sysv/linux/riscv/configure
# Omit irrelevant whitespace and comment-only changes,
# perhaps from a slightly-different Autoconf version.
git checkout -f \
sysdeps/csky/configure \
sysdeps/hppa/configure \
sysdeps/riscv/configure \
sysdeps/unix/sysv/linux/csky/configure
# Omit changes that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
git checkout -f \
sysdeps/powerpc/powerpc64/ppc-mcount.S \
sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
# Omit change that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Joseph Myers
04277e02d7
Update copyright dates with scripts/update-copyrights.
...
* All files with FSF copyright notices: Update copyright dates
using scripts/update-copyrights.
* locale/programs/charmap-kw.h: Regenerated.
* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
Wilco Dijkstra
5770c0ad1e
[AArch64] Adjust writeback in non-zero memset
...
This fixes an ineffiency in the non-zero memset. Delaying the writeback
until the end of the loop is slightly faster on some cores - this shows
~5% performance gain on Cortex-A53 when doing large non-zero memsets.
* sysdeps/aarch64/memset.S (MEMSET): Improve non-zero memset loop.
2018-11-20 12:37:00 +00:00
Joseph Myers
688903eb3e
Update copyright dates with scripts/update-copyrights.
...
* All files with FSF copyright notices: Update copyright dates
using scripts/update-copyrights.
* locale/programs/charmap-kw.h: Regenerated.
* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
Adhemerval Zanella
4e00196912
aarch64: fix memset with --disable-multi-arch
...
* sysdeps/aarch64/memset.S (MEMSET): Define.
2017-12-20 12:05:32 +00:00
Siddhesh Poyarekar
5a67c4fa01
aarch64: Optimized memset for falkor
...
The generic memset reads dczid_el0 on every memset. This has a
significant impact on falkor for a range of sizes because reading
dczid_el0 is slow.
The DZP bit in the dczid_el0 register does not change dynamically, so
it is safe to read once during program startup. With this patch
dczid_el0 is read once during startup and zva_size is cached. This is
used to invoke the falkor-specific memset; the generic memset routine
remains unchanged.
The gains due to this are significant for falkor, with run time
reductions as high as 48%. Here's a sample from the falkor tests:
Function: memset
Variant: walk
simple_memset __memset_falkor __memset_generic
=====================================================================
length=256, char=0: 139.96 (-698.28%) 9.07 ( 48.26%) 17.53
length=257, char=0: 140.50 (-699.03%) 9.53 ( 45.80%) 17.58
length=258, char=0: 140.96 (-703.95%) 9.58 ( 45.36%) 17.53
length=259, char=0: 141.56 (-705.16%) 9.53 ( 45.79%) 17.58
length=260, char=0: 142.15 (-710.76%) 9.57 ( 45.39%) 17.53
length=261, char=0: 142.50 (-710.39%) 9.53 ( 45.78%) 17.58
length=262, char=0: 142.97 (-715.09%) 9.57 ( 45.42%) 17.54
length=263, char=0: 143.51 (-716.18%) 9.53 ( 45.80%) 17.58
length=264, char=0: 143.93 (-720.55%) 9.58 ( 45.39%) 17.54
length=265, char=0: 144.56 (-722.07%) 9.53 ( 45.80%) 17.59
length=266, char=0: 144.98 (-726.42%) 9.58 ( 45.42%) 17.54
length=267, char=0: 145.53 (-727.53%) 9.53 ( 45.80%) 17.59
length=268, char=0: 146.25 (-731.81%) 9.53 ( 45.79%) 17.58
length=269, char=0: 146.52 (-735.39%) 9.53 ( 45.66%) 17.54
length=270, char=0: 146.97 (-735.81%) 9.53 ( 45.80%) 17.58
length=271, char=0: 147.54 (-741.08%) 9.58 ( 45.38%) 17.54
length=512, char=0: 268.26 (-1307.85%) 12.06 ( 36.71%) 19.05
length=513, char=0: 268.73 (-1273.89%) 13.56 ( 30.68%) 19.56
length=514, char=0: 269.31 (-1276.89%) 13.56 ( 30.68%) 19.56
length=515, char=0: 269.73 (-1279.05%) 13.56 ( 30.68%) 19.56
length=516, char=0: 270.34 (-1282.24%) 13.56 ( 30.67%) 19.56
length=517, char=0: 270.83 (-1284.71%) 13.56 ( 30.66%) 19.56
length=518, char=0: 271.20 (-1286.54%) 13.56 ( 30.67%) 19.56
length=519, char=0: 271.67 (-1288.67%) 13.65 ( 30.24%) 19.56
length=520, char=0: 272.14 (-1291.04%) 13.65 ( 30.22%) 19.56
length=521, char=0: 272.66 (-1293.69%) 13.65 ( 30.23%) 19.56
length=522, char=0: 273.14 (-1296.13%) 13.65 ( 30.20%) 19.56
length=523, char=0: 273.64 (-1298.75%) 13.65 ( 30.23%) 19.56
length=524, char=0: 274.34 (-1302.16%) 13.66 ( 30.20%) 19.57
length=525, char=0: 274.64 (-1297.78%) 13.56 ( 30.99%) 19.65
length=526, char=0: 275.20 (-1300.04%) 13.56 ( 31.01%) 19.66
length=527, char=0: 275.66 (-1302.86%) 13.56 ( 30.99%) 19.65
length=1024, char=0: 524.46 (-2169.75%) 20.12 ( 12.92%) 23.11
length=1025, char=0: 525.14 (-2124.63%) 21.62 ( 8.40%) 23.61
length=1026, char=0: 525.59 (-2125.36%) 21.88 ( 7.37%) 23.62
length=1027, char=0: 525.98 (-2127.14%) 21.62 ( 8.46%) 23.62
length=1028, char=0: 526.68 (-2131.10%) 21.62 ( 8.42%) 23.61
length=1029, char=0: 527.10 (-2131.70%) 21.79 ( 7.73%) 23.62
length=1030, char=0: 527.54 (-2118.51%) 21.62 ( 9.10%) 23.78
length=1031, char=0: 527.98 (-2136.37%) 21.62 ( 8.43%) 23.61
length=1032, char=0: 528.70 (-2139.38%) 21.62 ( 8.43%) 23.61
length=1033, char=0: 529.25 (-2124.37%) 21.62 ( 9.11%) 23.79
length=1034, char=0: 529.48 (-2142.95%) 21.62 ( 8.43%) 23.61
length=1035, char=0: 530.11 (-2145.13%) 21.62 ( 8.44%) 23.61
length=1036, char=0: 530.76 (-2147.10%) 21.79 ( 7.73%) 23.62
length=1037, char=0: 531.03 (-2149.45%) 21.62 ( 8.42%) 23.61
length=1038, char=0: 531.64 (-2151.87%) 21.62 ( 8.42%) 23.61
length=1039, char=0: 531.99 (-2151.63%) 21.80 ( 7.75%) 23.63
* sysdeps/aarch64/memset-reg.h: New file.
* sysdeps/aarch64/memset.S: Use it.
(__memset): Rename to MEMSET macro.
[ZVA_MACRO]: Use zva_macro.
* sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
Add memset_generic and memset_falkor.
* sysdeps/aarch64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add memset ifuncs.
* sysdeps/aarch64/multiarch/init-arch.h (INIT_ARCH): New
local variable zva_size.
* sysdeps/aarch64/multiarch/memset.c: New file.
* sysdeps/aarch64/multiarch/memset_generic.S: New file.
* sysdeps/aarch64/multiarch/memset_falkor.S: New file.
* sysdeps/aarch64/multiarch/rtld-memset.S: New file.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
(DCZID_DZP_MASK): New macro.
(DCZID_BS_MASK): Likewise.
(init_cpu_features): Read and set zva_size.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.h
(struct cpu_features): New member zva_size.
2017-11-20 18:25:04 +05:30
Joseph Myers
bfff8b1bec
Update copyright dates with scripts/update-copyrights.
2017-01-01 00:14:16 +00:00
Steve Ellcey
389d1f1b23
Partial ILP32 support for aarch64.
...
* sysdeps/aarch64/crti.S: Add include of sysdep.h.
(call_weak_fn): Use PTR_REG to get correct reg name in ILP32.
* sysdeps/aarch64/dl-irel.h: Add include of sysdep.h.
(elf_irela): Use AARCH64_R macro to get correct relocation in ILP32.
* sysdeps/aarch64/dl-machine.h: Add include of sysdep.h.
(elf_machine_load_address, RTLD_START, RTLD_START_1, RTLD_START,
elf_machine_type_class, ELF_MACHINE_JMP_SLOT, elf_machine_rela,
elf_machine_lazy_rel): Add ifdef's for ILP32 support.
* sysdeps/aarch64/dl-tlsdesc.S (_dl_tlsdesc_return,
_dl_tlsdesc_return_lazy, _dl_tlsdesc_dynamic,
_dl_tlsdesc_resolve_hold): Extend pointers in ILP32, use PTR_REG
to get correct reg name for ILP32.
* sysdeps/aarch64/dl-trampoline.S (ip01): New Macro.
(RELA_SIZE): New Macro.
(_dl_runtime_resolve, _dl_runtime_profile): Use new macros and PTR_REG
to support ILP32.
* sysdeps/aarch64/jmpbuf-unwind.h (_JMPBUF_CFA_UNWINDS_ADJ): Add
cast for ILP32 mode.
* sysdeps/aarch64/memcmp.S (memcmp): Extend arg pointers for ILP32 mode.
* sysdeps/aarch64/memcpy.S (memmove, memcpy): Ditto.
* sysdeps/aarch64/memset.S (__memset): Ditto.
* sysdeps/aarch64/strchr.S (strchr): Ditto.
* sysdeps/aarch64/strchrnul.S (__strchrnul): Ditto.
* sysdeps/aarch64/strcmp.S (strcmp): Ditto.
* sysdeps/aarch64/strcpy.S (strcpy): Ditto.
* sysdeps/aarch64/strlen.S (__strlen): Ditto.
* sysdeps/aarch64/strncmp.S (strncmp): Ditto.
* sysdeps/aarch64/strnlen.S (strnlen): Ditto.
* sysdeps/aarch64/strrchr.S (strrchr): Ditto.
* sysdeps/unix/sysv/linux/aarch64/clone.S: Ditto.
* sysdeps/unix/sysv/linux/aarch64/setcontext.S (__setcontext): Ditto.
* sysdeps/unix/sysv/linux/aarch64/swapcontext.S (__swapcontext): Ditto.
* sysdeps/aarch64/__longjmp.S (__longjmp): Extend pointers in ILP32,
change PTR_MANGLE call to use register numbers instead of names.
* sysdeps/unix/sysv/linux/aarch64/getcontext.S (__getcontext): Ditto.
* sysdeps/aarch64/setjmp.S (__sigsetjmp): Extend arg pointers for
ILP32 mode, change PTR_MANGLE calls to use register numbers.
* sysdeps/aarch64/start.S (_start): Ditto.
* sysdeps/aarch64/nptl/bits/pthreadtypes.h
(__PTHREAD_RWLOCK_INT_FLAGS_SHARED): New define.
(__SIZEOF_PTHREAD_ATTR_T, __SIZEOF_PTHREAD_MUTEX_T,
__SIZEOF_PTHREAD_MUTEXATTR_T, __SIZEOF_PTHREAD_COND_T,
__SIZEOF_PTHREAD_COND_COMPAT_T, __SIZEOF_PTHREAD_CONDATTR_T,
__SIZEOF_PTHREAD_RWLOCK_T, __SIZEOF_PTHREAD_RWLOCKATTR_T,
__SIZEOF_PTHREAD_BARRIER_T, __SIZEOF_PTHREAD_BARRIERATTR_T):
Make defined values dependent on __ILP32__.
* sysdeps/aarch64/nptl/bits/semaphore.h (__SIZEOF_SEM_T): Change define.
(sem_t): Change __align type.
* sysdeps/aarch64/sysdep.h (AARCH64_R, PTR_REG, PTR_LOG_SIZE, DELOUSE,
PTR_SIZE): New Macros.
(LDST_PCREL, LDST_GLOBAL) Update to use PTR_REG.
* sysdeps/unix/sysv/linux/aarch64/bits/fcntl.h (O_LARGEFILE):
Set when in ILP32 mode.
(F_GETLK64, F_SETLK64, F_SETLKW64): Only set in LP64 mode.
* sysdeps/unix/sysv/linux/aarch64/dl-cache.h (DL_CACHE_DEFAULT_ID):
Set elf flags for ILP32.
(add_system_dir): Set ILP32 library directories.
* sysdeps/unix/sysv/linux/aarch64/init-first.c
(_libc_vdso_platform_setup): Set minimum kernel version for ILP32.
* sysdeps/unix/sysv/linux/aarch64/ldconfig.h
(SYSDEP_KNOWN_INTERPRETER_NAMES): Add ILP32 names.
* sysdeps/unix/sysv/linux/aarch64/sigcontextinfo.h (GET_PC, SET_PC):
New Macros.
* sysdeps/unix/sysv/linux/aarch64/sysdep.h: Handle ILP32 pointers.
2016-11-28 09:01:23 -08:00
Wilco Dijkstra
a8c5a2a952
This is an optimized memset for AArch64. Memset is split into 4 main cases:
...
small sets of up to 16 bytes, medium of 16..96 bytes which are fully unrolled.
Large memsets of more than 96 bytes align the destination and use an unrolled
loop processing 64 bytes per iteration. Memsets of zero of more than 256 use
the dc zva instruction, and there are faster versions for the common ZVA sizes
64 or 128. STP of Q registers is used to reduce codesize without loss of
performance.
The speedup on test-memset is 1% on Cortex-A57 and 8% on Cortex-A53.
* sysdeps/aarch64/memset.S (__memset):
Rewrite of optimized memset.
2016-05-12 16:44:53 +01:00
Joseph Myers
f7a9f785e5
Update copyright dates with scripts/update-copyrights.
2016-01-04 16:05:18 +00:00
Joseph Myers
b168057aaa
Update copyright dates with scripts/update-copyrights.
2015-01-02 16:29:47 +00:00
Marcus Shawcroft
75eff3fe90
Relocate AArch64 from ports to libc.
...
This patch moves the AArch64 port to the main sysdeps hierarchy. The
move is essentially:
git mv ports/sysdeps/aarch64 sysdeps/aarch64
git mv ports/sysdeps/unix/sysv/linux/aarch64 sysdeps/unix/sysv/linux/aarch64
The README is updated and I've updated ChangeLog.aarch64 along the
lines of the ARM move. The AArch64 build has been tested to confirm
that there were no changes in objdump -dr output or the shared
objects.
2014-02-11 11:36:00 +00:00