This patch optimizes the performance of memcpy/memmove for A64FX [1]
which implements ARMv8-A SVE and has L1 64KB cache per core and L2 8MB
cache per NUMA node.
The performance optimization makes use of Scalable Vector Register
with several techniques such as loop unrolling, memory access
alignment, cache zero fill, and software pipelining.
SVE assembler code for memcpy/memmove is implemented as Vector Length
Agnostic code so theoretically it can be run on any SOC which supports
ARMv8-A SVE standard.
We confirmed that all testcases have been passed by running 'make
check' and 'make xcheck' not only on A64FX but also on ThunderX2.
And also we confirmed that the SVE 512 bit vector register performance
is roughly 4 times better than Advanced SIMD 128 bit register and 8
times better than scalar 64 bit register by running 'make bench'.
[1] https://github.com/fujitsu/A64FX
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
This patch is a test helper script to change Vector Length for child
process. This script can be used as test-wrapper for 'make check'.
Usage examples:
~/build$ make check subdirs=string \
test-wrapper='~/glibc/sysdeps/unix/sysv/linux/aarch64/vltest.py 16'
~/build$ ~/glibc/sysdeps/unix/sysv/linux/aarch64/vltest.py 16 \
make test t=string/test-memcpy
~/build$ ~/glibc/sysdeps/unix/sysv/linux/aarch64/vltest.py 32 \
./debugglibc.sh string/test-memmove
~/build$ ~/glibc/sysdeps/unix/sysv/linux/aarch64/vltest.py 64 \
./testrun.sh string/test-memset
Since the variable expands to nothing under Linux, it is no longer
necessary to clutter the makefiles with it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Only the placeholder compatibility symbols are left now.
The __errno_location symbol was removed (moved) using
scripts/move-symbol-to-libc.py.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbols were moved using scripts/move-symbol-to-libc.py.
The libpthread placeholder symbols need some changes because some
symbol versions have gone away completely. But
__errno_location@@GLIBC_2.0 still exists, so the GLIBC_2.0 version
is still there.
The internal __pthread_create symbol now points to the correct
function, so the sysdeps/nptl/thrd_create.c override is no longer
necessary.
There was an issue how the hidden alias of pthread_getattr_default_np
was defined, so this commit cleans up that aspects and removes the
GLIBC_PRIVATE export altogether.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The tst-timespec_getres (e5ac7bd679) triggers an issue on 32-bit
architecture on Linux older than 5.1, where the fallback syscall
is used.
Checked on powerpc-linux-gnu.
ISO C2X adds a timespec_getres function alongside the C11
timespec_get, with functionality similar to that of POSIX clock_getres
(including allowing a NULL pointer to be passed to the function).
Implement this function for glibc, similarly to the implementation of
timespec_get.
This includes a basic test like that of timespec_get, but no
documentation in the manual, given that TIME_UTC and timespec_get
aren't documented in the manual at all. The handling of 64-bit time
follows that in timespec_get; people maintaining patch series for
64-bit time will need to update them accordingly (to export
__timespec_getres64, redirect calls in time.h and run the test for
_TIME_BITS=64).
Tested for x86_64 and x86, and (previous version; only testcase
differs) with build-many-glibcs.py.
The symbol was moved using scripts/move-symbol-to-libc.py.
The GLIBC_2.11 version is now empty, so add a placeholder symbol.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbol was moved using scripts/move-symbol-to-libc.py.
The GLIBC_2.3.4 version is now empty, so add a placeholder symbol.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbol was moved using scripts/move-symbol-to-libc.py.
Add __libpthread_version_placeholder@@GLIBC_2.12 for the targets
that need it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbol was moved using scripts/move-symbol-to-libc.py.
__libpthread_version_placeholder@@GLIBC_2.2 is needed by this change;
the Versions entry for GLIBC_2.2 in libpthread had leftover symbols
due to an error in a previous conflict resolution. The condition
for the placeholder symbol is complicated because some architectures
have earlier symbols at the GLIBC_2.2 symbol versions, so the
placeholder is not required there (yet).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbol was moved using scripts/move-symbol-to-libc.py.
A new placeholder symbol __libpthread_version_placeholder@GLIBC_2.18
is needed to keep the GLIBC_2.18 symbol version in libpthread.
The __pthread_getattr_default_np@@GLIBC_PRIVATE export is used
from pthread_create.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The error paths of __check_native would leave the socket FD open on
return, resulting in an FD leak. Rework function exit paths so that
the fd is always closed on return.
The symbols were moved using scripts/move-symbol-to-libc.py,
in one commit due to their dependency on the internal
__concurrency_level variable.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbols were moved using scripts/move-symbol-to-libc.py.
Also clean up some unwinder linking leftover in the same spot
in nptl/pthreadP.h.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbol was moved using scripts/move-symbol-to-libc.py.
It is necessary to arrange for a
__libpthread_version_placeholder@GLIBC_2.6 on some of the powerpc
targets.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbols pthread_clockjoin_np, pthread_join, pthread_timedjoin_np,
pthread_tryjoin_np, thrd_join were moved using
scripts/move-symbol-to-libc.py.
Moving the symbols at the same time avoids the need for temporary
exports.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbol was moved using scripts/move-symbol-to-libc.py.
The export of __default_pthread_attr_freeres is temporary. There
is a minor regression in freeres coverage because in the dynamic case,
__default_pthread_attr_freeres is no longer called if libpthread is
not linked in.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The nptl version is used as default, since now with symbol always
present the single-thread optimization is tricky.
Hurd is not change, it is used it own lock scheme (which call
_cthreads_funlockfile).
Checked on x86_64-linux-gnu.
The nptl version is used as default, since now with symbol always
present the single-thread optimization is tricky.
Hurd is not change, it is used it own lock scheme (which call
_cthreads_ftrylockfile).
Checked on x86_64-linux-gnu.
The nptl version is used as default, since now with symbol always
present the single-thread optimization is tricky.
Hurd is not change, it is used it own lock scheme (which call
_cthreads_flockfile).
Checked on x86_64-linux-gnu.
Linux 5.12 adds the constants PTRACE_SYSEMU and
PTRACE_SYSEMU_SINGLESTEP for s390. Add these to glibc.
Tested with build-many-glibcs.py for s390-linux-gnu and
s390x-linux-gnu.
All the stack lists are now in _rtld_global, so it is possible
to change stack permissions directly from there, instead of
calling into libpthread to do the change.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Always use __libc_multiple_threads if beneficial, and do not assume
the the dynamic loader is single-threaded. This assumption could
become incorrect by accident once more code is moved from libpthread
into it. The previous commit introducing the
NO_SYSCALL_CANCEL_CHECKING macro enables this change.
Do not hint to the compiler that multi-threaded programs are unlikely
(which is not quite true anymore).
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This allows the elimination of the __libc_multiple_threads_ptr
variable in libpthread and its initialization procedure.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Big win in binary size and avoids duplicating the logic in multiple
places.
On x86_64, dropped from 1883206 to 1881790, a 1416 byte decrease.
Also changed logic to track if ttyname_buf has been allocated by
checking if it's NULL instead of tracking buflen as an additional
variable.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Simplifies the logic and makes intent clearer, while at the same time
decreasing binary size.
On x86_64, dropped from 1883270 to 1883206, a 64 byte decrease.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>