Improve performance by handling another 16 bytes before entering the loop.
Use ADDHN in the loop to avoid SHRN+FMOV when it terminates. Change final
size computation to avoid increasing latency. On Neoverse V1 performance
of the random strlen benchmark improves by 4.6%.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
These functions are exp10m1, exp2m1, log10p1, log2p1.
Also regenerated ulps on x86_64.
For each format, there are 4 values, one for each rounding mode.
(For the intel96 format, there are 8 values, 4 for Intel hardware,
and 4 for AMD hardware. However, regen-ulps was only run on Intel.
It should be run in a separate patch on a AMD x86_64.)
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The old code used l_init_called as an indicator for whether TLS
initialization was complete. However, it is possible that
TLS for an object is initialized, written to, and then dlopen
for this object is called again, and l_init_called is not true at
this point. Previously, this resulted in TLS being initialized
twice, discarding any interim writes (technically introducing a
use-after-free bug even).
This commit introduces an explicit per-object flag, l_tls_in_slotinfo.
It indicates whether _dl_add_to_slotinfo has been called for this
object. This flag is used to avoid double-initialization of TLS.
In update_tls_slotinfo, the first_static_tls micro-optimization
is removed because preserving the initalization flag for subsequent
use by the second loop for static TLS is a bit complicated, and
another per-object flag does not seem to be worth it. Furthermore,
the l_init_called flag is dropped from the second loop (for static
TLS initialization) because l_need_tls_init on its own prevents
double-initialization.
The remaining l_init_called usage in resize_scopes and update_scopes
is just an optimization due to the use of scope_has_map, so it is
not changed in this commit.
The isupper check ensures that libc.so.6 is TLS is not reverted.
Such a revert happens if l_need_tls_init is not cleared in
_dl_allocate_tls_init for the main_thread case, now that
l_init_called is not checked anymore in update_tls_slotinfo
in elf/dl-open.c.
Reported-by: Jonathon Anderson <janderson@rice.edu>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
5476f8cd2e ("htl: move pthread_self info libc.") and
9dfa256216 ("htl: move pthread_equal into libc") to
1dc0bc8f07 ("htl: move pthread_attr_setdetachstate into libc")
moved some pthread_ symbols from libpthread.so to libc.so, but missed
adding the compat version like 5476f8cd2e ("htl: move pthread_self
info libc.") did: libc already had these symbols as forwards,
but versioned GLIBC_2.21, while the symbols in libpthread.so were
versioned GLIBC_2.12.
To fix running executables built before this, we thus have to add the
GLIBC_2.12 version, otherwise execution fails with e.g.
/usr/lib/i386-gnu/libglib-2.0.so: symbol lookup error: /usr/lib/i386-gnu/libglib-2.0.so: undefined symbol: pthread_attr_setinheritsched, version GLIBC_2.12
Add tests for MREMAP_MAYMOVE and MREMAP_FIXED. On Linux, also test
MREMAP_DONTUNMAP.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Update the mremap C implementation to support the optional argument for
MREMAP_DONTUNMAP added in Linux 5.7 since it may not always be correct
to implement a variadic function as a non-variadic function on all Linux
targets. Return MAP_FAILED and set errno to EINVAL for unknown flag bits.
This fixes BZ #31968.
Note: A test must be added when a new flag bit is introduced.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It was added by commit c62b758bae6af16 as a way for userspace to
check if two file descriptors refer to the same struct file.
Checked on aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py, and tst-pidfd-consts.py to 6.9.
There are no new constants covered by these tests in 6.10.
Tested with build-many-glibcs.py.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Linux 6.10 changes for syscall are:
* mseal for all architectures.
* map_shadow_stack for x32.
* Replace sync_file_range with sync_file_range2 for csky (which
fixes a broken sync_file_range usage).
Update syscall-names.list and regenerate the arch-syscall.h headers
with build-many-glibcs.py update-syscalls.
Tested with build-many-glibcs.py.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Remove local FAIL macro in favor to FAIL_EXIT1 from <support/check.h>,
which provides equivalent reporting, with the name of the file and the
line number within of the failure site additionally included. Remove
FAIL_ERR altogether and include ": %m" explicitly with the format string
supplied to FAIL_EXIT1 as there seems little value to have a separate
macro just for this.
Reviewed-by: DJ Delorie <dj@redhat.com>
Use RXX_LP in RTLD_START_ENABLE_X86_FEATURES. Support shadow stack during
startup for Linux 6.10:
commit 2883f01ec37dd8668e7222dfdb5980c86fdfe277
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Mar 15 07:04:33 2024 -0700
x86/shstk: Enable shadow stacks for x32
1. Add shadow stack support to x32 signal.
2. Use the 64-bit map_shadow_stack syscall for x32.
3. Set up shadow stack for x32.
Add the map_shadow_stack system call to <fixup-asm-unistd.h> and regenerate
arch-syscall.h. Tested on Intel Tiger Lake with CET enabled x32. There
are no regressions with CET enabled x86-64. There are no changes in CET
enabled x86-64 _dl_start_user.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Remove sysdeps/x86_64/x32/dl-machine.h by folding x32 ARCH_LA_PLTENTER,
ARCH_LA_PLTEXIT and RTLD_START into sysdeps/x86_64/dl-machine.h. There
are no regressions on x86-64 nor x32. There are no changes in x86-64
_dl_start_user. On x32, _dl_start_user changes are
<_dl_start_user>:
mov %eax,%r12d
+ mov %esp,%r13d
mov (%rsp),%edx
mov %edx,%esi
- mov %esp,%r13d
and $0xfffffff0,%esp
mov 0x0(%rip),%edi # <_dl_start_user+0x14>
lea 0x8(%r13,%rdx,4),%ecx
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
The powerpc pkey_get/pkey_set support was only added for 64-bit [1],
and tst-pkey only checks if the support was present with pkey_alloc
(which does not fail on powerpc32, at least running a 64-bit kernel).
Checked on powerpc-linux-gnu.
[1] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a803367bab167f5ec4fde1f0d0ec447707c29520
Reviewed-By: Andreas K. Huettel <dilfridge@gentoo.org>
Xfail elf/tst-platform-1 on x32 since kernel passes i686 in AT_PLATFORM.
See https://sourceware.org/bugzilla/show_bug.cgi?id=22363
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
0e75c4a463 ("hurd: Fix pthread_self() without libpthread") added a
declaration for ___pthread_init_thread instead of __pthread_init_thread,
and missed defining the external hidden symbol.
5476f8cd2e ("htl: move pthread_self info libc.") moved the htl
pthread_self() function from libpthread to libc, replacing the previous libc
stub that just returns 0. And 53da64d1cf ("htl: Initialize ___pthread_self
early") added initialization code which is needed before being able to
call pthread_self. It is currently in libpthread, and thus never called
before programs can call pthread_self from libc, which then segfaults
when accessing _pthread_self()->thread.
This moves the initialization to libc itself, as initialized variables, so
pthread_self can always be called fine.
In _dl_tlsdesc_dynamic, there are three 'addi.d sp, sp, -size'
instructions to allocate stack size for Float/LSX/LASX registers.
Every 'addi.d sp, sp, -size' needs a cfi_adjust_cfa_offset because
of sp is used to compute CFA. But only one 'addi.d sp, sp, -size'
will be run according to HWCAP value. And all cfi_adjust_cfa_offset
will be executed in stack unwinding, it result in incorrect CFA.
Change _dl_tlsdesc_dynamic to _dl_tlsdesc_dynamic,
_dl_tlsdesc_dynamic_lsx and _dl_tlsdesc_dynamic_lasx.
Conflicting cfi instructions can be distributed to the three functions.
And cfi instructions can correspond to stack down instructions.
The original commit enabling non-temporal memset on Skylake Server had
erroneous benchmarks (actually done on ICX).
Further benchmarks indicate non-temporal stores may in fact by a
regression on Skylake Server.
This commit may be over-cautious in some cases, but should avoid any
regressions for 2.40.
Tested using qemu on all x86_64 cpu arch supported by both qemu +
GLIBC.
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
We use thread_get_name and thread_set_name to get and set the thread
name, so nothing is stored in the thread structure since these functions
are supposed to be called sparingly.
One notable difference with Linux is that the thread name is up to 32
chars, whereas Linux's is 16.
Also added a mach_RPC_CHECK to check for the existing of gnumach RPCs.