When glibc is built with ISA level 3 or higher by default, the resulting
glibc binaries won't run on SSE or FMA4 processors. Exclude SSE, AVX and
FMA4 variants in libm multiarch when ISA level 3 or higher is enabled by
default.
When glibc is built with ISA level 2 enabled by default, only keep SSE4.1
variant.
Fixes BZ 31335.
NB: elf/tst-valgrind-smoke test fails with ISA level 4, because valgrind
doesn't support AVX512 instructions:
https://bugs.kde.org/show_bug.cgi?id=383010
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add APX registers to STATE_SAVE_MASK so that APX registers are saved in
ld.so trampoline. This fixes BZ #31371.
Also update STATE_SAVE_OFFSET and STATE_SAVE_MASK for i386 which will
be used by i386 _dl_tlsdesc_dynamic.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
The optimization is not faster than the generic algorithm,
using the bench-strstr the geometric mean running on a POWER10 machine
using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97
(which uses the generic implementation).
Also, there is no need to redirect the internal str*/mem* call
to optimized version, internal ifunc is supported and enabled
for internal calls (meaning that the generic implementation
will use any asm optimization if available).
Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The FSR version field is read-only and might be non-zero.
This allows math/test-fpucw* to correctly pass when the version is
non-zero.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Commit ff026950e2 ("Add a C wrapper for
prctl [BZ #25896]") replaced the assembler wrapper with a C function.
However, on powerpc64le-linux-gnu, the C variadic function
implementation requires extra work in the caller to set up the
parameter save area. Calling a function that needs a parameter save
area without one (because the prototype used indicates the function is
not variadic) corrupts the caller's stack. The Linux manual pages
project documents prctl as a non-variadic function. This has resulted
in various projects over the years using non-variadic prototypes,
including the sanitizer libraries in LLVm and GCC (GCC PR 113728).
This commit switches back to the assembler implementation on most
targets and only keeps the C implementation for x86-64 x32.
Also add the __prctl_time64 alias from commit
b39ffab860 ("Linux: Add time64 alias for
prctl") to sysdeps/unix/sysv/linux/syscalls.list; it was not yet
present in commit ff026950e2.
This restores the old ABI on powerpc64le-linux-gnu, thus fixing
bug 29770.
Reviewed-By: Simon Chopin <simon.chopin@canonical.com>
Before this change, we incorrectly used the SSE2 variant in the
implementation, without checking that the system actually supports
SSE2.
Tested-by: Sam James <sam@gentoo.org>
__builtin_ffs{,ll} basically on __builtin_ctz{,ll} in MIPS GCC compiler.
The hardware ctz instructions were available after MIPS{32,64} Release1. By using builtin ctz. It can also reduce code size of ffs/ffsll.
Checked on mips o32. mips64.
Signed-off-by: Junxian Zhu <zhujunxian@oss.cipunited.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
For AMD Zen3+ architecture, the performance of the vectorized loop is
slightly better than ERMS.
Checked on x86_64-linux-gnu on Zen3.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The REP MOVSB usage on memcpy/memmove does not show much performance
improvement on Zen3/Zen4 cores compared to the vectorized loops. Also,
as from BZ 30994, if the source is aligned and the destination is not
the performance can be 20x slower.
The performance difference is noticeable with small buffer sizes, closer
to the lower bounds limits when memcpy/memmove starts to use ERMS. The
performance of REP MOVSB is similar to vectorized instruction on the
size limit (the L2 cache). Also, there is no drawback to multiple cores
sharing the cache.
Checked on x86_64-linux-gnu on Zen3.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Some shadow stack test scripts use the '==' operator with the 'test'
command to validate exit codes resulting in the following error:
sysdeps/x86_64/tst-shstk-legacy-1e.sh: 31: test: 139: unexpected operator
The '==' operator is invalid for the 'test' command, use '-eq' like the
previous call to 'test'.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Linux 6.7 adds a constant SOL_VSOCK (recall that various constants in
include/linux/socket.h are in fact part of the kernel-userspace API
despite that not being a uapi header). Add it to glibc's
bits/socket.h.
Tested for x86_64.
The commit 49d877a80b (arm: Remove
_dl_skip_args usage) removed the _SKIP_ARGS literal, which was
previously loader to r4 on loader _start. However, the cleanup did not
remove the following 'ldr r4, [sl, r4]' on _dl_start_user, used to check
to skip the arguments after ld self-relocations.
In my testing, the kernel initially set r4 to 0, which makes the
ldr instruction just read the _GLOBAL_OFFSET_TABLE_. However, since r4
is a callee-saved register; a different runtime might not zero
initialize it and thus trigger an invalid memory access.
Checked on arm-linux-gnu.
Reported-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
On LoongArch GCC compiles __builtin_ffs{,ll} to basically
`(x ? __builtin_ctz (x) : -1) + 1`. Since a hardware ctz instruction is
available, this is much better than the table-driven generic
implementation.
Tested on loongarch64.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
For o32 we need to setup a minimal stack frame to allow cprestore
on __thread_start_clone3 (which instruct the linker to save the
gp for PIC). Also, there is no guarantee by kABI that $8 will be
preserved after syscall execution, so we need to save it on the
provided stack.
Checked on mipsel-linux-gnu.
Reported-by: Khem Raj <raj.khem@gmail.com>
Tested-by: Khem Raj <raj.khem@gmail.com>
Complete the internal renaming from "C2X" and related names in GCC by
renaming *-c2x and *-gnu2x tests to *-c23 and *-gnu23.
Tested for x86_64, and with build-many-glibcs.py for powerpc64le.
When running the testsuite in parallel, for instance running make -j
$(nproc) check, occasionally tst-epoll fails with a timeout. It happens
because it sometimes takes a bit more than 10ms for the process to get
cloned and blocked by the syscall. In that case the signal is
sent to early, and the test fails with a timeout.
Checked on x86_64-linux-gnu.
WG14 decided to use the name C23 as the informal name of the next
revision of the C standard (notwithstanding the publication date in
2024). Update references to C2X in glibc to use the C23 name.
This is intended to update everything *except* where it involves
renaming files (the changes involving renaming tests are intended to
be done separately). In the case of the _ISOC2X_SOURCE feature test
macro - the only user-visible interface involved - support for that
macro is kept for backwards compatibility, while adding
_ISOC23_SOURCE.
Tested for x86_64.
Starting with commits
- 7ea510127e
string: Add libc_hidden_proto for strchrnul
- 22999b2f0f
string: Add libc_hidden_proto for memrchr
building glibc on s390x with --disable-multi-arch fails if only
the C-variant of strchrnul / memrchr is used. This is the case
if gcc uses -march < z13.
The build fails with:
../sysdeps/s390/strchrnul-c.c:28:49: error: ‘__strchrnul_c’ undeclared here (not in a function); did you mean ‘__strchrnul’?
28 | __hidden_ver1 (__strchrnul_c, __GI___strchrnul, __strchrnul_c);
With --disable-multi-arch, __strchrnul_c is not available as string/strchrnul.c
is just included without defining STRCHRNUL and thus we also don't have to create
the internal hidden symbol.
Tested-by: Andreas K. Hüttel <dilfridge@gentoo.org>
The small counts copy bytes comparsion should be unsigned (as the
memmove size argument). It fixes string/tst-memmove-overflow on
sparcv9, where the input size triggers an invalid code path.
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
Similar to sparc32 fix, remove the unwind information on the signal
return stubs. This fixes the regressions:
FAIL: nptl/tst-cancel24-static
FAIL: nptl/tst-cond8-static
FAIL: nptl/tst-mutex8-static
FAIL: nptl/tst-mutexpi8-static
FAIL: nptl/tst-mutexpi9
On sparc64-linux-gnu.
The FPU used by LEON does not preserve NaN payload. This change allows
the math/test-*-canonicalize tests to pass on LEON.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Use the math_force_eval() macro to force the calculation to complete and
raise the exception.
With this change the math/test-fenv test pass.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Conversions from a float to a long long on SPARC v8 uses a libgcc function
that may not raise the correct exceptions on overflow. It also may raise
spurious "inexact" exceptions on non overflow cases. This patch fixes the
problem in the same way as for RV32.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The functions were previously written in C, but were not compiled
with unwind information. The ENTRY/END macros includes .cfi_startproc
and .cfi_endproc which adds unwind information. This caused the
tests cleanup-8 and cleanup-10 in the GCC testsuite to fail.
This patch adds a version of the ENTRY/END macros without the
CFI instructions that can be used instead.
sigaction registers a restorer address that is located two instructions
before the stub function. This patch adds a two instruction padding to
avoid that the unwinder accesses the unwind information from the function
that the linker has placed right before it in memory. This fixes an issue
with pthread_cancel that caused tst-mutex8-static (and other tests) to fail.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
On LEON, if the stfsr instruction is immediately following a floating-point
operation instruction in a running program, with no other instruction in
between the two, the stfsr might behave as if the order was reversed
between the two instructions and the stfsr occurred before the
floating-point operation.
Add a nop instruction before the stfsr to prevent this from happening.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Macros for using inline assembly to access the fp state register exists
in both fenv_private.h and in fpu_control.h. Let fenv_private.h use the
macros from fpu_control.h
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.7. (There are no new
constants covered by these tests in 6.7 that need any other header
changes.)
Tested with build-many-glibcs.py.
Linux 6.7 adds the futex_requeue, futex_wait and futex_wake syscalls,
and enables map_shadow_stack for architectures previously missing it.
Update syscall-names.list and regenerate the arch-syscall.h headers
with build-many-glibcs.py update-syscalls.
Tested with build-many-glibcs.py.
Systemd execution environment configuration may prohibit changing a memory
mapping to become executable:
MemoryDenyWriteExecute=
Takes a boolean argument. If set, attempts to create memory mappings
that are writable and executable at the same time, or to change existing
memory mappings to become executable, or mapping shared memory segments
as executable, are prohibited.
When it is set, systemd service stops working if PLT rewrite is enabled.
Check if mprotect works before rewriting PLT. This fixes BZ #31230.
This also works with SELinux when deny_execmem is on.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory. Ffsll function code size is 17 bytes. Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line. When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.
Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.
This patch fixes ffsll function random performance regression.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch referents the commit 374cef3 to add static-pie support. And
because the dummy link map is used when relocating ourselves, so need
not to set __global_pointer$ at this time.
It will also check whether toolchain supports to build static-pie.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The 551101e824 change is incorrect for
alpha and sparc, since __NR_stat is defined by both kABI. Use
__NR_newfstat to check whether to fallback to __NR_fstat64 (similar
to what fstatat64 does).
Checked on sparc64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Remove the error handling wrapper from exp10. This is very similar to
the changes done to exp and exp2, except that we also need to handle
pow10 and pow10l.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The __getrandom_nocancel function returns errors as negative values
instead of errno. This is inconsistent with other _nocancel functions
and it breaks "TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0))" in
__arc4random_buf. Use INLINE_SYSCALL_CALL instead of
INTERNAL_SYSCALL_CALL to fix this issue.
But __getrandom_nocancel has been avoiding from touching errno for a
reason, see BZ 29624. So add a __getrandom_nocancel_nostatus function
and use it in tcache_key_initialize.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
CET feature bits in TCB, which are Linux specific, are used to check if
CET features are active. Move CET feature check to Linux/x86 directory.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
1. Remove _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk.
2. Move CET offsets from x86 cpu-features-offsets.sym to x86-64
features-offsets.sym.
3. Rename x86 cet-control.h to x86-64 feature-control.h since it is only
for x86-64 and also used for PLT rewrite.
4. Add x86-64 ldsodefs.h to include feature-control.h.
5. Change TUNABLE_CALLBACK (set_plt_rewrite) to x86-64 only.
6. Move x86 dl-procruntime.c to x86-64.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>