The aarch64 uses 'trad' for traditional tls and 'desc' for tls
descriptors, but unlike other targets it defaults to 'desc'. The
gnutls2 configure check does not set aarch64 as an ABI that uses
TLS descriptors, which then disable somes stests.
Also rename the internal machinery fron gnu2 to tls descriptors.
Checked on aarch64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
ARM _dl_tlsdesc_dynamic slow path has two issues:
* The ip/r12 is defined by AAPCS as a scratch register, and gcc is
used to save the stack pointer before on some function calls. So it
should also be saved/restored as well. It fixes the tst-gnu2-tls2.
* None of the possible VFP registers are saved/restored. ARM has the
additional complexity to have different VFP bank sizes (depending of
VFP support by the chip).
The tst-gnu2-tls2 test is extended to check for VFP registers, although
only for hardfp builds. Different than setcontext, _dl_tlsdesc_dynamic
does not have HWCAP_ARM_IWMMXT (I don't have a way to properly test
it and it is almost a decade since newer hardware was released).
With this patch there is no need to mark tst-gnu2-tls2 as XFAIL.
Checked on arm-linux-gnueabihf.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
_dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack.
After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define
TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX
to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to
STATE_SAVE_OFFSET(%rsp).
+==================+<- stack frame start aligned at 8 or 16 bytes
| |<- RDI saved in the red zone
| |<- RSI saved in the red zone
| |<- RBX saved in the red zone
| |<- paddings for stack realignment of 64 bytes
|------------------|<- xsave buffer end aligned at 64 bytes
| |<-
| |<-
| |<-
|------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp)
| |<- 8-byte padding for 64-byte alignment
| |<- 8-byte padding for 64-byte alignment
| |<- R11
| |<- R10
| |<- R9
| |<- R8
| |<- RDX
| |<- RCX
+==================+<- RSP aligned at 64 bytes
Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size
for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI
and RBX are saved onto stack without adjusting stack pointer first, using
the red-zone. This fixes BZ #31501.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
Originally, nptl/descr.h included <sys/rseq.h>, but we removed that
in commit 2c6b4b272e ("nptl:
Unconditionally use a 32-byte rseq area"). After that, it was
not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c
compilation that provided a definition. This commit always checks
the rseq area for CPU number information before using the other
approaches.
This adds an unnecessary (but well-predictable) branch on
architectures which do not define RSEQ_SIG, but its cost is small
compared to the system call. Most architectures that have vDSO
acceleration for getcpu also have rseq support.
Fixes: 2c6b4b272e
Fixes: 1d350aa060
Reviewed-by: Arjun Shankar <arjun@redhat.com>
Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.
The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.8. (There are no new
constants covered by these tests in 6.8 that need any other header
changes.)
Tested with build-many-glibcs.py.
Linux 6.8 adds five new syscalls. Update syscall-names.list and
regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.
Tested with build-many-glibcs.py.
Similar to strstr (1e9a550ba4), power8 strcasestr does not show much
improvement compared to the generic implementation. The geomean
on bench-strcasestr shows:
__strcasestr_power8 __strcasestr_ppc
power10 1159 1120
power9 1640 1469
power8 1787 1904
The strcasestr uses the same 'trick' as power7 strstr to detect
potential quadradic behavior, which only adds overheads for input
that trigger quadradic behavior and it is really a hack.
Checked on powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The memcpy optimization (commit 587a1290a1) has a series
of mistakes:
- The implementation is wrong: the chunk size calculation is wrong
leading to invalid memory access.
- It adds ifunc supports as default, so --disable-multi-arch does
not work as expected for riscv.
- It mixes Linux files (memcpy ifunc selection which requires the
vDSO/syscall mechanism) with generic support (the memcpy
optimization itself).
- There is no __libc_ifunc_impl_list, which makes testing only
check the selected implementation instead of all supported
by the system.
This patch also simplifies the required bits to enable ifunc: there
is no need to memcopy.h; nor to add Linux-specific files.
The __memcpy_noalignment tail handling now uses a branchless strategy
similar to aarch64 (overlap 32-bits copies for sizes 4..7 and byte
copies for size 1..3).
Checked on riscv64 and riscv32 by explicitly enabling the function
on __libc_ifunc_impl_list on qemu-system.
Changes from v1:
* Implement the memcpy in assembly to correctly handle RISCV
strict-alignment.
Reviewed-by: Evan Green <evan@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Each mask in the sigset array is an unsigned long, so fix __sigisemptyset
to use that instead of int. The __sigword function returns a simple array
index, so it can return int instead of unsigned long.
Replace minimum ISA check ifdef conditional with if. Since
MINIMUM_X86_ISA_LEVEL and AVX_X86_ISA_LEVEL are compile time constants,
compiler will perform constant folding optimization, getting same
results.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
For CPU implementations that can perform unaligned accesses with little
or no performance penalty, create a memcpy implementation that does not
bother aligning buffers. It will use a block of integer registers, a
single integer register, and fall back to bytewise copy for the
remainder.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add a little helper method so it's easier to fetch a single value from
the hwprobe function when used within an ifunc selector.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
RISC-V is apparently the first architecture to pass more than one
argument to ifunc resolvers. The helper macros in libc-symbols.h,
__ifunc_resolver(), __ifunc(), and __ifunc_hidden(), are incompatible
with this. These macros have an "arg" (non-final) parameter that
represents the parameter signature of the ifunc resolver. The result is
an inability to pass the required comma through in a single preprocessor
argument.
Rearrange the __ifunc_resolver() macro to be variadic, and pass the
types as those variable parameters. Move the guts of __ifunc() and
__ifunc_hidden() into new macros, __ifunc_args(), and
__ifunc_args_hidden(), that pass the variable arguments down through to
__ifunc_resolver(). Then redefine __ifunc() and __ifunc_hidden(), which
are used in a bunch of places, to simply shuffle the arguments down into
__ifunc_args[_hidden]. Finally, define a riscv-ifunc.h header, which
provides convenience macros to those looking to write ifunc selectors
that use both arguments.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The new __riscv_hwprobe() function is designed to be used by ifunc
selector functions. This presents a challenge for applications and
libraries, as ifunc selectors are invoked before all relocations have
been performed, so an external call to __riscv_hwprobe() from an ifunc
selector won't work. To address this, pass a pointer to the
__riscv_hwprobe() function into ifunc selectors as the second
argument (alongside dl_hwcap, which was already being passed).
Include a typedef as well for convenience, so that ifunc users don't
have to go through contortions to call this routine. Users will need to
remember to check the second argument for NULL, to account for older
glibcs that don't pass the function.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The new riscv_hwprobe syscall also comes with a vDSO for faster answers
to your most common questions. Call in today to speak with a kernel
representative near you!
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
a regular syscall, but without setting errno. Instead, the return value
is plumbed straight out of the macro.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add awareness and a thin wrapper function around a new Linux system call
that allows callers to get architecture and microarchitecture
information about the CPUs from the kernel. This can be used to
do things like dynamically choose a memcpy implementation.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
_dl_tlsdesc_dynamic should also preserve AMX registers which are
caller-saved. Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID
to x86-64 TLSDESC_CALL_STATE_SAVE_MASK. Compute the AMX state size
and save it in xsave_state_full_size which is only used by
_dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec. This fixes
the AMX part of BZ #31372. Tested on AMX processor.
AMX test is enabled only for compilers with the fix for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098
GCC 14 and GCC 11/12/13 branches have the bug fix.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
When strcmp-avx2.S is used as the default, elf/tst-valgrind-smoke fails
with
==1272761== Conditional jump or move depends on uninitialised value(s)
==1272761== at 0x4022C98: strcmp (strcmp-avx2.S:462)
==1272761== by 0x400B05B: _dl_name_match_p (dl-misc.c:75)
==1272761== by 0x40085F3: _dl_map_object (dl-load.c:1966)
==1272761== by 0x401AEA4: map_doit (rtld.c:644)
==1272761== by 0x4001488: _dl_catch_exception (dl-catch.c:237)
==1272761== by 0x40015AE: _dl_catch_error (dl-catch.c:256)
==1272761== by 0x401B38F: do_preload (rtld.c:816)
==1272761== by 0x401C116: handle_preload_list (rtld.c:892)
==1272761== by 0x401EDF5: dl_main (rtld.c:1842)
==1272761== by 0x401A79E: _dl_sysdep_start (dl-sysdep.c:140)
==1272761== by 0x401BEEE: _dl_start_final (rtld.c:494)
==1272761== by 0x401BEEE: _dl_start (rtld.c:581)
==1272761== by 0x401AD87: ??? (in */elf/ld.so)
The assembly codes are:
0x0000000004022c80 <+144>: vmovdqu 0x20(%rdi),%ymm0
0x0000000004022c85 <+149>: vpcmpeqb 0x20(%rsi),%ymm0,%ymm1
0x0000000004022c8a <+154>: vpcmpeqb %ymm0,%ymm15,%ymm2
0x0000000004022c8e <+158>: vpandn %ymm1,%ymm2,%ymm1
0x0000000004022c92 <+162>: vpmovmskb %ymm1,%ecx
0x0000000004022c96 <+166>: inc %ecx
=> 0x0000000004022c98 <+168>: jne 0x4022c32 <strcmp+66>
strcmp-avx2.S has 32-byte vector loads of strings which are shorter than
32 bytes:
(gdb) p (char *) ($rdi + 0x20)
$6 = 0x1ffeffea20 "memcheck-amd64-linux.so"
(gdb) p (char *) ($rsi + 0x20)
$7 = 0x4832640 "core-amd64-linux.so"
(gdb) call (int) strlen ((char *) ($rsi + 0x20))
$8 = 19
(gdb) call (int) strlen ((char *) ($rdi + 0x20))
$9 = 23
(gdb)
It triggers the valgrind error. The above code is safe since the loads
don't cross the page boundary. Update tst-valgrind-smoke.sh to accept
an optional suppression file and pass a suppression file to valgrind when
strcmp-avx2.S is the default implementation of strcmp.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
When glibc is built with ISA level 3 or above enabled, SSE resolvers
aren't available and glibc fails to build:
ld: .../elf/librtld.os: in function `init_cpu_features':
.../elf/../sysdeps/x86/cpu-features.c:1200:(.text+0x1445f): undefined reference to `_dl_runtime_resolve_fxsave'
ld: .../elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object
/usr/local/bin/ld: final link failed: bad value
For ISA level 3 or above, don't use _dl_runtime_resolve_fxsave nor
_dl_tlsdesc_dynamic_fxsave.
This fixes BZ #31429.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Compiler generates the following instruction sequence for GNU2 dynamic
TLS access:
leaq tls_var@TLSDESC(%rip), %rax
call *tls_var@TLSCALL(%rax)
or
leal tls_var@TLSDESC(%ebx), %eax
call *tls_var@TLSCALL(%eax)
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS and RAX/EAX, are unchanged after CALL. When
_dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
path. __tls_get_addr is a normal function which doesn't preserve any
caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
caller-saved registers, but didn't preserve any other caller-saved
registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
XSAVE and XSAVEC to save and restore all caller-saved registers. This
fixes BZ #31372.
Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
to optimize elf_machine_runtime_setup.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Instead of tying based on the linker name and version, check for the
required support:
* whether it does not generate dynamic TLS relocations in PIE
(binutils PR ld/22263);
* if it accepts --no-dynamic-linker (by using -static-pie);
* and if it adds a DT_JMPREL pointing to .rela.iplt with static pie.
The patch also trims the comments, for binutils one of the tests should
already cover it. The kernel ones are not clear which version should
have the backport, nor it is something that glibc can do much about
it. Finally, the glibc is somewhat confusing, since it refers
to commits not related to s390x.
Checked with a build for s390x-linux-gnu.
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
This includes a fix for big-endian in AdvSIMD log, some cosmetic
changes, and numerous small optimisations mainly around inlining and
using indexed variants of MLA intrinsics.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Starting with commit e57d8fc97b
"S390: Always use svc 0"
clone clobbers the call-saved register r7 in error case:
function or stack is NULL.
This patch restores the saved registers also in the error case.
Furthermore the existing test misc/tst-clone is extended to check
all error cases and that clone does not clobber registers in this
error case.
When glibc is built with ISA level 3 or higher by default, the resulting
glibc binaries won't run on SSE or FMA4 processors. Exclude SSE, AVX and
FMA4 variants in libm multiarch when ISA level 3 or higher is enabled by
default.
When glibc is built with ISA level 2 enabled by default, only keep SSE4.1
variant.
Fixes BZ 31335.
NB: elf/tst-valgrind-smoke test fails with ISA level 4, because valgrind
doesn't support AVX512 instructions:
https://bugs.kde.org/show_bug.cgi?id=383010
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add APX registers to STATE_SAVE_MASK so that APX registers are saved in
ld.so trampoline. This fixes BZ #31371.
Also update STATE_SAVE_OFFSET and STATE_SAVE_MASK for i386 which will
be used by i386 _dl_tlsdesc_dynamic.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
The optimization is not faster than the generic algorithm,
using the bench-strstr the geometric mean running on a POWER10 machine
using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97
(which uses the generic implementation).
Also, there is no need to redirect the internal str*/mem* call
to optimized version, internal ifunc is supported and enabled
for internal calls (meaning that the generic implementation
will use any asm optimization if available).
Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The FSR version field is read-only and might be non-zero.
This allows math/test-fpucw* to correctly pass when the version is
non-zero.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Commit ff026950e2 ("Add a C wrapper for
prctl [BZ #25896]") replaced the assembler wrapper with a C function.
However, on powerpc64le-linux-gnu, the C variadic function
implementation requires extra work in the caller to set up the
parameter save area. Calling a function that needs a parameter save
area without one (because the prototype used indicates the function is
not variadic) corrupts the caller's stack. The Linux manual pages
project documents prctl as a non-variadic function. This has resulted
in various projects over the years using non-variadic prototypes,
including the sanitizer libraries in LLVm and GCC (GCC PR 113728).
This commit switches back to the assembler implementation on most
targets and only keeps the C implementation for x86-64 x32.
Also add the __prctl_time64 alias from commit
b39ffab860 ("Linux: Add time64 alias for
prctl") to sysdeps/unix/sysv/linux/syscalls.list; it was not yet
present in commit ff026950e2.
This restores the old ABI on powerpc64le-linux-gnu, thus fixing
bug 29770.
Reviewed-By: Simon Chopin <simon.chopin@canonical.com>
Before this change, we incorrectly used the SSE2 variant in the
implementation, without checking that the system actually supports
SSE2.
Tested-by: Sam James <sam@gentoo.org>
__builtin_ffs{,ll} basically on __builtin_ctz{,ll} in MIPS GCC compiler.
The hardware ctz instructions were available after MIPS{32,64} Release1. By using builtin ctz. It can also reduce code size of ffs/ffsll.
Checked on mips o32. mips64.
Signed-off-by: Junxian Zhu <zhujunxian@oss.cipunited.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Maxim Kuvyrkov <maxim.kuvyrkov@linaro.org>
For AMD Zen3+ architecture, the performance of the vectorized loop is
slightly better than ERMS.
Checked on x86_64-linux-gnu on Zen3.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The REP MOVSB usage on memcpy/memmove does not show much performance
improvement on Zen3/Zen4 cores compared to the vectorized loops. Also,
as from BZ 30994, if the source is aligned and the destination is not
the performance can be 20x slower.
The performance difference is noticeable with small buffer sizes, closer
to the lower bounds limits when memcpy/memmove starts to use ERMS. The
performance of REP MOVSB is similar to vectorized instruction on the
size limit (the L2 cache). Also, there is no drawback to multiple cores
sharing the cache.
Checked on x86_64-linux-gnu on Zen3.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Some shadow stack test scripts use the '==' operator with the 'test'
command to validate exit codes resulting in the following error:
sysdeps/x86_64/tst-shstk-legacy-1e.sh: 31: test: 139: unexpected operator
The '==' operator is invalid for the 'test' command, use '-eq' like the
previous call to 'test'.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Linux 6.7 adds a constant SOL_VSOCK (recall that various constants in
include/linux/socket.h are in fact part of the kernel-userspace API
despite that not being a uapi header). Add it to glibc's
bits/socket.h.
Tested for x86_64.
The commit 49d877a80b (arm: Remove
_dl_skip_args usage) removed the _SKIP_ARGS literal, which was
previously loader to r4 on loader _start. However, the cleanup did not
remove the following 'ldr r4, [sl, r4]' on _dl_start_user, used to check
to skip the arguments after ld self-relocations.
In my testing, the kernel initially set r4 to 0, which makes the
ldr instruction just read the _GLOBAL_OFFSET_TABLE_. However, since r4
is a callee-saved register; a different runtime might not zero
initialize it and thus trigger an invalid memory access.
Checked on arm-linux-gnu.
Reported-by: Adrian Ratiu <adrian.ratiu@collabora.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
On LoongArch GCC compiles __builtin_ffs{,ll} to basically
`(x ? __builtin_ctz (x) : -1) + 1`. Since a hardware ctz instruction is
available, this is much better than the table-driven generic
implementation.
Tested on loongarch64.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
For o32 we need to setup a minimal stack frame to allow cprestore
on __thread_start_clone3 (which instruct the linker to save the
gp for PIC). Also, there is no guarantee by kABI that $8 will be
preserved after syscall execution, so we need to save it on the
provided stack.
Checked on mipsel-linux-gnu.
Reported-by: Khem Raj <raj.khem@gmail.com>
Tested-by: Khem Raj <raj.khem@gmail.com>
Complete the internal renaming from "C2X" and related names in GCC by
renaming *-c2x and *-gnu2x tests to *-c23 and *-gnu23.
Tested for x86_64, and with build-many-glibcs.py for powerpc64le.
When running the testsuite in parallel, for instance running make -j
$(nproc) check, occasionally tst-epoll fails with a timeout. It happens
because it sometimes takes a bit more than 10ms for the process to get
cloned and blocked by the syscall. In that case the signal is
sent to early, and the test fails with a timeout.
Checked on x86_64-linux-gnu.
WG14 decided to use the name C23 as the informal name of the next
revision of the C standard (notwithstanding the publication date in
2024). Update references to C2X in glibc to use the C23 name.
This is intended to update everything *except* where it involves
renaming files (the changes involving renaming tests are intended to
be done separately). In the case of the _ISOC2X_SOURCE feature test
macro - the only user-visible interface involved - support for that
macro is kept for backwards compatibility, while adding
_ISOC23_SOURCE.
Tested for x86_64.
Starting with commits
- 7ea510127e
string: Add libc_hidden_proto for strchrnul
- 22999b2f0f
string: Add libc_hidden_proto for memrchr
building glibc on s390x with --disable-multi-arch fails if only
the C-variant of strchrnul / memrchr is used. This is the case
if gcc uses -march < z13.
The build fails with:
../sysdeps/s390/strchrnul-c.c:28:49: error: ‘__strchrnul_c’ undeclared here (not in a function); did you mean ‘__strchrnul’?
28 | __hidden_ver1 (__strchrnul_c, __GI___strchrnul, __strchrnul_c);
With --disable-multi-arch, __strchrnul_c is not available as string/strchrnul.c
is just included without defining STRCHRNUL and thus we also don't have to create
the internal hidden symbol.
Tested-by: Andreas K. Hüttel <dilfridge@gentoo.org>
The small counts copy bytes comparsion should be unsigned (as the
memmove size argument). It fixes string/tst-memmove-overflow on
sparcv9, where the input size triggers an invalid code path.
Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
Similar to sparc32 fix, remove the unwind information on the signal
return stubs. This fixes the regressions:
FAIL: nptl/tst-cancel24-static
FAIL: nptl/tst-cond8-static
FAIL: nptl/tst-mutex8-static
FAIL: nptl/tst-mutexpi8-static
FAIL: nptl/tst-mutexpi9
On sparc64-linux-gnu.
The FPU used by LEON does not preserve NaN payload. This change allows
the math/test-*-canonicalize tests to pass on LEON.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Use the math_force_eval() macro to force the calculation to complete and
raise the exception.
With this change the math/test-fenv test pass.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Conversions from a float to a long long on SPARC v8 uses a libgcc function
that may not raise the correct exceptions on overflow. It also may raise
spurious "inexact" exceptions on non overflow cases. This patch fixes the
problem in the same way as for RV32.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The functions were previously written in C, but were not compiled
with unwind information. The ENTRY/END macros includes .cfi_startproc
and .cfi_endproc which adds unwind information. This caused the
tests cleanup-8 and cleanup-10 in the GCC testsuite to fail.
This patch adds a version of the ENTRY/END macros without the
CFI instructions that can be used instead.
sigaction registers a restorer address that is located two instructions
before the stub function. This patch adds a two instruction padding to
avoid that the unwinder accesses the unwind information from the function
that the linker has placed right before it in memory. This fixes an issue
with pthread_cancel that caused tst-mutex8-static (and other tests) to fail.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
On LEON, if the stfsr instruction is immediately following a floating-point
operation instruction in a running program, with no other instruction in
between the two, the stfsr might behave as if the order was reversed
between the two instructions and the stfsr occurred before the
floating-point operation.
Add a nop instruction before the stfsr to prevent this from happening.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Macros for using inline assembly to access the fp state register exists
in both fenv_private.h and in fpu_control.h. Let fenv_private.h use the
macros from fpu_control.h
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.7. (There are no new
constants covered by these tests in 6.7 that need any other header
changes.)
Tested with build-many-glibcs.py.
Linux 6.7 adds the futex_requeue, futex_wait and futex_wake syscalls,
and enables map_shadow_stack for architectures previously missing it.
Update syscall-names.list and regenerate the arch-syscall.h headers
with build-many-glibcs.py update-syscalls.
Tested with build-many-glibcs.py.
Systemd execution environment configuration may prohibit changing a memory
mapping to become executable:
MemoryDenyWriteExecute=
Takes a boolean argument. If set, attempts to create memory mappings
that are writable and executable at the same time, or to change existing
memory mappings to become executable, or mapping shared memory segments
as executable, are prohibited.
When it is set, systemd service stops working if PLT rewrite is enabled.
Check if mprotect works before rewriting PLT. This fixes BZ #31230.
This also works with SELinux when deny_execmem is on.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Ffsll function randomly regress by ~20%, depending on how code gets
aligned in memory. Ffsll function code size is 17 bytes. Since default
function alignment is 16 bytes, it can load on 16, 32, 48 or 64 bytes
aligned memory. When ffsll function load at 16, 32 or 64 bytes aligned
memory, entire code fits in single 64 bytes cache line. When ffsll
function load at 48 bytes aligned memory, it splits in two cache line,
hence random regression.
Ffsll function size reduction from 17 bytes to 12 bytes ensures that it
will always fit in single 64 bytes cache line.
This patch fixes ffsll function random performance regression.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch referents the commit 374cef3 to add static-pie support. And
because the dummy link map is used when relocating ourselves, so need
not to set __global_pointer$ at this time.
It will also check whether toolchain supports to build static-pie.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The 551101e824 change is incorrect for
alpha and sparc, since __NR_stat is defined by both kABI. Use
__NR_newfstat to check whether to fallback to __NR_fstat64 (similar
to what fstatat64 does).
Checked on sparc64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Remove the error handling wrapper from exp10. This is very similar to
the changes done to exp and exp2, except that we also need to handle
pow10 and pow10l.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The __getrandom_nocancel function returns errors as negative values
instead of errno. This is inconsistent with other _nocancel functions
and it breaks "TEMP_FAILURE_RETRY (__getrandom_nocancel (p, n, 0))" in
__arc4random_buf. Use INLINE_SYSCALL_CALL instead of
INTERNAL_SYSCALL_CALL to fix this issue.
But __getrandom_nocancel has been avoiding from touching errno for a
reason, see BZ 29624. So add a __getrandom_nocancel_nostatus function
and use it in tcache_key_initialize.
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
CET feature bits in TCB, which are Linux specific, are used to check if
CET features are active. Move CET feature check to Linux/x86 directory.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
1. Remove _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk.
2. Move CET offsets from x86 cpu-features-offsets.sym to x86-64
features-offsets.sym.
3. Rename x86 cet-control.h to x86-64 feature-control.h since it is only
for x86-64 and also used for PLT rewrite.
4. Add x86-64 ldsodefs.h to include feature-control.h.
5. Change TUNABLE_CALLBACK (set_plt_rewrite) to x86-64 only.
6. Move x86 dl-procruntime.c to x86-64.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Move sysdeps/x86/libc-start.h to sysdeps/x86_64/libc-start.h and use
sysdeps/generic/libc-start.h for i386.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
CET is only support for x86_64, this patch reverts:
- faaee1f07e x86: Support shadow stack pointer in setjmp/longjmp.
- be9ccd27c0 i386: Add _CET_ENDBR to indirect jump targets in
add_n.S/sub_n.S
- c02695d776 x86/CET: Update vfork to prevent child return
- 5d844e1b72 i386: Enable CET support in ucontext functions
- 124bcde683 x86: Add _CET_ENDBR to functions in crti.S
- 562837c002 x86: Add _CET_ENDBR to functions in dl-tlsdesc.S
- f753fa7dea x86: Support IBT and SHSTK in Intel CET [BZ #21598]
- 825b58f3fb i386-mcount.S: Add _CET_ENDBR to _mcount and __fentry__
- 7e119cd582 i386: Use _CET_NOTRACK in i686/memcmp.S
- 177824e232 i386: Use _CET_NOTRACK in memcmp-sse4.S
- 0a899af097 i386: Use _CET_NOTRACK in memcpy-ssse3-rep.S
- 7fb613361c i386: Use _CET_NOTRACK in memcpy-ssse3.S
- 77a8ae0948 i386: Use _CET_NOTRACK in memset-sse2-rep.S
- 00e7b76a8f i386: Use _CET_NOTRACK in memset-sse2.S
- 90d15dc577 i386: Use _CET_NOTRACK in strcat-sse2.S
- f1574581c7 i386: Use _CET_NOTRACK in strcpy-sse2.S
- 4031d7484a i386/sub_n.S: Add a missing _CET_ENDBR to indirect jump
- target
-
Checked on i686-linux-gnu.
The CET is only supported for x86_64 and there is no plan to add
kernel support for i386. Move the Makefile rules and files from the
generic x86 folder to x86_64 one.
Checked on x86_64-linux-gnu and i686-linux-gnu.
PLT rewrite calculated displacement with
ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;
On x32, displacement from 0xf7fbe060 to 0x401030 was calculated as
unsigned int disp = 0x401030 - 0xf7fbe060 - 5;
with disp == 0x8442fcb and caused displacement overflow. The PLT entry
was changed to:
0xf7fbe060 <+0>: e9 cb 2f 44 08 jmp 0x401030
0xf7fbe065 <+5>: cc int3
0xf7fbe066 <+6>: cc int3
0xf7fbe067 <+7>: cc int3
0xf7fbe068 <+8>: cc int3
0xf7fbe069 <+9>: cc int3
0xf7fbe06a <+10>: cc int3
0xf7fbe06b <+11>: cc int3
0xf7fbe06c <+12>: cc int3
0xf7fbe06d <+13>: cc int3
0xf7fbe06e <+14>: cc int3
0xf7fbe06f <+15>: cc int3
x32 has 32-bit address range, but it doesn't wrap address around at 4GB,
JMP target was changed to 0x100401030 (0xf7fbe060LL + 0x8442fcbLL + 5),
which is above 4GB.
Always use uint64_t to calculate displacement. This fixes BZ #31218.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
relocation.
For x86-64, add
#define DT_X86_64_PLT (DT_LOPROC + 0)
#define DT_X86_64_PLTSZ (DT_LOPROC + 1)
#define DT_X86_64_PLTENT (DT_LOPROC + 3)
1. DT_X86_64_PLT: The address of the procedure linkage table.
2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
table.
3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
entry.
With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
memory offset of the indirect branch instruction.
Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
with direct branch after relocation when the lazy binding is disabled.
PLT rewrite is disabled by default since SELinux may disallow modifying
code pages and ld.so can't detect it in all cases. Use
$ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
to enable PLT rewrite with 32-bit direct jump at run-time or
$ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2
to enable PLT rewrite with 32-bit direct jump and on APX processors with
64-bit absolute jump at run-time.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
These describe generic AArch64 CPU features, and are not tied to a
kernel-specific way of determining them. We can share them between
the Linux and Hurd AArch64 ports.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240103171502.1358371-13-bugaevc@gmail.com>
We fetch __vm_page_size as the very first RPC that we do, inside
__mach_init (). Propagate that to _dl_pagesize ASAP after that,
before any other initialization.
In dynamic builds, this is already done immediately after
__mach_init (), inside _dl_sysdep_start ().
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240103171502.1358371-12-bugaevc@gmail.com>
This is the case on both x86 architectures, but not on AArch64.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240103171502.1358371-11-bugaevc@gmail.com>
We already have the RETURN_TO macro for this exact use case, and it's already
used in the non-static code path. Use it here too.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240103171502.1358371-9-bugaevc@gmail.com>
Instead of relying on the stack frame layout to figure out where the stack
pointer was prior to the _hurd_stack_setup () call, just pass the pointer
as an argument explicitly. This is less brittle and much more portable.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240103171502.1358371-8-bugaevc@gmail.com>
setcontext and swapcontext put a restore token on the old shadow stack
which is used to restore the target shadow stack when switching user
contexts. When longjmp from a user context, the target shadow stack
can be different from the current shadow stack and INCSSP can't be
used to restore the shadow stack pointer to the target shadow stack.
Update longjmp to search for a restore token. If found, use the token
to restore the shadow stack pointer before using INCSSP to pop the
shadow stack. Stop the token search and use INCSSP if the shadow stack
entry value is the same as the current shadow stack pointer.
It is a user error if there is a shadow stack switch without leaving a
restore token on the old shadow stack.
The only difference between __longjmp.S and __longjmp_chk.S is that
__longjmp_chk.S has a check for invalid longjmp usages. Merge
__longjmp.S and __longjmp_chk.S by adding the CHECK_INVALID_LONGJMP
macro.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Since shadow stack is only supported for x86-64, ignore --enable-cet for
i386. Always setting $(enable-cet) for i386 to "no" to support
ifneq ($(enable-cet),no)
in x86 Makefiles. We can't use
ifeq ($(enable-cet),yes)
since $(enable-cet) can be "yes", "no" or "permissive".
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
C23 adds a header <stdbit.h> with various functions and type-generic
macros for bit-manipulation of unsigned integers (plus macro defines
related to endianness). Implement this header for glibc.
The functions have both inline definitions in the header (referenced
by macros defined in the header) and copies with external linkage in
the library (which are implemented in terms of those macros to avoid
duplication). They are documented in the glibc manual. Tests, as
well as verifying results for various inputs (of both the macros and
the out-of-line functions), verify the types of those results (which
showed up a bug in an earlier version with the type-generic macro
stdc_has_single_bit wrongly returning a promoted type), that the
macros can be used at top level in a source file (so don't use ({})),
that they evaluate their arguments exactly once, and that the macros
for the type-specific functions have the expected implicit conversions
to the relevant argument type.
Jakub previously referred to -Wconversion warnings in type-generic
macros, so I've included a test with -Wconversion (but the only
warnings I saw and fixed from that test were actually in inline
functions in the <stdbit.h> header - not anything coming from use of
the type-generic macros themselves).
This implementation of the type-generic macros does not handle
unsigned __int128, or unsigned _BitInt types with a width other than
that of a standard integer type (and C23 doesn't require the header to
handle such types either). Support for those types, using the new
type-generic built-in functions Jakub's added for GCC 14, can
reasonably be added in a followup (along of course with associated
tests).
This implementation doesn't do anything special to handle C++, or have
any tests of functionality in C++ beyond the existing tests that all
headers can be compiled in C++ code; it's not clear exactly what form
this header should take in C++, but probably not one using macros.
DIS ballot comment AT-107 asks for the word "count" to be added to the
names of the stdc_leading_zeros, stdc_leading_ones,
stdc_trailing_zeros and stdc_trailing_ones functions and macros. I
don't think it's likely to be accepted (accepting any technical
comments would mean having an FDIS ballot), but if it is accepted at
the WG14 meeting (22-26 January in Strasbourg, starting with DIS
ballot comment handling) then there would still be time to update
glibc for the renaming before the 2.39 release.
The new functions and header are placed in the stdlib/ directory in
glibc, rather than creating a new toplevel stdbit/ or putting them in
string/ alongside ffs.
Tested for x86_64 and x86.
Includes test for setcontext too.
The test directly checks after longjmp if ZA got disabled and the
ZA contents got saved following the lazy saving scheme. It does not
use ACLE code to verify that gcc can interoperate with glibc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
For the ZA lazy saving scheme to work, setcontext has to call
__libc_arm_za_disable.
Also fixes swapcontext which uses setcontext internally.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
For the ZA lazy saving scheme to work, longjmp has to call
__libc_arm_za_disable.
In ld.so we assume ZA is not used so longjmp does not need
special support there.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The runtime support routines for the call ABI of the Scalable Matrix
Extension (SME) are mostly in libgcc. Since libc.so cannot depend on
libgcc_s.so have an implementation of __arm_za_disable in libc for
libc internal use in longjmp and similar APIs.
__libc_arm_za_disable follows the same PCS rules as __arm_za_disable,
but it's a hidden symbol so it does not need variant PCS marking.
Using __libc_fatal instead of abort because it can print a message and
works in ld.so too. But for now we don't need SME routines in ld.so.
To check the SME HWCAP in asm, we need the _dl_hwcap2 member offset in
_rtld_global_ro in the shared libc.so, while in libc.a the _dl_hwcap2
object is accessed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
When shadow stack is enabled, some CET tests failed when compiled with
GCC 14:
FAIL: elf/tst-cet-legacy-4
FAIL: elf/tst-cet-legacy-5a
FAIL: elf/tst-cet-legacy-6a
which are caused by
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113039
These tests use -fcf-protection -fcf-protection=branch and assume that
-fcf-protection=branch will override -fcf-protection. But this GCC 14
commit:
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1c6231c05bdcca
changed the -fcf-protection behavior such that
-fcf-protection -fcf-protection=branch
is treated the same as
-fcf-protection
Use
-fcf-protection -fcf-protection=none -fcf-protection=branch
as the workaround. This fixes BZ #31187.
Tested with GCC 13 and GCC 14 on Intel Tiger Lake.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Not all CET enabled applications and libraries have been properly tested
in CET enabled environments. Some CET enabled applications or libraries
will crash or misbehave when CET is enabled. Don't set CET active by
default so that all applications and libraries will run normally regardless
of whether CET is active or not. Shadow stack can be enabled by
$ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK
at run-time if shadow stack can be enabled by kernel.
NB: This commit can be reverted if it is OK to enable CET by default for
all applications and libraries.
Initially, IBT and SHSTK are marked as active when CPU supports them
and CET are enabled in glibc. They can be disabled early by tunables
before relocation. Since after relocation, GLRO(dl_x86_cpu_features)
becomes read-only, we can't update GLRO(dl_x86_cpu_features) to mark
IBT and SHSTK as inactive. Instead, check the feature_1 field in TCB
to decide if IBT and SHST are active.
Previously, CET was enabled by kernel before passing control to user
space and the startup code must disable CET if applications or shared
libraries aren't CET enabled. Since the current kernel only supports
shadow stack and won't enable shadow stack before passing control to
user space, we need to enable shadow stack during startup if the
application and all shared library are shadow stack enabled. There
is no need to disable shadow stack at startup. Shadow stack can only
be enabled in a function which will never return. Otherwise, shadow
stack will underflow at the function return.
1. GL(dl_x86_feature_1) is set to the CET features which are supported
by the processor and are not disabled by the tunable. Only non-zero
features in GL(dl_x86_feature_1) should be enabled. After enabling
shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check
if shadow stack is really enabled.
2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable. It is
safe since RTLD_START never returns.
3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static
executable. Since the start function using ARCH_SETUP_TLS never returns,
it is safe to enable shadow stack in ARCH_SETUP_TLS.
Sync with Linux kernel 6.6 shadow stack interface. Since only x86-64 is
supported, i386 shadow stack codes are unchanged and CET shouldn't be
enabled for i386.
1. When the shadow stack base in TCB is unset, the default shadow stack
is in use. Use the current shadow stack pointer as the marker for the
default shadow stack. It is used to identify if the current shadow stack
is the same as the target shadow stack when switching ucontexts. If yes,
INCSSP will be used to unwind shadow stack. Otherwise, shadow stack
restore token will be used.
2. Allocate shadow stack with the map_shadow_stack syscall. Since there
is no function to explicitly release ucontext, there is no place to
release shadow stack allocated by map_shadow_stack in ucontext functions.
Such shadow stacks will be leaked.
3. Rename arch_prctl CET commands to ARCH_SHSTK_XXX.
4. Rewrite the CET control functions with the current kernel shadow stack
interface.
Since CET is no longer enabled by kernel, a separate patch will enable
shadow stack during startup.
Code is mostly inspired from the LoongArch one, which has a similar ABI,
with minor changes to support riscv32 and register differences.
This fixes elf/tst-sprof-basic. This also fixes elf/tst-audit1,
elf/tst-audit2 and elf/tst-audit8 with recent binutils snapshots when
--enable-bind-now is used.
Resolves: BZ #31151
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
_dl_tlsdesc_undefweak and _dl_tlsdesc_dynamic access the thread pointer
via the tcb field in TCB:
_dl_tlsdesc_undefweak:
_CET_ENDBR
movq 8(%rax), %rax
subq %fs:0, %rax
ret
_dl_tlsdesc_dynamic:
...
subq %fs:0, %rax
movq -8(%rsp), %rdi
ret
Since the tcb field in TCB is a pointer, %fs:0 is a 32-bit location,
not 64-bit. It should use "sub %fs:0, %RAX_LP" instead. Since
_dl_tlsdesc_undefweak returns ptrdiff_t and _dl_make_tlsdesc_dynamic
returns void *, RAX_LP is appropriate here for x32 and x86-64. This
fixes BZ #31185.
On x32, I got
FAIL: elf/tst-tlsgap
$ gdb elf/tst-tlsgap
...
open tst-tlsgap-mod1.so
Thread 2 "tst-tlsgap" received signal SIGSEGV, Segmentation fault.
[Switching to LWP 2268754]
_dl_tlsdesc_dynamic () at ../sysdeps/x86_64/dl-tlsdesc.S:108
108 movq (%rsi), %rax
(gdb) p/x $rsi
$4 = 0xf7dbf9005655fb18
(gdb)
This is caused by
_dl_tlsdesc_dynamic:
_CET_ENDBR
/* Preserve call-clobbered registers that we modify.
We need two scratch regs anyway. */
movq %rsi, -16(%rsp)
movq %fs:DTV_OFFSET, %rsi
Since the dtv field in TCB is a pointer, %fs:DTV_OFFSET is a 32-bit
location, not 64-bit. Load the dtv field to RSI_LP instead of rsi.
This fixes BZ #31184.
In permissive mode, don't disable IBT nor SHSTK when dlopening a legacy
shared library if not single threaded since IBT and SHSTK may be still
enabled in other threads. Other threads with IBT or SHSTK enabled will
crash when calling functions in the legacy shared library. Instead, an
error will be issued.
Improve readability and make maintenance easier for dl-feature.c by
modularizing sysdeps/x86/dl-cet.c:
1. Support processors with:
a. Only IBT. Or
b. Only SHSTK. Or
c. Both IBT and SHSTK.
2. Lock CET features only if IBT or SHSTK are enabled and are not
enabled permissively.
Added annotations for autovec by GCC and GFortran - this enables GCC
>= 9 to autovectorise math calls at -Ofast.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Compilers may emit calls to 'half-width' routines (two-lane
single-precision variants). These have been added in the form of
wrappers around the full-width versions, where the low half of the
vector is simply duplicated. This will perform poorly when one lane
triggers the special-case handler, as there will be a redundant call
to the scalar version, however this is expected to be rare at Ofast.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
The expression
(excepts & FE_ALL_EXCEPT) << 27
produces a signed integer overflow when 'excepts' is specified as
FE_INVALID (= 0x10), because
- excepts is of type 'int',
- FE_ALL_EXCEPT is of type 'int',
- thus (excepts & FE_ALL_EXCEPT) is (int) 0x10,
- 'int' is 32 bits wide.
The patched code produces the same instruction sequence as
previosuly.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
It clears some exception flags that are outside the EXCEPTS argument.
It fixes math/test-fexcept on qemu-user.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
libc_feupdateenv_riscv should check for FE_DFL_ENV, similar to
libc_fesetenv_riscv.
Also extend the test-fenv.c to test fenvupdate.
Checked on riscv under qemu-system.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
According to ISO C23 (7.6.4.4), fesetexcept is supposed to set
floating-point exception flags without raising a trap (unlike
feraiseexcept, which is supposed to raise a trap if feenableexcept
was called with the appropriate argument).
The flags can be set in the 387 unit or in the SSE unit. When we need
to clear a flag, we need to do so in both units, due to the way
fetestexcept is implemented.
When we need to set a flag, it is sufficient to do it in the SSE unit,
because that is guaranteed to not trap. However, on i386 CPUs that have
only a 387 unit, set the flags in the 387, as long as this cannot trap.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
According to ISO C23 (7.6.4.4), fesetexcept is supposed to set
floating-point exception flags without raising a trap (unlike
feraiseexcept, which is supposed to raise a trap if feenableexcept
was called with the appropriate argument).
The flags can be set in the 387 unit or in the SSE unit. To set
a flag, it is sufficient to do it in the SSE unit, because that is
guaranteed to not trap. However, on i386 CPUs that have only a
387 unit, set the flags in the 387, as long as this cannot trap.
Checked on i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
According to ISO C23 (7.6.4.4), fesetexcept is supposed to set
floating-point exception flags without raising a trap (unlike
feraiseexcept, which is supposed to raise a trap if feenableexcept was
called with the appropriate argument).
This is a side-effect of how we implement the GNU extension
feenableexcept, where feenableexcept/fesetenv/fesetmode/feupdateenv
might issue prctl (PR_SET_FPEXC, PR_FP_EXC_PRECISE) depending of the
argument. And on PR_FP_EXC_PRECISE, setting a floating-point exception
flag triggers a trap.
To make the both functions follow the C23, fesetexcept and
fesetexceptflag now fail if the argument may trigger a trap.
The math tests now check for an value different than 0, instead
of bail out as unsupported for EXCEPTION_SET_FORCES_TRAP.
Checked on powerpc64le-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The tunable parsing duplicates the tunable environment variable so it
null-terminates each one since it simplifies the later parsing. It has
the drawback of adding another point of failure (__minimal_malloc
failing), and the memory copy requires tuning the compiler to avoid mem
operations calls.
The parsing now tracks the tunable start and its size. The
dl-tunable-parse.h adds helper functions to help parsing, like a strcmp
that also checks for size and an iterator for suboptions that are
comma-separated (used on hwcap parsing by x86, powerpc, and s390x).
Since the environment variable is allocated on the stack by the kernel,
it is safe to keep the references to the suboptions for later parsing
of string tunables (as done by set_hwcaps by multiple architectures).
Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, and
aarch64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Since GCC commit f31a019d1161ec78846473da743aedf49cca8c27 "Emit
funcall external declarations only if actually used.", the glibc
testsuite has failed to build for 32-bit SPARC with GCC mainline.
/scratch/jmyers/glibc-bot/install/compilers/sparc64-linux-gnu/lib/gcc/sparc64-glibc-linux-gnu/14.0.0/../../../../sparc64-glibc-linux-gnu/bin/ld: /scratch/jmyers/glibc-bot/install/compilers/sparc64-linux-gnu/lib/gcc/sparc64-glibc-linux-gnu/14.0.0/32/libgcc.a(_divsi3.o): in function `.div':
/scratch/jmyers/glibc-bot/src/gcc/libgcc/config/sparc/lb1spc.S:138: multiple definition of `.div'; /scratch/jmyers/glibc-bot/build/glibcs/sparcv9-linux-gnu/glibc/libc.a(sdiv.o):/scratch/jmyers/glibc-bot/src/glibc/gnulib/../sysdeps/sparc/sparc32/sparcv9/sdiv.S:13: first defined here
/scratch/jmyers/glibc-bot/install/compilers/sparc64-linux-gnu/lib/gcc/sparc64-glibc-linux-gnu/14.0.0/../../../../sparc64-glibc-linux-gnu/bin/ld: disabling relaxation; it will not work with multiple definitions
collect2: error: ld returned 1 exit status
make[3]: *** [../Rules:298: /scratch/jmyers/glibc-bot/build/glibcs/sparcv9-linux-gnu/glibc/nptl/tst-cancel24-static] Error 1
https://sourceware.org/pipermail/libc-testresults/2023q4/012154.html
I'm not sure of the exact sequence of undefined references that cause
first the glibc object file defining .div and then the libgcc object
file defining both .div and .udiv to be pulled in (which must have
been perturbed by that GCC change in a way that introduced the build
failure), but I think the failure illustrates that it's inherently
fragile for glibc to define symbols in separate object files that
libgcc defines in the same object file - and indeed for glibc to
redefine libgcc symbols at all, since the division into object files
shouldn't really be part of the interface between libgcc and libc.
These symbols appear to be in libc only for compatibility, maybe one
of the cases where they were accidentally exported from shared libc in
glibc 2.0 before the introduction of symbol versioning and so programs
started expecting shared libc to provide them. Thus, there is no need
to have them in static libc. Add this set of libgcc functions to
shared-only-routines so they are no longer provided in static libc.
(No change is made regarding .mul - dotmul source file - since unlike
the other symbols in this grouping, it doesn't actually appear to be a
libgcc symbol, at least in current GCC.)
Tested with build-many-glibcs.py for sparcv9-linux-gnu with GCC
mainline.
Verify that legacy shadow stack code in .init_array section in application
and shared library, which are marked as shadow stack enabled, will trigger
segfault.
So far if the ucontext structure was obtained by getcontext and co,
the return address was stored in general purpose register 14 as
it is defined as return address in the ABI.
In contrast, the context passed to a signal handler contains the address
in psw.addr field.
If somebody e.g. wants to dump the address of the context, the origin
needs to be known.
Now this patch adjusts getcontext and friends and stores the return address
also in psw.addr field.
Note that setcontext isn't adjusted and it is not supported to pass a
ucontext structure from signal-handler to setcontext. We are not able to
restore all registers and branching to psw.addr without clobbering one
register.
This commit uses a common implementation 'strlen-evex-base.S' for both
'strlen-evex' and 'strlen-evex512'
The motivation is to reduce the number of implementations to maintain.
This incidentally gives a small performance improvement.
All tests pass on x86.
Benchmarks were taken on SKX.
https://www.intel.com/content/www/us/en/products/sku/123613/intel-core-i97900x-xseries-processor-13-75m-cache-up-to-4-30-ghz/specifications.html
Geometric mean for strlen-evex512 over all benchmarks (N=10) was (new/old) 0.939
Geometric mean for wcslen-evex512 over all benchmarks (N=10) was (new/old) 0.965
Code Size Changes:
strlen-evex512.S : +24 bytes
wcslen-evex512.S : +54 bytes
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Since shadow stack (SHSTK) is enabled in the Linux kernel without
enabling indirect branch tracking (IBT), don't assume that SHSTK
implies IBT. Use "CPU_FEATURE_ACTIVE (IBT)" to check if IBT is active
and "CPU_FEATURE_ACTIVE (SHSTK)" to check if SHSTK is active.
This patch reserves space for HWCAP3/HWCAP4 in the TCB of powerpc.
These hardware capabilities bits will be used by future Power
architectures.
Versioned symbol '__parse_hwcap_3_4_and_convert_at_platform' advertises
the availability of the new HWCAP3/HWCAP4 data in the TCB.
This is an ABI change for GLIBC 2.39.
Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
Current implementation of strcmp for power10 has
performance regression for multiple small sizes
and alignment combination.
Most of these performance issues are fixed by this
patch. The compare loop is unrolled and page crosses
of unrolled loop is handled.
Thanks to Paul E. Murphy for helping in fixing the
performance issues.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Co-Authored-By: Paul E. Murphy <murphyp@linux.ibm.com>
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
Optimized memchr for POWER10 based on existing rawmemchr and strlen.
Reordering instructions and loop unrolling helped in getting better performance.
Reviewed-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
The PT_GNU_PROPERTY segment is scanned before PT_NOTE. For binaries
with the PT_GNU_PROPERTY segment, we can check it to avoid scan of
the PT_NOTE segment.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
This patch is based on __strcmp_power9 and __strlen_power10.
Improvements from __strcmp_power9:
1. Uses new POWER10 instructions
- This code uses lxvp to decrease contention on load
by loading 32 bytes per instruction.
2. Performance implication
- This version has around 30% better performance on average.
- Performance regression is seen for a specific combination
of sizes and alignments. Some of them is observed without
changes also, while rest may be induced by the patch.
Signed-off-by: Amrita H S <amritahs@linux.vnet.ibm.com>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
To avoid any environment variable to change setuid binaries
semantics.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Loader already ignores LD_DEBUG, LD_DEBUG_OUTPUT, and
LD_TRACE_LOADED_OBJECTS. Both LD_WARN and LD_VERBOSE are similar to
LD_DEBUG, in the sense they enable additional checks and debug
information, so it makes sense to disable them.
Also add both LD_VERBOSE and LD_WARN on filtered environment variables
for setuid binaries.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>