Adhemerval noticed that the gettimeofday() and 32-bit clock_gettime()
vDSO calls won't be used by glibc on hppa, so there is no need to
declare them. Both syscalls will be emulated by utilizing return values
of the 64-bit clock_gettime() vDSO instead.
Signed-off-by: Helge Deller <deller@gmx.de>
Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
MIPSr6 has MADDF.s/MADDF.d instructions, which are fused.
In MIPS ISA, double support can be subsetted. Only FMAF is enabled
for this case.
* sysdeps/mips/fpu/math-use-builtins-fma.h
Signed-off-by: YunQiang Su <syq@gcc.gnu.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It turns out that quite a few applications use bundled mallocs that
have been built to use global-dynamic TLS (instead of the recommended
initial-exec TLS). The previous workaround from
commit afe42e935b ("elf: Avoid some
free (NULL) calls in _dl_update_slotinfo") does not fix all
encountered cases unfortunatelly.
This change avoids the TLS generation update for recursive use
of TLS from a malloc that was called during a TLS update. This
is possible because an interposed malloc has a fixed module ID and
TLS slot. (It cannot be unloaded.) If an initially-loaded module ID
is encountered in __tls_get_addr and the dynamic linker is already
in the middle of a TLS update, use the outdated DTV, thus avoiding
another call into malloc. It's still necessary to update the
DTV to the most recent generation, to get out of the slow path,
which is why the check for recursion is needed.
The bookkeeping is done using a global counter instead of per-thread
flag because TLS access in the dynamic linker is tricky.
All this will go away once the dynamic linker stops using malloc
for TLS, likely as part of a change that pre-allocates all TLS
during pthread_create/dlopen.
Fixes commit d2123d6827 ("elf: Fix slow
tls access after dlopen [BZ #19924]").
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound'
on Zhaoxin processors without ERMS. The default
'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000
Zhaoxin processors, this patch updates the value to
'shared / cachesize_non_temporal_divisor'.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
This patch optimizes large size copy using normal store when src > dst
and overlap. Make it the same as the logic in memmove-vec-unaligned-erms.S.
Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non-
temporal threshold, this patch updates that value to
'__x86_shared_non_temporal_threshold'. Currently, the
__x86_shared_non_temporal_threshold is cpu-specific, and different CPUs
will have different values based on the related nt-benchmark results.
However, in memmove-ssse3, the nontemporal threshold uses
'__x86_shared_cache_size_half', which sounds unreasonable.
The performance is not changed drastically although shows overall
improvements without any major regressions or gains.
Results on Zhaoxin KX-7000:
bench-memcpy geometric_mean(N=20) New / Original: 0.999
bench-memcpy-random geometric_mean(N=20) New / Original: 0.999
bench-memcpy-large geometric_mean(N=20) New / Original: 0.978
bench-memmove geometric_mean(N=20) New / Original: 1.000
bench-memmmove-large geometric_mean(N=20) New / Original: 0.962
Results on Intel Core i5-6600K:
bench-memcpy geometric_mean(N=20) New / Original: 1.001
bench-memcpy-random geometric_mean(N=20) New / Original: 0.999
bench-memcpy-large geometric_mean(N=20) New / Original: 1.001
bench-memmove geometric_mean(N=20) New / Original: 0.995
bench-memmmove-large geometric_mean(N=20) New / Original: 0.936
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Fix code formatting under the Zhaoxin branch and add comments for
different Zhaoxin models.
Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
the AVX_Fast_Unaligned_Load.
Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
use sse2_unaligned version of memset,strcpy and strcat.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Qualcom's new core, oryon-1, has a different characteristics for
memset than the current versions of memset. For non-zero, larger
sizes, using GPRs rather than the SIMD stores is ~30% faster.
For even larger sizes, using the nontemporal stores is needed
not to polute the L1/L2 caches.
For zero values, using `dc zva` should be used. Since we
know the size will always be 64 bytes, we don't need to figure
out the size there.
I started with the emag memset and added back the `dc zva` code.
Changes since v1:
* v3: Fix comment formating
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Qualcomm's new core (oryon-1) has a different performance characteristic
than other cores. For memcpy, it is faster to use the GPRs to
do the copy for large sizes (2x faster). For even larger sizes,
it is better to use the nontemporal load/store instructions so
we don't pollute the L1/L2 caches.
For smaller sizes, the characteristic are very similar to
other cores.
I used the thunderx memcpy as a starting point and expanded from there.
Changes since v1:
* v2: Fix ordering in Makefile.
* v3: Fix comment grammar about the ldnp/stnp instructions.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This recently came up during a cleanup to remove misaligned accesses
from the RISC-V port.
Link: https://sourceware.org/pipermail/libc-alpha/2022-June/139961.html
Suggested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Fangrui Song <maskray@google.com>
asm volatile ("movfcsr2gr $t0, $fcsr0" ::: "$t0");
asm volatile ("st.d $t0, %0" :"=m"(restore_fcsr));
generate to the following instructions with -Og flag:
movfcsr2gr $t0, $zero
addi.d $t0, $sp, 2047(0x7ff)
addi.d $t0, $t0, 77(0x4d)
st.w $t0, $t0, 0
fcsr0 register and restore_fcsr variable are both stored in t0 register.
Change to:
asm volatile ("movfcsr2gr %0, $fcsr0" :"=r"(restore_fcsr));
to avoid restore_fcsr address in t0.
Comparing float value using memcmp because float value cannot be
directly compared for equality.
Put LOAD_REGISTER_FCSR and SAVE_REGISTER_FCC after LOAD_REGISTER_FLOAT.
Some float instructions may change fcsr register.
If the pidfd_spawn/pidfd_spawnp helper process succeeds, but evecve
fails for some reason (either with an invalid/non-existent, memory
allocation, etc.) the resulting pidfd is never closed, nor returned
to caller (so it can call close).
Since the process creation failed, it should be up to posix_spawn to
also, close the file descriptor in this case (similar to what it
does to reap the process).
This patch also changes the waitpid with waitid (P_PIDFD) for pidfd
case, to avoid a possible pid re-use.
Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The atomic_spin_nop() macro can be used to run arch-specific
code in the body of a spin loop to potentially improve efficiency.
RISC-V's Zihintpause extension includes a PAUSE instruction for
this use-case, which is encoded as a HINT, which means that it
behaves like a NOP on systems that don't implement Zihintpause.
Binutils supports Zihintpause since 2.36, so this patch uses
the ".insn" directive to keep the code compatible with older
toolchains.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
MIPSr6 has MADDF.s/MADDF.d instructions, which are fused.
In MIPS ISA, double support can be subsetted. Only FMAF is enabled
for this case.
* sysdeps/mips/fpu/math-use-builtins-fma.h
Signed-off-by: YunQiang Su <syq@gcc.gnu.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
The upcoming parisc (hppa) v6.11 Linux kernel will include vDSO
support for gettimeofday(), clock_gettime() and clock_gettime64()
syscalls for 32- and 64-bit userspace.
The patch below adds the necessary glue code for glibc.
Signed-off-by: Helge Deller <deller@gmx.de>
Changes in v2:
- add vsyscalls for 64-bit too
The test is not a run-time check, so update the description.
Also use readelf -W for a more stable output format and fix
an LC_ALL typo.
This avoids garbled configure messages:
checking for s390-specific static PIE requirements (runtime check)... 0x0000000000000017 (JMPREL) 0x280
yes
Based on a -march=x86-64-v4 -mfpmath=sse build, with and without
--disable-multi-arch, running on a Zen 4 CPU. Also used different
-march=x8i6-64-v… settings.
HWCAP value is overwritten at the first comparison of the LASX case.
The second comparison at LSX get incorrect result.
Change to use t0 to save HWCAP value, and use t1 to save comparison
result.
The _dl_sysdep_parse_arguments function contains initalization
of a large on-stack variable:
dl_parse_auxv_t auxv_values = { 0, };
This uses a non-inline version of memset on powerpc64le-linux-gnu,
so it must use the baseline memset.
A desired hugetlb page size can be encoded in the flags parameter of
system calls such as mmap() and shmget(). The Linux UAPI headers have
included explicit definitions for these encodings since v4.14.
This patch adds these definitions that are used along with MAP_HUGETLB
and SHM_HUGETLB flags as specified in the corresponding man pages. This
relieves programs from having to duplicate and/or compute the encodings
manually.
Additionally, the filter on these definitions in tst-mman-consts.py is
removed, as suggested by Florian. I then ran this tests successfully,
confirming the alignment with the kernel headers.
PASS: misc/tst-mman-consts
original exit status 0
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Tested-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Remove the definitions of HWCAP_IMPORTANT after removal of
LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask. There HWCAP_IMPORTANT
was used as default value.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove the environment variable LD_HWCAP_MASK and the tunable
glibc.cpu.hwcap_mask as those are not used anymore in common-code
after removal in elf/dl-cache.c:search_cache().
The only remaining user is sparc32 where it is used in
elf_machine_matches_host(). If sparc32 does not need it anymore,
we can get rid of it at all. Otherwise we could also move
LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask to be sparc32 specific.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove the definitions of _DL_PLATFORMS_COUNT as those are not used
anymore after removal in elf/dl-cache.c:search_cache().
Note: On x86, we can also get rid of the definitions
HWCAP_PLATFORMS_START and HWCAP_PLATFORMS_COUNT.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove the definitions of _DL_FIRST_PLATFORM as those were only used
in the _DL_HWCAP_PLATFORM definitions and in _dl_string_platform().
Both were removed.
Note: Removed on every architecture despite of powerpc, where
_dl_string_platform() is still used.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove the definitions of _DL_HWCAP_PLATFORM as those are not used
anymore after removal in elf/dl-cache.c:search_cache().
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove the platform strings in dl-procinfo.c where also
the implementation of _dl_string_platform() was removed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Despite of powerpc where the returned integer is stored in tcb,
and the diagnostics output, there is no user anymore.
Thus this patch removes the diagnostics output and
_dl_string_platform for all other platforms.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Both defines are not used anymore. Those were only used for
_dl_string_hwcap(), which itself was removed with commit
ab40f20364
"elf: Remove _dl_string_hwcap"
Just clean up.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
As discussed at the patch review meeting
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the exp2m1 and exp10m1 functions (exp2(x)-1 and
exp10(x)-1, like expm1).
As with other such functions, these use type-generic templates that
could be replaced with faster and more accurate type-specific
implementations in future. Test inputs are copied from those for
expm1, plus some additions close to the overflow threshold (copied
from exp2 and exp10) and also some near the underflow threshold.
exp2m1 has the unusual property of having an input (M_MAX_EXP) where
whether the function overflows (under IEEE semantics) depends on the
rounding mode. Although these could reasonably be XFAILed in the
testsuite (as we do in some cases for arguments very close to a
function's overflow threshold when an error of a few ulps in the
implementation can result in the implementation not agreeing with an
ideal one on whether overflow takes place - the testsuite isn't smart
enough to handle this automatically), since these functions aren't
required to be correctly rounding, I made the implementation check for
and handle this case specially.
The Makefile ordering expected by lint-makefiles for the new functions
is a bit peculiar, but I implemented it in this patch so that the test
passes; I don't know why log2 also needed moving in one Makefile
variable setting when it didn't in my previous patches, but the
failure showed a different place was expected for that function as
well.
The powerpc64le IFUNC setup seems not to be as self-contained as one
might hope; it shouldn't be necessary to add IFUNCs for new functions
such as these simply to get them building, but without setting up
IFUNCs for the new functions, there were undefined references to
__GI___expm1f128 (that IFUNC machinery results in no such function
being defined, but doesn't stop include/math.h from doing the
redirection resulting in the exp2m1f128 and exp10m1f128
implementations expecting to call it).
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the log10p1 functions (log10(1+x): like log1p, but for
base-10 logarithms).
This is directly analogous to the log2p1 implementation (except that
whereas log2p1 has a smaller underflow range than log1p, log10p1 has a
larger underflow range). The test inputs are copied from those for
log1p and log2p1, plus a few more inputs in that wider underflow
range.
Tested for x86_64 and x86, and with build-many-glibcs.py.
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p). As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.
Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.
The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either). It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately. For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).
Tested for x86_64 and x86, and with build-many-glibcs.py.
When we don't want to use non-temporal stores for memset, we set
`x86_memset_non_temporal_threshold` to SIZE_MAX.
The current code, however, we using `maximum_non_temporal_threshold`
as the upper bound which is `SIZE_MAX >> 4` so we ended up with a
value of `0`.
Fix is to just use `SIZE_MAX` as the upper bound for when setting the
tunable.
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Linux pinacolada 6.6.32-gentoo #1 SMP PREEMPT Sun Jun 9 14:18:17 CEST 2024 x86_64 Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz GenuineIntel GNU/Linux
32bit build for multilib environment
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
"ADDI sp, sp, 24" and "ADDI sp, sp, SZFCSREG" (SZFCSREG = 4) are
misaligning the stack: the ABI mandates a 16-byte alignment. Fix it
by changing the first one to "ADDI sp, sp, 32", and reuse the spare 4th
slot for saving fcsr.
Reported-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
Properly set libc_cv_have_x86_isa_level in shell for MINIMUM_X86_ISA_LEVEL
defined as
(__X86_ISA_V1 + __X86_ISA_V2 + __X86_ISA_V3 + __X86_ISA_V4)
Also set __X86_ISA_V2 to 1 for i386 if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
is defined. There are no changes in config.h nor in config.make on x86-64.
On i386, -march=x86-64-v2 with GCC generates
#define MINIMUM_X86_ISA_LEVEL 2
in config.h and
have-x86-isa-level = 2
in config.make. This fixes BZ #31883.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
The __stack_prot is used by Linux to make the stack executable if
a modules requires it. It is also marked as RELRO, which requires
to change the segment permission to RW to update it.
Also, there is no need to keep track of the flags: either the stack
will have the default permission of the ABI or should be change to
PROT_READ | PROT_WRITE | PROT_EXEC. The only additional flag,
PROT_GROWSDOWN or PROT_GROWSUP, is Linux only and can be deducted
from _STACK_GROWS_DOWN/_STACK_GROWS_UP.
Also, the check_consistency function was already removed some time
ago.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
On i386, set the default minimum ISA level to 0, not 1 (baseline which
includes SSE2). There are no changes in config.h nor in config.make on
x86-64. This fixes BZ #31867.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Tested-by: Ian Jordan <immoloism@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
In commit 46b5e98ef6 ("x86: Add seperate non-temporal tunable for
memset") a tunable threshold for enabling non-temporal memset was added,
but only for Intel hardware.
Since that commit, new benchmark results suggest that non-temporal
memset is beneficial on AMD, as well, so allow this tunable to be set
for AMD.
See:
https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
which has been updated to include data using different stategies for
large memset on AMD Zen2, Zen3, and Zen4.
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
As of Linux kernel 6.9, some ioctls and a parameters structure have been
introduced which allow user programs to control whether a particular
epoll context will busy poll.
Update the headers to include these for the convenience of user apps.
The ioctls were added in Linux kernel 6.9 commit 18e2bf0edf4dd
("eventpoll: Add epoll ioctl for epoll_params") [1] to
include/uapi/linux/eventpoll.h.
[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/?h=v6.9&id=18e2bf0edf4dd
Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Left shift of ki is undefined when ki<0, copy the logic from exp,
which uses unsigned arithmetics, to fix it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The tuning for non-temporal stores for memset vs memcpy is not always
the same. This includes both the exact value and whether non-temporal
stores are profitable at all for a given arch.
This patch add `x86_memset_non_temporal_threshold`. Currently we
disable non-temporal stores for non Intel vendors as the only
benchmarks showing its benefit have been on Intel hardware.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Previously we use `rep stosb` for all medium/large memsets. This is
notably worse than non-temporal stores for large (above a
few MBs) memsets.
See:
https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
For data using different stategies for large memset on ICX and SKX.
Using non-temporal stores can be up to 3x faster on ICX and 2x faster
on SKX. Historically, these numbers would not have been so good
because of the zero-over-zero writeback optimization that `rep stosb`
is able to do. But, the zero-over-zero writeback optimization has been
removed as a potential side-channel attack, so there is no longer any
good reason to only rely on `rep stosb` for large memsets. On the flip
size, non-temporal writes can avoid data in their RFO requests saving
memory bandwidth.
All of the other changes to the file are to re-organize the
code-blocks to maintain "good" alignment given the new code added in
the `L(stosb_local)` case.
The results from running the GLIBC memset benchmarks on TGL-client for
N=20 runs:
Geometric Mean across the suite New / Old EXEX256: 0.979
Geometric Mean across the suite New / Old EXEX512: 0.979
Geometric Mean across the suite New / Old AVX2 : 0.986
Geometric Mean across the suite New / Old SSE2 : 0.979
Most of the cases are essentially unchanged, this is mostly to show
that adding the non-temporal case didn't add any regressions to the
other cases.
The results on the memset-large benchmark suite on TGL-client for N=20
runs:
Geometric Mean across the suite New / Old EXEX256: 0.926
Geometric Mean across the suite New / Old EXEX512: 0.925
Geometric Mean across the suite New / Old AVX2 : 0.928
Geometric Mean across the suite New / Old SSE2 : 0.924
So roughly a 7.5% speedup. This is lower than what we see on servers
(likely because clients typically have faster single-core bandwidth so
saving bandwidth on RFOs is less impactful), but still advantageous.
Full test-suite passes on x86_64 w/ and w/o multiarch.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
A space is added before the left bracket of the x86_64 elf_machine_rela
function, in order to harmonize with the rest of the implementation of
the function and to make it easier to retrieve the function. The lines
where the function definition is located has been re-indented, as well
as its left curly bracket placed in the correct position.
Signed-off-by: Xin Wang <yw987194828@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
pidfd_getpid.c has
/* Ignore invalid large values. */
if (INT_MULTIPLY_WRAPV (10, n, &n)
|| INT_ADD_WRAPV (n, *l++ - '0', &n))
return -1;
For GCC older than GCC 7, INT_ADD_WRAPV(a, b, r) is defined as
_GL_INT_OP_WRAPV (a, b, r, +, _GL_INT_ADD_RANGE_OVERFLOW)
and *l++ - '0' is evaluated twice. Fix BZ #31798 by moving "l++" out of
the if statement. Tested with GCC 6.4 and GCC 14.1.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch updates the kernel version in the tests tst-mman-consts.py
and tst-mount-consts.py to 6.9. (There are no new constants covered
by these tests in 6.9 that need any other header changes;
tst-pidfd-consts.py was updated separately along with adding new
constants relevant to that test.)
Tested with build-many-glibcs.py.
The libc.a for alpha, s390, and sparcv9 does not provide
copysignf64x, copysignf128, frexpf64x, frexpf128, modff64x, and
modff128.
Checked with a static build for the affected ABIs.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Both the generic and POWER6 versions provide definitions of the
symbol, which are already provided by the ifunc resolver.
Checked on powerpc-linux-gnu-power4.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
For powerpc64 the generic version provides a weak definition of
strchrnul, which are already provided by the ifunc resolver. The
powerpc32 version is slight different, where for static case there
is no iFUNC support.
The strncasecmp_l is provided ifunc resolver.
Checked on powerpc-linux-gnu-power4 and powerpc64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The generic version provides weak definitions of memchr/strlen,
which are already provided by the ifunc resolvers.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Linux 6.9 adds some more PIDFD_* constants. Add them to glibc's
sys/pidfd.h, including updating comments that said FLAGS was reserved
and must be 0, along with updating tst-pidfd-consts.py.
Tested with build-many-glibcs.py.
Don't provide __nexttowardf128_do_not_use, nexttowardf128_do_not_use,
finitef128_do_not_use, isinff128_do_not_use and isnanf128_do_not_use.
This fixes BZ #31757.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Some static implementation of float128 routines might call __isnanf128,
which is not provided by the static object.
Checked on x86_64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The commit 08ddd26814 removed the static exp10 on i386 and m68k with an
empty w_exp10.c (required for the ABIs that uses the newly
implementation). This patch fixes by adding the required symbols on the
arch-specific w_exp{f}_compat.c implementation.
Checked on i686-linux-gnu and with a build for m68k-linux-gnu.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
The commit 16439f419b removed the static fmod/fmodf on i386 and m68k
with and empty w_fmod.c (required for the ABIs that uses the newly
implementation). This patch fixes by adding the required symbols on
the arch-specific w_fmod{f}_compat.c implementation.
To statically build fmod fails on some ABI (alpha, s390, sparc) because
it does not export the ldexpf128, this is also fixed by this patch.
Checked on i686-linux-gnu and with a build for m68k-linux-gnu.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
clone3 isn't exported from glibc and is hidden in libc.so. Fix BZ #31770
by removing clone3 alias.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Plus a small amount of moving includes around in order to be able to
remove duplicate definition of asuint64.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Fix BZ #31768 by not defining stpncpy alias when used in IFUNC.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
C23 adds various <math.h> function families originally defined in TS
18661-4. Add the log2p1 functions (log2(1+x): like log1p, but for
base-2 logarithms).
This illustrates the intended structure of implementations of all
these function families: define them initially with a type-generic
template implementation. If someone wishes to add type-specific
implementations, it is likely such implementations can be both faster
and more accurate than the type-generic one and can then override it
for types for which they are implemented (adding benchmarks would be
desirable in such cases to demonstrate that a new implementation is
indeed faster).
The test inputs are copied from those for log1p. Note that these
changes make gen-auto-libm-tests depend on MPFR 4.2 (or later).
The bulk of the changes are fairly generic for any such new function.
(sysdeps/powerpc/nofpu/Makefile only needs changing for those
type-generic templates that use fabs.)
Tested for x86_64 and x86, and with build-many-glibcs.py.
Linux 6.9 has no new syscalls. Update the version number in
syscall-names.list to reflect that it is still current for 6.9.
Tested with build-many-glibcs.py.
Fix BZ #31755 by renaming the internal function procutils_read_file to
__libc_procutils_read_file.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Fix BZ #31759 by not defining nearbyint aliases when used in IFUNC.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
This supports common coding patterns. The GCC C front end before
version 7 rejects the may_alias attribute on a struct definition
if it was not present in a previous forward declaration, so this
attribute can only be conditionally applied.
This implements the spirit of the change in Austin Group issue 1641.
Suggested-by: Marek Polacek <polacek@redhat.com>
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Sam James <sam@gentoo.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch ensures that $libc_cv_cc_submachine, which is set from
"--with-cpu", overrides $CFLAGS for configure time tests.
Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
This is mostly based on AArch64 and RISC-V implementation.
Add R_LARCH_TLS_DESC32 and R_LARCH_TLS_DESC64 relocations.
For _dl_tlsdesc_dynamic function slow path, temporarily save and restore
all vector registers.
Previously many routines used * to load from vector types stored
in the data table. This is emitted as ldr, which byte-swaps the
entire vector register, and causes bugs for big-endian when not
all lanes contain the same value. When a vector is to be used
this way, it has been replaced with an array and the load with an
explicit ld1 intrinsic, which byte-swaps only within lanes.
As well, many routines previously used non-standard GCC syntax
for vector operations such as indexing into vectors types with []
and assembling vectors using {}. This syntax should not be mixed
with ACLE, as the former does not respect endianness whereas the
latter does. Such examples have been replaced with, for instance,
vcombine_* and vgetq_lane* intrinsics. Helpers which only use the
GCC syntax, such as the v_call helpers, do not need changing as
they do not use intrinsics.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
The e68b1151f7 commit changed the
__fesetround_inline_nocheck implementation to use mffscrni
(through __fe_mffscrn) instead of mtfsfi. For generic powerpc
ceil/floor/trunc, the function is supposed to disable the
floating-point inexact exception enable bit, however mffscrni
does not change any exception enable bits.
This patch fixes by reverting the optimization for the
__fesetround_inline_nocheck.
Checked on powerpc-linux-gnu.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
This code expects the WCSCAT preprocessor macro to be predefined in case
the evex implementation of the function should be defined with a name
different from __wcsncat_evex. However, when glibc is built for
x86-64-v4 without multiarch support, sysdeps/x86_64/wcsncat.S defines
WCSNCAT variable instead of WCSCAT to build it as wcsncat. Rename the
variable to WCSNCAT, as it is actually a better naming choice for the
variable in this case.
Reported-by: Kenton Groombridge
Link: https://bugs.gentoo.org/921945
Fixes: 64b8b6516b ("x86: Add evex optimized functions for the wchar_t strcpy family")
Signed-off-by: Gabi Falk <gabifalk@gmx.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
The 680c597e9c commit made loader reject ill-formatted strings by
first tracking all set tunables and then applying them. However, it does
not take into consideration if the same tunable is set multiple times,
where parse_tunables_string appends the found tunable without checking
if it was already in the list. It leads to a stack-based buffer overflow
if the tunable is specified more than the total number of tunables. For
instance:
GLIBC_TUNABLES=glibc.malloc.check=2:... (repeat over the number of
total support for different tunable).
Instead, use the index of the tunable list to get the expected tunable
entry. Since now the initial list is zero-initialized, the compiler
might emit an extra memset and this requires some minor adjustment
on some ports.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reported-by: Yuto Maeda <maeda@cyberdefense.jp>
Reported-by: Yutaro Shimizu <shimizu@cyberdefense.jp>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Starting from glibc 2.1, crt1.o contains _IO_stdin_used which is checked
by _IO_check_libio to provide binary compatibility for glibc 2.0. Add
crt1-2.0.o for tests against glibc 2.0. Define tests-2.0 for glibc 2.0
compatibility tests. Add and update glibc 2.0 compatibility tests for
stderr, matherr and pthread_kill.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch is based on __strcmp_power10.
Improvements from __strncmp_power9:
1. Uses new POWER10 instructions
- This code uses lxvp to decrease contention on load
by loading 32 bytes per instruction.
2. Performance implication
- This version has around 38% better performance on average.
- Minor performance regression is seen for few small sizes
and specific combination of alignments.
Signed-off-by: Amrita H S <amritahs@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
This patch adds hardware floating point support to OpenRISC. Hardware
floating point toolchain builds are enabled by passing the machine
specific argument -mhard-float to gcc via CFLAGS. With this enabled GCC
generates floating point instructions for single-precision operations
and exports __or1k_hard_float__.
There are 2 main parts to this patch.
- Implement fenv functions to update the FPCSR flags keeping it in sync
with sfp (software floating point).
- Update machine context functions to store and restore the FPCSR
state.
*On mcontext_t ABI*
This patch adds __fpcsr to mcontext_t. This is an ABI change, but also
an ABI fix. The Linux kernel has always defined padding in mcontext_t
that space was missing from the glibc ABI. In Linux this unused space
has now been re-purposed for storing the FPCSR. This patch brings
OpenRISC glibc in line with the Linux kernel and other libc
implementation (musl).
Compatibility getcontext, setcontext, etc symbols have been added to
allow for binaries expecting the old ABI to continue to work.
*Hard float ABI*
The calling conventions and types do not change with OpenRISC hard-float
so glibc hard-float builds continue to use dynamic linker
/lib/ld-linux-or1k.so.1.
*Testing*
I have tested this patch both with hard-float and soft-float builds and
the test results look fine to me. Results are as follows:
Hard Float
# failures
FAIL: elf/tst-sprof-basic (Haven't figured out yet, not related to hard-float)
FAIL: gmon/tst-gmon-pie (PIE bug in or1k toolchain)
FAIL: gmon/tst-gmon-pie-gprof (PIE bug in or1k toolchain)
FAIL: iconvdata/iconv-test (timeout, passed when run manually)
FAIL: nptl/tst-cond24 (Timeout)
FAIL: nptl/tst-mutex10 (Timeout)
# summary
6 FAIL
4289 PASS
86 UNSUPPORTED
16 XFAIL
2 XPASS
# versions
Toolchain: or1k-smhfpu-linux-gnu
Compiler: gcc version 14.0.1 20240324 (experimental) [master r14-9649-gbb04a11418f] (GCC)
Binutils: GNU assembler version 2.42.0 (or1k-smhfpu-linux-gnu) using BFD version (GNU Binutils) 2.42.0.20240324
Linux: Linux buildroot 6.9.0-rc1-00008-g4dc70e1aadfa #112 SMP Sat Apr 27 06:43:11 BST 2024 openrisc GNU/Linux
Tester: shorne
Glibc: 2024-04-25 b62928f907 Florian Weimer x86: In ld.so, diagnose missing APX support in APX-only builds (origin/master, origin/HEAD)
Soft Float
# failures
FAIL: elf/tst-sprof-basic
FAIL: gmon/tst-gmon-pie
FAIL: gmon/tst-gmon-pie-gprof
FAIL: nptl/tst-cond24
FAIL: nptl/tst-mutex10
# summary
5 FAIL
4295 PASS
81 UNSUPPORTED
16 XFAIL
2 XPASS
# versions
Toolchain: or1k-smh-linux-gnu
Compiler: gcc version 14.0.1 20240324 (experimental) [master r14-9649-gbb04a11418f] (GCC)
Binutils: GNU assembler version 2.42.0 (or1k-smh-linux-gnu) using BFD version (GNU Binutils) 2.42.0.20240324
Linux: Linux buildroot 6.9.0-rc1-00008-g4dc70e1aadfa #112 SMP Sat Apr 27 06:43:11 BST 2024 openrisc GNU/Linux
Tester: shorne
Glibc: 2024-04-25 b62928f907 Florian Weimer x86: In ld.so, diagnose missing APX support in APX-only builds (origin/master, origin/HEAD)
Documentation: https://raw.githubusercontent.com/openrisc/doc/master/openrisc-arch-1.4-rev0.pdf
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch adds the ulps test file to prepare for the upcoming
hard float patch. This is separated out to make the hard float patch
smaller.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Commit c73c96a4a1 updated memcpy.S and
mempcpy.S, but omitted memmove.S and memset.S. As a result, the static
library built as PIC, whether with or without multiarch support,
contains two definitions for each of the __memmove_chk and __memset_chk
symbols.
/usr/lib/gcc/i686-pc-linux-gnu/14/../../../../i686-pc-linux-gnu/bin/ld: /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../lib/libc.a(memset-ia32.o): in function `__memset_chk':
/var/tmp/portage/sys-libs/glibc-2.39-r3/work/glibc-2.39/string/../sysdeps/i386/i686/memset.S:32: multiple definition of `__memset_chk'; /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../lib/libc.a(memset_chk.o):/var/tmp/portage/sys-libs/glibc-2.39-r3/work/glibc-2.39/debug/../sysdeps/i386/i686/multiarch/memset_chk.c:24: first defined here
After this change, regardless of PIC options, the static library, built
for i686 with multiarch contains implementations of these functions
respectively from debug/memmove_chk.c and debug/memset_chk.c, and
without multiarch contains implementations of these functions
respectively from sysdeps/i386/memmove_chk.S and
sysdeps/i386/memset_chk.S. This ensures that memmove and memset won't
pull in __chk_fail and the routines it calls.
Reported-by: Sam James <sam@gentoo.org>
Tested-by: Sam James <sam@gentoo.org>
Fixes: c73c96a4a1 ("i686: Fix build with --disable-multiarch")
Signed-off-by: Gabi Falk <gabifalk@gmx.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
/home/bmg/install/compilers/x86_64-linux-gnu/lib/gcc/x86_64-glibc-linux-gnu/13.2.1/../../../../x86_64-glibc-linux-gnu/bin/ld: /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(memcpy_chk.o): in function `__memcpy_chk':
/home/bmg/src/glibc/debug/../sysdeps/i386/memcpy_chk.S:29: multiple definition of `__memcpy_chk';/home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(memcpy.o):/home/bmg/src/glibc/string/../sysdeps/i386/i586/memcpy.S:31: first defined here /home/bmg/install/compilers/x86_64-linux-gnu/lib/gcc/x86_64-glibc-linux-gnu/13.2.1/../../../../x86_64-glibc-linux-gnu/bin/ld: /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(mempcpy_chk.o): in function `__mempcpy_chk': /home/bmg/src/glibc/debug/../sysdeps/i386/mempcpy_chk.S:28: multiple definition of `__mempcpy_chk'; /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(mempcpy.o):/home/bmg/src/glibc/string/../sysdeps/i386/i586/memcpy.S:31: first defined here
After this change, the static library built for i586, regardless of PIC
options, contains implementations of these functions respectively from
sysdeps/i386/memcpy_chk.S and sysdeps/i386/mempcpy_chk.S. This ensures
that memcpy and mempcpy won't pull in __chk_fail and the routines it
calls.
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Gabi Falk <gabifalk@gmx.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
The FSF's Licensing and Compliance Lab noted a discrepancy in the
licensing of several files in the glibc package.
When timespect_get.c was impelemented the license did not include
the standard ", or (at your option) any later version." text.
Change the license in timespec_get.c and all copied files to match
the expected license.
This change was previously approved in principle by the FSF in
RT ticket #1316403. And a similar instance was fixed in
commit 46703efa02.
At this point, this is mainly a tool for testing the early ld.so
CPU compatibility diagnostics: GCC uses the new instructions in most
functions, so it's easy to spot if some of the early code is not
built correctly.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Define MINIMUM_X86_ISA_LEVEL at configure time to avoid
/usr/bin/ld: …/build/elf/librtld.os: in function `init_cpu_features':
…/git/elf/../sysdeps/x86/cpu-features.c:1202: undefined reference to `_dl_runtime_resolve_fxsave'
/usr/bin/ld: …/build/elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status
when glibc is built with -march=x86-64-v3 and configured with
--with-rtld-early-cflags=-march=x86-64, which is used to allow ld.so to
print an error message on unsupported CPUs:
Fatal glibc error: CPU does not support x86-64-v3
This fixes BZ #31676.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
The current IFUNC selection is always using the most recent
features which are available via AT_HWCAP. But in
some scenarios it is useful to adjust this selection.
The environment variable:
GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,zzz,....
can be used to enable HWCAP feature yyy, disable HWCAP feature xxx,
where the feature name is case-sensitive and has to match the ones
used in sysdeps/loongarch/cpu-tunables.c.
Signed-off-by: caiyinyu <caiyinyu@loongson.cn>
Fall back to ppoll if ppoll_time64 fails with ENOSYS.
Fixes commit 370da8a121 ("nptl: Fix
tst-cancel30 on sparc64").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
These fields store timestamps when the system was running. No Linux
systems existed before 1970, so these values are unused. Switching
to unsigned types allows continued use of the existing struct layouts
beyond the year 2038.
The intent is to give distributions more time to switch to improved
interfaces that also avoid locking/data corruption issues.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS. This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This seems to have stopped working with some GCC 14 versions,
which clobber r2. With other compilers, the kernel-provided
r2 value is still available at this point.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
This reverts commit a1735e0aa8.
The test failure is a real valgrind bug that needs to be fixed before
valgrind is usable with a glibc that has been built with
CC="gcc -march=x86-64-v3". The proposed valgrind patch teaches
valgrind to replace ld.so strcmp with an unoptimized scalar
implementation, thus avoiding any AVX2-related problems.
Valgrind bug: <https://bugs.kde.org/show_bug.cgi?id=485487>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The gnulib version contains an important change (9ce573cde), which
fixes some problems with multithreading, entropy loss, and ASLR leak
nfo. It also fixes an issue where getrandom is not being used
on some new files generation (only for __GT_NOCREATE on first try).
The 044bf893ac removed __path_search, which is now moved to another
gnulib shared files (stdio-common/tmpdir.{c,h}). Tthis patch
also fixes direxists to use __stat64_time64 instead of __xstat64,
and move the include of pathmax.h for !_LIBC (since it is not used
by glibc). The license is also changed from GPL 3.0 to 2.1, with
permission from the authors (Bruno Haible and Paul Eggert).
The sync also removed the clock fallback, since clock_gettime
with CLOCK_REALTIME is expected to always succeed.
It syncs with gnulib commit 323834962817af7b115187e8c9a833437f8d20ec.
Checked on x86_64-linux-gnu.
Co-authored-by: Bruno Haible <bruno@clisp.org>
Co-authored-by: Paul Eggert <eggert@cs.ucla.edu>
Reviewed-by: Bruno Haible <bruno@clisp.org>
This prints some information from struct cpu_features, and the midr_el1
and dczid_el0 system register contents on every CPU.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
This is surprisingly difficult to implement if the goal is to produce
reasonably sized output. With the current approaches to output
compression (suppressing zeros and repeated results between CPUs,
folding ranges of identical subleaves, dealing with the %ecx
reflection issue), the output is less than 600 KiB even for systems
with 256 logical CPUs.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
When -mapxf is used to build glibc, the resulting glibc will never run
on FMA4 machines. Exclude FMA4 IFUNC functions when -mapxf is used.
This requires GCC which defines __APX_F__ for -mapxf with commit:
1df56719bd8 x86: Define __APX_F__ for -mapxf
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
The a4ed0471d7 removed the generic version which is included by
features.h and used by Hurd.
Checked by building i686-gnu and x86_64-gnu with build-many-glibc.py.
The implementations of trunc functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled. Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.
The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The implementations of floor functions using x87 floating point (i386 and
86_64 long double only) traps when FE_INEXACT is enabled. Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.
The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The implementations of ceil functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled. Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.
The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
In Linux 6.9 a new flag is added to allow for Per-io operations to
disable append mode even if a file was opened with the flag O_APPEND.
This is done with the new RWF_NOAPPEND flag.
This caused two test failures as these tests expected the flag 0x00000020
to be unused. Adding the flag definition now fixes these tests on Linux
6.9 (v6.9-rc1).
FAIL: misc/tst-preadvwritev2
FAIL: misc/tst-preadvwritev64v2
This patch adds the flag, adjusts the test and adds details to
documentation.
Link: https://lore.kernel.org/all/20200831153207.GO3265@brightrain.aerifal.cx/
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The ifunc variants now uses the powerpc implementation which in turn
uses the compiler builtin. Without the proper -mcpu switch the builtin
does not generate the expected optimization.
Checked on powerpc-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
It was raised on libc-help [1] that some Linux kernel interfaces expect
the libc to define __USE_TIME_BITS64 to indicate the time_t size for the
kABI. Different than defined by the initial y2038 design document [2],
the __USE_TIME_BITS64 is only defined for ABIs that support more than
one time_t size (by defining the _TIME_BITS for each module).
The 64 bit time_t redirects are now enabled using a different internal
define (__USE_TIME64_REDIRECTS). There is no expected change in semantic
or code generation.
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and
arm-linux-gnueabi
[1] https://sourceware.org/pipermail/libc-help/2024-January/006557.html
[2] https://sourceware.org/glibc/wiki/Y2038ProofnessDesign
Reviewed-by: DJ Delorie <dj@redhat.com>
As indicated in a recent thread, this it is a simple brute-force
algorithm that checks the whole needle at a matching character pair
(and does so 1 byte at a time after the first 64 bytes of a needle).
Also it never skips ahead and thus can match at every haystack
position after trying to match all of the needle, which generic
implementation avoids.
As indicated by Wilco, a 4x larger needle and 16x larger haystack gives
a clear 65x slowdown both basic_strstr and __strstr_avx512:
"ifuncs": ["basic_strstr", "twoway_strstr", "__strstr_avx512",
"__strstr_sse2_unaligned", "__strstr_generic"],
{
"len_haystack": 65536,
"len_needle": 1024,
"align_haystack": 0,
"align_needle": 0,
"fail": 1,
"desc": "Difficult bruteforce needle",
"timings": [4.0948e+07, 15094.5, 3.20818e+07, 108558, 10839.2]
},
{
"len_haystack": 1048576,
"len_needle": 4096,
"align_haystack": 0,
"align_needle": 0,
"fail": 1,
"desc": "Difficult bruteforce needle",
"timings": [2.69767e+09, 100797, 2.08535e+09, 495706, 82666.9]
}
PS: I don't have an AVX512 capable machine to verify this issues, but
skimming through the code it does seems to follow what Wilco has
described.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
The value of l_scope is only valid post relocation, so this original
check was triggering undefined behavior. Instead just directly check to
see if the object has been relocated, at which point using l_scope is
safe.
Reported-by: Andreas Schwab <schwab@suse.de>
Closes: BZ #31317
Fixes: e0590f41fe ("RISC-V: Enable static-pie.")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Previously, HTL would always allocate non-executable stacks. This has
never been noticed, since GNU Mach on x86 ignores VM_PROT_EXECUTE and
makes all pages implicitly executable. Since GNU Mach on AArch64
supports non-executable pages, HTL forgetting to pass VM_PROT_EXECUTE
immediately breaks any code that (unfortunately, still) relies on
executable stacks.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-7-bugaevc@gmail.com>
While we could support it on any architecture, the tunable is currently
only defined on x86_64.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-5-bugaevc@gmail.com>
Move _hurd_self_sigstate (), _hurd_critical_section_lock (), and
_hurd_critical_section_unlock () inline implementations (that were
already guarded by #if defined _LIBC) to the internal version of the
header. While at it, add <tls.h> to the includes, and use
__LIBC_NO_TLS () unconditionally.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-2-bugaevc@gmail.com>
On OpenRISC variadic functions and regular functions have different
calling conventions so this wrapper is needed to translate. This
wrapper is copied from x86_64/x32. I don't know the build system enough
to find a cleaner way to share the code between x86_64/x32 and or1k
(maybe Implies?), so I went with the straight copy.
This fixes test failures:
misc/tst-prctl
nptl/tst-setgetname
This test failure:
math/test-fenv
If rounding mode and exception macros are defined then the fenv tests
run and always fail. This patch adds an ifdef using the
__or1k_hard_float__ macro provided by gcc to avoid defining these fenv
macros when they cnnot be used. This is similar to what is done in csky.
Note, I will post the or1k hard-float support soon. So, I prefer to
leave the hard-float bits here for now.
Old Linux kernels disable SVE after every system call. Calling the
SVE-optimized memcpy afterwards will then cause a trap to reenable SVE.
As a result, applications with a high use of syscalls may run slower with
the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0,
except for 5.14.0 which was patched. Avoid this by checking the kernel
version and selecting the SVE ifunc on modern kernels.
Parse the kernel version reported by uname() into a 24-bit kernel.major.minor
value without calling any library functions. If uname() is not supported or
if the version format is not recognized, assume the kernel is modern.
Tested-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
The following three changes have been added to provide initial Power11 support.
1. Add the directories to hold Power11 files.
2. Add support to select Power11 libraries based on AT_PLATFORM.
3. Let submachine=power11 be set automatically.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
This patch adds a new feature for powerpc. In order to get faster
access to the HWCAP3/HWCAP4 masks, similar to HWCAP/HWCAP2 (i.e. for
implementing __builtin_cpu_supports() in GCC) without the overhead of
reading them from the auxiliary vector, we now reserve space for them
in the TCB.
This is an ABI change for GLIBC 2.39.
Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The aarch64 uses 'trad' for traditional tls and 'desc' for tls
descriptors, but unlike other targets it defaults to 'desc'. The
gnutls2 configure check does not set aarch64 as an ABI that uses
TLS descriptors, which then disable somes stests.
Also rename the internal machinery fron gnu2 to tls descriptors.
Checked on aarch64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
ARM _dl_tlsdesc_dynamic slow path has two issues:
* The ip/r12 is defined by AAPCS as a scratch register, and gcc is
used to save the stack pointer before on some function calls. So it
should also be saved/restored as well. It fixes the tst-gnu2-tls2.
* None of the possible VFP registers are saved/restored. ARM has the
additional complexity to have different VFP bank sizes (depending of
VFP support by the chip).
The tst-gnu2-tls2 test is extended to check for VFP registers, although
only for hardfp builds. Different than setcontext, _dl_tlsdesc_dynamic
does not have HWCAP_ARM_IWMMXT (I don't have a way to properly test
it and it is almost a decade since newer hardware was released).
With this patch there is no need to mark tst-gnu2-tls2 as XFAIL.
Checked on arm-linux-gnueabihf.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
_dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack.
After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define
TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX
to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to
STATE_SAVE_OFFSET(%rsp).
+==================+<- stack frame start aligned at 8 or 16 bytes
| |<- RDI saved in the red zone
| |<- RSI saved in the red zone
| |<- RBX saved in the red zone
| |<- paddings for stack realignment of 64 bytes
|------------------|<- xsave buffer end aligned at 64 bytes
| |<-
| |<-
| |<-
|------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp)
| |<- 8-byte padding for 64-byte alignment
| |<- 8-byte padding for 64-byte alignment
| |<- R11
| |<- R10
| |<- R9
| |<- R8
| |<- RDX
| |<- RCX
+==================+<- RSP aligned at 64 bytes
Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size
for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI
and RBX are saved onto stack without adjusting stack pointer first, using
the red-zone. This fixes BZ #31501.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
Originally, nptl/descr.h included <sys/rseq.h>, but we removed that
in commit 2c6b4b272e ("nptl:
Unconditionally use a 32-byte rseq area"). After that, it was
not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c
compilation that provided a definition. This commit always checks
the rseq area for CPU number information before using the other
approaches.
This adds an unnecessary (but well-predictable) branch on
architectures which do not define RSEQ_SIG, but its cost is small
compared to the system call. Most architectures that have vDSO
acceleration for getcpu also have rseq support.
Fixes: 2c6b4b272e
Fixes: 1d350aa060
Reviewed-by: Arjun Shankar <arjun@redhat.com>
Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.
The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.8. (There are no new
constants covered by these tests in 6.8 that need any other header
changes.)
Tested with build-many-glibcs.py.
Linux 6.8 adds five new syscalls. Update syscall-names.list and
regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.
Tested with build-many-glibcs.py.
Similar to strstr (1e9a550ba4), power8 strcasestr does not show much
improvement compared to the generic implementation. The geomean
on bench-strcasestr shows:
__strcasestr_power8 __strcasestr_ppc
power10 1159 1120
power9 1640 1469
power8 1787 1904
The strcasestr uses the same 'trick' as power7 strstr to detect
potential quadradic behavior, which only adds overheads for input
that trigger quadradic behavior and it is really a hack.
Checked on powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The memcpy optimization (commit 587a1290a1) has a series
of mistakes:
- The implementation is wrong: the chunk size calculation is wrong
leading to invalid memory access.
- It adds ifunc supports as default, so --disable-multi-arch does
not work as expected for riscv.
- It mixes Linux files (memcpy ifunc selection which requires the
vDSO/syscall mechanism) with generic support (the memcpy
optimization itself).
- There is no __libc_ifunc_impl_list, which makes testing only
check the selected implementation instead of all supported
by the system.
This patch also simplifies the required bits to enable ifunc: there
is no need to memcopy.h; nor to add Linux-specific files.
The __memcpy_noalignment tail handling now uses a branchless strategy
similar to aarch64 (overlap 32-bits copies for sizes 4..7 and byte
copies for size 1..3).
Checked on riscv64 and riscv32 by explicitly enabling the function
on __libc_ifunc_impl_list on qemu-system.
Changes from v1:
* Implement the memcpy in assembly to correctly handle RISCV
strict-alignment.
Reviewed-by: Evan Green <evan@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Each mask in the sigset array is an unsigned long, so fix __sigisemptyset
to use that instead of int. The __sigword function returns a simple array
index, so it can return int instead of unsigned long.
Replace minimum ISA check ifdef conditional with if. Since
MINIMUM_X86_ISA_LEVEL and AVX_X86_ISA_LEVEL are compile time constants,
compiler will perform constant folding optimization, getting same
results.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
For CPU implementations that can perform unaligned accesses with little
or no performance penalty, create a memcpy implementation that does not
bother aligning buffers. It will use a block of integer registers, a
single integer register, and fall back to bytewise copy for the
remainder.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add a little helper method so it's easier to fetch a single value from
the hwprobe function when used within an ifunc selector.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
RISC-V is apparently the first architecture to pass more than one
argument to ifunc resolvers. The helper macros in libc-symbols.h,
__ifunc_resolver(), __ifunc(), and __ifunc_hidden(), are incompatible
with this. These macros have an "arg" (non-final) parameter that
represents the parameter signature of the ifunc resolver. The result is
an inability to pass the required comma through in a single preprocessor
argument.
Rearrange the __ifunc_resolver() macro to be variadic, and pass the
types as those variable parameters. Move the guts of __ifunc() and
__ifunc_hidden() into new macros, __ifunc_args(), and
__ifunc_args_hidden(), that pass the variable arguments down through to
__ifunc_resolver(). Then redefine __ifunc() and __ifunc_hidden(), which
are used in a bunch of places, to simply shuffle the arguments down into
__ifunc_args[_hidden]. Finally, define a riscv-ifunc.h header, which
provides convenience macros to those looking to write ifunc selectors
that use both arguments.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The new __riscv_hwprobe() function is designed to be used by ifunc
selector functions. This presents a challenge for applications and
libraries, as ifunc selectors are invoked before all relocations have
been performed, so an external call to __riscv_hwprobe() from an ifunc
selector won't work. To address this, pass a pointer to the
__riscv_hwprobe() function into ifunc selectors as the second
argument (alongside dl_hwcap, which was already being passed).
Include a typedef as well for convenience, so that ifunc users don't
have to go through contortions to call this routine. Users will need to
remember to check the second argument for NULL, to account for older
glibcs that don't pass the function.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The new riscv_hwprobe syscall also comes with a vDSO for faster answers
to your most common questions. Call in today to speak with a kernel
representative near you!
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
a regular syscall, but without setting errno. Instead, the return value
is plumbed straight out of the macro.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add awareness and a thin wrapper function around a new Linux system call
that allows callers to get architecture and microarchitecture
information about the CPUs from the kernel. This can be used to
do things like dynamically choose a memcpy implementation.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
_dl_tlsdesc_dynamic should also preserve AMX registers which are
caller-saved. Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID
to x86-64 TLSDESC_CALL_STATE_SAVE_MASK. Compute the AMX state size
and save it in xsave_state_full_size which is only used by
_dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec. This fixes
the AMX part of BZ #31372. Tested on AMX processor.
AMX test is enabled only for compilers with the fix for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098
GCC 14 and GCC 11/12/13 branches have the bug fix.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
When strcmp-avx2.S is used as the default, elf/tst-valgrind-smoke fails
with
==1272761== Conditional jump or move depends on uninitialised value(s)
==1272761== at 0x4022C98: strcmp (strcmp-avx2.S:462)
==1272761== by 0x400B05B: _dl_name_match_p (dl-misc.c:75)
==1272761== by 0x40085F3: _dl_map_object (dl-load.c:1966)
==1272761== by 0x401AEA4: map_doit (rtld.c:644)
==1272761== by 0x4001488: _dl_catch_exception (dl-catch.c:237)
==1272761== by 0x40015AE: _dl_catch_error (dl-catch.c:256)
==1272761== by 0x401B38F: do_preload (rtld.c:816)
==1272761== by 0x401C116: handle_preload_list (rtld.c:892)
==1272761== by 0x401EDF5: dl_main (rtld.c:1842)
==1272761== by 0x401A79E: _dl_sysdep_start (dl-sysdep.c:140)
==1272761== by 0x401BEEE: _dl_start_final (rtld.c:494)
==1272761== by 0x401BEEE: _dl_start (rtld.c:581)
==1272761== by 0x401AD87: ??? (in */elf/ld.so)
The assembly codes are:
0x0000000004022c80 <+144>: vmovdqu 0x20(%rdi),%ymm0
0x0000000004022c85 <+149>: vpcmpeqb 0x20(%rsi),%ymm0,%ymm1
0x0000000004022c8a <+154>: vpcmpeqb %ymm0,%ymm15,%ymm2
0x0000000004022c8e <+158>: vpandn %ymm1,%ymm2,%ymm1
0x0000000004022c92 <+162>: vpmovmskb %ymm1,%ecx
0x0000000004022c96 <+166>: inc %ecx
=> 0x0000000004022c98 <+168>: jne 0x4022c32 <strcmp+66>
strcmp-avx2.S has 32-byte vector loads of strings which are shorter than
32 bytes:
(gdb) p (char *) ($rdi + 0x20)
$6 = 0x1ffeffea20 "memcheck-amd64-linux.so"
(gdb) p (char *) ($rsi + 0x20)
$7 = 0x4832640 "core-amd64-linux.so"
(gdb) call (int) strlen ((char *) ($rsi + 0x20))
$8 = 19
(gdb) call (int) strlen ((char *) ($rdi + 0x20))
$9 = 23
(gdb)
It triggers the valgrind error. The above code is safe since the loads
don't cross the page boundary. Update tst-valgrind-smoke.sh to accept
an optional suppression file and pass a suppression file to valgrind when
strcmp-avx2.S is the default implementation of strcmp.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
When glibc is built with ISA level 3 or above enabled, SSE resolvers
aren't available and glibc fails to build:
ld: .../elf/librtld.os: in function `init_cpu_features':
.../elf/../sysdeps/x86/cpu-features.c:1200:(.text+0x1445f): undefined reference to `_dl_runtime_resolve_fxsave'
ld: .../elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object
/usr/local/bin/ld: final link failed: bad value
For ISA level 3 or above, don't use _dl_runtime_resolve_fxsave nor
_dl_tlsdesc_dynamic_fxsave.
This fixes BZ #31429.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Compiler generates the following instruction sequence for GNU2 dynamic
TLS access:
leaq tls_var@TLSDESC(%rip), %rax
call *tls_var@TLSCALL(%rax)
or
leal tls_var@TLSDESC(%ebx), %eax
call *tls_var@TLSCALL(%eax)
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS and RAX/EAX, are unchanged after CALL. When
_dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
path. __tls_get_addr is a normal function which doesn't preserve any
caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
caller-saved registers, but didn't preserve any other caller-saved
registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
XSAVE and XSAVEC to save and restore all caller-saved registers. This
fixes BZ #31372.
Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
to optimize elf_machine_runtime_setup.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Instead of tying based on the linker name and version, check for the
required support:
* whether it does not generate dynamic TLS relocations in PIE
(binutils PR ld/22263);
* if it accepts --no-dynamic-linker (by using -static-pie);
* and if it adds a DT_JMPREL pointing to .rela.iplt with static pie.
The patch also trims the comments, for binutils one of the tests should
already cover it. The kernel ones are not clear which version should
have the backport, nor it is something that glibc can do much about
it. Finally, the glibc is somewhat confusing, since it refers
to commits not related to s390x.
Checked with a build for s390x-linux-gnu.
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
This includes a fix for big-endian in AdvSIMD log, some cosmetic
changes, and numerous small optimisations mainly around inlining and
using indexed variants of MLA intrinsics.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Starting with commit e57d8fc97b
"S390: Always use svc 0"
clone clobbers the call-saved register r7 in error case:
function or stack is NULL.
This patch restores the saved registers also in the error case.
Furthermore the existing test misc/tst-clone is extended to check
all error cases and that clone does not clobber registers in this
error case.
When glibc is built with ISA level 3 or higher by default, the resulting
glibc binaries won't run on SSE or FMA4 processors. Exclude SSE, AVX and
FMA4 variants in libm multiarch when ISA level 3 or higher is enabled by
default.
When glibc is built with ISA level 2 enabled by default, only keep SSE4.1
variant.
Fixes BZ 31335.
NB: elf/tst-valgrind-smoke test fails with ISA level 4, because valgrind
doesn't support AVX512 instructions:
https://bugs.kde.org/show_bug.cgi?id=383010
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add APX registers to STATE_SAVE_MASK so that APX registers are saved in
ld.so trampoline. This fixes BZ #31371.
Also update STATE_SAVE_OFFSET and STATE_SAVE_MASK for i386 which will
be used by i386 _dl_tlsdesc_dynamic.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
The optimization is not faster than the generic algorithm,
using the bench-strstr the geometric mean running on a POWER10 machine
using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97
(which uses the generic implementation).
Also, there is no need to redirect the internal str*/mem* call
to optimized version, internal ifunc is supported and enabled
for internal calls (meaning that the generic implementation
will use any asm optimization if available).
Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
The FSR version field is read-only and might be non-zero.
This allows math/test-fpucw* to correctly pass when the version is
non-zero.
Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Commit ff026950e2 ("Add a C wrapper for
prctl [BZ #25896]") replaced the assembler wrapper with a C function.
However, on powerpc64le-linux-gnu, the C variadic function
implementation requires extra work in the caller to set up the
parameter save area. Calling a function that needs a parameter save
area without one (because the prototype used indicates the function is
not variadic) corrupts the caller's stack. The Linux manual pages
project documents prctl as a non-variadic function. This has resulted
in various projects over the years using non-variadic prototypes,
including the sanitizer libraries in LLVm and GCC (GCC PR 113728).
This commit switches back to the assembler implementation on most
targets and only keeps the C implementation for x86-64 x32.
Also add the __prctl_time64 alias from commit
b39ffab860 ("Linux: Add time64 alias for
prctl") to sysdeps/unix/sysv/linux/syscalls.list; it was not yet
present in commit ff026950e2.
This restores the old ABI on powerpc64le-linux-gnu, thus fixing
bug 29770.
Reviewed-By: Simon Chopin <simon.chopin@canonical.com>
Before this change, we incorrectly used the SSE2 variant in the
implementation, without checking that the system actually supports
SSE2.
Tested-by: Sam James <sam@gentoo.org>