Commit Graph

16417 Commits

Author SHA1 Message Date
John David Anglin
4737e6a7a3 hppa/vdso: Provide 64-bit clock_gettime() vDSO only
Adhemerval noticed that the gettimeofday() and 32-bit clock_gettime()
vDSO calls won't be used by glibc on hppa, so there is no need to
declare them.  Both syscalls will be emulated by utilizing return values
of the 64-bit clock_gettime() vDSO instead.

Signed-off-by: Helge Deller <deller@gmx.de>
Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
2024-07-02 16:26:32 -04:00
YunQiang Su
9d0e9c8a13 MIPSr6/math: Use builtin fma and fmaf
MIPSr6 has MADDF.s/MADDF.d instructions, which are fused.

In MIPS ISA, double support can be subsetted.  Only FMAF is enabled
for this case.

	* sysdeps/mips/fpu/math-use-builtins-fma.h

Signed-off-by: YunQiang Su <syq@gcc.gnu.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-07-01 14:52:30 -03:00
Florian Weimer
018f0fc3b8 elf: Support recursive use of dynamic TLS in interposed malloc
It turns out that quite a few applications use bundled mallocs that
have been built to use global-dynamic TLS (instead of the recommended
initial-exec TLS).  The previous workaround from
commit afe42e935b ("elf: Avoid some
free (NULL) calls in _dl_update_slotinfo") does not fix all
encountered cases unfortunatelly.

This change avoids the TLS generation update for recursive use
of TLS from a malloc that was called during a TLS update.  This
is possible because an interposed malloc has a fixed module ID and
TLS slot.  (It cannot be unloaded.)  If an initially-loaded module ID
is encountered in __tls_get_addr and the dynamic linker is already
in the middle of a TLS update, use the outdated DTV, thus avoiding
another call into malloc.  It's still necessary to update the
DTV to the most recent generation, to get out of the slow path,
which is why the check for recursion is needed.

The bookkeeping is done using a global counter instead of per-thread
flag because TLS access in the dynamic linker is tricky.

All this will go away once the dynamic linker stops using malloc
for TLS, likely as part of a change that pre-allocates all TLS
during pthread_create/dlopen.

Fixes commit d2123d6827 ("elf: Fix slow
tls access after dlopen [BZ #19924]").

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-07-01 19:02:11 +02:00
MayShao-oc
9dc645cb56 x86: Set default non_temporal_threshold for Zhaoxin processors
Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound'
on Zhaoxin processors without ERMS. The default
'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000
Zhaoxin processors, this patch updates the value to
'shared / cachesize_non_temporal_divisor'.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-30 06:26:43 -07:00
MayShao-oc
c19457aec6 x86_64: Optimize large size copy in memmove-ssse3
This patch optimizes large size copy using normal store when src > dst
and overlap.  Make it the same as the logic in memmove-vec-unaligned-erms.S.

Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non-
temporal threshold, this patch updates that value to
'__x86_shared_non_temporal_threshold'.  Currently, the
__x86_shared_non_temporal_threshold is cpu-specific, and different CPUs
will have different values based on the related nt-benchmark results.
However, in memmove-ssse3, the nontemporal threshold uses
'__x86_shared_cache_size_half', which sounds unreasonable.

The performance is not changed drastically although shows overall
improvements without any major regressions or gains.

Results on Zhaoxin KX-7000:
bench-memcpy geometric_mean(N=20) New / Original: 0.999

bench-memcpy-random geometric_mean(N=20) New / Original: 0.999

bench-memcpy-large geometric_mean(N=20) New / Original: 0.978

bench-memmove geometric_mean(N=20) New / Original: 1.000

bench-memmmove-large geometric_mean(N=20) New / Original: 0.962

Results on Intel Core i5-6600K:
bench-memcpy geometric_mean(N=20) New / Original: 1.001

bench-memcpy-random geometric_mean(N=20) New / Original: 0.999

bench-memcpy-large geometric_mean(N=20) New / Original: 1.001

bench-memmove geometric_mean(N=20) New / Original: 0.995

bench-memmmove-large geometric_mean(N=20) New / Original: 0.936
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-30 06:26:43 -07:00
MayShao-oc
44d757eb9f x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors
Fix code formatting under the Zhaoxin branch and add comments for
different Zhaoxin models.

Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
the AVX_Fast_Unaligned_Load.

Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
use sse2_unaligned version of memset,strcpy and strcat.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-30 06:26:43 -07:00
Andrew Pinski
2f1f7a5f8a
Aarch64: Add new memset for Qualcomm's oryon-1 core
Qualcom's new core, oryon-1, has a different characteristics for
memset than the current versions of memset. For non-zero, larger
sizes, using GPRs rather than the SIMD stores is ~30% faster.
For even larger sizes, using the nontemporal stores is needed
not to polute the L1/L2 caches.

For zero values, using `dc zva` should be used. Since we
know the size will always be 64 bytes, we don't need to figure
out the size there.

I started with the emag memset and added back the `dc zva` code.

Changes since v1:
* v3: Fix comment formating

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-30 13:47:17 +02:00
Andrew Pinski
4dc83cac78
Aarch64: Add memcpy for qualcomm's oryon-1 core
Qualcomm's new core (oryon-1) has a different performance characteristic
than other cores. For memcpy, it is faster to use the GPRs to
do the copy for large sizes (2x faster). For even larger sizes,
it is better to use the nontemporal load/store instructions so
we don't pollute the L1/L2 caches.

For smaller sizes, the characteristic are very similar to
other cores.
I used the thunderx memcpy as a starting point and expanded from there.

Changes since v1:
* v2: Fix ordering in Makefile.
* v3: Fix comment grammar about the ldnp/stnp instructions.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-30 13:46:33 +02:00
Palmer Dabbelt
07fe71f59b
arm: Avoid UB in elf_machine_rel()
This recently came up during a cleanup to remove misaligned accesses
from the RISC-V port.

Link: https://sourceware.org/pipermail/libc-alpha/2022-June/139961.html
Suggested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Fangrui Song <maskray@google.com>
2024-06-26 12:45:43 +02:00
mengqinggang
a10b6ad471 LoongArch: Fix tst-gnu2-tls2 test case
asm volatile ("movfcsr2gr $t0, $fcsr0" ::: "$t0");
asm volatile ("st.d $t0, %0" :"=m"(restore_fcsr));

generate to the following instructions with -Og flag:

movfcsr2gr      $t0, $zero
addi.d          $t0, $sp, 2047(0x7ff)
addi.d          $t0, $t0, 77(0x4d)
st.w            $t0, $t0, 0

fcsr0 register and restore_fcsr variable are both stored in t0 register.

Change to:

asm volatile ("movfcsr2gr %0, $fcsr0" :"=r"(restore_fcsr));

to avoid restore_fcsr address in t0.

Comparing float value using memcmp because float value cannot be
directly compared for equality.

Put LOAD_REGISTER_FCSR and SAVE_REGISTER_FCC after LOAD_REGISTER_FLOAT.
Some float instructions may change fcsr register.
2024-06-26 12:02:07 +08:00
Adhemerval Zanella
c90cfce849 posix: Fix pidfd_spawn/pidfd_spawnp leak if execve fails (BZ 31695)
If the pidfd_spawn/pidfd_spawnp helper process succeeds, but evecve
fails for some reason (either with an invalid/non-existent, memory
allocation, etc.) the resulting pidfd is never closed, nor returned
to caller (so it can call close).

Since the process creation failed, it should be up to posix_spawn to
also, close the file descriptor in this case (similar to what it
does to reap the process).

This patch also changes the waitpid with waitid (P_PIDFD) for pidfd
case, to avoid a possible pid re-use.

Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-06-25 12:11:48 -03:00
Andreas K. Hüttel
d32c342425
Revert "MIPSr6/math: Use builtin fma and fmaf"
Apologies, I mistakenly interpreted this to be already accepted.
Reverting until v6 or later is reviewed and approved.

This reverts commit 9e06e4a43b.
2024-06-25 01:02:58 +02:00
Christoph Müllner
81c7f6193c
RISC-V: Execute a PAUSE hint in spin loops
The atomic_spin_nop() macro can be used to run arch-specific
code in the body of a spin loop to potentially improve efficiency.
RISC-V's Zihintpause extension includes a PAUSE instruction for
this use-case, which is encoded as a HINT, which means that it
behaves like a NOP on systems that don't implement Zihintpause.

Binutils supports Zihintpause since 2.36, so this patch uses
the ".insn" directive to keep the code compatible with older
toolchains.

Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-06-24 21:36:49 +02:00
YunQiang Su
9e06e4a43b
MIPSr6/math: Use builtin fma and fmaf
MIPSr6 has MADDF.s/MADDF.d instructions, which are fused.

In MIPS ISA, double support can be subsetted.  Only FMAF is enabled
for this case.

	* sysdeps/mips/fpu/math-use-builtins-fma.h

Signed-off-by: YunQiang Su <syq@gcc.gnu.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
2024-06-24 19:43:57 +02:00
John David Anglin
aecde502e9 hppa/vdso: Add wrappers for vDSO functions
The upcoming parisc (hppa) v6.11 Linux kernel will include vDSO
support for gettimeofday(), clock_gettime() and clock_gettime64()
syscalls for 32- and 64-bit userspace.
The patch below adds the necessary glue code for glibc.

Signed-off-by: Helge Deller <deller@gmx.de>

Changes in v2:
- add vsyscalls for 64-bit too
2024-06-23 19:39:28 -04:00
John David Anglin
9dddb26954 Update hppa libm-test-ulps 2024-06-23 13:51:25 -04:00
John David Anglin
da61ba3f89 Update hppa libm-test-ulps 2024-06-20 19:44:04 -04:00
Julian Zhu
9f2bf0e23a
RISC-V: Update ulps
For the exp10m1, exp2m1, log10p1 and log2p1 implementations.

Signed-off-by: Julian Zhu <jz531210@gmail.com>
2024-06-20 23:46:32 +02:00
Julian Zhu
cb20e7c7cc
MIPS: Update ulps
Update mips32/mips64 ulps for the exp10m1, exp2m1, and log10p1 implementations.

Signed-off-by: Julian Zhu <jz531210@gmail.com>
2024-06-20 23:45:24 +02:00
Florian Weimer
b375e597da i386: Update ulps
This is from a -march=i686 -mtune=generic build with
--disable-multi-arch, running on a Cascade Lake CPU.
2024-06-20 19:00:48 +02:00
Florian Weimer
362588f7cc s390x: Capture grep output in static PIE check
The test is not a run-time check, so update the description.
Also use readelf -W for a more stable output format and fix
an LC_ALL typo.

This avoids garbled configure messages:

checking for s390-specific static PIE requirements (runtime check)...  0x0000000000000017 (JMPREL)             0x280
yes
2024-06-20 14:34:06 +02:00
Florian Weimer
71dafdf5f1 powerpc: Update ulps
Results based on POWER8 and POWER9 machines running
powerpc64-linux-gnu, with and without --disable-multi-arch.
2024-06-20 12:15:31 +02:00
Florian Weimer
3cb77b7d1e i386: Update ulps
Based on a -march=x86-64-v4 -mfpmath=sse build, with and without
--disable-multi-arch, running on a Zen 4 CPU.  Also used different
-march=x8i6-64-v… settings.
2024-06-20 12:15:09 +02:00
Xi Ruoyao
9405d54c62
LoongArch: Update ulps
Add ulps for recently added C23 exp10m1, exp2m1, and log10p1 functions.

Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2024-06-19 21:17:19 +02:00
Andreas K. Hüttel
4f1cf0c0e1
sparc: Regenerate ULPs
Linux catbus 5.15.110-gentoo-r1 #1 SMP Fri Jun 9 17:53:23 PDT 2023 sparc64 sun4v UltraSparc T5 (Niagara5) GNU/Linux

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-06-19 14:58:32 +02:00
Stefan Liebler
19f6d6a480 s390x: Regenerate ULPs.
Needed due to:
- "Implement C23 log10p1"
  commit ID 55eb99e9a9
- "Implement C23 exp2m1, exp10m1"
  commit ID 7ec903e028
2024-06-19 08:42:30 +02:00
mengqinggang
9a675d998e LoongArch: Fix _dl_tlsdesc_dynamic in LSX case
HWCAP value is overwritten at the first comparison of the LASX case.
The second comparison at LSX get incorrect result.
Change to use t0 to save HWCAP value, and use t1 to save comparison
result.
2024-06-19 10:06:41 +08:00
Adhemerval Zanella
92341e3150 arm: Update ulps
For the exp10m1, exp2m1, and log10p1 implementations.
2024-06-18 17:31:10 -03:00
Adhemerval Zanella
45f5f51b85 aarch64: Update ulps
For the exp10m1, exp2m1, and log10p1 implementations.
2024-06-18 17:31:10 -03:00
Adhemerval Zanella
52b397bafa powerpc: Update ulps
For the exp10m1, exp2m1, and log10p1 implementations.
2024-06-18 17:31:10 -03:00
Florian Weimer
f6ea5d1291 Linux: Include <dl-symbol-redir-ifunc.h> in dl-sysdep.c
The _dl_sysdep_parse_arguments function contains initalization
of a large on-stack variable:

  dl_parse_auxv_t auxv_values = { 0, };

This uses a non-inline version of memset on powerpc64le-linux-gnu,
so it must use the baseline memset.
2024-06-18 10:56:34 +02:00
Carlos Llamas
176671f604 linux: add definitions for hugetlb page size encodings
A desired hugetlb page size can be encoded in the flags parameter of
system calls such as mmap() and shmget(). The Linux UAPI headers have
included explicit definitions for these encodings since v4.14.

This patch adds these definitions that are used along with MAP_HUGETLB
and SHM_HUGETLB flags as specified in the corresponding man pages. This
relieves programs from having to duplicate and/or compute the encodings
manually.

Additionally, the filter on these definitions in tst-mman-consts.py is
removed, as suggested by Florian. I then ran this tests successfully,
confirming the alignment with the kernel headers.

  PASS: misc/tst-mman-consts
  original exit status 0

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Tested-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-06-18 10:56:34 +02:00
Stefan Liebler
e260ceb4aa elf: Remove HWCAP_IMPORTANT
Remove the definitions of HWCAP_IMPORTANT after removal of
LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask.  There HWCAP_IMPORTANT
was used as default value.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
ad0aa1f549 elf: Remove LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask
Remove the environment variable LD_HWCAP_MASK and the tunable
glibc.cpu.hwcap_mask as those are not used anymore in common-code
after removal in elf/dl-cache.c:search_cache().

The only remaining user is sparc32 where it is used in
elf_machine_matches_host().  If sparc32 does not need it anymore,
we can get rid of it at all.  Otherwise we could also move
LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask to be sparc32 specific.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
343439a31e elf: Remove _DL_PLATFORMS_COUNT
Remove the definitions of _DL_PLATFORMS_COUNT as those are not used
anymore after removal in elf/dl-cache.c:search_cache().

Note: On x86, we can also get rid of the definitions
HWCAP_PLATFORMS_START and HWCAP_PLATFORMS_COUNT.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
43c7c5e62d elf: Remove _DL_FIRST_PLATFORM
Remove the definitions of _DL_FIRST_PLATFORM as those were only used
in the _DL_HWCAP_PLATFORM definitions and in _dl_string_platform().
Both were removed.

Note: Removed on every architecture despite of powerpc, where
_dl_string_platform() is still used.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
ed23449dac elf: Remove _DL_HWCAP_PLATFORM
Remove the definitions of _DL_HWCAP_PLATFORM as those are not used
anymore after removal in elf/dl-cache.c:search_cache().
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
374c8b4483 elf: Remove platform strings in dl-procinfo.c
Remove the platform strings in dl-procinfo.c where also
the implementation of _dl_string_platform() was removed.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
8faada8302 elf: Remove _dl_string_platform
Despite of powerpc where the returned integer is stored in tcb,
and the diagnostics output, there is no user anymore.

Thus this patch removes the diagnostics output and
_dl_string_platform for all other platforms.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
Stefan Liebler
f14b6dfc87 x86: Remove HWCAP_START and HWCAP_COUNT
Both defines are not used anymore.  Those were only used for
_dl_string_hwcap(), which itself was removed with commit
ab40f20364
"elf: Remove _dl_string_hwcap"

Just clean up.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-18 10:45:36 +02:00
YunQiang Su
eaf4fc516a
math: Update mips32/mips64 ulps for log2p1 2024-06-17 21:45:53 +02:00
Andreas K. Hüttel
98ffc1bfeb
Convert to autoconf 2.72 (vanilla release, no distribution patches)
As discussed at the patch review meeting

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Reviewed-by: Simon Chopin <simon.chopin@canonical.com>
2024-06-17 21:15:28 +02:00
Joseph Myers
7ec903e028 Implement C23 exp2m1, exp10m1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the exp2m1 and exp10m1 functions (exp2(x)-1 and
exp10(x)-1, like expm1).

As with other such functions, these use type-generic templates that
could be replaced with faster and more accurate type-specific
implementations in future.  Test inputs are copied from those for
expm1, plus some additions close to the overflow threshold (copied
from exp2 and exp10) and also some near the underflow threshold.

exp2m1 has the unusual property of having an input (M_MAX_EXP) where
whether the function overflows (under IEEE semantics) depends on the
rounding mode.  Although these could reasonably be XFAILed in the
testsuite (as we do in some cases for arguments very close to a
function's overflow threshold when an error of a few ulps in the
implementation can result in the implementation not agreeing with an
ideal one on whether overflow takes place - the testsuite isn't smart
enough to handle this automatically), since these functions aren't
required to be correctly rounding, I made the implementation check for
and handle this case specially.

The Makefile ordering expected by lint-makefiles for the new functions
is a bit peculiar, but I implemented it in this patch so that the test
passes; I don't know why log2 also needed moving in one Makefile
variable setting when it didn't in my previous patches, but the
failure showed a different place was expected for that function as
well.

The powerpc64le IFUNC setup seems not to be as self-contained as one
might hope; it shouldn't be necessary to add IFUNCs for new functions
such as these simply to get them building, but without setting up
IFUNCs for the new functions, there were undefined references to
__GI___expm1f128 (that IFUNC machinery results in no such function
being defined, but doesn't stop include/math.h from doing the
redirection resulting in the exp2m1f128 and exp10m1f128
implementations expecting to call it).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 16:31:49 +00:00
Joseph Myers
55eb99e9a9 Implement C23 log10p1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the log10p1 functions (log10(1+x): like log1p, but for
base-10 logarithms).

This is directly analogous to the log2p1 implementation (except that
whereas log2p1 has a smaller underflow range than log1p, log10p1 has a
larger underflow range).  The test inputs are copied from those for
log1p and log2p1, plus a few more inputs in that wider underflow
range.

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:48:13 +00:00
Joseph Myers
bb014f50c4 Implement C23 logp1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the logp1 functions (aliases for log1p functions - the
name is intended to be more consistent with the new log2p1 and
log10p1, where clearly it would have been very confusing to name those
functions log21p and log101p).  As aliases rather than new functions,
the content of this patch is somewhat different from those actually
adding new functions.

Tests are shared with log1p, so this patch *does* mechanically update
all affected libm-test-ulps files to expect the same errors for both
functions.

The vector versions of log1p on aarch64 and x86_64 are *not* updated
to have logp1 aliases (and thus there are no corresponding header,
tests, abilist or ulps changes for vector functions either).  It would
be reasonable for such vector aliases and corresponding changes to
other files to be made separately.  For now, the log1p tests instead
avoid testing logp1 in the vector case (a Makefile change is needed to
avoid problems with grep, used in generating the .c files for vector
function tests, matching more than one ALL_RM_TEST line in a file
testing multiple functions with the same inputs, when it assumes that
the .inc file only has a single such line).

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-06-17 13:47:09 +00:00
Noah Goldstein
5b54a33435 x86: Fix value for x86_memset_non_temporal_threshold when it is undesirable
When we don't want to use non-temporal stores for memset, we set
`x86_memset_non_temporal_threshold` to SIZE_MAX.

The current code, however, we using `maximum_non_temporal_threshold`
as the upper bound which is `SIZE_MAX >> 4` so we ended up with a
value of `0`.

Fix is to just use `SIZE_MAX` as the upper bound for when setting the
tunable.
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-06-14 17:25:05 -05:00
Andreas K. Hüttel
3953b5b88f
i686: Regenerate ulps
Linux pinacolada 6.6.32-gentoo #1 SMP PREEMPT Sun Jun  9 14:18:17 CEST 2024 x86_64 Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz GenuineIntel GNU/Linux
32bit build for multilib environment

Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-06-14 21:24:24 +02:00
Xi Ruoyao
97aa7b7346 LoongArch: Ensure sp 16-byte aligned for tlsdesc
"ADDI sp, sp, 24" and "ADDI sp, sp, SZFCSREG" (SZFCSREG = 4) are
misaligning the stack: the ABI mandates a 16-byte alignment.  Fix it
by changing the first one to "ADDI sp, sp, 32", and reuse the spare 4th
slot for saving fcsr.

Reported-by: Jinyang He <hejinyang@loongson.cn>
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2024-06-14 10:14:54 +08:00
H.J. Lu
29807a271e x86: Properly set x86 minimum ISA level [BZ #31883]
Properly set libc_cv_have_x86_isa_level in shell for MINIMUM_X86_ISA_LEVEL
defined as

(__X86_ISA_V1 + __X86_ISA_V2 + __X86_ISA_V3 + __X86_ISA_V4)

Also set __X86_ISA_V2 to 1 for i386 if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8
is defined.  There are no changes in config.h nor in config.make on x86-64.
On i386, -march=x86-64-v2 with GCC generates

 #define MINIMUM_X86_ISA_LEVEL 2

in config.h and

have-x86-isa-level = 2

in config.make.  This fixes BZ #31883.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-12 14:27:54 -07:00
Adhemerval Zanella
7edd3814b0 linux: Remove __stack_prot
The __stack_prot is used by Linux to make the stack executable if
a modules requires it.  It is also marked as RELRO, which requires
to change the segment permission to RW to update it.

Also, there is no need to keep track of the flags: either the stack
will have the default permission of the ABI or should be change to
PROT_READ | PROT_WRITE | PROT_EXEC.  The only additional flag,
PROT_GROWSDOWN or PROT_GROWSUP, is Linux only and can be deducted
from _STACK_GROWS_DOWN/_STACK_GROWS_UP.

Also, the check_consistency function was already removed some time
ago.

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-06-12 15:25:54 -03:00
H.J. Lu
09bc68b0ac x86: Properly set MINIMUM_X86_ISA_LEVEL for i386 [BZ #31867]
On i386, set the default minimum ISA level to 0, not 1 (baseline which
includes SSE2).  There are no changes in config.h nor in config.make on
x86-64.  This fixes BZ #31867.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Tested-by: Ian Jordan <immoloism@gmail.com>
Reviewed-by: Sam James <sam@gentoo.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-06-11 00:10:08 -07:00
Joe Damato
bef2a827a5 x86: Enable non-temporal memset tunable for AMD
In commit 46b5e98ef6 ("x86: Add seperate non-temporal tunable for
memset") a tunable threshold for enabling non-temporal memset was added,
but only for Intel hardware.

Since that commit, new benchmark results suggest that non-temporal
memset is beneficial on AMD, as well, so allow this tunable to be set
for AMD.

See:
https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
which has been updated to include data using different stategies for
large memset on AMD Zen2, Zen3, and Zen4.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-06-10 16:18:18 -05:00
Samuel Thibault
74f9ee3b91 hurd: Fix lsetxattr return value
The manpage says that lsetxattr returns 0 on success, like setxattr.
2024-06-10 21:56:13 +02:00
Joe Damato
92c270d32c Linux: Add epoll ioctls
As of Linux kernel 6.9, some ioctls and a parameters structure have been
introduced which allow user programs to control whether a particular
epoll context will busy poll.

Update the headers to include these for the convenience of user apps.

The ioctls were added in Linux kernel 6.9 commit 18e2bf0edf4dd
("eventpoll: Add epoll ioctl for epoll_params") [1] to
include/uapi/linux/eventpoll.h.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/?h=v6.9&id=18e2bf0edf4dd

Signed-off-by: Joe Damato <jdamato@fastly.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-04 12:09:15 -05:00
Szabolcs Nagy
2a9943b4a0 math: Fix exp10 undefined left shift
Left shift of ki is undefined when ki<0, copy the logic from exp,
which uses unsigned arithmetics, to fix it.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-04 15:33:26 +01:00
Joseph Myers
1d441791cb Add new AArch64 HWCAP2 definitions from Linux 6.9 to bits/hwcap.h
Linux 6.9 adds 15 new HWCAP2_* values for AArch64; add them to
bits/hwcap.h in glibc.

Tested with build-many-glibcs.py for aarch64-linux-gnu.
2024-06-04 12:25:05 +00:00
Noah Goldstein
46b5e98ef6 x86: Add seperate non-temporal tunable for memset
The tuning for non-temporal stores for memset vs memcpy is not always
the same. This includes both the exact value and whether non-temporal
stores are profitable at all for a given arch.

This patch add `x86_memset_non_temporal_threshold`. Currently we
disable non-temporal stores for non Intel vendors as the only
benchmarks showing its benefit have been on Intel hardware.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-30 12:36:09 -05:00
Noah Goldstein
5bf0ab8057 x86: Improve large memset perf with non-temporal stores [RHEL-29312]
Previously we use `rep stosb` for all medium/large memsets. This is
notably worse than non-temporal stores for large (above a
few MBs) memsets.
See:
https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing
For data using different stategies for large memset on ICX and SKX.

Using non-temporal stores can be up to 3x faster on ICX and 2x faster
on SKX. Historically, these numbers would not have been so good
because of the zero-over-zero writeback optimization that `rep stosb`
is able to do. But, the zero-over-zero writeback optimization has been
removed as a potential side-channel attack, so there is no longer any
good reason to only rely on `rep stosb` for large memsets. On the flip
size, non-temporal writes can avoid data in their RFO requests saving
memory bandwidth.

All of the other changes to the file are to re-organize the
code-blocks to maintain "good" alignment given the new code added in
the `L(stosb_local)` case.

The results from running the GLIBC memset benchmarks on TGL-client for
N=20 runs:

Geometric Mean across the suite New / Old EXEX256: 0.979
Geometric Mean across the suite New / Old EXEX512: 0.979
Geometric Mean across the suite New / Old AVX2   : 0.986
Geometric Mean across the suite New / Old SSE2   : 0.979

Most of the cases are essentially unchanged, this is mostly to show
that adding the non-temporal case didn't add any regressions to the
other cases.

The results on the memset-large benchmark suite on TGL-client for N=20
runs:

Geometric Mean across the suite New / Old EXEX256: 0.926
Geometric Mean across the suite New / Old EXEX512: 0.925
Geometric Mean across the suite New / Old AVX2   : 0.928
Geometric Mean across the suite New / Old SSE2   : 0.924

So roughly a 7.5% speedup. This is lower than what we see on servers
(likely because clients typically have faster single-core bandwidth so
saving bandwidth on RFOs is less impactful), but still advantageous.

Full test-suite passes on x86_64 w/ and w/o multiarch.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-30 12:36:09 -05:00
Xi Ruoyao
0c1d2c277a LoongArch: Use "$fcsr0" instead of "$r0" in _FPU_{GET,SET}CW
Clang inline-asm parser does not allow using "$r0" in
movfcsr2gr/movgr2fcsr, so everything using _FPU_{GET,SET}CW is now
failing to build with Clang on LoongArch.  As we now requires Binutils
>= 2.41 which supports using "$fcsr0" here, use it instead of "$r0" to
fix the issue.

Link: https://github.com/loongson-community/discussions/issues/53#issuecomment-2081507390
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=4142b2368353
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2024-05-28 09:17:05 +08:00
Xin Wang
e0f7f1808f x86_64: Reformat elf_machine_rela
A space is added before the left bracket of the x86_64 elf_machine_rela
function, in order to harmonize with the rest of the implementation of
the function and to make it easier to retrieve the function. The lines
where the function definition is located has been re-indented, as well
as its left curly bracket placed in the correct position.

Signed-off-by: Xin Wang <yw987194828@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-27 13:46:45 -07:00
Sunil K Pandey
1b713c9a53 i386: Disable Intel Xeon Phi tests for GCC 15 and above (BZ 31782)
This patch disables Intel Xeon Phi tests for GCC 15 and above.

GCC 15 removed Intel Xeon Phi ISA support.
commit e1a7e2c54d52d0ba374735e285b617af44841ace
Author: Haochen Jiang <haochen.jiang@intel.com>
Date:   Mon May 20 10:43:44 2024 +0800

    i386: Remove Xeon Phi ISA support

Fixes BZ 31782.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-27 12:28:13 -07:00
H.J. Lu
f981bf6b9d parse_fdinfo: Don't advance pointer twice [BZ #31798]
pidfd_getpid.c has

      /* Ignore invalid large values.  */
      if (INT_MULTIPLY_WRAPV (10, n, &n)
          || INT_ADD_WRAPV (n, *l++ - '0', &n))
        return -1;

For GCC older than GCC 7, INT_ADD_WRAPV(a, b, r) is defined as

   _GL_INT_OP_WRAPV (a, b, r, +, _GL_INT_ADD_RANGE_OVERFLOW)

and *l++ - '0' is evaluated twice.  Fix BZ #31798 by moving "l++" out of
the if statement.  Tested with GCC 6.4 and GCC 14.1.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-05-27 06:52:45 -07:00
H.J. Lu
23c60af6dc sysdeps/ieee754/ldbl-opt/Makefile: Split and sort libnldbl-calls
Put each item on a separate line and sort libnldbl-calls.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-24 10:25:40 -07:00
H.J. Lu
639c143db3 sysdeps/ieee754/ldbl-opt/Makefile: Remove test-nldbl-redirect-static
Remove $(objpfx)test-nldbl-redirect-static checked in by accident.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-24 06:36:18 -07:00
H.J. Lu
acfb169b3c sysdeps/ieee754/ldbl-opt/Makefile: Split and sort tests
Put each test on a separate line and sort tests.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-24 06:31:49 -07:00
Stefan Liebler
4af49c60a1 s390x: Regenerate ULPs.
Needed due to:
"Implement C23 log2p1"
commit ID 79c52daf47
2024-05-24 09:53:49 +02:00
Joseph Myers
84d2762922 Update kernel version to 6.9 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py
and tst-mount-consts.py to 6.9.  (There are no new constants covered
by these tests in 6.9 that need any other header changes;
tst-pidfd-consts.py was updated separately along with adding new
constants relevant to that test.)

Tested with build-many-glibcs.py.
2024-05-23 14:04:48 +00:00
Adhemerval Zanella
eaa8113bf0 math: Provide missing math symbols on libc.a (BZ 31781)
The libc.a for alpha, s390, and sparcv9 does not provide
copysignf64x, copysignf128, frexpf64x, frexpf128, modff64x, and
modff128.

Checked with a static build for the affected ABIs.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-23 09:36:08 -03:00
Adhemerval Zanella
1664bbf238 s390: Make utmp32, utmpx32, and login32 shared only (BZ 31790)
The function that work with 'struct utmp32' and 'struct utmpx32'
are only for compat symbols.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-23 09:36:08 -03:00
Adhemerval Zanella
18dbe27847 microblaze: Remove cacheflush from libc.a (BZ 31788)
microblaze does not export it in libc.so nor the kernel provides the
cacheflush syscall.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-23 09:36:08 -03:00
Adhemerval Zanella
d8ebde14fb powerpc: Remove duplicated llrintf and llrintf32 from libm.a (BZ 31787)
Both the generic and POWER6 versions provide definitions of the
symbol, which are already provided by the ifunc resolver.

Checked on powerpc-linux-gnu-power4.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-23 09:36:08 -03:00
Adhemerval Zanella
5fededd825 powerpc: Remove duplicate strchrnul and strncasecmp_l libc.a (BZ 31786)
For powerpc64 the generic version provides a weak definition of
strchrnul, which are already provided by the ifunc resolver.  The
powerpc32 version is slight different, where for static case there
is no iFUNC support.

The strncasecmp_l is provided ifunc resolver.

Checked on powerpc-linux-gnu-power4 and powerpc64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-23 09:36:08 -03:00
Adhemerval Zanella
62eaa46739 loongarch: Remove duplicate strnlen in libc.a (BZ 31785)
The generic version provides weak definitions of strnlen,
which are already provided by the ifunc resolver.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-23 09:36:08 -03:00
Adhemerval Zanella
ef9596352b aarch64: Remove duplicate memchr/strlen in libc.a (BZ 31777)
The generic version provides weak definitions of memchr/strlen,
which are already provided by the ifunc resolvers.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-23 09:36:08 -03:00
Joseph Myers
e9a37242f9 Update PIDFD_* constants for Linux 6.9
Linux 6.9 adds some more PIDFD_* constants.  Add them to glibc's
sys/pidfd.h, including updating comments that said FLAGS was reserved
and must be 0, along with updating tst-pidfd-consts.py.

Tested with build-many-glibcs.py.
2024-05-23 12:22:40 +00:00
H.J. Lu
43d41ae6d7 Don't provide XXXf128_do_not_use aliases [BZ #31757]
Don't provide __nexttowardf128_do_not_use, nexttowardf128_do_not_use,
finitef128_do_not_use, isinff128_do_not_use and isnanf128_do_not_use.
This fixes BZ #31757.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-05-22 06:12:17 -07:00
Adhemerval Zanella
5d4999e519 math: Fix isnanf128 static build (BZ 31774)
Some static implementation of float128 routines might call __isnanf128,
which is not provided by the static object.

Checked on x86_64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-05-21 16:53:27 -03:00
Adhemerval Zanella
1f09aae36a math: Fix i386 and m68k exp10 on static build (BZ 31775)
The commit 08ddd26814 removed the static exp10 on i386 and m68k with an
empty w_exp10.c (required for the ABIs that uses the newly
implementation).  This patch fixes by adding the required symbols on the
arch-specific w_exp{f}_compat.c implementation.

Checked on i686-linux-gnu and with a build for m68k-linux-gnu.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
2024-05-21 13:44:22 -03:00
Adhemerval Zanella
0b716305df math: Fix i386 and m68k fmod/fmodf on static build (BZ 31488)
The commit 16439f419b removed the static fmod/fmodf on i386 and m68k
with and empty w_fmod.c (required for the ABIs that uses the newly
implementation).  This patch fixes by adding the required symbols on
the arch-specific w_fmod{f}_compat.c implementation.

To statically build fmod fails on some ABI (alpha, s390, sparc) because
it does not export the ldexpf128, this is also fixed by this patch.

Checked on i686-linux-gnu and with a build for m68k-linux-gnu.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
2024-05-21 13:43:39 -03:00
H.J. Lu
437c94e04b Remove the clone3 symbol from libc.a [BZ #31770]
clone3 isn't exported from glibc and is hidden in libc.so.  Fix BZ #31770
by removing clone3 alias.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-05-21 07:05:08 -07:00
Joe Ramsay
0fed0b250f aarch64/fpu: Add vector variants of pow
Plus a small amount of moving includes around in order to be able to
remove duplicate definition of asuint64.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-05-21 14:38:49 +01:00
caiyinyu
3c1e22372d LoongArch: Update ulps
For the log2p1 implementation.
2024-05-21 12:08:25 +08:00
mengqinggang
16d47c1594 LoongArch: Fix tst-gnu2-tls2 compiler error
Add -mno-lsx to tst-gnu2-tlsmod*.c if gcc support -mno-lsx.
Add escape character '\' in vector support test function.
2024-05-21 11:23:03 +08:00
H.J. Lu
8428278b5f i386: Don't define stpncpy alias when used in IFUNC [BZ #31768]
Fix BZ #31768 by not defining stpncpy alias when used in IFUNC.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-05-20 19:35:00 -07:00
Adhemerval Zanella
f83e461f10 powerpc: Update ulps
For the log2p1 implementation.
2024-05-20 13:12:23 -03:00
Adhemerval Zanella
32b2aa59da arm: Update ulps
For the log2p1 implementation.
2024-05-20 13:12:23 -03:00
Adhemerval Zanella
241338bd6f aarch64: Update ulps
For the log2p1 implementation.
2024-05-20 13:12:23 -03:00
Joseph Myers
79c52daf47 Implement C23 log2p1
C23 adds various <math.h> function families originally defined in TS
18661-4.  Add the log2p1 functions (log2(1+x): like log1p, but for
base-2 logarithms).

This illustrates the intended structure of implementations of all
these function families: define them initially with a type-generic
template implementation.  If someone wishes to add type-specific
implementations, it is likely such implementations can be both faster
and more accurate than the type-generic one and can then override it
for types for which they are implemented (adding benchmarks would be
desirable in such cases to demonstrate that a new implementation is
indeed faster).

The test inputs are copied from those for log1p.  Note that these
changes make gen-auto-libm-tests depend on MPFR 4.2 (or later).

The bulk of the changes are fairly generic for any such new function.
(sysdeps/powerpc/nofpu/Makefile only needs changing for those
type-generic templates that use fabs.)

Tested for x86_64 and x86, and with build-many-glibcs.py.
2024-05-20 13:41:39 +00:00
Joseph Myers
cf0ca8d52e Update syscall lists for Linux 6.9
Linux 6.9 has no new syscalls.  Update the version number in
syscall-names.list to reflect that it is still current for 6.9.

Tested with build-many-glibcs.py.
2024-05-20 13:10:31 +00:00
H.J. Lu
7935e7a537 Rename procutils_read_file to __libc_procutils_read_file [BZ #31755]
Fix BZ #31755 by renaming the internal function procutils_read_file to
__libc_procutils_read_file.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-05-20 05:22:43 -07:00
H.J. Lu
4e21cb95e2 nearbyint: Don't define alias when used in IFUNC [BZ #31759]
Fix BZ #31759 by not defining nearbyint aliases when used in IFUNC.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-05-20 05:21:41 -07:00
Florian Weimer
8d7b6b4cb2 socket: Use may_alias on sockaddr structs (bug 19622)
This supports common coding patterns.  The GCC C front end before
version 7 rejects the may_alias attribute on a struct definition
if it was not present in a previous forward declaration, so this
attribute can only be conditionally applied.

This implements the spirit of the change in Austin Group issue 1641.

Suggested-by: Marek Polacek <polacek@redhat.com>
Suggested-by: Jakub Jelinek <jakub@redhat.com>
Reviewed-by: Sam James <sam@gentoo.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-05-18 09:33:19 +02:00
Manjunath Matti
a81cdde1cb powerpc64: Fix by using the configure value $libc_cv_cc_submachine [BZ #31629]
This patch ensures that $libc_cv_cc_submachine, which is set from
"--with-cpu", overrides $CFLAGS for configure time tests.

Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-05-16 17:31:45 -05:00
Joe Ramsay
75207bde68 aarch64/fpu: Add vector variants of cbrt
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-05-16 14:35:06 +01:00
Joe Ramsay
157f89fa3d aarch64/fpu: Add vector variants of hypot
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-05-16 14:34:43 +01:00
mengqinggang
1dbf2bef79 LoongArch: Add support for TLS Descriptors
This is mostly based on AArch64 and RISC-V implementation.

Add R_LARCH_TLS_DESC32 and R_LARCH_TLS_DESC64 relocations.

For _dl_tlsdesc_dynamic function slow path, temporarily save and restore
all vector registers.
2024-05-15 10:31:53 +08:00
Joe Ramsay
90a6ca8b28 aarch64: Fix AdvSIMD libmvec routines for big-endian
Previously many routines used * to load from vector types stored
in the data table. This is emitted as ldr, which byte-swaps the
entire vector register, and causes bugs for big-endian when not
all lanes contain the same value. When a vector is to be used
this way, it has been replaced with an array and the load with an
explicit ld1 intrinsic, which byte-swaps only within lanes.

As well, many routines previously used non-standard GCC syntax
for vector operations such as indexing into vectors types with []
and assembling vectors using {}. This syntax should not be mixed
with ACLE, as the former does not respect endianness whereas the
latter does. Such examples have been replaced with, for instance,
vcombine_* and vgetq_lane* intrinsics. Helpers which only use the
GCC syntax, such as the v_call helpers, do not need changing as
they do not use intrinsics.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-05-14 13:10:33 +01:00
Adhemerval Zanella
ae515ba530 powerpc: Fix __fesetround_inline_nocheck on POWER9+ (BZ 31682)
The e68b1151f7 commit changed the
__fesetround_inline_nocheck implementation to use mffscrni
(through __fe_mffscrn) instead of mtfsfi.  For generic powerpc
ceil/floor/trunc, the function is supposed to disable the
floating-point inexact exception enable bit, however mffscrni
does not change any exception enable bits.

This patch fixes by reverting the optimization for the
__fesetround_inline_nocheck.

Checked on powerpc-linux-gnu.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
2024-05-09 08:59:30 -03:00
Gabi Falk
dd5f891c1a x86_64: Fix missing wcsncat function definition without multiarch (x86-64-v4)
This code expects the WCSCAT preprocessor macro to be predefined in case
the evex implementation of the function should be defined with a name
different from __wcsncat_evex.  However, when glibc is built for
x86-64-v4 without multiarch support, sysdeps/x86_64/wcsncat.S defines
WCSNCAT variable instead of WCSCAT to build it as wcsncat.  Rename the
variable to WCSNCAT, as it is actually a better naming choice for the
variable in this case.

Reported-by: Kenton Groombridge
Link: https://bugs.gentoo.org/921945
Fixes: 64b8b6516b ("x86: Add evex optimized functions for the wchar_t strcpy family")
Signed-off-by: Gabi Falk <gabifalk@gmx.com>
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-05-08 07:37:59 -07:00
Adhemerval Zanella
1e1ad714ee support: Add envp argument to support_capture_subprogram
So tests can specify a list of environment variables.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2024-05-07 12:16:36 -03:00
Adhemerval Zanella
bcae44ea85 elf: Only process multiple tunable once (BZ 31686)
The 680c597e9c commit made loader reject ill-formatted strings by
first tracking all set tunables and then applying them. However, it does
not take into consideration if the same tunable is set multiple times,
where parse_tunables_string appends the found tunable without checking
if it was already in the list. It leads to a stack-based buffer overflow
if the tunable is specified more than the total number of tunables.  For
instance:

  GLIBC_TUNABLES=glibc.malloc.check=2:... (repeat over the number of
  total support for different tunable).

Instead, use the index of the tunable list to get the expected tunable
entry.  Since now the initial list is zero-initialized, the compiler
might emit an extra memset and this requires some minor adjustment
on some ports.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reported-by: Yuto Maeda <maeda@cyberdefense.jp>
Reported-by: Yutaro Shimizu <shimizu@cyberdefense.jp>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2024-05-07 12:16:36 -03:00
H.J. Lu
5f245f3bfb Add crt1-2.0.o for glibc 2.0 compatibility tests
Starting from glibc 2.1, crt1.o contains _IO_stdin_used which is checked
by _IO_check_libio to provide binary compatibility for glibc 2.0.  Add
crt1-2.0.o for tests against glibc 2.0.  Define tests-2.0 for glibc 2.0
compatibility tests.  Add and update glibc 2.0 compatibility tests for
stderr, matherr and pthread_kill.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-05-06 07:49:40 -07:00
Amrita H S
23f0d81608 powerpc: Optimized strncmp for power10
This patch is based on __strcmp_power10.

Improvements from __strncmp_power9:

    1. Uses new POWER10 instructions
       - This code uses lxvp to decrease contention on load
	 by loading 32 bytes per instruction.

    2. Performance implication
       - This version has around 38% better performance on average.
       - Minor performance regression is seen for few small sizes
	 and specific combination of alignments.

Signed-off-by: Amrita H S <amritahs@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-05-06 09:01:29 -05:00
Stafford Horne
643d9d38d5 or1k: Add hard float support
This patch adds hardware floating point support to OpenRISC.  Hardware
floating point toolchain builds are enabled by passing the machine
specific argument -mhard-float to gcc via CFLAGS.  With this enabled GCC
generates floating point instructions for single-precision operations
and exports __or1k_hard_float__.

There are 2 main parts to this patch.

 - Implement fenv functions to update the FPCSR flags keeping it in sync
   with sfp (software floating point).
 - Update machine context functions to store and restore the FPCSR
   state.

*On mcontext_t ABI*

This patch adds __fpcsr to mcontext_t.  This is an ABI change, but also
an ABI fix.  The Linux kernel has always defined padding in mcontext_t
that space was missing from the glibc ABI.  In Linux this unused space
has now been re-purposed for storing the FPCSR.  This patch brings
OpenRISC glibc in line with the Linux kernel and other libc
implementation (musl).

Compatibility getcontext, setcontext, etc symbols have been added to
allow for binaries expecting the old ABI to continue to work.

*Hard float ABI*

The calling conventions and types do not change with OpenRISC hard-float
so glibc hard-float builds continue to use dynamic linker
/lib/ld-linux-or1k.so.1.

*Testing*

I have tested this patch both with hard-float and soft-float builds and
the test results look fine to me.  Results are as follows:

Hard Float

    # failures
    FAIL: elf/tst-sprof-basic		(Haven't figured out yet, not related to hard-float)
    FAIL: gmon/tst-gmon-pie		(PIE bug in or1k toolchain)
    FAIL: gmon/tst-gmon-pie-gprof	(PIE bug in or1k toolchain)
    FAIL: iconvdata/iconv-test		(timeout, passed when run manually)
    FAIL: nptl/tst-cond24		(Timeout)
    FAIL: nptl/tst-mutex10		(Timeout)

    # summary
	  6 FAIL
       4289 PASS
	 86 UNSUPPORTED
	 16 XFAIL
	  2 XPASS

    # versions
    Toolchain: or1k-smhfpu-linux-gnu
    Compiler:  gcc version 14.0.1 20240324 (experimental) [master r14-9649-gbb04a11418f] (GCC)
    Binutils:  GNU assembler version 2.42.0 (or1k-smhfpu-linux-gnu) using BFD version (GNU Binutils) 2.42.0.20240324
    Linux:     Linux buildroot 6.9.0-rc1-00008-g4dc70e1aadfa #112 SMP Sat Apr 27 06:43:11 BST 2024 openrisc GNU/Linux
    Tester:    shorne
    Glibc:     2024-04-25 b62928f907 Florian Weimer   x86: In ld.so, diagnose missing APX support in APX-only builds  (origin/master, origin/HEAD)

Soft Float

    # failures
    FAIL: elf/tst-sprof-basic
    FAIL: gmon/tst-gmon-pie
    FAIL: gmon/tst-gmon-pie-gprof
    FAIL: nptl/tst-cond24
    FAIL: nptl/tst-mutex10

    # summary
	  5 FAIL
       4295 PASS
	 81 UNSUPPORTED
	 16 XFAIL
	  2 XPASS

    # versions
    Toolchain: or1k-smh-linux-gnu
    Compiler:  gcc version 14.0.1 20240324 (experimental) [master r14-9649-gbb04a11418f] (GCC)
    Binutils:  GNU assembler version 2.42.0 (or1k-smh-linux-gnu) using BFD version (GNU Binutils) 2.42.0.20240324
    Linux:     Linux buildroot 6.9.0-rc1-00008-g4dc70e1aadfa #112 SMP Sat Apr 27 06:43:11 BST 2024 openrisc GNU/Linux
    Tester:    shorne
    Glibc:     2024-04-25 b62928f907 Florian Weimer   x86: In ld.so, diagnose missing APX support in APX-only builds  (origin/master, origin/HEAD)

Documentation: https://raw.githubusercontent.com/openrisc/doc/master/openrisc-arch-1.4-rev0.pdf
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-05-03 18:28:18 +01:00
Stafford Horne
b57adfa49b or1k: Add hard float libm-test-ulps
This patch adds the ulps test file to prepare for the upcoming
hard float patch.  This is separated out to make the hard float patch
smaller.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-05-03 18:28:18 +01:00
Gabi Falk
5a2cf833f5
i686: Fix multiple definitions of __memmove_chk and __memset_chk
Commit c73c96a4a1 updated memcpy.S and
mempcpy.S, but omitted memmove.S and memset.S.  As a result, the static
library built as PIC, whether with or without multiarch support,
contains two definitions for each of the __memmove_chk and __memset_chk
symbols.

/usr/lib/gcc/i686-pc-linux-gnu/14/../../../../i686-pc-linux-gnu/bin/ld: /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../lib/libc.a(memset-ia32.o): in function `__memset_chk':
/var/tmp/portage/sys-libs/glibc-2.39-r3/work/glibc-2.39/string/../sysdeps/i386/i686/memset.S:32: multiple definition of `__memset_chk'; /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../lib/libc.a(memset_chk.o):/var/tmp/portage/sys-libs/glibc-2.39-r3/work/glibc-2.39/debug/../sysdeps/i386/i686/multiarch/memset_chk.c:24: first defined here

After this change, regardless of PIC options, the static library, built
for i686 with multiarch contains implementations of these functions
respectively from debug/memmove_chk.c and debug/memset_chk.c, and
without multiarch contains implementations of these functions
respectively from sysdeps/i386/memmove_chk.S and
sysdeps/i386/memset_chk.S.  This ensures that memmove and memset won't
pull in __chk_fail and the routines it calls.

Reported-by: Sam James <sam@gentoo.org>
Tested-by: Sam James <sam@gentoo.org>
Fixes: c73c96a4a1 ("i686: Fix build with --disable-multiarch")
Signed-off-by: Gabi Falk <gabifalk@gmx.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
2024-05-02 11:51:10 +01:00
Gabi Falk
0fdf4ba48c
i586: Fix multiple definitions of __memcpy_chk and __mempcpy_chk
/home/bmg/install/compilers/x86_64-linux-gnu/lib/gcc/x86_64-glibc-linux-gnu/13.2.1/../../../../x86_64-glibc-linux-gnu/bin/ld: /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(memcpy_chk.o): in function `__memcpy_chk':
/home/bmg/src/glibc/debug/../sysdeps/i386/memcpy_chk.S:29: multiple definition of `__memcpy_chk';/home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(memcpy.o):/home/bmg/src/glibc/string/../sysdeps/i386/i586/memcpy.S:31: first defined here /home/bmg/install/compilers/x86_64-linux-gnu/lib/gcc/x86_64-glibc-linux-gnu/13.2.1/../../../../x86_64-glibc-linux-gnu/bin/ld: /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(mempcpy_chk.o): in function `__mempcpy_chk': /home/bmg/src/glibc/debug/../sysdeps/i386/mempcpy_chk.S:28: multiple definition of `__mempcpy_chk'; /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(mempcpy.o):/home/bmg/src/glibc/string/../sysdeps/i386/i586/memcpy.S:31: first defined here

After this change, the static library built for i586, regardless of PIC
options, contains implementations of these functions respectively from
sysdeps/i386/memcpy_chk.S and sysdeps/i386/mempcpy_chk.S.  This ensures
that memcpy and mempcpy won't pull in __chk_fail and the routines it
calls.

Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Gabi Falk <gabifalk@gmx.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
2024-05-02 11:50:21 +01:00
Carlos O'Donell
91695ee459 time: Allow later version licensing.
The FSF's Licensing and Compliance Lab noted a discrepancy in the
licensing of several files in the glibc package.

When timespect_get.c was impelemented the license did not include
the standard ", or (at your option) any later version." text.

Change the license in timespec_get.c and all copied files to match
the expected license.

This change was previously approved in principle by the FSF in
RT ticket #1316403. And a similar instance was fixed in
commit 46703efa02.
2024-05-01 09:03:26 -04:00
Wilco Dijkstra
6dae61567f AArch64: Remove unused defines of CPU names
Remove unused defines of CPU names in cpu-features.h.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-30 13:32:29 +01:00
Florian Weimer
b62928f907 x86: In ld.so, diagnose missing APX support in APX-only builds
At this point, this is mainly a tool for testing the early ld.so
CPU compatibility diagnostics: GCC uses the new instructions in most
functions, so it's easy to spot if some of the early code is not
built correctly.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-25 17:20:28 +02:00
Florian Weimer
3a3a449742 i386: ulp update for SSE2 --disable-multi-arch configurations 2024-04-25 12:56:48 +02:00
H.J. Lu
46c9997413 x86: Define MINIMUM_X86_ISA_LEVEL in config.h [BZ #31676]
Define MINIMUM_X86_ISA_LEVEL at configure time to avoid

/usr/bin/ld: …/build/elf/librtld.os: in function `init_cpu_features':
…/git/elf/../sysdeps/x86/cpu-features.c:1202: undefined reference to `_dl_runtime_resolve_fxsave'
/usr/bin/ld: …/build/elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

when glibc is built with -march=x86-64-v3 and configured with
--with-rtld-early-cflags=-march=x86-64, which is used to allow ld.so to
print an error message on unsupported CPUs:

Fatal glibc error: CPU does not support x86-64-v3

This fixes BZ #31676.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-04-24 04:50:56 -07:00
caiyinyu
095067efdf LoongArch: Add glibc.cpu.hwcap support.
The current IFUNC selection is always using the most recent
features which are available via AT_HWCAP.  But in
some scenarios it is useful to adjust this selection.

The environment variable:

GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,zzz,....

can be used to enable HWCAP feature yyy, disable HWCAP feature xxx,
where the feature name is case-sensitive and has to match the ones
used in sysdeps/loongarch/cpu-tunables.c.

Signed-off-by: caiyinyu <caiyinyu@loongson.cn>
2024-04-24 18:22:38 +08:00
Florian Weimer
f4724843ad nptl: Fix tst-cancel30 on kernels without ppoll_time64 support
Fall back to ppoll if ppoll_time64 fails with ENOSYS.
Fixes commit 370da8a121 ("nptl: Fix
tst-cancel30 on sparc64").

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-04-23 21:16:32 +02:00
Florian Weimer
5361ad3910 login: Use unsigned 32-bit types for seconds-since-epoch
These fields store timestamps when the system was running.  No Linux
systems existed before 1970, so these values are unused.  Switching
to unsigned types allows continued use of the existing struct layouts
beyond the year 2038.

The intent is to give distributions more time to switch to improved
interfaces that also avoid locking/data corruption issues.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Florian Weimer
9abdae94c7 login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701)
These structs describe file formats under /var/log, and should not
depend on the definition of _TIME_BITS.  This is achieved by
defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that
support 32-bit time_t values (where __time_t is 32 bits).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Florian Weimer
4d4da5aab9 login: Check default sizes of structs utmp, utmpx, lastlog
The default <utmp-size.h> is for ports with a 64-bit time_t.
Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1
need to override it.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-19 14:38:17 +02:00
Florian Weimer
14e56bd4ce powerpc: Fix ld.so address determination for PCREL mode (bug 31640)
This seems to have stopped working with some GCC 14 versions,
which clobber r2.  With other compilers, the kernel-provided
r2 value is still available at this point.

Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-04-14 08:24:51 +02:00
Florian Weimer
aea52e3d2b Revert "x86_64: Suppress false positive valgrind error"
This reverts commit a1735e0aa8.

The test failure is a real valgrind bug that needs to be fixed before
valgrind is usable with a glibc that has been built with
CC="gcc -march=x86-64-v3".  The proposed valgrind patch teaches
valgrind to replace ld.so strcmp with an unoptimized scalar
implementation, thus avoiding any AVX2-related problems.

Valgrind bug: <https://bugs.kde.org/show_bug.cgi?id=485487>

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-13 17:42:13 +02:00
Adhemerval Zanella
686d542025 posix: Sync tempname with gnulib
The gnulib version contains an important change (9ce573cde), which
fixes some problems with multithreading, entropy loss, and ASLR leak
nfo.  It also fixes an issue where getrandom is not being used
on some new files generation (only for __GT_NOCREATE on first try).

The 044bf893ac removed __path_search, which is now moved to another
gnulib shared files (stdio-common/tmpdir.{c,h}).  Tthis patch
also fixes direxists to use __stat64_time64 instead of __xstat64,
and move the include of pathmax.h for !_LIBC (since it is not used
by glibc).  The license is also changed from GPL 3.0 to 2.1, with
permission from the authors (Bruno Haible and Paul Eggert).

The sync also removed the clock fallback, since clock_gettime
with CLOCK_REALTIME is expected to always succeed.

It syncs with gnulib commit 323834962817af7b115187e8c9a833437f8d20ec.

Checked on x86_64-linux-gnu.

Co-authored-by: Bruno Haible <bruno@clisp.org>
Co-authored-by: Paul Eggert <eggert@cs.ucla.edu>
Reviewed-by: Bruno Haible <bruno@clisp.org>
2024-04-10 14:53:39 -03:00
Florian Weimer
f8d8b1b1e6 aarch64: Enhanced CPU diagnostics for ld.so
This prints some information from struct cpu_features, and the midr_el1
and dczid_el0 system register contents on every CPU.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-08 16:48:55 +02:00
Florian Weimer
7a430f40c4 x86: Add generic CPUID data dumper to ld.so --list-diagnostics
This is surprisingly difficult to implement if the goal is to produce
reasonably sized output.  With the current approaches to output
compression (suppressing zeros and repeated results between CPUs,
folding ranges of identical subleaves, dealing with the %ecx
reflection issue), the output is less than 600 KiB even for systems
with 256 logical CPUs.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-08 16:48:55 +02:00
Florian Weimer
5653ccd847 elf: Add CPU iteration support for future use in ld.so diagnostics
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-08 16:48:55 +02:00
H.J. Lu
9e1f4aef86 x86-64: Exclude FMA4 IFUNC functions for -mapxf
When -mapxf is used to build glibc, the resulting glibc will never run
on FMA4 machines.  Exclude FMA4 IFUNC functions when -mapxf is used.
This requires GCC which defines __APX_F__ for -mapxf with commit:

1df56719bd8 x86: Define __APX_F__ for -mapxf

Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-04-06 05:03:55 -07:00
Adhemerval Zanella
c27f8763cf Reinstate generic features-time64.h
The a4ed0471d7 removed the generic version which is included by
features.h and used by Hurd.

Checked by building i686-gnu and x86_64-gnu with build-many-glibc.py.
2024-04-05 09:02:36 -03:00
Adhemerval Zanella
460d9e2dfe Cleanup __tls_get_addr on alpha/microblaze localplt.data
They are not required.

Checked with a make check for both ABIs.
2024-04-04 17:20:33 -03:00
Adhemerval Zanella
95700e7998 arm: Remove ld.so __tls_get_addr plt usage
Use the hidden alias instead.

Checked on arm-linux-gnueabihf.
2024-04-04 17:03:32 -03:00
Adhemerval Zanella
50c2be2390 aarch64: Remove ld.so __tls_get_addr plt usage
Use the hidden alias instead.

Checked on aarch64-linux-gnu.
2024-04-04 17:02:32 -03:00
Adhemerval Zanella
44ccc2465c math: x86 trunc traps when FE_INEXACT is enabled (BZ 31603)
The implementations of trunc functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Adhemerval Zanella
932544efa4 math: x86 floor traps when FE_INEXACT is enabled (BZ 31601)
The implementations of floor functions using x87 floating point (i386 and
86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Adhemerval Zanella
637bfc392f math: x86 ceill traps when FE_INEXACT is enabled (BZ 31600)
The implementations of ceil functions using x87 floating point (i386 and
x86_64 long double only) traps when FE_INEXACT is enabled.  Although
this is a GNU extension outside the scope of the C standard, other
architectures that also support traps do not show this behavior.

The fix moves the implementation to a common one that holds any
exceptions with a 'fnclex' (libc_feholdexcept_setround_387).

Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-04-04 14:29:28 -03:00
Joe Ramsay
87cb1dfcd6 aarch64/fpu: Add vector variants of erfc
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:24 +01:00
Joe Ramsay
3d3a4fb8e4 aarch64/fpu: Add vector variants of tanh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:20 +01:00
Joe Ramsay
eedbbca0bf aarch64/fpu: Add vector variants of sinh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:16 +01:00
Joe Ramsay
8b67920528 aarch64/fpu: Add vector variants of atanh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:12 +01:00
Joe Ramsay
81406ea3c5 aarch64/fpu: Add vector variants of asinh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:33:02 +01:00
Joe Ramsay
b09fee1d21 aarch64/fpu: Add vector variants of acosh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:32:58 +01:00
Joe Ramsay
bdb5705b7b aarch64/fpu: Add vector variants of cosh
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:32:52 +01:00
Joe Ramsay
cb5d84f1f8 aarch64/fpu: Add vector variants of erf
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-04-04 10:32:48 +01:00
Stafford Horne
3db9d208dd misc: Add support for Linux uio.h RWF_NOAPPEND flag
In Linux 6.9 a new flag is added to allow for Per-io operations to
disable append mode even if a file was opened with the flag O_APPEND.
This is done with the new RWF_NOAPPEND flag.

This caused two test failures as these tests expected the flag 0x00000020
to be unused.  Adding the flag definition now fixes these tests on Linux
6.9 (v6.9-rc1).

  FAIL: misc/tst-preadvwritev2
  FAIL: misc/tst-preadvwritev64v2

This patch adds the flag, adjusts the test and adds details to
documentation.

Link: https://lore.kernel.org/all/20200831153207.GO3265@brightrain.aerifal.cx/
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-04-04 09:41:27 +01:00
Adhemerval Zanella
4dcd674b66 powerpc: Add missing arch flags on rounding ifunc variants
The ifunc variants now uses the powerpc implementation which in turn
uses the compiler builtin.  Without the proper -mcpu switch the builtin
does not generate the expected optimization.

Checked on powerpc-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-04-02 15:49:31 -03:00
Adhemerval Zanella
a4ed0471d7 Always define __USE_TIME_BITS64 when 64 bit time_t is used
It was raised on libc-help [1] that some Linux kernel interfaces expect
the libc to define __USE_TIME_BITS64 to indicate the time_t size for the
kABI.  Different than defined by the initial y2038 design document [2],
the __USE_TIME_BITS64 is only defined for ABIs that support more than
one time_t size (by defining the _TIME_BITS for each module).

The 64 bit time_t redirects are now enabled using a different internal
define (__USE_TIME64_REDIRECTS). There is no expected change in semantic
or code generation.

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and
arm-linux-gnueabi

[1] https://sourceware.org/pipermail/libc-help/2024-January/006557.html
[2] https://sourceware.org/glibc/wiki/Y2038ProofnessDesign

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-04-02 15:28:36 -03:00
Adhemerval Zanella
721314c980 x86_64: Remove avx512 strstr implementation
As indicated in a recent thread, this it is a simple brute-force
algorithm that checks the whole needle at a matching character pair
(and does so 1 byte at a time after the first 64 bytes of a needle).
Also it never skips ahead and thus can match at every haystack
position after trying to match all of the needle, which generic
implementation avoids.

As indicated by Wilco, a 4x larger needle and 16x larger haystack gives
a clear 65x slowdown both basic_strstr and __strstr_avx512:

  "ifuncs": ["basic_strstr", "twoway_strstr", "__strstr_avx512",
"__strstr_sse2_unaligned", "__strstr_generic"],

    {
     "len_haystack": 65536,
     "len_needle": 1024,
     "align_haystack": 0,
     "align_needle": 0,
     "fail": 1,
     "desc": "Difficult bruteforce needle",
     "timings": [4.0948e+07, 15094.5, 3.20818e+07, 108558, 10839.2]
    },
    {
     "len_haystack": 1048576,
     "len_needle": 4096,
     "align_haystack": 0,
     "align_needle": 0,
     "fail": 1,
     "desc": "Difficult bruteforce needle",
     "timings": [2.69767e+09, 100797, 2.08535e+09, 495706, 82666.9]
    }

PS: I don't have an AVX512 capable machine to verify this issues, but
    skimming through the code it does seems to follow what Wilco has
    described.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-03-27 13:48:16 -03:00
Adhemerval Zanella
2e53eb9234 signal: Avoid system signal disposition to interfere with tests
Both tst-sigset2 and tst-signal1 expectes that SIGINT disposition
is set to SIG_DFL.
2024-03-27 13:47:09 -03:00
Palmer Dabbelt
96d1b9ac23 RISC-V: Fix the static-PIE non-relocated object check
The value of l_scope is only valid post relocation, so this original
check was triggering undefined behavior.  Instead just directly check to
see if the object has been relocated, at which point using l_scope is
safe.

Reported-by: Andreas Schwab <schwab@suse.de>
Closes: BZ #31317
Fixes: e0590f41fe ("RISC-V: Enable static-pie.")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-25 15:17:13 +01:00
Sergey Bugaev
dc1a77269c htl: Implement some support for TLS_DTV_AT_TP
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-19-bugaevc@gmail.com>
2024-03-23 23:00:30 +01:00
Sergey Bugaev
a4273efa21 htl: Respect GL(dl_stack_flags) when allocating stacks
Previously, HTL would always allocate non-executable stacks.  This has
never been noticed, since GNU Mach on x86 ignores VM_PROT_EXECUTE and
makes all pages implicitly executable.  Since GNU Mach on AArch64
supports non-executable pages, HTL forgetting to pass VM_PROT_EXECUTE
immediately breaks any code that (unfortunately, still) relies on
executable stacks.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-7-bugaevc@gmail.com>
2024-03-23 22:48:44 +01:00
Sergey Bugaev
b467cfcaee hurd: Use the RETURN_ADDRESS macro
This gives us PAC stripping on AArch64.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-6-bugaevc@gmail.com>
2024-03-23 22:48:01 +01:00
Sergey Bugaev
6afeac1289 hurd: Disable Prefer_MAP_32BIT_EXEC on non-x86_64 for now
While we could support it on any architecture, the tunable is currently
only defined on x86_64.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-5-bugaevc@gmail.com>
2024-03-23 22:47:46 +01:00
Sergey Bugaev
7f02511e5b hurd: Move internal functions to internal header
Move _hurd_self_sigstate (), _hurd_critical_section_lock (), and
_hurd_critical_section_unlock () inline implementations (that were
already guarded by #if defined _LIBC) to the internal version of the
header.  While at it, add <tls.h> to the includes, and use
__LIBC_NO_TLS () unconditionally.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-ID: <20240323173301.151066-2-bugaevc@gmail.com>
2024-03-23 22:43:07 +01:00
Stafford Horne
ad05a42370 or1k: Add prctl wrapper to unwrap variadic args
On OpenRISC variadic functions and regular functions have different
calling conventions so this wrapper is needed to translate.  This
wrapper is copied from x86_64/x32.  I don't know the build system enough
to find a cleaner way to share the code between x86_64/x32 and or1k
(maybe Implies?), so I went with the straight copy.

This fixes test failures:

  misc/tst-prctl
  nptl/tst-setgetname
2024-03-22 15:43:34 +00:00
Stafford Horne
df7e29e2a4 or1k: Only define fpu rouding and exceptions with hard-float
This test failure:

  math/test-fenv

If rounding mode and exception macros are defined then the fenv tests
run and always fail.  This patch adds an ifdef using the
__or1k_hard_float__ macro provided by gcc to avoid defining these fenv
macros when they cnnot be used.  This is similar to what is done in csky.

Note, I will post the or1k hard-float support soon. So, I prefer to
leave the hard-float bits here for now.
2024-03-22 15:43:34 +00:00
Stafford Horne
2e982a3937 or1k: Update libm test ulps
To fix test failures:

    FAIL: math/test-float-hypot
    FAIL: math/test-float32-hypot
2024-03-22 15:43:34 +00:00
Wilco Dijkstra
2e94e2f5d2 AArch64: Check kernel version for SVE ifuncs
Old Linux kernels disable SVE after every system call.  Calling the
SVE-optimized memcpy afterwards will then cause a trap to reenable SVE.
As a result, applications with a high use of syscalls may run slower with
the SVE memcpy.  This is true for kernels between 4.15.0 and before 6.2.0,
except for 5.14.0 which was patched.  Avoid this by checking the kernel
version and selecting the SVE ifunc on modern kernels.

Parse the kernel version reported by uname() into a 24-bit kernel.major.minor
value without calling any library functions.  If uname() is not supported or
if the version format is not recognized, assume the kernel is modern.

Tested-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2024-03-21 16:50:51 +00:00
Amrita H S
1ea0511456 powerpc: Placeholder and infrastructure/build support to add Power11 related changes.
The following three changes have been added to provide initial Power11 support.
    1. Add the directories to hold Power11 files.
    2. Add support to select Power11 libraries based on AT_PLATFORM.
    3. Let submachine=power11 be set automatically.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-03-19 21:11:34 -05:00
Manjunath Matti
3ab9b88e2a powerpc: Add HWCAP3/HWCAP4 data to TCB for Power Architecture.
This patch adds a new feature for powerpc.  In order to get faster
access to the HWCAP3/HWCAP4 masks, similar to HWCAP/HWCAP2 (i.e. for
implementing __builtin_cpu_supports() in GCC) without the overhead of
reading them from the auxiliary vector, we now reserve space for them
in the TCB.

This is an ABI change for GLIBC 2.39.

Suggested-by: Peter Bergner <bergner@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-03-19 17:19:27 -05:00
Adhemerval Zanella
3d53d18fc7 elf: Enable TLS descriptor tests on aarch64
The aarch64 uses 'trad' for traditional tls and 'desc' for tls
descriptors, but unlike other targets it defaults to 'desc'.  The
gnutls2 configure check does not set aarch64 as an ABI that uses
TLS descriptors, which then disable somes stests.

Also rename the internal machinery fron gnu2 to tls descriptors.

Checked on aarch64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-03-19 14:53:30 -03:00
Adhemerval Zanella
64c7e34428 arm: Update _dl_tlsdesc_dynamic to preserve caller-saved registers (BZ 31372)
ARM _dl_tlsdesc_dynamic slow path has two issues:

  * The ip/r12 is defined by AAPCS as a scratch register, and gcc is
    used to save the stack pointer before on some function calls.  So it
    should also be saved/restored as well.  It fixes the tst-gnu2-tls2.

  * None of the possible VFP registers are saved/restored.  ARM has the
    additional complexity to have different VFP bank sizes (depending of
    VFP support by the chip).

The tst-gnu2-tls2 test is extended to check for VFP registers, although
only for hardfp builds.  Different than setcontext, _dl_tlsdesc_dynamic
does not have  HWCAP_ARM_IWMMXT (I don't have a way to properly test
it and it is almost a decade since newer hardware was released).

With this patch there is no need to mark tst-gnu2-tls2 as XFAIL.

Checked on arm-linux-gnueabihf.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-03-19 14:53:30 -03:00
Andreas Schwab
fd7ee2e6c5 Add tst-gnu2-tls2mod1 to test-internal-extras
That allows sysdeps/x86_64/tst-gnu2-tls2mod1.S to use internal headers.

Fixes: 717ebfa85c ("x86-64: Allocate state buffer space for RDI, RSI and RBX")
2024-03-19 14:28:28 +01:00
H.J. Lu
717ebfa85c x86-64: Allocate state buffer space for RDI, RSI and RBX
_dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack.
After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11.  Define
TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX
to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to
STATE_SAVE_OFFSET(%rsp).

   +==================+<- stack frame start aligned at 8 or 16 bytes
   |                  |<- RDI saved in the red zone
   |                  |<- RSI saved in the red zone
   |                  |<- RBX saved in the red zone
   |                  |<- paddings for stack realignment of 64 bytes
   |------------------|<- xsave buffer end aligned at 64 bytes
   |                  |<-
   |                  |<-
   |                  |<-
   |------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp)
   |                  |<- 8-byte padding for 64-byte alignment
   |                  |<- 8-byte padding for 64-byte alignment
   |                  |<- R11
   |                  |<- R10
   |                  |<- R9
   |                  |<- R8
   |                  |<- RDX
   |                  |<- RCX
   +==================+<- RSP aligned at 64 bytes

Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size
for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI
and RBX are saved onto stack without adjusting stack pointer first, using
the red-zone.  This fixes BZ #31501.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-03-18 19:45:13 -07:00
Darius Rad
f44f3aed31 riscv: Update nofpu libm test ulps
Fix two test failures.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-03-18 11:28:50 +01:00
Florian Weimer
7a76f21867 linux: Use rseq area unconditionally in sched_getcpu (bug 31479)
Originally, nptl/descr.h included <sys/rseq.h>, but we removed that
in commit 2c6b4b272e ("nptl:
Unconditionally use a 32-byte rseq area").  After that, it was
not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c
compilation that provided a definition.  This commit always checks
the rseq area for CPU number information before using the other
approaches.

This adds an unnecessary (but well-predictable) branch on
architectures which do not define RSEQ_SIG, but its cost is small
compared to the system call.  Most architectures that have vDSO
acceleration for getcpu also have rseq support.

Fixes: 2c6b4b272e
Fixes: 1d350aa060
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2024-03-15 19:08:24 +01:00
Szabolcs Nagy
73c26018ed aarch64: fix check for SVE support in assembler
Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.

The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-03-14 14:27:56 +00:00
Joseph Myers
2367bf468c Update kernel version to 6.8 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.8.  (There are no new
constants covered by these tests in 6.8 that need any other header
changes.)

Tested with build-many-glibcs.py.
2024-03-13 19:46:21 +00:00
Joseph Myers
3de2f8755c Update syscall lists for Linux 6.8
Linux 6.8 adds five new syscalls.  Update syscall-names.list and
regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.

Tested with build-many-glibcs.py.
2024-03-13 13:57:56 +00:00
Adhemerval Zanella
4a76fb1da8 powerpc: Remove power8 strcasestr optimization
Similar to strstr (1e9a550ba4), power8 strcasestr does not show much
improvement compared to the generic implementation.  The geomean
on bench-strcasestr shows:

            __strcasestr_power8  __strcasestr_ppc
  power10                  1159              1120
  power9                   1640              1469
  power8                   1787              1904

The strcasestr uses the same 'trick' as power7 strstr to detect
potential quadradic behavior, which only adds overheads for input
that trigger quadradic behavior and it is really a hack.

Checked on powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-03-12 17:11:01 -03:00
Adhemerval Zanella
2149da3683 riscv: Fix alignment-ignorant memcpy implementation
The memcpy optimization (commit 587a1290a1) has a series
of mistakes:

  - The implementation is wrong: the chunk size calculation is wrong
    leading to invalid memory access.

  - It adds ifunc supports as default, so --disable-multi-arch does
    not work as expected for riscv.

  - It mixes Linux files (memcpy ifunc selection which requires the
    vDSO/syscall mechanism)  with generic support (the memcpy
    optimization itself).

  - There is no __libc_ifunc_impl_list, which makes testing only
    check the selected implementation instead of all supported
    by the system.

This patch also simplifies the required bits to enable ifunc: there
is no need to memcopy.h; nor to add Linux-specific files.

The __memcpy_noalignment tail handling now uses a branchless strategy
similar to aarch64 (overlap 32-bits copies for sizes 4..7 and byte
copies for size 1..3).

Checked on riscv64 and riscv32 by explicitly enabling the function
on __libc_ifunc_impl_list on qemu-system.

Changes from v1:
* Implement the memcpy in assembly to correctly handle RISCV
  strict-alignment.
Reviewed-by: Evan Green <evan@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-12 14:38:08 -03:00
Andreas Schwab
2173173d57 linux/sigsetops: fix type confusion (bug 31468)
Each mask in the sigset array is an unsigned long, so fix __sigisemptyset
to use that instead of int.  The __sigword function returns a simple array
index, so it can return int instead of unsigned long.
2024-03-12 10:00:22 +01:00
caiyinyu
aeee41f1cf LoongArch: Correct {__ieee754, _}_scalb -> {__ieee754, _}_scalbf 2024-03-12 14:07:27 +08:00
Sunil K Pandey
b6e3898194 x86-64: Simplify minimum ISA check ifdef conditional with if
Replace minimum ISA check ifdef conditional with if.  Since
MINIMUM_X86_ISA_LEVEL and AVX_X86_ISA_LEVEL are compile time constants,
compiler will perform constant folding optimization, getting same
results.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-03-03 15:47:53 -08:00
Evan Green
587a1290a1
riscv: Add and use alignment-ignorant memcpy
For CPU implementations that can perform unaligned accesses with little
or no performance penalty, create a memcpy implementation that does not
bother aligning buffers. It will use a block of integer registers, a
single integer register, and fall back to bytewise copy for the
remainder.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:15:01 -08:00
Evan Green
a2b47f7d46
riscv: Add ifunc helper method to hwprobe.h
Add a little helper method so it's easier to fetch a single value from
the hwprobe function when used within an ifunc selector.

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:15:00 -08:00
Evan Green
a29bb320a1
riscv: Enable multi-arg ifunc resolvers
RISC-V is apparently the first architecture to pass more than one
argument to ifunc resolvers. The helper macros in libc-symbols.h,
__ifunc_resolver(), __ifunc(), and __ifunc_hidden(), are incompatible
with this. These macros have an "arg" (non-final) parameter that
represents the parameter signature of the ifunc resolver. The result is
an inability to pass the required comma through in a single preprocessor
argument.

Rearrange the __ifunc_resolver() macro to be variadic, and pass the
types as those variable parameters. Move the guts of __ifunc() and
__ifunc_hidden() into new macros, __ifunc_args(), and
__ifunc_args_hidden(), that pass the variable arguments down through to
__ifunc_resolver(). Then redefine __ifunc() and __ifunc_hidden(), which
are used in a bunch of places, to simply shuffle the arguments down into
__ifunc_args[_hidden]. Finally, define a riscv-ifunc.h header, which
provides convenience macros to those looking to write ifunc selectors
that use both arguments.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:59 -08:00
Evan Green
78308ce77a
riscv: Add __riscv_hwprobe pointer to ifunc calls
The new __riscv_hwprobe() function is designed to be used by ifunc
selector functions. This presents a challenge for applications and
libraries, as ifunc selectors are invoked before all relocations have
been performed, so an external call to __riscv_hwprobe() from an ifunc
selector won't work. To address this, pass a pointer to the
__riscv_hwprobe() function into ifunc selectors as the second
argument (alongside dl_hwcap, which was already being passed).

Include a typedef as well for convenience, so that ifunc users don't
have to go through contortions to call this routine. Users will need to
remember to check the second argument for NULL, to account for older
glibcs that don't pass the function.

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:58 -08:00
Evan Green
e7919e0db2
riscv: Add hwprobe vdso call support
The new riscv_hwprobe syscall also comes with a vDSO for faster answers
to your most common questions. Call in today to speak with a kernel
representative near you!

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:57 -08:00
Evan Green
c6c33339b4
linux: Introduce INTERNAL_VSYSCALL
Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
a regular syscall, but without setting errno. Instead, the return value
is plumbed straight out of the macro.

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:56 -08:00
Evan Green
426d0e1aa8
riscv: Add Linux hwprobe syscall support
Add awareness and a thin wrapper function around a new Linux system call
that allows callers to get architecture and microarchitecture
information about the CPUs from the kernel. This can be used to
do things like dynamically choose a memcpy implementation.

Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-03-01 07:14:55 -08:00
H.J. Lu
9b7091415a x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers
_dl_tlsdesc_dynamic should also preserve AMX registers which are
caller-saved.  Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID
to x86-64 TLSDESC_CALL_STATE_SAVE_MASK.  Compute the AMX state size
and save it in xsave_state_full_size which is only used by
_dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec.  This fixes
the AMX part of BZ #31372.  Tested on AMX processor.

AMX test is enabled only for compilers with the fix for

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098

GCC 14 and GCC 11/12/13 branches have the bug fix.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-02-29 04:30:01 -08:00
H.J. Lu
a1735e0aa8 x86_64: Suppress false positive valgrind error
When strcmp-avx2.S is used as the default, elf/tst-valgrind-smoke fails
with

==1272761== Conditional jump or move depends on uninitialised value(s)
==1272761==    at 0x4022C98: strcmp (strcmp-avx2.S:462)
==1272761==    by 0x400B05B: _dl_name_match_p (dl-misc.c:75)
==1272761==    by 0x40085F3: _dl_map_object (dl-load.c:1966)
==1272761==    by 0x401AEA4: map_doit (rtld.c:644)
==1272761==    by 0x4001488: _dl_catch_exception (dl-catch.c:237)
==1272761==    by 0x40015AE: _dl_catch_error (dl-catch.c:256)
==1272761==    by 0x401B38F: do_preload (rtld.c:816)
==1272761==    by 0x401C116: handle_preload_list (rtld.c:892)
==1272761==    by 0x401EDF5: dl_main (rtld.c:1842)
==1272761==    by 0x401A79E: _dl_sysdep_start (dl-sysdep.c:140)
==1272761==    by 0x401BEEE: _dl_start_final (rtld.c:494)
==1272761==    by 0x401BEEE: _dl_start (rtld.c:581)
==1272761==    by 0x401AD87: ??? (in */elf/ld.so)

The assembly codes are:

   0x0000000004022c80 <+144>:	vmovdqu 0x20(%rdi),%ymm0
   0x0000000004022c85 <+149>:	vpcmpeqb 0x20(%rsi),%ymm0,%ymm1
   0x0000000004022c8a <+154>:	vpcmpeqb %ymm0,%ymm15,%ymm2
   0x0000000004022c8e <+158>:	vpandn %ymm1,%ymm2,%ymm1
   0x0000000004022c92 <+162>:	vpmovmskb %ymm1,%ecx
   0x0000000004022c96 <+166>:	inc    %ecx
=> 0x0000000004022c98 <+168>:	jne    0x4022c32 <strcmp+66>

strcmp-avx2.S has 32-byte vector loads of strings which are shorter than
32 bytes:

(gdb) p (char *) ($rdi + 0x20)
$6 = 0x1ffeffea20 "memcheck-amd64-linux.so"
(gdb) p (char *) ($rsi + 0x20)
$7 = 0x4832640 "core-amd64-linux.so"
(gdb) call (int) strlen ((char *) ($rsi + 0x20))
$8 = 19
(gdb) call (int) strlen ((char *) ($rdi + 0x20))
$9 = 23
(gdb)

It triggers the valgrind error.  The above code is safe since the loads
don't cross the page boundary.  Update tst-valgrind-smoke.sh to accept
an optional suppression file and pass a suppression file to valgrind when
strcmp-avx2.S is the default implementation of strcmp.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
2024-02-28 13:40:55 -08:00
H.J. Lu
8c7c188d62 x86: Don't check XFD against /proc/cpuinfo
Since /proc/cpuinfo doesn't report XFD, don't check it against
/proc/cpuinfo.
2024-02-28 11:50:38 -08:00
H.J. Lu
befe2d3c4d x86-64: Don't use SSE resolvers for ISA level 3 or above
When glibc is built with ISA level 3 or above enabled, SSE resolvers
aren't available and glibc fails to build:

ld: .../elf/librtld.os: in function `init_cpu_features':
.../elf/../sysdeps/x86/cpu-features.c:1200:(.text+0x1445f): undefined reference to `_dl_runtime_resolve_fxsave'
ld: .../elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object
/usr/local/bin/ld: final link failed: bad value

For ISA level 3 or above, don't use _dl_runtime_resolve_fxsave nor
_dl_tlsdesc_dynamic_fxsave.

This fixes BZ #31429.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-02-28 11:49:30 -08:00
H.J. Lu
0aac205a81 x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers
Compiler generates the following instruction sequence for GNU2 dynamic
TLS access:

	leaq	tls_var@TLSDESC(%rip), %rax
	call	*tls_var@TLSCALL(%rax)

or

	leal	tls_var@TLSDESC(%ebx), %eax
	call	*tls_var@TLSCALL(%eax)

CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS and RAX/EAX, are unchanged after CALL.  When
_dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
path.  __tls_get_addr is a normal function which doesn't preserve any
caller-saved registers.  _dl_tlsdesc_dynamic saved and restored integer
caller-saved registers, but didn't preserve any other caller-saved
registers.  Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
XSAVE and XSAVEC to save and restore all caller-saved registers.  This
fixes BZ #31372.

Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
to optimize elf_machine_runtime_setup.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-02-28 09:02:56 -08:00
H.J. Lu
e6350be7e9 sysdeps/unix/sysv/linux/x86_64/Makefile: Add the end marker
Add the end marker to tests, tests-container and modules-names.
2024-02-28 05:48:27 -08:00
Adhemerval Zanella
b53e73ea80 s390: Improve static-pie configure tests
Instead of tying based on the linker name and version, check for the
required support:

  * whether it does not generate dynamic TLS relocations in PIE
    (binutils PR ld/22263);
  * if it accepts --no-dynamic-linker (by using -static-pie);
  * and if it adds a DT_JMPREL pointing to .rela.iplt with static pie.

The patch also trims the comments, for binutils one of the tests should
already cover it.  The kernel ones are not clear which version should
have the backport, nor it is something that glibc can do much about
it.  Finally, the glibc is somewhat confusing, since it refers
to commits not related to s390x.

Checked with a build for s390x-linux-gnu.

Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
2024-02-28 10:09:53 -03:00
H.J. Lu
24c8db87c9 x86: Change ENQCMD test to CHECK_FEATURE_PRESENT
Since ENQCMD is mainly used in kernel, change the ENQCMD test to
CHECK_FEATURE_PRESENT.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-02-27 11:50:52 -08:00
Joe Ramsay
e302e10213 aarch64/fpu: Sync libmvec routines from 2.39 and before with AOR
This includes a fix for big-endian in AdvSIMD log, some cosmetic
changes, and numerous small optimisations mainly around inlining and
using indexed variants of MLA intrinsics.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-02-26 09:45:50 -03:00
Stefan Liebler
02782fd128 S390: Do not clobber r7 in clone [BZ #31402]
Starting with commit e57d8fc97b
"S390: Always use svc 0"
clone clobbers the call-saved register r7 in error case:
function or stack is NULL.

This patch restores the saved registers also in the error case.
Furthermore the existing test misc/tst-clone is extended to check
all error cases and that clone does not clobber registers in this
error case.
2024-02-26 13:37:46 +01:00
Sunil K Pandey
9f78a7c1d0 x86_64: Exclude SSE, AVX and FMA4 variants in libm multiarch
When glibc is built with ISA level 3 or higher by default, the resulting
glibc binaries won't run on SSE or FMA4 processors.  Exclude SSE, AVX and
FMA4 variants in libm multiarch when ISA level 3 or higher is enabled by
default.

When glibc is built with ISA level 2 enabled by default, only keep SSE4.1
variant.

Fixes BZ 31335.

NB: elf/tst-valgrind-smoke test fails with ISA level 4, because valgrind
doesn't support AVX512 instructions:

https://bugs.kde.org/show_bug.cgi?id=383010

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-02-25 13:20:51 -08:00
H.J. Lu
dfb05f8e70 x86-64: Save APX registers in ld.so trampoline
Add APX registers to STATE_SAVE_MASK so that APX registers are saved in
ld.so trampoline.  This fixes BZ #31371.

Also update STATE_SAVE_OFFSET and STATE_SAVE_MASK for i386 which will
be used by i386 _dl_tlsdesc_dynamic.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-02-25 09:22:15 -08:00
Simon Chopin
59e0441d4a tests: gracefully handle AppArmor userns containment
Recent AppArmor containment allows restricting unprivileged user
namespaces, which is enabled by default on recent Ubuntu systems.
When this happens, as is common with Linux Security Modules, the syscall
will fail with -EACCESS.

When that happens, the affected tests will now be considered unsupported
rather than simply failing.

Further information:

* https://gitlab.com/apparmor/apparmor/-/wikis/unprivileged_userns_restriction
* https://ubuntu.com/blog/ubuntu-23-10-restricted-unprivileged-user-namespaces
* https://manpages.ubuntu.com/manpages/jammy/man5/apparmor.d.5.html (for
  the return code)

V2:
* Fix duplicated line in check_unshare_hints
* Also handle similar failure in tst-pidfd_getpid

V3:
* Comment formatting
* Aded some more documentation on syscall return value

Signed-off-by: Simon Chopin <simon.chopin@canonical.com>
2024-02-23 08:50:00 -03:00
Adhemerval Zanella
1e9a550ba4 powerpc: Remove power7 strstr optimization
The optimization is not faster than the generic algorithm,
using the bench-strstr the geometric mean running on a POWER10 machine
using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97
(which uses the generic implementation).

Also, there is no need to redirect the internal str*/mem* call
to optimized version, internal ifunc is supported and enabled
for internal calls (meaning that the generic implementation
will use any asm optimization if available).

Checked on powerpc64le-linux-gnu.
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-02-23 08:50:00 -03:00
Adhemerval Zanella
f4c142bb9f arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405)
Instead of __dl_iterate_phdr. On ARM dlfo_eh_frame/dlfo_eh_count
maps to PT_ARM_EXIDX vaddr start / length.

On a Neoverse N1 machine with 160 cores, the following program:

  $ cat test.c
  #include <stdlib.h>
  #include <pthread.h>
  #include <assert.h>

  enum {
    niter = 1024,
    ntimes = 128,
  };

  static void *
  tf (void *arg)
  {
    int a = (int) arg;

    for (int i = 0; i < niter; i++)
      {
        void *p[ntimes];
        for (int j = 0; j < ntimes; j++)
  	p[j] = malloc (a * 128);
        for (int j = 0; j < ntimes; j++)
  	free (p[j]);
      }

    return NULL;
  }

  int main (int argc, char *argv[])
  {
    enum { nthreads = 16 };
    pthread_t t[nthreads];

    for (int i = 0; i < nthreads; i ++)
      assert (pthread_create (&t[i], NULL, tf, (void *) i) == 0);

    for (int i = 0; i < nthreads; i++)
      {
        void *r;
        assert (pthread_join (t[i], &r) == 0);
        assert (r == NULL);
      }

    return 0;
  }
  $ arm-linux-gnueabihf-gcc -fsanitize=address test.c -o test

Improves from ~15s to 0.5s.

Checked on arm-linux-gnueabihf.
2024-02-23 08:50:00 -03:00
Xi Ruoyao
e2a65ecc4b
math: Update mips64 ulps
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-02-22 21:28:25 +01:00
Daniel Cederman
aa4106db1d sparc: Treat the version field in the FPU control word as reserved
The FSR version field is read-only and might be non-zero.

This allows math/test-fpucw* to correctly pass when the version is
non-zero.

Signed-off-by: Daniel Cederman <cederman@gaisler.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-02-19 10:55:50 -03:00
Flavio Cruz
88b771ab5e Implement setcontext/getcontext/makecontext/swapcontext for Hurd x86_64
Tested with the tests provided by glibc plus some other toy examples.
Message-ID: <20240217202535.1860803-1-flaviocruz@gmail.com>
2024-02-17 21:45:35 +01:00
Flavio Cruz
e3da8f9bad Use proc_getchildren_rusage when available in getrusage and times.
Message-ID: <20240217164846.1837223-1-flaviocruz@gmail.com>
2024-02-17 21:14:39 +01:00
Florian Weimer
6a04404521 Linux: Switch back to assembly syscall wrapper for prctl (bug 29770)
Commit ff026950e2 ("Add a C wrapper for
prctl [BZ #25896]") replaced the assembler wrapper with a C function.
However, on powerpc64le-linux-gnu, the C variadic function
implementation requires extra work in the caller to set up the
parameter save area.  Calling a function that needs a parameter save
area without one (because the prototype used indicates the function is
not variadic) corrupts the caller's stack.   The Linux manual pages
project documents prctl as a non-variadic function.  This has resulted
in various projects over the years using non-variadic prototypes,
including the sanitizer libraries in LLVm and GCC (GCC PR 113728).

This commit switches back to the assembler implementation on most
targets and only keeps the C implementation for x86-64 x32.

Also add the __prctl_time64 alias from commit
b39ffab860 ("Linux: Add time64 alias for
prctl") to sysdeps/unix/sysv/linux/syscalls.list; it was not yet
present in commit ff026950e2.

This restores the old ABI on powerpc64le-linux-gnu, thus fixing
bug 29770.

Reviewed-By: Simon Chopin <simon.chopin@canonical.com>
2024-02-17 09:17:04 +01:00
Florian Weimer
0d9166c224 i386: Use generic memrchr in libc (bug 31316)
Before this change, we incorrectly used the SSE2 variant in the
implementation, without checking that the system actually supports
SSE2.

Tested-by: Sam James <sam@gentoo.org>
2024-02-16 07:41:04 +01:00
H.J. Lu
ef7f4b1fef Apply the Makefile sorting fix
Apply the Makefile sorting fix generated by sort-makefile-lines.py.
2024-02-15 11:19:56 -08:00
H.J. Lu
71d133c500 sysdeps/x86_64/Makefile (tests): Add the end marker 2024-02-15 11:12:13 -08:00