glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-22 04:50:07 +00:00

Author	SHA1	Message	Date
John David Anglin	4737e6a7a3	hppa/vdso: Provide 64-bit clock_gettime() vDSO only Adhemerval noticed that the gettimeofday() and 32-bit clock_gettime() vDSO calls won't be used by glibc on hppa, so there is no need to declare them. Both syscalls will be emulated by utilizing return values of the 64-bit clock_gettime() vDSO instead. Signed-off-by: Helge Deller <deller@gmx.de> Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>	2024-07-02 16:26:32 -04:00
YunQiang Su	9d0e9c8a13	MIPSr6/math: Use builtin fma and fmaf MIPSr6 has MADDF.s/MADDF.d instructions, which are fused. In MIPS ISA, double support can be subsetted. Only FMAF is enabled for this case. * sysdeps/mips/fpu/math-use-builtins-fma.h Signed-off-by: YunQiang Su <syq@gcc.gnu.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-01 14:52:30 -03:00
Florian Weimer	018f0fc3b8	elf: Support recursive use of dynamic TLS in interposed malloc It turns out that quite a few applications use bundled mallocs that have been built to use global-dynamic TLS (instead of the recommended initial-exec TLS). The previous workaround from commit `afe42e935b` ("elf: Avoid some free (NULL) calls in _dl_update_slotinfo") does not fix all encountered cases unfortunatelly. This change avoids the TLS generation update for recursive use of TLS from a malloc that was called during a TLS update. This is possible because an interposed malloc has a fixed module ID and TLS slot. (It cannot be unloaded.) If an initially-loaded module ID is encountered in __tls_get_addr and the dynamic linker is already in the middle of a TLS update, use the outdated DTV, thus avoiding another call into malloc. It's still necessary to update the DTV to the most recent generation, to get out of the slow path, which is why the check for recursion is needed. The bookkeeping is done using a global counter instead of per-thread flag because TLS access in the dynamic linker is tricky. All this will go away once the dynamic linker stops using malloc for TLS, likely as part of a change that pre-allocates all TLS during pthread_create/dlopen. Fixes commit `d2123d6827` ("elf: Fix slow tls access after dlopen [BZ #19924]"). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-07-01 19:02:11 +02:00
MayShao-oc	9dc645cb56	x86: Set default non_temporal_threshold for Zhaoxin processors Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound' on Zhaoxin processors without ERMS. The default 'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000 Zhaoxin processors, this patch updates the value to 'shared / cachesize_non_temporal_divisor'. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-30 06:26:43 -07:00
MayShao-oc	c19457aec6	x86_64: Optimize large size copy in memmove-ssse3 This patch optimizes large size copy using normal store when src > dst and overlap. Make it the same as the logic in memmove-vec-unaligned-erms.S. Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non- temporal threshold, this patch updates that value to '__x86_shared_non_temporal_threshold'. Currently, the __x86_shared_non_temporal_threshold is cpu-specific, and different CPUs will have different values based on the related nt-benchmark results. However, in memmove-ssse3, the nontemporal threshold uses '__x86_shared_cache_size_half', which sounds unreasonable. The performance is not changed drastically although shows overall improvements without any major regressions or gains. Results on Zhaoxin KX-7000: bench-memcpy geometric_mean(N=20) New / Original: 0.999 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 0.978 bench-memmove geometric_mean(N=20) New / Original: 1.000 bench-memmmove-large geometric_mean(N=20) New / Original: 0.962 Results on Intel Core i5-6600K: bench-memcpy geometric_mean(N=20) New / Original: 1.001 bench-memcpy-random geometric_mean(N=20) New / Original: 0.999 bench-memcpy-large geometric_mean(N=20) New / Original: 1.001 bench-memmove geometric_mean(N=20) New / Original: 0.995 bench-memmmove-large geometric_mean(N=20) New / Original: 0.936 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-30 06:26:43 -07:00
MayShao-oc	44d757eb9f	x86: Set preferred CPU features on the KH-40000 and KX-7000 Zhaoxin processors Fix code formatting under the Zhaoxin branch and add comments for different Zhaoxin models. Unaligned AVX load are slower on KH-40000 and KX-7000, so disable the AVX_Fast_Unaligned_Load. Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to use sse2_unaligned version of memset,strcpy and strcat. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-30 06:26:43 -07:00
Andrew Pinski	2f1f7a5f8a	Aarch64: Add new memset for Qualcomm's oryon-1 core Qualcom's new core, oryon-1, has a different characteristics for memset than the current versions of memset. For non-zero, larger sizes, using GPRs rather than the SIMD stores is ~30% faster. For even larger sizes, using the nontemporal stores is needed not to polute the L1/L2 caches. For zero values, using `dc zva` should be used. Since we know the size will always be 64 bytes, we don't need to figure out the size there. I started with the emag memset and added back the `dc zva` code. Changes since v1: * v3: Fix comment formating Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-30 13:47:17 +02:00
Andrew Pinski	4dc83cac78	Aarch64: Add memcpy for qualcomm's oryon-1 core Qualcomm's new core (oryon-1) has a different performance characteristic than other cores. For memcpy, it is faster to use the GPRs to do the copy for large sizes (2x faster). For even larger sizes, it is better to use the nontemporal load/store instructions so we don't pollute the L1/L2 caches. For smaller sizes, the characteristic are very similar to other cores. I used the thunderx memcpy as a starting point and expanded from there. Changes since v1: * v2: Fix ordering in Makefile. * v3: Fix comment grammar about the ldnp/stnp instructions. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-30 13:46:33 +02:00
Palmer Dabbelt	07fe71f59b	arm: Avoid UB in elf_machine_rel() This recently came up during a cleanup to remove misaligned accesses from the RISC-V port. Link: https://sourceware.org/pipermail/libc-alpha/2022-June/139961.html Suggested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Reviewed-by: Fangrui Song <maskray@google.com>	2024-06-26 12:45:43 +02:00
mengqinggang	a10b6ad471	LoongArch: Fix tst-gnu2-tls2 test case asm volatile ("movfcsr2gr $t0, $fcsr0" ::: "$t0"); asm volatile ("st.d $t0, %0" :"=m"(restore_fcsr)); generate to the following instructions with -Og flag: movfcsr2gr $t0, $zero addi.d $t0, $sp, 2047(0x7ff) addi.d $t0, $t0, 77(0x4d) st.w $t0, $t0, 0 fcsr0 register and restore_fcsr variable are both stored in t0 register. Change to: asm volatile ("movfcsr2gr %0, $fcsr0" :"=r"(restore_fcsr)); to avoid restore_fcsr address in t0. Comparing float value using memcmp because float value cannot be directly compared for equality. Put LOAD_REGISTER_FCSR and SAVE_REGISTER_FCC after LOAD_REGISTER_FLOAT. Some float instructions may change fcsr register.	2024-06-26 12:02:07 +08:00
Adhemerval Zanella	c90cfce849	posix: Fix pidfd_spawn/pidfd_spawnp leak if execve fails (BZ 31695) If the pidfd_spawn/pidfd_spawnp helper process succeeds, but evecve fails for some reason (either with an invalid/non-existent, memory allocation, etc.) the resulting pidfd is never closed, nor returned to caller (so it can call close). Since the process creation failed, it should be up to posix_spawn to also, close the file descriptor in this case (similar to what it does to reap the process). This patch also changes the waitpid with waitid (P_PIDFD) for pidfd case, to avoid a possible pid re-use. Checked on x86_64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-06-25 12:11:48 -03:00
Andreas K. Hüttel	d32c342425	Revert "MIPSr6/math: Use builtin fma and fmaf" Apologies, I mistakenly interpreted this to be already accepted. Reverting until v6 or later is reviewed and approved. This reverts commit `9e06e4a43b`.	2024-06-25 01:02:58 +02:00
Christoph Müllner	81c7f6193c	RISC-V: Execute a PAUSE hint in spin loops The atomic_spin_nop() macro can be used to run arch-specific code in the body of a spin loop to potentially improve efficiency. RISC-V's Zihintpause extension includes a PAUSE instruction for this use-case, which is encoded as a HINT, which means that it behaves like a NOP on systems that don't implement Zihintpause. Binutils supports Zihintpause since 2.36, so this patch uses the ".insn" directive to keep the code compatible with older toolchains. Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-06-24 21:36:49 +02:00
YunQiang Su	9e06e4a43b	MIPSr6/math: Use builtin fma and fmaf MIPSr6 has MADDF.s/MADDF.d instructions, which are fused. In MIPS ISA, double support can be subsetted. Only FMAF is enabled for this case. * sysdeps/mips/fpu/math-use-builtins-fma.h Signed-off-by: YunQiang Su <syq@gcc.gnu.org> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>	2024-06-24 19:43:57 +02:00
John David Anglin	aecde502e9	hppa/vdso: Add wrappers for vDSO functions The upcoming parisc (hppa) v6.11 Linux kernel will include vDSO support for gettimeofday(), clock_gettime() and clock_gettime64() syscalls for 32- and 64-bit userspace. The patch below adds the necessary glue code for glibc. Signed-off-by: Helge Deller <deller@gmx.de> Changes in v2: - add vsyscalls for 64-bit too	2024-06-23 19:39:28 -04:00
John David Anglin	9dddb26954	Update hppa libm-test-ulps	2024-06-23 13:51:25 -04:00
John David Anglin	da61ba3f89	Update hppa libm-test-ulps	2024-06-20 19:44:04 -04:00
Julian Zhu	9f2bf0e23a	RISC-V: Update ulps For the exp10m1, exp2m1, log10p1 and log2p1 implementations. Signed-off-by: Julian Zhu <jz531210@gmail.com>	2024-06-20 23:46:32 +02:00
Julian Zhu	cb20e7c7cc	MIPS: Update ulps Update mips32/mips64 ulps for the exp10m1, exp2m1, and log10p1 implementations. Signed-off-by: Julian Zhu <jz531210@gmail.com>	2024-06-20 23:45:24 +02:00
Florian Weimer	b375e597da	i386: Update ulps This is from a -march=i686 -mtune=generic build with --disable-multi-arch, running on a Cascade Lake CPU.	2024-06-20 19:00:48 +02:00
Florian Weimer	362588f7cc	s390x: Capture grep output in static PIE check The test is not a run-time check, so update the description. Also use readelf -W for a more stable output format and fix an LC_ALL typo. This avoids garbled configure messages: checking for s390-specific static PIE requirements (runtime check)... 0x0000000000000017 (JMPREL) 0x280 yes	2024-06-20 14:34:06 +02:00
Florian Weimer	71dafdf5f1	powerpc: Update ulps Results based on POWER8 and POWER9 machines running powerpc64-linux-gnu, with and without --disable-multi-arch.	2024-06-20 12:15:31 +02:00
Florian Weimer	3cb77b7d1e	i386: Update ulps Based on a -march=x86-64-v4 -mfpmath=sse build, with and without --disable-multi-arch, running on a Zen 4 CPU. Also used different -march=x8i6-64-v… settings.	2024-06-20 12:15:09 +02:00
Xi Ruoyao	9405d54c62	LoongArch: Update ulps Add ulps for recently added C23 exp10m1, exp2m1, and log10p1 functions. Signed-off-by: Xi Ruoyao <xry111@xry111.site>	2024-06-19 21:17:19 +02:00
Andreas K. Hüttel	4f1cf0c0e1	sparc: Regenerate ULPs Linux catbus 5.15.110-gentoo-r1 #1 SMP Fri Jun 9 17:53:23 PDT 2023 sparc64 sun4v UltraSparc T5 (Niagara5) GNU/Linux Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-06-19 14:58:32 +02:00
Stefan Liebler	19f6d6a480	s390x: Regenerate ULPs. Needed due to: - "Implement C23 log10p1" commit ID `55eb99e9a9` - "Implement C23 exp2m1, exp10m1" commit ID `7ec903e028`	2024-06-19 08:42:30 +02:00
mengqinggang	9a675d998e	LoongArch: Fix _dl_tlsdesc_dynamic in LSX case HWCAP value is overwritten at the first comparison of the LASX case. The second comparison at LSX get incorrect result. Change to use t0 to save HWCAP value, and use t1 to save comparison result.	2024-06-19 10:06:41 +08:00
Adhemerval Zanella	92341e3150	arm: Update ulps For the exp10m1, exp2m1, and log10p1 implementations.	2024-06-18 17:31:10 -03:00
Adhemerval Zanella	45f5f51b85	aarch64: Update ulps For the exp10m1, exp2m1, and log10p1 implementations.	2024-06-18 17:31:10 -03:00
Adhemerval Zanella	52b397bafa	powerpc: Update ulps For the exp10m1, exp2m1, and log10p1 implementations.	2024-06-18 17:31:10 -03:00
Florian Weimer	f6ea5d1291	Linux: Include <dl-symbol-redir-ifunc.h> in dl-sysdep.c The _dl_sysdep_parse_arguments function contains initalization of a large on-stack variable: dl_parse_auxv_t auxv_values = { 0, }; This uses a non-inline version of memset on powerpc64le-linux-gnu, so it must use the baseline memset.	2024-06-18 10:56:34 +02:00
Carlos Llamas	176671f604	linux: add definitions for hugetlb page size encodings A desired hugetlb page size can be encoded in the flags parameter of system calls such as mmap() and shmget(). The Linux UAPI headers have included explicit definitions for these encodings since v4.14. This patch adds these definitions that are used along with MAP_HUGETLB and SHM_HUGETLB flags as specified in the corresponding man pages. This relieves programs from having to duplicate and/or compute the encodings manually. Additionally, the filter on these definitions in tst-mman-consts.py is removed, as suggested by Florian. I then ran this tests successfully, confirming the alignment with the kernel headers. PASS: misc/tst-mman-consts original exit status 0 Signed-off-by: Carlos Llamas <cmllamas@google.com> Tested-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-06-18 10:56:34 +02:00
Stefan Liebler	e260ceb4aa	elf: Remove HWCAP_IMPORTANT Remove the definitions of HWCAP_IMPORTANT after removal of LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask. There HWCAP_IMPORTANT was used as default value. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-18 10:45:36 +02:00
Stefan Liebler	ad0aa1f549	elf: Remove LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask Remove the environment variable LD_HWCAP_MASK and the tunable glibc.cpu.hwcap_mask as those are not used anymore in common-code after removal in elf/dl-cache.c:search_cache(). The only remaining user is sparc32 where it is used in elf_machine_matches_host(). If sparc32 does not need it anymore, we can get rid of it at all. Otherwise we could also move LD_HWCAP_MASK / tunable glibc.cpu.hwcap_mask to be sparc32 specific. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-18 10:45:36 +02:00
Stefan Liebler	343439a31e	elf: Remove _DL_PLATFORMS_COUNT Remove the definitions of _DL_PLATFORMS_COUNT as those are not used anymore after removal in elf/dl-cache.c:search_cache(). Note: On x86, we can also get rid of the definitions HWCAP_PLATFORMS_START and HWCAP_PLATFORMS_COUNT. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-18 10:45:36 +02:00
Stefan Liebler	43c7c5e62d	elf: Remove _DL_FIRST_PLATFORM Remove the definitions of _DL_FIRST_PLATFORM as those were only used in the _DL_HWCAP_PLATFORM definitions and in _dl_string_platform(). Both were removed. Note: Removed on every architecture despite of powerpc, where _dl_string_platform() is still used. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-18 10:45:36 +02:00
Stefan Liebler	ed23449dac	elf: Remove _DL_HWCAP_PLATFORM Remove the definitions of _DL_HWCAP_PLATFORM as those are not used anymore after removal in elf/dl-cache.c:search_cache(). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-18 10:45:36 +02:00
Stefan Liebler	374c8b4483	elf: Remove platform strings in dl-procinfo.c Remove the platform strings in dl-procinfo.c where also the implementation of _dl_string_platform() was removed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-18 10:45:36 +02:00
Stefan Liebler	8faada8302	elf: Remove _dl_string_platform Despite of powerpc where the returned integer is stored in tcb, and the diagnostics output, there is no user anymore. Thus this patch removes the diagnostics output and _dl_string_platform for all other platforms. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-18 10:45:36 +02:00
Stefan Liebler	f14b6dfc87	x86: Remove HWCAP_START and HWCAP_COUNT Both defines are not used anymore. Those were only used for _dl_string_hwcap(), which itself was removed with commit `ab40f20364` "elf: Remove _dl_string_hwcap" Just clean up. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-18 10:45:36 +02:00
YunQiang Su	eaf4fc516a	math: Update mips32/mips64 ulps for log2p1	2024-06-17 21:45:53 +02:00
Andreas K. Hüttel	98ffc1bfeb	Convert to autoconf 2.72 (vanilla release, no distribution patches) As discussed at the patch review meeting Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> Reviewed-by: Simon Chopin <simon.chopin@canonical.com>	2024-06-17 21:15:28 +02:00
Joseph Myers	7ec903e028	Implement C23 exp2m1, exp10m1 C23 adds various <math.h> function families originally defined in TS 18661-4. Add the exp2m1 and exp10m1 functions (exp2(x)-1 and exp10(x)-1, like expm1). As with other such functions, these use type-generic templates that could be replaced with faster and more accurate type-specific implementations in future. Test inputs are copied from those for expm1, plus some additions close to the overflow threshold (copied from exp2 and exp10) and also some near the underflow threshold. exp2m1 has the unusual property of having an input (M_MAX_EXP) where whether the function overflows (under IEEE semantics) depends on the rounding mode. Although these could reasonably be XFAILed in the testsuite (as we do in some cases for arguments very close to a function's overflow threshold when an error of a few ulps in the implementation can result in the implementation not agreeing with an ideal one on whether overflow takes place - the testsuite isn't smart enough to handle this automatically), since these functions aren't required to be correctly rounding, I made the implementation check for and handle this case specially. The Makefile ordering expected by lint-makefiles for the new functions is a bit peculiar, but I implemented it in this patch so that the test passes; I don't know why log2 also needed moving in one Makefile variable setting when it didn't in my previous patches, but the failure showed a different place was expected for that function as well. The powerpc64le IFUNC setup seems not to be as self-contained as one might hope; it shouldn't be necessary to add IFUNCs for new functions such as these simply to get them building, but without setting up IFUNCs for the new functions, there were undefined references to __GI___expm1f128 (that IFUNC machinery results in no such function being defined, but doesn't stop include/math.h from doing the redirection resulting in the exp2m1f128 and exp10m1f128 implementations expecting to call it). Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-06-17 16:31:49 +00:00
Joseph Myers	55eb99e9a9	Implement C23 log10p1 C23 adds various <math.h> function families originally defined in TS 18661-4. Add the log10p1 functions (log10(1+x): like log1p, but for base-10 logarithms). This is directly analogous to the log2p1 implementation (except that whereas log2p1 has a smaller underflow range than log1p, log10p1 has a larger underflow range). The test inputs are copied from those for log1p and log2p1, plus a few more inputs in that wider underflow range. Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-06-17 13:48:13 +00:00
Joseph Myers	bb014f50c4	Implement C23 logp1 C23 adds various <math.h> function families originally defined in TS 18661-4. Add the logp1 functions (aliases for log1p functions - the name is intended to be more consistent with the new log2p1 and log10p1, where clearly it would have been very confusing to name those functions log21p and log101p). As aliases rather than new functions, the content of this patch is somewhat different from those actually adding new functions. Tests are shared with log1p, so this patch does mechanically update all affected libm-test-ulps files to expect the same errors for both functions. The vector versions of log1p on aarch64 and x86_64 are not updated to have logp1 aliases (and thus there are no corresponding header, tests, abilist or ulps changes for vector functions either). It would be reasonable for such vector aliases and corresponding changes to other files to be made separately. For now, the log1p tests instead avoid testing logp1 in the vector case (a Makefile change is needed to avoid problems with grep, used in generating the .c files for vector function tests, matching more than one ALL_RM_TEST line in a file testing multiple functions with the same inputs, when it assumes that the .inc file only has a single such line). Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-06-17 13:47:09 +00:00
Noah Goldstein	5b54a33435	x86: Fix value for `x86_memset_non_temporal_threshold` when it is undesirable When we don't want to use non-temporal stores for memset, we set `x86_memset_non_temporal_threshold` to SIZE_MAX. The current code, however, we using `maximum_non_temporal_threshold` as the upper bound which is `SIZE_MAX >> 4` so we ended up with a value of `0`. Fix is to just use `SIZE_MAX` as the upper bound for when setting the tunable. Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-06-14 17:25:05 -05:00
Andreas K. Hüttel	3953b5b88f	i686: Regenerate ulps Linux pinacolada 6.6.32-gentoo #1 SMP PREEMPT Sun Jun 9 14:18:17 CEST 2024 x86_64 Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz GenuineIntel GNU/Linux 32bit build for multilib environment Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-06-14 21:24:24 +02:00
Xi Ruoyao	97aa7b7346	LoongArch: Ensure sp 16-byte aligned for tlsdesc "ADDI sp, sp, 24" and "ADDI sp, sp, SZFCSREG" (SZFCSREG = 4) are misaligning the stack: the ABI mandates a 16-byte alignment. Fix it by changing the first one to "ADDI sp, sp, 32", and reuse the spare 4th slot for saving fcsr. Reported-by: Jinyang He <hejinyang@loongson.cn> Signed-off-by: Xi Ruoyao <xry111@xry111.site>	2024-06-14 10:14:54 +08:00
H.J. Lu	29807a271e	x86: Properly set x86 minimum ISA level [BZ #31883 ] Properly set libc_cv_have_x86_isa_level in shell for MINIMUM_X86_ISA_LEVEL defined as (__X86_ISA_V1 + __X86_ISA_V2 + __X86_ISA_V3 + __X86_ISA_V4) Also set __X86_ISA_V2 to 1 for i386 if __GCC_HAVE_SYNC_COMPARE_AND_SWAP_8 is defined. There are no changes in config.h nor in config.make on x86-64. On i386, -march=x86-64-v2 with GCC generates #define MINIMUM_X86_ISA_LEVEL 2 in config.h and have-x86-isa-level = 2 in config.make. This fixes BZ #31883. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-12 14:27:54 -07:00
Adhemerval Zanella	7edd3814b0	linux: Remove __stack_prot The __stack_prot is used by Linux to make the stack executable if a modules requires it. It is also marked as RELRO, which requires to change the segment permission to RW to update it. Also, there is no need to keep track of the flags: either the stack will have the default permission of the ABI or should be change to PROT_READ \| PROT_WRITE \| PROT_EXEC. The only additional flag, PROT_GROWSDOWN or PROT_GROWSUP, is Linux only and can be deducted from _STACK_GROWS_DOWN/_STACK_GROWS_UP. Also, the check_consistency function was already removed some time ago. Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-06-12 15:25:54 -03:00
H.J. Lu	09bc68b0ac	x86: Properly set MINIMUM_X86_ISA_LEVEL for i386 [BZ #31867 ] On i386, set the default minimum ISA level to 0, not 1 (baseline which includes SSE2). There are no changes in config.h nor in config.make on x86-64. This fixes BZ #31867. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Tested-by: Ian Jordan <immoloism@gmail.com> Reviewed-by: Sam James <sam@gentoo.org> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-06-11 00:10:08 -07:00
Joe Damato	bef2a827a5	x86: Enable non-temporal memset tunable for AMD In commit `46b5e98ef6` ("x86: Add seperate non-temporal tunable for memset") a tunable threshold for enabling non-temporal memset was added, but only for Intel hardware. Since that commit, new benchmark results suggest that non-temporal memset is beneficial on AMD, as well, so allow this tunable to be set for AMD. See: https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing which has been updated to include data using different stategies for large memset on AMD Zen2, Zen3, and Zen4. Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-06-10 16:18:18 -05:00
Samuel Thibault	74f9ee3b91	hurd: Fix lsetxattr return value The manpage says that lsetxattr returns 0 on success, like setxattr.	2024-06-10 21:56:13 +02:00
Joe Damato	92c270d32c	Linux: Add epoll ioctls As of Linux kernel 6.9, some ioctls and a parameters structure have been introduced which allow user programs to control whether a particular epoll context will busy poll. Update the headers to include these for the convenience of user apps. The ioctls were added in Linux kernel 6.9 commit 18e2bf0edf4dd ("eventpoll: Add epoll ioctl for epoll_params") [1] to include/uapi/linux/eventpoll.h. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/diff/?h=v6.9&id=18e2bf0edf4dd Signed-off-by: Joe Damato <jdamato@fastly.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-04 12:09:15 -05:00
Szabolcs Nagy	2a9943b4a0	math: Fix exp10 undefined left shift Left shift of ki is undefined when ki<0, copy the logic from exp, which uses unsigned arithmetics, to fix it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-04 15:33:26 +01:00
Joseph Myers	1d441791cb	Add new AArch64 HWCAP2 definitions from Linux 6.9 to bits/hwcap.h Linux 6.9 adds 15 new HWCAP2_* values for AArch64; add them to bits/hwcap.h in glibc. Tested with build-many-glibcs.py for aarch64-linux-gnu.	2024-06-04 12:25:05 +00:00
Noah Goldstein	46b5e98ef6	x86: Add seperate non-temporal tunable for memset The tuning for non-temporal stores for memset vs memcpy is not always the same. This includes both the exact value and whether non-temporal stores are profitable at all for a given arch. This patch add `x86_memset_non_temporal_threshold`. Currently we disable non-temporal stores for non Intel vendors as the only benchmarks showing its benefit have been on Intel hardware. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-30 12:36:09 -05:00
Noah Goldstein	5bf0ab8057	x86: Improve large memset perf with non-temporal stores [RHEL-29312] Previously we use `rep stosb` for all medium/large memsets. This is notably worse than non-temporal stores for large (above a few MBs) memsets. See: https://docs.google.com/spreadsheets/d/1opzukzvum4n6-RUVHTGddV6RjAEil4P2uMjjQGLbLcU/edit?usp=sharing For data using different stategies for large memset on ICX and SKX. Using non-temporal stores can be up to 3x faster on ICX and 2x faster on SKX. Historically, these numbers would not have been so good because of the zero-over-zero writeback optimization that `rep stosb` is able to do. But, the zero-over-zero writeback optimization has been removed as a potential side-channel attack, so there is no longer any good reason to only rely on `rep stosb` for large memsets. On the flip size, non-temporal writes can avoid data in their RFO requests saving memory bandwidth. All of the other changes to the file are to re-organize the code-blocks to maintain "good" alignment given the new code added in the `L(stosb_local)` case. The results from running the GLIBC memset benchmarks on TGL-client for N=20 runs: Geometric Mean across the suite New / Old EXEX256: 0.979 Geometric Mean across the suite New / Old EXEX512: 0.979 Geometric Mean across the suite New / Old AVX2 : 0.986 Geometric Mean across the suite New / Old SSE2 : 0.979 Most of the cases are essentially unchanged, this is mostly to show that adding the non-temporal case didn't add any regressions to the other cases. The results on the memset-large benchmark suite on TGL-client for N=20 runs: Geometric Mean across the suite New / Old EXEX256: 0.926 Geometric Mean across the suite New / Old EXEX512: 0.925 Geometric Mean across the suite New / Old AVX2 : 0.928 Geometric Mean across the suite New / Old SSE2 : 0.924 So roughly a 7.5% speedup. This is lower than what we see on servers (likely because clients typically have faster single-core bandwidth so saving bandwidth on RFOs is less impactful), but still advantageous. Full test-suite passes on x86_64 w/ and w/o multiarch. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-30 12:36:09 -05:00
Xi Ruoyao	0c1d2c277a	LoongArch: Use "$fcsr0" instead of "$r0" in _FPU_{GET,SET}CW Clang inline-asm parser does not allow using "$r0" in movfcsr2gr/movgr2fcsr, so everything using _FPU_{GET,SET}CW is now failing to build with Clang on LoongArch. As we now requires Binutils >= 2.41 which supports using "$fcsr0" here, use it instead of "$r0" to fix the issue. Link: https://github.com/loongson-community/discussions/issues/53#issuecomment-2081507390 Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=4142b2368353 Signed-off-by: Xi Ruoyao <xry111@xry111.site>	2024-05-28 09:17:05 +08:00
Xin Wang	e0f7f1808f	x86_64: Reformat elf_machine_rela A space is added before the left bracket of the x86_64 elf_machine_rela function, in order to harmonize with the rest of the implementation of the function and to make it easier to retrieve the function. The lines where the function definition is located has been re-indented, as well as its left curly bracket placed in the correct position. Signed-off-by: Xin Wang <yw987194828@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-27 13:46:45 -07:00
Sunil K Pandey	1b713c9a53	i386: Disable Intel Xeon Phi tests for GCC 15 and above (BZ 31782) This patch disables Intel Xeon Phi tests for GCC 15 and above. GCC 15 removed Intel Xeon Phi ISA support. commit e1a7e2c54d52d0ba374735e285b617af44841ace Author: Haochen Jiang <haochen.jiang@intel.com> Date: Mon May 20 10:43:44 2024 +0800 i386: Remove Xeon Phi ISA support Fixes BZ 31782. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-27 12:28:13 -07:00
H.J. Lu	f981bf6b9d	parse_fdinfo: Don't advance pointer twice [BZ #31798 ] pidfd_getpid.c has /* Ignore invalid large values. / if (INT_MULTIPLY_WRAPV (10, n, &n) \|\| INT_ADD_WRAPV (n, l++ - '0', &n)) return -1; For GCC older than GCC 7, INT_ADD_WRAPV(a, b, r) is defined as _GL_INT_OP_WRAPV (a, b, r, +, _GL_INT_ADD_RANGE_OVERFLOW) and *l++ - '0' is evaluated twice. Fix BZ #31798 by moving "l++" out of the if statement. Tested with GCC 6.4 and GCC 14.1. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-05-27 06:52:45 -07:00
H.J. Lu	23c60af6dc	sysdeps/ieee754/ldbl-opt/Makefile: Split and sort libnldbl-calls Put each item on a separate line and sort libnldbl-calls. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-24 10:25:40 -07:00
H.J. Lu	639c143db3	sysdeps/ieee754/ldbl-opt/Makefile: Remove test-nldbl-redirect-static Remove $(objpfx)test-nldbl-redirect-static checked in by accident. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-24 06:36:18 -07:00
H.J. Lu	acfb169b3c	sysdeps/ieee754/ldbl-opt/Makefile: Split and sort tests Put each test on a separate line and sort tests. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-24 06:31:49 -07:00
Stefan Liebler	4af49c60a1	s390x: Regenerate ULPs. Needed due to: "Implement C23 log2p1" commit ID `79c52daf47`	2024-05-24 09:53:49 +02:00
Joseph Myers	84d2762922	Update kernel version to 6.9 in header constant tests This patch updates the kernel version in the tests tst-mman-consts.py and tst-mount-consts.py to 6.9. (There are no new constants covered by these tests in 6.9 that need any other header changes; tst-pidfd-consts.py was updated separately along with adding new constants relevant to that test.) Tested with build-many-glibcs.py.	2024-05-23 14:04:48 +00:00
Adhemerval Zanella	eaa8113bf0	math: Provide missing math symbols on libc.a (BZ 31781) The libc.a for alpha, s390, and sparcv9 does not provide copysignf64x, copysignf128, frexpf64x, frexpf128, modff64x, and modff128. Checked with a static build for the affected ABIs. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-23 09:36:08 -03:00
Adhemerval Zanella	1664bbf238	s390: Make utmp32, utmpx32, and login32 shared only (BZ 31790) The function that work with 'struct utmp32' and 'struct utmpx32' are only for compat symbols. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-23 09:36:08 -03:00
Adhemerval Zanella	18dbe27847	microblaze: Remove cacheflush from libc.a (BZ 31788) microblaze does not export it in libc.so nor the kernel provides the cacheflush syscall. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-23 09:36:08 -03:00
Adhemerval Zanella	d8ebde14fb	powerpc: Remove duplicated llrintf and llrintf32 from libm.a (BZ 31787) Both the generic and POWER6 versions provide definitions of the symbol, which are already provided by the ifunc resolver. Checked on powerpc-linux-gnu-power4. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-23 09:36:08 -03:00
Adhemerval Zanella	5fededd825	powerpc: Remove duplicate strchrnul and strncasecmp_l libc.a (BZ 31786) For powerpc64 the generic version provides a weak definition of strchrnul, which are already provided by the ifunc resolver. The powerpc32 version is slight different, where for static case there is no iFUNC support. The strncasecmp_l is provided ifunc resolver. Checked on powerpc-linux-gnu-power4 and powerpc64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-23 09:36:08 -03:00
Adhemerval Zanella	62eaa46739	loongarch: Remove duplicate strnlen in libc.a (BZ 31785) The generic version provides weak definitions of strnlen, which are already provided by the ifunc resolver. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-23 09:36:08 -03:00
Adhemerval Zanella	ef9596352b	aarch64: Remove duplicate memchr/strlen in libc.a (BZ 31777) The generic version provides weak definitions of memchr/strlen, which are already provided by the ifunc resolvers. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-23 09:36:08 -03:00
Joseph Myers	e9a37242f9	Update PIDFD_* constants for Linux 6.9 Linux 6.9 adds some more PIDFD_* constants. Add them to glibc's sys/pidfd.h, including updating comments that said FLAGS was reserved and must be 0, along with updating tst-pidfd-consts.py. Tested with build-many-glibcs.py.	2024-05-23 12:22:40 +00:00
H.J. Lu	43d41ae6d7	Don't provide XXXf128_do_not_use aliases [BZ #31757 ] Don't provide __nexttowardf128_do_not_use, nexttowardf128_do_not_use, finitef128_do_not_use, isinff128_do_not_use and isnanf128_do_not_use. This fixes BZ #31757. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-05-22 06:12:17 -07:00
Adhemerval Zanella	5d4999e519	math: Fix isnanf128 static build (BZ 31774) Some static implementation of float128 routines might call __isnanf128, which is not provided by the static object. Checked on x86_64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-21 16:53:27 -03:00
Adhemerval Zanella	1f09aae36a	math: Fix i386 and m68k exp10 on static build (BZ 31775) The commit `08ddd26814` removed the static exp10 on i386 and m68k with an empty w_exp10.c (required for the ABIs that uses the newly implementation). This patch fixes by adding the required symbols on the arch-specific w_exp{f}_compat.c implementation. Checked on i686-linux-gnu and with a build for m68k-linux-gnu. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>	2024-05-21 13:44:22 -03:00
Adhemerval Zanella	0b716305df	math: Fix i386 and m68k fmod/fmodf on static build (BZ 31488) The commit `16439f419b` removed the static fmod/fmodf on i386 and m68k with and empty w_fmod.c (required for the ABIs that uses the newly implementation). This patch fixes by adding the required symbols on the arch-specific w_fmod{f}_compat.c implementation. To statically build fmod fails on some ABI (alpha, s390, sparc) because it does not export the ldexpf128, this is also fixed by this patch. Checked on i686-linux-gnu and with a build for m68k-linux-gnu. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Tested-by: Aurelien Jarno <aurelien@aurel32.net>	2024-05-21 13:43:39 -03:00
H.J. Lu	437c94e04b	Remove the clone3 symbol from libc.a [BZ #31770 ] clone3 isn't exported from glibc and is hidden in libc.so. Fix BZ #31770 by removing clone3 alias. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-05-21 07:05:08 -07:00
Joe Ramsay	0fed0b250f	aarch64/fpu: Add vector variants of pow Plus a small amount of moving includes around in order to be able to remove duplicate definition of asuint64. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-05-21 14:38:49 +01:00
caiyinyu	3c1e22372d	LoongArch: Update ulps For the log2p1 implementation.	2024-05-21 12:08:25 +08:00
mengqinggang	16d47c1594	LoongArch: Fix tst-gnu2-tls2 compiler error Add -mno-lsx to tst-gnu2-tlsmod*.c if gcc support -mno-lsx. Add escape character '\' in vector support test function.	2024-05-21 11:23:03 +08:00
H.J. Lu	8428278b5f	i386: Don't define stpncpy alias when used in IFUNC [BZ #31768 ] Fix BZ #31768 by not defining stpncpy alias when used in IFUNC. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2024-05-20 19:35:00 -07:00
Adhemerval Zanella	f83e461f10	powerpc: Update ulps For the log2p1 implementation.	2024-05-20 13:12:23 -03:00
Adhemerval Zanella	32b2aa59da	arm: Update ulps For the log2p1 implementation.	2024-05-20 13:12:23 -03:00
Adhemerval Zanella	241338bd6f	aarch64: Update ulps For the log2p1 implementation.	2024-05-20 13:12:23 -03:00
Joseph Myers	79c52daf47	Implement C23 log2p1 C23 adds various <math.h> function families originally defined in TS 18661-4. Add the log2p1 functions (log2(1+x): like log1p, but for base-2 logarithms). This illustrates the intended structure of implementations of all these function families: define them initially with a type-generic template implementation. If someone wishes to add type-specific implementations, it is likely such implementations can be both faster and more accurate than the type-generic one and can then override it for types for which they are implemented (adding benchmarks would be desirable in such cases to demonstrate that a new implementation is indeed faster). The test inputs are copied from those for log1p. Note that these changes make gen-auto-libm-tests depend on MPFR 4.2 (or later). The bulk of the changes are fairly generic for any such new function. (sysdeps/powerpc/nofpu/Makefile only needs changing for those type-generic templates that use fabs.) Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-05-20 13:41:39 +00:00
Joseph Myers	cf0ca8d52e	Update syscall lists for Linux 6.9 Linux 6.9 has no new syscalls. Update the version number in syscall-names.list to reflect that it is still current for 6.9. Tested with build-many-glibcs.py.	2024-05-20 13:10:31 +00:00
H.J. Lu	7935e7a537	Rename procutils_read_file to __libc_procutils_read_file [BZ #31755 ] Fix BZ #31755 by renaming the internal function procutils_read_file to __libc_procutils_read_file. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-05-20 05:22:43 -07:00
H.J. Lu	4e21cb95e2	nearbyint: Don't define alias when used in IFUNC [BZ #31759 ] Fix BZ #31759 by not defining nearbyint aliases when used in IFUNC. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-05-20 05:21:41 -07:00
Florian Weimer	8d7b6b4cb2	socket: Use may_alias on sockaddr structs (bug 19622) This supports common coding patterns. The GCC C front end before version 7 rejects the may_alias attribute on a struct definition if it was not present in a previous forward declaration, so this attribute can only be conditionally applied. This implements the spirit of the change in Austin Group issue 1641. Suggested-by: Marek Polacek <polacek@redhat.com> Suggested-by: Jakub Jelinek <jakub@redhat.com> Reviewed-by: Sam James <sam@gentoo.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-05-18 09:33:19 +02:00
Manjunath Matti	a81cdde1cb	powerpc64: Fix by using the configure value $libc_cv_cc_submachine [BZ #31629 ] This patch ensures that $libc_cv_cc_submachine, which is set from "--with-cpu", overrides $CFLAGS for configure time tests. Suggested-by: Peter Bergner <bergner@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2024-05-16 17:31:45 -05:00
Joe Ramsay	75207bde68	aarch64/fpu: Add vector variants of cbrt Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-05-16 14:35:06 +01:00
Joe Ramsay	157f89fa3d	aarch64/fpu: Add vector variants of hypot Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-05-16 14:34:43 +01:00
mengqinggang	1dbf2bef79	LoongArch: Add support for TLS Descriptors This is mostly based on AArch64 and RISC-V implementation. Add R_LARCH_TLS_DESC32 and R_LARCH_TLS_DESC64 relocations. For _dl_tlsdesc_dynamic function slow path, temporarily save and restore all vector registers.	2024-05-15 10:31:53 +08:00
Joe Ramsay	90a6ca8b28	aarch64: Fix AdvSIMD libmvec routines for big-endian Previously many routines used * to load from vector types stored in the data table. This is emitted as ldr, which byte-swaps the entire vector register, and causes bugs for big-endian when not all lanes contain the same value. When a vector is to be used this way, it has been replaced with an array and the load with an explicit ld1 intrinsic, which byte-swaps only within lanes. As well, many routines previously used non-standard GCC syntax for vector operations such as indexing into vectors types with [] and assembling vectors using {}. This syntax should not be mixed with ACLE, as the former does not respect endianness whereas the latter does. Such examples have been replaced with, for instance, vcombine_* and vgetq_lane* intrinsics. Helpers which only use the GCC syntax, such as the v_call helpers, do not need changing as they do not use intrinsics. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-05-14 13:10:33 +01:00
Adhemerval Zanella	ae515ba530	powerpc: Fix __fesetround_inline_nocheck on POWER9+ (BZ 31682) The `e68b1151f7` commit changed the __fesetround_inline_nocheck implementation to use mffscrni (through __fe_mffscrn) instead of mtfsfi. For generic powerpc ceil/floor/trunc, the function is supposed to disable the floating-point inexact exception enable bit, however mffscrni does not change any exception enable bits. This patch fixes by reverting the optimization for the __fesetround_inline_nocheck. Checked on powerpc-linux-gnu. Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>	2024-05-09 08:59:30 -03:00
Gabi Falk	dd5f891c1a	x86_64: Fix missing wcsncat function definition without multiarch (x86-64-v4) This code expects the WCSCAT preprocessor macro to be predefined in case the evex implementation of the function should be defined with a name different from __wcsncat_evex. However, when glibc is built for x86-64-v4 without multiarch support, sysdeps/x86_64/wcsncat.S defines WCSNCAT variable instead of WCSCAT to build it as wcsncat. Rename the variable to WCSNCAT, as it is actually a better naming choice for the variable in this case. Reported-by: Kenton Groombridge Link: https://bugs.gentoo.org/921945 Fixes: `64b8b6516b` ("x86: Add evex optimized functions for the wchar_t strcpy family") Signed-off-by: Gabi Falk <gabifalk@gmx.com> Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2024-05-08 07:37:59 -07:00
Adhemerval Zanella	1e1ad714ee	support: Add envp argument to support_capture_subprogram So tests can specify a list of environment variables. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2024-05-07 12:16:36 -03:00
Adhemerval Zanella	bcae44ea85	elf: Only process multiple tunable once (BZ 31686) The `680c597e9c` commit made loader reject ill-formatted strings by first tracking all set tunables and then applying them. However, it does not take into consideration if the same tunable is set multiple times, where parse_tunables_string appends the found tunable without checking if it was already in the list. It leads to a stack-based buffer overflow if the tunable is specified more than the total number of tunables. For instance: GLIBC_TUNABLES=glibc.malloc.check=2:... (repeat over the number of total support for different tunable). Instead, use the index of the tunable list to get the expected tunable entry. Since now the initial list is zero-initialized, the compiler might emit an extra memset and this requires some minor adjustment on some ports. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reported-by: Yuto Maeda <maeda@cyberdefense.jp> Reported-by: Yutaro Shimizu <shimizu@cyberdefense.jp> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2024-05-07 12:16:36 -03:00
H.J. Lu	5f245f3bfb	Add crt1-2.0.o for glibc 2.0 compatibility tests Starting from glibc 2.1, crt1.o contains _IO_stdin_used which is checked by _IO_check_libio to provide binary compatibility for glibc 2.0. Add crt1-2.0.o for tests against glibc 2.0. Define tests-2.0 for glibc 2.0 compatibility tests. Add and update glibc 2.0 compatibility tests for stderr, matherr and pthread_kill. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-05-06 07:49:40 -07:00
Amrita H S	23f0d81608	powerpc: Optimized strncmp for power10 This patch is based on __strcmp_power10. Improvements from __strncmp_power9: 1. Uses new POWER10 instructions - This code uses lxvp to decrease contention on load by loading 32 bytes per instruction. 2. Performance implication - This version has around 38% better performance on average. - Minor performance regression is seen for few small sizes and specific combination of alignments. Signed-off-by: Amrita H S <amritahs@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2024-05-06 09:01:29 -05:00
Stafford Horne	643d9d38d5	or1k: Add hard float support This patch adds hardware floating point support to OpenRISC. Hardware floating point toolchain builds are enabled by passing the machine specific argument -mhard-float to gcc via CFLAGS. With this enabled GCC generates floating point instructions for single-precision operations and exports __or1k_hard_float__. There are 2 main parts to this patch. - Implement fenv functions to update the FPCSR flags keeping it in sync with sfp (software floating point). - Update machine context functions to store and restore the FPCSR state. On mcontext_t ABI This patch adds __fpcsr to mcontext_t. This is an ABI change, but also an ABI fix. The Linux kernel has always defined padding in mcontext_t that space was missing from the glibc ABI. In Linux this unused space has now been re-purposed for storing the FPCSR. This patch brings OpenRISC glibc in line with the Linux kernel and other libc implementation (musl). Compatibility getcontext, setcontext, etc symbols have been added to allow for binaries expecting the old ABI to continue to work. Hard float ABI The calling conventions and types do not change with OpenRISC hard-float so glibc hard-float builds continue to use dynamic linker /lib/ld-linux-or1k.so.1. Testing I have tested this patch both with hard-float and soft-float builds and the test results look fine to me. Results are as follows: Hard Float # failures FAIL: elf/tst-sprof-basic (Haven't figured out yet, not related to hard-float) FAIL: gmon/tst-gmon-pie (PIE bug in or1k toolchain) FAIL: gmon/tst-gmon-pie-gprof (PIE bug in or1k toolchain) FAIL: iconvdata/iconv-test (timeout, passed when run manually) FAIL: nptl/tst-cond24 (Timeout) FAIL: nptl/tst-mutex10 (Timeout) # summary 6 FAIL 4289 PASS 86 UNSUPPORTED 16 XFAIL 2 XPASS # versions Toolchain: or1k-smhfpu-linux-gnu Compiler: gcc version 14.0.1 20240324 (experimental) [master r14-9649-gbb04a11418f] (GCC) Binutils: GNU assembler version 2.42.0 (or1k-smhfpu-linux-gnu) using BFD version (GNU Binutils) 2.42.0.20240324 Linux: Linux buildroot 6.9.0-rc1-00008-g4dc70e1aadfa #112 SMP Sat Apr 27 06:43:11 BST 2024 openrisc GNU/Linux Tester: shorne Glibc: 2024-04-25 `b62928f907` Florian Weimer x86: In ld.so, diagnose missing APX support in APX-only builds (origin/master, origin/HEAD) Soft Float # failures FAIL: elf/tst-sprof-basic FAIL: gmon/tst-gmon-pie FAIL: gmon/tst-gmon-pie-gprof FAIL: nptl/tst-cond24 FAIL: nptl/tst-mutex10 # summary 5 FAIL 4295 PASS 81 UNSUPPORTED 16 XFAIL 2 XPASS # versions Toolchain: or1k-smh-linux-gnu Compiler: gcc version 14.0.1 20240324 (experimental) [master r14-9649-gbb04a11418f] (GCC) Binutils: GNU assembler version 2.42.0 (or1k-smh-linux-gnu) using BFD version (GNU Binutils) 2.42.0.20240324 Linux: Linux buildroot 6.9.0-rc1-00008-g4dc70e1aadfa #112 SMP Sat Apr 27 06:43:11 BST 2024 openrisc GNU/Linux Tester: shorne Glibc: 2024-04-25 `b62928f907` Florian Weimer x86: In ld.so, diagnose missing APX support in APX-only builds (origin/master, origin/HEAD) Documentation: https://raw.githubusercontent.com/openrisc/doc/master/openrisc-arch-1.4-rev0.pdf Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-05-03 18:28:18 +01:00
Stafford Horne	b57adfa49b	or1k: Add hard float libm-test-ulps This patch adds the ulps test file to prepare for the upcoming hard float patch. This is separated out to make the hard float patch smaller. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-05-03 18:28:18 +01:00
Gabi Falk	5a2cf833f5	i686: Fix multiple definitions of __memmove_chk and __memset_chk Commit `c73c96a4a1` updated memcpy.S and mempcpy.S, but omitted memmove.S and memset.S. As a result, the static library built as PIC, whether with or without multiarch support, contains two definitions for each of the __memmove_chk and __memset_chk symbols. /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../i686-pc-linux-gnu/bin/ld: /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../lib/libc.a(memset-ia32.o): in function `__memset_chk': /var/tmp/portage/sys-libs/glibc-2.39-r3/work/glibc-2.39/string/../sysdeps/i386/i686/memset.S:32: multiple definition of `__memset_chk'; /usr/lib/gcc/i686-pc-linux-gnu/14/../../../../lib/libc.a(memset_chk.o):/var/tmp/portage/sys-libs/glibc-2.39-r3/work/glibc-2.39/debug/../sysdeps/i386/i686/multiarch/memset_chk.c:24: first defined here After this change, regardless of PIC options, the static library, built for i686 with multiarch contains implementations of these functions respectively from debug/memmove_chk.c and debug/memset_chk.c, and without multiarch contains implementations of these functions respectively from sysdeps/i386/memmove_chk.S and sysdeps/i386/memset_chk.S. This ensures that memmove and memset won't pull in __chk_fail and the routines it calls. Reported-by: Sam James <sam@gentoo.org> Tested-by: Sam James <sam@gentoo.org> Fixes: `c73c96a4a1` ("i686: Fix build with --disable-multiarch") Signed-off-by: Gabi Falk <gabifalk@gmx.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>	2024-05-02 11:51:10 +01:00
Gabi Falk	0fdf4ba48c	i586: Fix multiple definitions of __memcpy_chk and __mempcpy_chk /home/bmg/install/compilers/x86_64-linux-gnu/lib/gcc/x86_64-glibc-linux-gnu/13.2.1/../../../../x86_64-glibc-linux-gnu/bin/ld: /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(memcpy_chk.o): in function `__memcpy_chk': /home/bmg/src/glibc/debug/../sysdeps/i386/memcpy_chk.S:29: multiple definition of `__memcpy_chk';/home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(memcpy.o):/home/bmg/src/glibc/string/../sysdeps/i386/i586/memcpy.S:31: first defined here /home/bmg/install/compilers/x86_64-linux-gnu/lib/gcc/x86_64-glibc-linux-gnu/13.2.1/../../../../x86_64-glibc-linux-gnu/bin/ld: /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(mempcpy_chk.o): in function `__mempcpy_chk': /home/bmg/src/glibc/debug/../sysdeps/i386/mempcpy_chk.S:28: multiple definition of `__mempcpy_chk'; /home/bmg/build/glibcs/i586-linux-gnu/glibc/libc.a(mempcpy.o):/home/bmg/src/glibc/string/../sysdeps/i386/i586/memcpy.S:31: first defined here After this change, the static library built for i586, regardless of PIC options, contains implementations of these functions respectively from sysdeps/i386/memcpy_chk.S and sysdeps/i386/mempcpy_chk.S. This ensures that memcpy and mempcpy won't pull in __chk_fail and the routines it calls. Reported-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Gabi Falk <gabifalk@gmx.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>	2024-05-02 11:50:21 +01:00
Carlos O'Donell	91695ee459	time: Allow later version licensing. The FSF's Licensing and Compliance Lab noted a discrepancy in the licensing of several files in the glibc package. When timespect_get.c was impelemented the license did not include the standard ", or (at your option) any later version." text. Change the license in timespec_get.c and all copied files to match the expected license. This change was previously approved in principle by the FSF in RT ticket #1316403. And a similar instance was fixed in commit `46703efa02`.	2024-05-01 09:03:26 -04:00
Wilco Dijkstra	6dae61567f	AArch64: Remove unused defines of CPU names Remove unused defines of CPU names in cpu-features.h. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-04-30 13:32:29 +01:00
Florian Weimer	b62928f907	x86: In ld.so, diagnose missing APX support in APX-only builds At this point, this is mainly a tool for testing the early ld.so CPU compatibility diagnostics: GCC uses the new instructions in most functions, so it's easy to spot if some of the early code is not built correctly. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-04-25 17:20:28 +02:00
Florian Weimer	3a3a449742	i386: ulp update for SSE2 --disable-multi-arch configurations	2024-04-25 12:56:48 +02:00
H.J. Lu	46c9997413	x86: Define MINIMUM_X86_ISA_LEVEL in config.h [BZ #31676 ] Define MINIMUM_X86_ISA_LEVEL at configure time to avoid /usr/bin/ld: …/build/elf/librtld.os: in function `init_cpu_features': …/git/elf/../sysdeps/x86/cpu-features.c:1202: undefined reference to `_dl_runtime_resolve_fxsave' /usr/bin/ld: …/build/elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object /usr/bin/ld: final link failed: bad value collect2: error: ld returned 1 exit status when glibc is built with -march=x86-64-v3 and configured with --with-rtld-early-cflags=-march=x86-64, which is used to allow ld.so to print an error message on unsupported CPUs: Fatal glibc error: CPU does not support x86-64-v3 This fixes BZ #31676. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2024-04-24 04:50:56 -07:00
caiyinyu	095067efdf	LoongArch: Add glibc.cpu.hwcap support. The current IFUNC selection is always using the most recent features which are available via AT_HWCAP. But in some scenarios it is useful to adjust this selection. The environment variable: GLIBC_TUNABLES=glibc.cpu.hwcaps=-xxx,yyy,zzz,.... can be used to enable HWCAP feature yyy, disable HWCAP feature xxx, where the feature name is case-sensitive and has to match the ones used in sysdeps/loongarch/cpu-tunables.c. Signed-off-by: caiyinyu <caiyinyu@loongson.cn>	2024-04-24 18:22:38 +08:00
Florian Weimer	f4724843ad	nptl: Fix tst-cancel30 on kernels without ppoll_time64 support Fall back to ppoll if ppoll_time64 fails with ENOSYS. Fixes commit `370da8a121` ("nptl: Fix tst-cancel30 on sparc64"). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-04-23 21:16:32 +02:00
Florian Weimer	5361ad3910	login: Use unsigned 32-bit types for seconds-since-epoch These fields store timestamps when the system was running. No Linux systems existed before 1970, so these values are unused. Switching to unsigned types allows continued use of the existing struct layouts beyond the year 2038. The intent is to give distributions more time to switch to improved interfaces that also avoid locking/data corruption issues. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-04-19 14:38:17 +02:00
Florian Weimer	9abdae94c7	login: structs utmp, utmpx, lastlog _TIME_BITS independence (bug 30701) These structs describe file formats under /var/log, and should not depend on the definition of _TIME_BITS. This is achieved by defining __WORDSIZE_TIME64_COMPAT32 to 1 on 32-bit ports that support 32-bit time_t values (where __time_t is 32 bits). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-04-19 14:38:17 +02:00
Florian Weimer	4d4da5aab9	login: Check default sizes of structs utmp, utmpx, lastlog The default <utmp-size.h> is for ports with a 64-bit time_t. Ports with a 32-bit time_t or with __WORDSIZE_TIME64_COMPAT32=1 need to override it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-04-19 14:38:17 +02:00
Florian Weimer	14e56bd4ce	powerpc: Fix ld.so address determination for PCREL mode (bug 31640) This seems to have stopped working with some GCC 14 versions, which clobber r2. With other compilers, the kernel-provided r2 value is still available at this point. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2024-04-14 08:24:51 +02:00
Florian Weimer	aea52e3d2b	Revert "x86_64: Suppress false positive valgrind error" This reverts commit `a1735e0aa8`. The test failure is a real valgrind bug that needs to be fixed before valgrind is usable with a glibc that has been built with CC="gcc -march=x86-64-v3". The proposed valgrind patch teaches valgrind to replace ld.so strcmp with an unoptimized scalar implementation, thus avoiding any AVX2-related problems. Valgrind bug: <https://bugs.kde.org/show_bug.cgi?id=485487> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-04-13 17:42:13 +02:00
Adhemerval Zanella	686d542025	posix: Sync tempname with gnulib The gnulib version contains an important change (9ce573cde), which fixes some problems with multithreading, entropy loss, and ASLR leak nfo. It also fixes an issue where getrandom is not being used on some new files generation (only for __GT_NOCREATE on first try). The 044bf893ac removed __path_search, which is now moved to another gnulib shared files (stdio-common/tmpdir.{c,h}). Tthis patch also fixes direxists to use __stat64_time64 instead of __xstat64, and move the include of pathmax.h for !_LIBC (since it is not used by glibc). The license is also changed from GPL 3.0 to 2.1, with permission from the authors (Bruno Haible and Paul Eggert). The sync also removed the clock fallback, since clock_gettime with CLOCK_REALTIME is expected to always succeed. It syncs with gnulib commit 323834962817af7b115187e8c9a833437f8d20ec. Checked on x86_64-linux-gnu. Co-authored-by: Bruno Haible <bruno@clisp.org> Co-authored-by: Paul Eggert <eggert@cs.ucla.edu> Reviewed-by: Bruno Haible <bruno@clisp.org>	2024-04-10 14:53:39 -03:00
Florian Weimer	f8d8b1b1e6	aarch64: Enhanced CPU diagnostics for ld.so This prints some information from struct cpu_features, and the midr_el1 and dczid_el0 system register contents on every CPU. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-08 16:48:55 +02:00
Florian Weimer	7a430f40c4	x86: Add generic CPUID data dumper to ld.so --list-diagnostics This is surprisingly difficult to implement if the goal is to produce reasonably sized output. With the current approaches to output compression (suppressing zeros and repeated results between CPUs, folding ranges of identical subleaves, dealing with the %ecx reflection issue), the output is less than 600 KiB even for systems with 256 logical CPUs. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-04-08 16:48:55 +02:00
Florian Weimer	5653ccd847	elf: Add CPU iteration support for future use in ld.so diagnostics Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-08 16:48:55 +02:00
H.J. Lu	9e1f4aef86	x86-64: Exclude FMA4 IFUNC functions for -mapxf When -mapxf is used to build glibc, the resulting glibc will never run on FMA4 machines. Exclude FMA4 IFUNC functions when -mapxf is used. This requires GCC which defines __APX_F__ for -mapxf with commit: 1df56719bd8 x86: Define __APX_F__ for -mapxf Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2024-04-06 05:03:55 -07:00
Adhemerval Zanella	c27f8763cf	Reinstate generic features-time64.h The `a4ed0471d7` removed the generic version which is included by features.h and used by Hurd. Checked by building i686-gnu and x86_64-gnu with build-many-glibc.py.	2024-04-05 09:02:36 -03:00
Adhemerval Zanella	460d9e2dfe	Cleanup __tls_get_addr on alpha/microblaze localplt.data They are not required. Checked with a make check for both ABIs.	2024-04-04 17:20:33 -03:00
Adhemerval Zanella	95700e7998	arm: Remove ld.so __tls_get_addr plt usage Use the hidden alias instead. Checked on arm-linux-gnueabihf.	2024-04-04 17:03:32 -03:00
Adhemerval Zanella	50c2be2390	aarch64: Remove ld.so __tls_get_addr plt usage Use the hidden alias instead. Checked on aarch64-linux-gnu.	2024-04-04 17:02:32 -03:00
Adhemerval Zanella	44ccc2465c	math: x86 trunc traps when FE_INEXACT is enabled (BZ 31603) The implementations of trunc functions using x87 floating point (i386 and x86_64 long double only) traps when FE_INEXACT is enabled. Although this is a GNU extension outside the scope of the C standard, other architectures that also support traps do not show this behavior. The fix moves the implementation to a common one that holds any exceptions with a 'fnclex' (libc_feholdexcept_setround_387). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-04-04 14:29:28 -03:00
Adhemerval Zanella	932544efa4	math: x86 floor traps when FE_INEXACT is enabled (BZ 31601) The implementations of floor functions using x87 floating point (i386 and 86_64 long double only) traps when FE_INEXACT is enabled. Although this is a GNU extension outside the scope of the C standard, other architectures that also support traps do not show this behavior. The fix moves the implementation to a common one that holds any exceptions with a 'fnclex' (libc_feholdexcept_setround_387). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-04-04 14:29:28 -03:00
Adhemerval Zanella	637bfc392f	math: x86 ceill traps when FE_INEXACT is enabled (BZ 31600) The implementations of ceil functions using x87 floating point (i386 and x86_64 long double only) traps when FE_INEXACT is enabled. Although this is a GNU extension outside the scope of the C standard, other architectures that also support traps do not show this behavior. The fix moves the implementation to a common one that holds any exceptions with a 'fnclex' (libc_feholdexcept_setround_387). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-04-04 14:29:28 -03:00
Joe Ramsay	87cb1dfcd6	aarch64/fpu: Add vector variants of erfc Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:24 +01:00
Joe Ramsay	3d3a4fb8e4	aarch64/fpu: Add vector variants of tanh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:20 +01:00
Joe Ramsay	eedbbca0bf	aarch64/fpu: Add vector variants of sinh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:16 +01:00
Joe Ramsay	8b67920528	aarch64/fpu: Add vector variants of atanh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:12 +01:00
Joe Ramsay	81406ea3c5	aarch64/fpu: Add vector variants of asinh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:33:02 +01:00
Joe Ramsay	b09fee1d21	aarch64/fpu: Add vector variants of acosh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:32:58 +01:00
Joe Ramsay	bdb5705b7b	aarch64/fpu: Add vector variants of cosh Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:32:52 +01:00
Joe Ramsay	cb5d84f1f8	aarch64/fpu: Add vector variants of erf Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-04-04 10:32:48 +01:00
Stafford Horne	3db9d208dd	misc: Add support for Linux uio.h RWF_NOAPPEND flag In Linux 6.9 a new flag is added to allow for Per-io operations to disable append mode even if a file was opened with the flag O_APPEND. This is done with the new RWF_NOAPPEND flag. This caused two test failures as these tests expected the flag 0x00000020 to be unused. Adding the flag definition now fixes these tests on Linux 6.9 (v6.9-rc1). FAIL: misc/tst-preadvwritev2 FAIL: misc/tst-preadvwritev64v2 This patch adds the flag, adjusts the test and adds details to documentation. Link: https://lore.kernel.org/all/20200831153207.GO3265@brightrain.aerifal.cx/ Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-04-04 09:41:27 +01:00
Adhemerval Zanella	4dcd674b66	powerpc: Add missing arch flags on rounding ifunc variants The ifunc variants now uses the powerpc implementation which in turn uses the compiler builtin. Without the proper -mcpu switch the builtin does not generate the expected optimization. Checked on powerpc-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2024-04-02 15:49:31 -03:00
Adhemerval Zanella	a4ed0471d7	Always define __USE_TIME_BITS64 when 64 bit time_t is used It was raised on libc-help [1] that some Linux kernel interfaces expect the libc to define __USE_TIME_BITS64 to indicate the time_t size for the kABI. Different than defined by the initial y2038 design document [2], the __USE_TIME_BITS64 is only defined for ABIs that support more than one time_t size (by defining the _TIME_BITS for each module). The 64 bit time_t redirects are now enabled using a different internal define (__USE_TIME64_REDIRECTS). There is no expected change in semantic or code generation. Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu, and arm-linux-gnueabi [1] https://sourceware.org/pipermail/libc-help/2024-January/006557.html [2] https://sourceware.org/glibc/wiki/Y2038ProofnessDesign Reviewed-by: DJ Delorie <dj@redhat.com>	2024-04-02 15:28:36 -03:00
Adhemerval Zanella	721314c980	x86_64: Remove avx512 strstr implementation As indicated in a recent thread, this it is a simple brute-force algorithm that checks the whole needle at a matching character pair (and does so 1 byte at a time after the first 64 bytes of a needle). Also it never skips ahead and thus can match at every haystack position after trying to match all of the needle, which generic implementation avoids. As indicated by Wilco, a 4x larger needle and 16x larger haystack gives a clear 65x slowdown both basic_strstr and __strstr_avx512: "ifuncs": ["basic_strstr", "twoway_strstr", "__strstr_avx512", "__strstr_sse2_unaligned", "__strstr_generic"], { "len_haystack": 65536, "len_needle": 1024, "align_haystack": 0, "align_needle": 0, "fail": 1, "desc": "Difficult bruteforce needle", "timings": [4.0948e+07, 15094.5, 3.20818e+07, 108558, 10839.2] }, { "len_haystack": 1048576, "len_needle": 4096, "align_haystack": 0, "align_needle": 0, "fail": 1, "desc": "Difficult bruteforce needle", "timings": [2.69767e+09, 100797, 2.08535e+09, 495706, 82666.9] } PS: I don't have an AVX512 capable machine to verify this issues, but skimming through the code it does seems to follow what Wilco has described. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-03-27 13:48:16 -03:00
Adhemerval Zanella	2e53eb9234	signal: Avoid system signal disposition to interfere with tests Both tst-sigset2 and tst-signal1 expectes that SIGINT disposition is set to SIG_DFL.	2024-03-27 13:47:09 -03:00
Palmer Dabbelt	96d1b9ac23	RISC-V: Fix the static-PIE non-relocated object check The value of l_scope is only valid post relocation, so this original check was triggering undefined behavior. Instead just directly check to see if the object has been relocated, at which point using l_scope is safe. Reported-by: Andreas Schwab <schwab@suse.de> Closes: BZ #31317 Fixes: `e0590f41fe` ("RISC-V: Enable static-pie.") Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-25 15:17:13 +01:00
Sergey Bugaev	dc1a77269c	htl: Implement some support for TLS_DTV_AT_TP Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20240323173301.151066-19-bugaevc@gmail.com>	2024-03-23 23:00:30 +01:00
Sergey Bugaev	a4273efa21	htl: Respect GL(dl_stack_flags) when allocating stacks Previously, HTL would always allocate non-executable stacks. This has never been noticed, since GNU Mach on x86 ignores VM_PROT_EXECUTE and makes all pages implicitly executable. Since GNU Mach on AArch64 supports non-executable pages, HTL forgetting to pass VM_PROT_EXECUTE immediately breaks any code that (unfortunately, still) relies on executable stacks. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20240323173301.151066-7-bugaevc@gmail.com>	2024-03-23 22:48:44 +01:00
Sergey Bugaev	b467cfcaee	hurd: Use the RETURN_ADDRESS macro This gives us PAC stripping on AArch64. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20240323173301.151066-6-bugaevc@gmail.com>	2024-03-23 22:48:01 +01:00
Sergey Bugaev	6afeac1289	hurd: Disable Prefer_MAP_32BIT_EXEC on non-x86_64 for now While we could support it on any architecture, the tunable is currently only defined on x86_64. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20240323173301.151066-5-bugaevc@gmail.com>	2024-03-23 22:47:46 +01:00
Sergey Bugaev	7f02511e5b	hurd: Move internal functions to internal header Move _hurd_self_sigstate (), _hurd_critical_section_lock (), and _hurd_critical_section_unlock () inline implementations (that were already guarded by #if defined _LIBC) to the internal version of the header. While at it, add <tls.h> to the includes, and use __LIBC_NO_TLS () unconditionally. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Message-ID: <20240323173301.151066-2-bugaevc@gmail.com>	2024-03-23 22:43:07 +01:00
Stafford Horne	ad05a42370	or1k: Add prctl wrapper to unwrap variadic args On OpenRISC variadic functions and regular functions have different calling conventions so this wrapper is needed to translate. This wrapper is copied from x86_64/x32. I don't know the build system enough to find a cleaner way to share the code between x86_64/x32 and or1k (maybe Implies?), so I went with the straight copy. This fixes test failures: misc/tst-prctl nptl/tst-setgetname	2024-03-22 15:43:34 +00:00
Stafford Horne	df7e29e2a4	or1k: Only define fpu rouding and exceptions with hard-float This test failure: math/test-fenv If rounding mode and exception macros are defined then the fenv tests run and always fail. This patch adds an ifdef using the __or1k_hard_float__ macro provided by gcc to avoid defining these fenv macros when they cnnot be used. This is similar to what is done in csky. Note, I will post the or1k hard-float support soon. So, I prefer to leave the hard-float bits here for now.	2024-03-22 15:43:34 +00:00
Stafford Horne	2e982a3937	or1k: Update libm test ulps To fix test failures: FAIL: math/test-float-hypot FAIL: math/test-float32-hypot	2024-03-22 15:43:34 +00:00
Wilco Dijkstra	2e94e2f5d2	AArch64: Check kernel version for SVE ifuncs Old Linux kernels disable SVE after every system call. Calling the SVE-optimized memcpy afterwards will then cause a trap to reenable SVE. As a result, applications with a high use of syscalls may run slower with the SVE memcpy. This is true for kernels between 4.15.0 and before 6.2.0, except for 5.14.0 which was patched. Avoid this by checking the kernel version and selecting the SVE ifunc on modern kernels. Parse the kernel version reported by uname() into a 24-bit kernel.major.minor value without calling any library functions. If uname() is not supported or if the version format is not recognized, assume the kernel is modern. Tested-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2024-03-21 16:50:51 +00:00
Amrita H S	1ea0511456	powerpc: Placeholder and infrastructure/build support to add Power11 related changes. The following three changes have been added to provide initial Power11 support. 1. Add the directories to hold Power11 files. 2. Add support to select Power11 libraries based on AT_PLATFORM. 3. Let submachine=power11 be set automatically. Reviewed-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2024-03-19 21:11:34 -05:00
Manjunath Matti	3ab9b88e2a	powerpc: Add HWCAP3/HWCAP4 data to TCB for Power Architecture. This patch adds a new feature for powerpc. In order to get faster access to the HWCAP3/HWCAP4 masks, similar to HWCAP/HWCAP2 (i.e. for implementing __builtin_cpu_supports() in GCC) without the overhead of reading them from the auxiliary vector, we now reserve space for them in the TCB. This is an ABI change for GLIBC 2.39. Suggested-by: Peter Bergner <bergner@linux.ibm.com> Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2024-03-19 17:19:27 -05:00
Adhemerval Zanella	3d53d18fc7	elf: Enable TLS descriptor tests on aarch64 The aarch64 uses 'trad' for traditional tls and 'desc' for tls descriptors, but unlike other targets it defaults to 'desc'. The gnutls2 configure check does not set aarch64 as an ABI that uses TLS descriptors, which then disable somes stests. Also rename the internal machinery fron gnu2 to tls descriptors. Checked on aarch64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-03-19 14:53:30 -03:00
Adhemerval Zanella	64c7e34428	arm: Update _dl_tlsdesc_dynamic to preserve caller-saved registers (BZ 31372) ARM _dl_tlsdesc_dynamic slow path has two issues: * The ip/r12 is defined by AAPCS as a scratch register, and gcc is used to save the stack pointer before on some function calls. So it should also be saved/restored as well. It fixes the tst-gnu2-tls2. * None of the possible VFP registers are saved/restored. ARM has the additional complexity to have different VFP bank sizes (depending of VFP support by the chip). The tst-gnu2-tls2 test is extended to check for VFP registers, although only for hardfp builds. Different than setcontext, _dl_tlsdesc_dynamic does not have HWCAP_ARM_IWMMXT (I don't have a way to properly test it and it is almost a decade since newer hardware was released). With this patch there is no need to mark tst-gnu2-tls2 as XFAIL. Checked on arm-linux-gnueabihf. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-03-19 14:53:30 -03:00
Andreas Schwab	fd7ee2e6c5	Add tst-gnu2-tls2mod1 to test-internal-extras That allows sysdeps/x86_64/tst-gnu2-tls2mod1.S to use internal headers. Fixes: `717ebfa85c` ("x86-64: Allocate state buffer space for RDI, RSI and RBX")	2024-03-19 14:28:28 +01:00
H.J. Lu	717ebfa85c	x86-64: Allocate state buffer space for RDI, RSI and RBX _dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack. After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to STATE_SAVE_OFFSET(%rsp). +==================+<- stack frame start aligned at 8 or 16 bytes \| \|<- RDI saved in the red zone \| \|<- RSI saved in the red zone \| \|<- RBX saved in the red zone \| \|<- paddings for stack realignment of 64 bytes \|------------------\|<- xsave buffer end aligned at 64 bytes \| \|<- \| \|<- \| \|<- \|------------------\|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp) \| \|<- 8-byte padding for 64-byte alignment \| \|<- 8-byte padding for 64-byte alignment \| \|<- R11 \| \|<- R10 \| \|<- R9 \| \|<- R8 \| \|<- RDX \| \|<- RCX +==================+<- RSP aligned at 64 bytes Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI and RBX are saved onto stack without adjusting stack pointer first, using the red-zone. This fixes BZ #31501. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2024-03-18 19:45:13 -07:00
Darius Rad	f44f3aed31	riscv: Update nofpu libm test ulps Fix two test failures. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-03-18 11:28:50 +01:00
Florian Weimer	7a76f21867	linux: Use rseq area unconditionally in sched_getcpu (bug 31479) Originally, nptl/descr.h included <sys/rseq.h>, but we removed that in commit `2c6b4b272e` ("nptl: Unconditionally use a 32-byte rseq area"). After that, it was not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c compilation that provided a definition. This commit always checks the rseq area for CPU number information before using the other approaches. This adds an unnecessary (but well-predictable) branch on architectures which do not define RSEQ_SIG, but its cost is small compared to the system call. Most architectures that have vDSO acceleration for getcpu also have rseq support. Fixes: `2c6b4b272e` Fixes: `1d350aa060` Reviewed-by: Arjun Shankar <arjun@redhat.com>	2024-03-15 19:08:24 +01:00
Szabolcs Nagy	73c26018ed	aarch64: fix check for SVE support in assembler Due to GCC bug 110901 -mcpu can override -march setting when compiling asm code and thus a compiler targetting a specific cpu can fail the configure check even when binutils gas supports SVE. The workaround is that explicit .arch directive overrides both -mcpu and -march, and since that's what the actual SVE memcpy uses the configure check should use that too even if the GCC issue is fixed independently. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-03-14 14:27:56 +00:00
Joseph Myers	2367bf468c	Update kernel version to 6.8 in header constant tests This patch updates the kernel version in the tests tst-mman-consts.py, tst-mount-consts.py and tst-pidfd-consts.py to 6.8. (There are no new constants covered by these tests in 6.8 that need any other header changes.) Tested with build-many-glibcs.py.	2024-03-13 19:46:21 +00:00
Joseph Myers	3de2f8755c	Update syscall lists for Linux 6.8 Linux 6.8 adds five new syscalls. Update syscall-names.list and regenerate the arch-syscall.h headers with build-many-glibcs.py update-syscalls. Tested with build-many-glibcs.py.	2024-03-13 13:57:56 +00:00
Adhemerval Zanella	4a76fb1da8	powerpc: Remove power8 strcasestr optimization Similar to strstr (`1e9a550ba4`), power8 strcasestr does not show much improvement compared to the generic implementation. The geomean on bench-strcasestr shows: __strcasestr_power8 __strcasestr_ppc power10 1159 1120 power9 1640 1469 power8 1787 1904 The strcasestr uses the same 'trick' as power7 strstr to detect potential quadradic behavior, which only adds overheads for input that trigger quadradic behavior and it is really a hack. Checked on powerpc64le-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2024-03-12 17:11:01 -03:00
Adhemerval Zanella	2149da3683	riscv: Fix alignment-ignorant memcpy implementation The memcpy optimization (commit `587a1290a1`) has a series of mistakes: - The implementation is wrong: the chunk size calculation is wrong leading to invalid memory access. - It adds ifunc supports as default, so --disable-multi-arch does not work as expected for riscv. - It mixes Linux files (memcpy ifunc selection which requires the vDSO/syscall mechanism) with generic support (the memcpy optimization itself). - There is no __libc_ifunc_impl_list, which makes testing only check the selected implementation instead of all supported by the system. This patch also simplifies the required bits to enable ifunc: there is no need to memcopy.h; nor to add Linux-specific files. The __memcpy_noalignment tail handling now uses a branchless strategy similar to aarch64 (overlap 32-bits copies for sizes 4..7 and byte copies for size 1..3). Checked on riscv64 and riscv32 by explicitly enabling the function on __libc_ifunc_impl_list on qemu-system. Changes from v1: * Implement the memcpy in assembly to correctly handle RISCV strict-alignment. Reviewed-by: Evan Green <evan@rivosinc.com> Acked-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-12 14:38:08 -03:00
Andreas Schwab	2173173d57	linux/sigsetops: fix type confusion (bug 31468) Each mask in the sigset array is an unsigned long, so fix __sigisemptyset to use that instead of int. The __sigword function returns a simple array index, so it can return int instead of unsigned long.	2024-03-12 10:00:22 +01:00
caiyinyu	aeee41f1cf	LoongArch: Correct {__ieee754, _}_scalb -> {__ieee754, _}_scalbf	2024-03-12 14:07:27 +08:00
Sunil K Pandey	b6e3898194	x86-64: Simplify minimum ISA check ifdef conditional with if Replace minimum ISA check ifdef conditional with if. Since MINIMUM_X86_ISA_LEVEL and AVX_X86_ISA_LEVEL are compile time constants, compiler will perform constant folding optimization, getting same results. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-03-03 15:47:53 -08:00
Evan Green	587a1290a1	riscv: Add and use alignment-ignorant memcpy For CPU implementations that can perform unaligned accesses with little or no performance penalty, create a memcpy implementation that does not bother aligning buffers. It will use a block of integer registers, a single integer register, and fall back to bytewise copy for the remainder. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-01 07:15:01 -08:00
Evan Green	a2b47f7d46	riscv: Add ifunc helper method to hwprobe.h Add a little helper method so it's easier to fetch a single value from the hwprobe function when used within an ifunc selector. Signed-off-by: Evan Green <evan@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-01 07:15:00 -08:00
Evan Green	a29bb320a1	riscv: Enable multi-arg ifunc resolvers RISC-V is apparently the first architecture to pass more than one argument to ifunc resolvers. The helper macros in libc-symbols.h, __ifunc_resolver(), __ifunc(), and __ifunc_hidden(), are incompatible with this. These macros have an "arg" (non-final) parameter that represents the parameter signature of the ifunc resolver. The result is an inability to pass the required comma through in a single preprocessor argument. Rearrange the __ifunc_resolver() macro to be variadic, and pass the types as those variable parameters. Move the guts of __ifunc() and __ifunc_hidden() into new macros, __ifunc_args(), and __ifunc_args_hidden(), that pass the variable arguments down through to __ifunc_resolver(). Then redefine __ifunc() and __ifunc_hidden(), which are used in a bunch of places, to simply shuffle the arguments down into __ifunc_args[_hidden]. Finally, define a riscv-ifunc.h header, which provides convenience macros to those looking to write ifunc selectors that use both arguments. Signed-off-by: Evan Green <evan@rivosinc.com> Reviewed-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-01 07:14:59 -08:00
Evan Green	78308ce77a	riscv: Add __riscv_hwprobe pointer to ifunc calls The new __riscv_hwprobe() function is designed to be used by ifunc selector functions. This presents a challenge for applications and libraries, as ifunc selectors are invoked before all relocations have been performed, so an external call to __riscv_hwprobe() from an ifunc selector won't work. To address this, pass a pointer to the __riscv_hwprobe() function into ifunc selectors as the second argument (alongside dl_hwcap, which was already being passed). Include a typedef as well for convenience, so that ifunc users don't have to go through contortions to call this routine. Users will need to remember to check the second argument for NULL, to account for older glibcs that don't pass the function. Signed-off-by: Evan Green <evan@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-01 07:14:58 -08:00
Evan Green	e7919e0db2	riscv: Add hwprobe vdso call support The new riscv_hwprobe syscall also comes with a vDSO for faster answers to your most common questions. Call in today to speak with a kernel representative near you! Signed-off-by: Evan Green <evan@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-01 07:14:57 -08:00
Evan Green	c6c33339b4	linux: Introduce INTERNAL_VSYSCALL Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to a regular syscall, but without setting errno. Instead, the return value is plumbed straight out of the macro. Signed-off-by: Evan Green <evan@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-01 07:14:56 -08:00
Evan Green	426d0e1aa8	riscv: Add Linux hwprobe syscall support Add awareness and a thin wrapper function around a new Linux system call that allows callers to get architecture and microarchitecture information about the CPUs from the kernel. This can be used to do things like dynamically choose a memcpy implementation. Signed-off-by: Evan Green <evan@rivosinc.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>	2024-03-01 07:14:55 -08:00
H.J. Lu	9b7091415a	x86-64: Update _dl_tlsdesc_dynamic to preserve AMX registers _dl_tlsdesc_dynamic should also preserve AMX registers which are caller-saved. Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID to x86-64 TLSDESC_CALL_STATE_SAVE_MASK. Compute the AMX state size and save it in xsave_state_full_size which is only used by _dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec. This fixes the AMX part of BZ #31372. Tested on AMX processor. AMX test is enabled only for compilers with the fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098 GCC 14 and GCC 11/12/13 branches have the bug fix. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2024-02-29 04:30:01 -08:00
H.J. Lu	a1735e0aa8	x86_64: Suppress false positive valgrind error When strcmp-avx2.S is used as the default, elf/tst-valgrind-smoke fails with ==1272761== Conditional jump or move depends on uninitialised value(s) ==1272761== at 0x4022C98: strcmp (strcmp-avx2.S:462) ==1272761== by 0x400B05B: _dl_name_match_p (dl-misc.c:75) ==1272761== by 0x40085F3: _dl_map_object (dl-load.c:1966) ==1272761== by 0x401AEA4: map_doit (rtld.c:644) ==1272761== by 0x4001488: _dl_catch_exception (dl-catch.c:237) ==1272761== by 0x40015AE: _dl_catch_error (dl-catch.c:256) ==1272761== by 0x401B38F: do_preload (rtld.c:816) ==1272761== by 0x401C116: handle_preload_list (rtld.c:892) ==1272761== by 0x401EDF5: dl_main (rtld.c:1842) ==1272761== by 0x401A79E: _dl_sysdep_start (dl-sysdep.c:140) ==1272761== by 0x401BEEE: _dl_start_final (rtld.c:494) ==1272761== by 0x401BEEE: _dl_start (rtld.c:581) ==1272761== by 0x401AD87: ??? (in /elf/ld.so) The assembly codes are: 0x0000000004022c80 <+144>: vmovdqu 0x20(%rdi),%ymm0 0x0000000004022c85 <+149>: vpcmpeqb 0x20(%rsi),%ymm0,%ymm1 0x0000000004022c8a <+154>: vpcmpeqb %ymm0,%ymm15,%ymm2 0x0000000004022c8e <+158>: vpandn %ymm1,%ymm2,%ymm1 0x0000000004022c92 <+162>: vpmovmskb %ymm1,%ecx 0x0000000004022c96 <+166>: inc %ecx => 0x0000000004022c98 <+168>: jne 0x4022c32 <strcmp+66> strcmp-avx2.S has 32-byte vector loads of strings which are shorter than 32 bytes: (gdb) p (char ) ($rdi + 0x20) $6 = 0x1ffeffea20 "memcheck-amd64-linux.so" (gdb) p (char ) ($rsi + 0x20) $7 = 0x4832640 "core-amd64-linux.so" (gdb) call (int) strlen ((char ) ($rsi + 0x20)) $8 = 19 (gdb) call (int) strlen ((char *) ($rdi + 0x20)) $9 = 23 (gdb) It triggers the valgrind error. The above code is safe since the loads don't cross the page boundary. Update tst-valgrind-smoke.sh to accept an optional suppression file and pass a suppression file to valgrind when strcmp-avx2.S is the default implementation of strcmp. Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>	2024-02-28 13:40:55 -08:00
H.J. Lu	8c7c188d62	x86: Don't check XFD against /proc/cpuinfo Since /proc/cpuinfo doesn't report XFD, don't check it against /proc/cpuinfo.	2024-02-28 11:50:38 -08:00
H.J. Lu	befe2d3c4d	x86-64: Don't use SSE resolvers for ISA level 3 or above When glibc is built with ISA level 3 or above enabled, SSE resolvers aren't available and glibc fails to build: ld: .../elf/librtld.os: in function `init_cpu_features': .../elf/../sysdeps/x86/cpu-features.c:1200:(.text+0x1445f): undefined reference to `_dl_runtime_resolve_fxsave' ld: .../elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object /usr/local/bin/ld: final link failed: bad value For ISA level 3 or above, don't use _dl_runtime_resolve_fxsave nor _dl_tlsdesc_dynamic_fxsave. This fixes BZ #31429. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-02-28 11:49:30 -08:00
H.J. Lu	0aac205a81	x86: Update _dl_tlsdesc_dynamic to preserve caller-saved registers Compiler generates the following instruction sequence for GNU2 dynamic TLS access: leaq tls_var@TLSDESC(%rip), %rax call tls_var@TLSCALL(%rax) or leal tls_var@TLSDESC(%ebx), %eax call tls_var@TLSCALL(%eax) CALL instruction is transparent to compiler which assumes all registers, except for EFLAGS and RAX/EAX, are unchanged after CALL. When _dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow path. __tls_get_addr is a normal function which doesn't preserve any caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer caller-saved registers, but didn't preserve any other caller-saved registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE, XSAVE and XSAVEC to save and restore all caller-saved registers. This fixes BZ #31372. Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic) to optimize elf_machine_runtime_setup. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-02-28 09:02:56 -08:00
H.J. Lu	e6350be7e9	sysdeps/unix/sysv/linux/x86_64/Makefile: Add the end marker Add the end marker to tests, tests-container and modules-names.	2024-02-28 05:48:27 -08:00
Adhemerval Zanella	b53e73ea80	s390: Improve static-pie configure tests Instead of tying based on the linker name and version, check for the required support: * whether it does not generate dynamic TLS relocations in PIE (binutils PR ld/22263); * if it accepts --no-dynamic-linker (by using -static-pie); * and if it adds a DT_JMPREL pointing to .rela.iplt with static pie. The patch also trims the comments, for binutils one of the tests should already cover it. The kernel ones are not clear which version should have the backport, nor it is something that glibc can do much about it. Finally, the glibc is somewhat confusing, since it refers to commits not related to s390x. Checked with a build for s390x-linux-gnu. Reviewed-by: Stefan Liebler <stli@linux.ibm.com>	2024-02-28 10:09:53 -03:00
H.J. Lu	24c8db87c9	x86: Change ENQCMD test to CHECK_FEATURE_PRESENT Since ENQCMD is mainly used in kernel, change the ENQCMD test to CHECK_FEATURE_PRESENT. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-02-27 11:50:52 -08:00
Joe Ramsay	e302e10213	aarch64/fpu: Sync libmvec routines from 2.39 and before with AOR This includes a fix for big-endian in AdvSIMD log, some cosmetic changes, and numerous small optimisations mainly around inlining and using indexed variants of MLA intrinsics. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-02-26 09:45:50 -03:00
Stefan Liebler	02782fd128	S390: Do not clobber r7 in clone [BZ #31402 ] Starting with commit `e57d8fc97b` "S390: Always use svc 0" clone clobbers the call-saved register r7 in error case: function or stack is NULL. This patch restores the saved registers also in the error case. Furthermore the existing test misc/tst-clone is extended to check all error cases and that clone does not clobber registers in this error case.	2024-02-26 13:37:46 +01:00
Sunil K Pandey	9f78a7c1d0	x86_64: Exclude SSE, AVX and FMA4 variants in libm multiarch When glibc is built with ISA level 3 or higher by default, the resulting glibc binaries won't run on SSE or FMA4 processors. Exclude SSE, AVX and FMA4 variants in libm multiarch when ISA level 3 or higher is enabled by default. When glibc is built with ISA level 2 enabled by default, only keep SSE4.1 variant. Fixes BZ 31335. NB: elf/tst-valgrind-smoke test fails with ISA level 4, because valgrind doesn't support AVX512 instructions: https://bugs.kde.org/show_bug.cgi?id=383010 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-02-25 13:20:51 -08:00
H.J. Lu	dfb05f8e70	x86-64: Save APX registers in ld.so trampoline Add APX registers to STATE_SAVE_MASK so that APX registers are saved in ld.so trampoline. This fixes BZ #31371. Also update STATE_SAVE_OFFSET and STATE_SAVE_MASK for i386 which will be used by i386 _dl_tlsdesc_dynamic. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-02-25 09:22:15 -08:00
Simon Chopin	59e0441d4a	tests: gracefully handle AppArmor userns containment Recent AppArmor containment allows restricting unprivileged user namespaces, which is enabled by default on recent Ubuntu systems. When this happens, as is common with Linux Security Modules, the syscall will fail with -EACCESS. When that happens, the affected tests will now be considered unsupported rather than simply failing. Further information: * https://gitlab.com/apparmor/apparmor/-/wikis/unprivileged_userns_restriction * https://ubuntu.com/blog/ubuntu-23-10-restricted-unprivileged-user-namespaces * https://manpages.ubuntu.com/manpages/jammy/man5/apparmor.d.5.html (for the return code) V2: * Fix duplicated line in check_unshare_hints * Also handle similar failure in tst-pidfd_getpid V3: * Comment formatting * Aded some more documentation on syscall return value Signed-off-by: Simon Chopin <simon.chopin@canonical.com>	2024-02-23 08:50:00 -03:00
Adhemerval Zanella	1e9a550ba4	powerpc: Remove power7 strstr optimization The optimization is not faster than the generic algorithm, using the bench-strstr the geometric mean running on a POWER10 machine using gcc 13.1.1 is 482.47 while the default __strstr_ppc is 340.97 (which uses the generic implementation). Also, there is no need to redirect the internal str/mem call to optimized version, internal ifunc is supported and enabled for internal calls (meaning that the generic implementation will use any asm optimization if available). Checked on powerpc64le-linux-gnu. Reviewed-by: Peter Bergner <bergner@linux.ibm.com>	2024-02-23 08:50:00 -03:00
Adhemerval Zanella	f4c142bb9f	arm: Use _dl_find_object on __gnu_Unwind_Find_exidx (BZ 31405) Instead of __dl_iterate_phdr. On ARM dlfo_eh_frame/dlfo_eh_count maps to PT_ARM_EXIDX vaddr start / length. On a Neoverse N1 machine with 160 cores, the following program: $ cat test.c #include <stdlib.h> #include <pthread.h> #include <assert.h> enum { niter = 1024, ntimes = 128, }; static void * tf (void arg) { int a = (int) arg; for (int i = 0; i < niter; i++) { void p[ntimes]; for (int j = 0; j < ntimes; j++) p[j] = malloc (a * 128); for (int j = 0; j < ntimes; j++) free (p[j]); } return NULL; } int main (int argc, char argv[]) { enum { nthreads = 16 }; pthread_t t[nthreads]; for (int i = 0; i < nthreads; i ++) assert (pthread_create (&t[i], NULL, tf, (void ) i) == 0); for (int i = 0; i < nthreads; i++) { void *r; assert (pthread_join (t[i], &r) == 0); assert (r == NULL); } return 0; } $ arm-linux-gnueabihf-gcc -fsanitize=address test.c -o test Improves from ~15s to 0.5s. Checked on arm-linux-gnueabihf.	2024-02-23 08:50:00 -03:00
Xi Ruoyao	e2a65ecc4b	math: Update mips64 ulps Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2024-02-22 21:28:25 +01:00
Daniel Cederman	aa4106db1d	sparc: Treat the version field in the FPU control word as reserved The FSR version field is read-only and might be non-zero. This allows math/test-fpucw* to correctly pass when the version is non-zero. Signed-off-by: Daniel Cederman <cederman@gaisler.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-02-19 10:55:50 -03:00
Flavio Cruz	88b771ab5e	Implement setcontext/getcontext/makecontext/swapcontext for Hurd x86_64 Tested with the tests provided by glibc plus some other toy examples. Message-ID: <20240217202535.1860803-1-flaviocruz@gmail.com>	2024-02-17 21:45:35 +01:00
Flavio Cruz	e3da8f9bad	Use proc_getchildren_rusage when available in getrusage and times. Message-ID: <20240217164846.1837223-1-flaviocruz@gmail.com>	2024-02-17 21:14:39 +01:00
Florian Weimer	6a04404521	Linux: Switch back to assembly syscall wrapper for prctl (bug 29770) Commit `ff026950e2` ("Add a C wrapper for prctl [BZ #25896]") replaced the assembler wrapper with a C function. However, on powerpc64le-linux-gnu, the C variadic function implementation requires extra work in the caller to set up the parameter save area. Calling a function that needs a parameter save area without one (because the prototype used indicates the function is not variadic) corrupts the caller's stack. The Linux manual pages project documents prctl as a non-variadic function. This has resulted in various projects over the years using non-variadic prototypes, including the sanitizer libraries in LLVm and GCC (GCC PR 113728). This commit switches back to the assembler implementation on most targets and only keeps the C implementation for x86-64 x32. Also add the __prctl_time64 alias from commit `b39ffab860` ("Linux: Add time64 alias for prctl") to sysdeps/unix/sysv/linux/syscalls.list; it was not yet present in commit `ff026950e2`. This restores the old ABI on powerpc64le-linux-gnu, thus fixing bug 29770. Reviewed-By: Simon Chopin <simon.chopin@canonical.com>	2024-02-17 09:17:04 +01:00
Florian Weimer	0d9166c224	i386: Use generic memrchr in libc (bug 31316) Before this change, we incorrectly used the SSE2 variant in the implementation, without checking that the system actually supports SSE2. Tested-by: Sam James <sam@gentoo.org>	2024-02-16 07:41:04 +01:00
H.J. Lu	ef7f4b1fef	Apply the Makefile sorting fix Apply the Makefile sorting fix generated by sort-makefile-lines.py.	2024-02-15 11:19:56 -08:00
H.J. Lu	71d133c500	sysdeps/x86_64/Makefile (tests): Add the end marker	2024-02-15 11:12:13 -08:00

... 2 3 4 5 6 ...

16417 Commits