glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-24 14:00:30 +00:00

Author	SHA1	Message	Date
H.J. Lu	a1bbee9fd1	x86-64/cet: Move dl-cet.[ch] to x86_64 directories Since CET is only enabled for x86-64, move dl-cet.[ch] to x86_64 directories. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-10 05:19:32 -08:00
H.J. Lu	b45115a666	x86: Move x86-64 shadow stack startup codes Move sysdeps/x86/libc-start.h to sysdeps/x86_64/libc-start.h and use sysdeps/generic/libc-start.h for i386. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-10 05:19:32 -08:00
Adhemerval Zanella	25f1e16ef0	i386: Remove CET support CET is only support for x86_64, this patch reverts: - `faaee1f07e` x86: Support shadow stack pointer in setjmp/longjmp. - `be9ccd27c0` i386: Add _CET_ENDBR to indirect jump targets in add_n.S/sub_n.S - `c02695d776` x86/CET: Update vfork to prevent child return - `5d844e1b72` i386: Enable CET support in ucontext functions - `124bcde683` x86: Add _CET_ENDBR to functions in crti.S - `562837c002` x86: Add _CET_ENDBR to functions in dl-tlsdesc.S - `f753fa7dea` x86: Support IBT and SHSTK in Intel CET [BZ #21598] - `825b58f3fb` i386-mcount.S: Add _CET_ENDBR to _mcount and __fentry__ - `7e119cd582` i386: Use _CET_NOTRACK in i686/memcmp.S - `177824e232` i386: Use _CET_NOTRACK in memcmp-sse4.S - `0a899af097` i386: Use _CET_NOTRACK in memcpy-ssse3-rep.S - `7fb613361c` i386: Use _CET_NOTRACK in memcpy-ssse3.S - `77a8ae0948` i386: Use _CET_NOTRACK in memset-sse2-rep.S - `00e7b76a8f` i386: Use _CET_NOTRACK in memset-sse2.S - `90d15dc577` i386: Use _CET_NOTRACK in strcat-sse2.S - `f1574581c7` i386: Use _CET_NOTRACK in strcpy-sse2.S - `4031d7484a` i386/sub_n.S: Add a missing _CET_ENDBR to indirect jump - target - Checked on i686-linux-gnu.	2024-01-09 13:55:51 -03:00
Adhemerval Zanella	b7fc4a07f2	x86: Move CET infrastructure to x86_64 The CET is only supported for x86_64 and there is no plan to add kernel support for i386. Move the Makefile rules and files from the generic x86 folder to x86_64 one. Checked on x86_64-linux-gnu and i686-linux-gnu.	2024-01-09 13:55:51 -03:00
Adhemerval Zanella	460860f457	Remove ia64-linux-gnu Linux 6.7 removed ia64 from the official tree [1], following the general principle that a glibc port needs upstream support for the architecture in all the components it depends on (binutils, GCC, and the Linux kernel). Apart from the removal of sysdeps/ia64 and sysdeps/unix/sysv/linux/ia64, there are updates to various comments referencing ia64 for which removal of those references seemed appropriate. The configuration is removed from README and build-many-glibcs.py. The CONTRIBUTED-BY, elf/elf.h, manual/contrib.texi (the porting mention), *.po files, config.guess, and longlong.h are not changed. For Linux it allows cleanup some clone2 support on multiple files. The following bug can be closed as WONTFIX: BZ 22634 [2], BZ 14250 [3], BZ 21634 [4], BZ 10163 [5], BZ 16401 [6], and BZ 11585 [7]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=43ff221426d33db909f7159fdf620c3b052e2d1c [2] https://sourceware.org/bugzilla/show_bug.cgi?id=22634 [3] https://sourceware.org/bugzilla/show_bug.cgi?id=14250 [4] https://sourceware.org/bugzilla/show_bug.cgi?id=21634 [5] https://sourceware.org/bugzilla/show_bug.cgi?id=10163 [6] https://sourceware.org/bugzilla/show_bug.cgi?id=16401 [7] https://sourceware.org/bugzilla/show_bug.cgi?id=11585 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-01-08 17:09:36 -03:00
H.J. Lu	848746e88e	elf: Add ELF_DYNAMIC_AFTER_RELOC to rewrite PLT Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after relocation. For x86-64, add #define DT_X86_64_PLT (DT_LOPROC + 0) #define DT_X86_64_PLTSZ (DT_LOPROC + 1) #define DT_X86_64_PLTENT (DT_LOPROC + 3) 1. DT_X86_64_PLT: The address of the procedure linkage table. 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage table. 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table entry. With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the memory offset of the indirect branch instruction. Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section with direct branch after relocation when the lazy binding is disabled. PLT rewrite is disabled by default since SELinux may disallow modifying code pages and ld.so can't detect it in all cases. Use $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 to enable PLT rewrite with 32-bit direct jump at run-time or $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2 to enable PLT rewrite with 32-bit direct jump and on APX processors with 64-bit absolute jump at run-time. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-05 05:49:49 -08:00
H.J. Lu	35694d3416	x86-64/cet: Check the restore token in longjmp setcontext and swapcontext put a restore token on the old shadow stack which is used to restore the target shadow stack when switching user contexts. When longjmp from a user context, the target shadow stack can be different from the current shadow stack and INCSSP can't be used to restore the shadow stack pointer to the target shadow stack. Update longjmp to search for a restore token. If found, use the token to restore the shadow stack pointer before using INCSSP to pop the shadow stack. Stop the token search and use INCSSP if the shadow stack entry value is the same as the current shadow stack pointer. It is a user error if there is a shadow stack switch without leaving a restore token on the old shadow stack. The only difference between __longjmp.S and __longjmp_chk.S is that __longjmp_chk.S has a check for invalid longjmp usages. Merge __longjmp.S and __longjmp_chk.S by adding the CHECK_INVALID_LONGJMP macro. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-04 13:38:26 -08:00
H.J. Lu	bbfb54930c	i386: Ignore --enable-cet Since shadow stack is only supported for x86-64, ignore --enable-cet for i386. Always setting $(enable-cet) for i386 to "no" to support ifneq ($(enable-cet),no) in x86 Makefiles. We can't use ifeq ($(enable-cet),yes) since $(enable-cet) can be "yes", "no" or "permissive". Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-04 06:08:55 -08:00
H.J. Lu	b5dcccfb12	x86/cet: Add -fcf-protection=none before -fcf-protection=branch When shadow stack is enabled, some CET tests failed when compiled with GCC 14: FAIL: elf/tst-cet-legacy-4 FAIL: elf/tst-cet-legacy-5a FAIL: elf/tst-cet-legacy-6a which are caused by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113039 These tests use -fcf-protection -fcf-protection=branch and assume that -fcf-protection=branch will override -fcf-protection. But this GCC 14 commit: https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1c6231c05bdcca changed the -fcf-protection behavior such that -fcf-protection -fcf-protection=branch is treated the same as -fcf-protection Use -fcf-protection -fcf-protection=none -fcf-protection=branch as the workaround. This fixes BZ #31187. Tested with GCC 13 and GCC 14 on Intel Tiger Lake. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-01 15:53:52 -08:00
Paul Eggert	dff8da6b3e	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
H.J. Lu	cf9481724b	x86/cet: Run some CET tests with shadow stack When CET is disabled by default, run some CET tests with shadow stack enabled using $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK	2024-01-01 05:22:48 -08:00
H.J. Lu	55d63e7312	x86/cet: Don't set CET active by default Not all CET enabled applications and libraries have been properly tested in CET enabled environments. Some CET enabled applications or libraries will crash or misbehave when CET is enabled. Don't set CET active by default so that all applications and libraries will run normally regardless of whether CET is active or not. Shadow stack can be enabled by $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK at run-time if shadow stack can be enabled by kernel. NB: This commit can be reverted if it is OK to enable CET by default for all applications and libraries.	2024-01-01 05:22:48 -08:00
H.J. Lu	d360dcc001	x86/cet: Check feature_1 in TCB for active IBT and SHSTK Initially, IBT and SHSTK are marked as active when CPU supports them and CET are enabled in glibc. They can be disabled early by tunables before relocation. Since after relocation, GLRO(dl_x86_cpu_features) becomes read-only, we can't update GLRO(dl_x86_cpu_features) to mark IBT and SHSTK as inactive. Instead, check the feature_1 field in TCB to decide if IBT and SHST are active.	2024-01-01 05:22:48 -08:00
H.J. Lu	541641a3de	x86/cet: Enable shadow stack during startup Previously, CET was enabled by kernel before passing control to user space and the startup code must disable CET if applications or shared libraries aren't CET enabled. Since the current kernel only supports shadow stack and won't enable shadow stack before passing control to user space, we need to enable shadow stack during startup if the application and all shared library are shadow stack enabled. There is no need to disable shadow stack at startup. Shadow stack can only be enabled in a function which will never return. Otherwise, shadow stack will underflow at the function return. 1. GL(dl_x86_feature_1) is set to the CET features which are supported by the processor and are not disabled by the tunable. Only non-zero features in GL(dl_x86_feature_1) should be enabled. After enabling shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check if shadow stack is really enabled. 2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable. It is safe since RTLD_START never returns. 3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static executable. Since the start function using ARCH_SETUP_TLS never returns, it is safe to enable shadow stack in ARCH_SETUP_TLS.	2024-01-01 05:22:48 -08:00
H.J. Lu	edb5e0c8f9	x86/cet: Sync with Linux kernel 6.6 shadow stack interface Sync with Linux kernel 6.6 shadow stack interface. Since only x86-64 is supported, i386 shadow stack codes are unchanged and CET shouldn't be enabled for i386. 1. When the shadow stack base in TCB is unset, the default shadow stack is in use. Use the current shadow stack pointer as the marker for the default shadow stack. It is used to identify if the current shadow stack is the same as the target shadow stack when switching ucontexts. If yes, INCSSP will be used to unwind shadow stack. Otherwise, shadow stack restore token will be used. 2. Allocate shadow stack with the map_shadow_stack syscall. Since there is no function to explicitly release ucontext, there is no place to release shadow stack allocated by map_shadow_stack in ucontext functions. Such shadow stacks will be leaked. 3. Rename arch_prctl CET commands to ARCH_SHSTK_XXX. 4. Rewrite the CET control functions with the current kernel shadow stack interface. Since CET is no longer enabled by kernel, a separate patch will enable shadow stack during startup.	2024-01-01 05:22:48 -08:00
H.J. Lu	41560a9312	x86/cet: Don't disable CET if not single threaded In permissive mode, don't disable IBT nor SHSTK when dlopening a legacy shared library if not single threaded since IBT and SHSTK may be still enabled in other threads. Other threads with IBT or SHSTK enabled will crash when calling functions in the legacy shared library. Instead, an error will be issued.	2023-12-20 05:03:37 -08:00
H.J. Lu	c04035809a	x86: Modularize sysdeps/x86/dl-cet.c Improve readability and make maintenance easier for dl-feature.c by modularizing sysdeps/x86/dl-cet.c: 1. Support processors with: a. Only IBT. Or b. Only SHSTK. Or c. Both IBT and SHSTK. 2. Lock CET features only if IBT or SHSTK are enabled and are not enabled permissively.	2023-12-20 04:57:55 -08:00
H.J. Lu	50bef9bd63	Fix elf: Do not duplicate the GLIBC_TUNABLES string commit `2a969b53c0` Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Wed Dec 6 10:24:01 2023 -0300 elf: Do not duplicate the GLIBC_TUNABLES string has @@ -38,7 +39,7 @@ which isn't available. */ #define CHECK_GLIBC_IFUNC_PREFERRED_OFF(f, cpu_features, name, len) \ _Static_assert (sizeof (#name) - 1 == len, #name " != " #len); \ - if (memcmp (f, #name, len) == 0) \ + if (tunable_str_comma_strcmp_cte (&f, #name) == 0) \ { \ cpu_features->preferred[index_arch_##name] \ &= ~bit_arch_##name; \ @@ -46,12 +47,11 @@ Fix it by removing "== 0" after tunable_str_comma_strcmp_cte.	2023-12-19 16:01:33 -08:00
H.J. Lu	cad5703e4f	Fix elf: Do not duplicate the GLIBC_TUNABLES string Fix issues in sysdeps/x86/tst-hwcap-tunables.c added by Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Wed Dec 6 10:24:01 2023 -0300 elf: Do not duplicate the GLIBC_TUNABLES string 1. -AVX,-AVX2,-AVX512F should be used to disable AVX, AVX2 and AVX512. 2. AVX512 IFUNC functions check AVX512VL. -AVX512VL should be added to disable these functions. This fixed: FAIL: elf/tst-hwcap-tunables ... [0] Spawned test for -Prefer_ERMS,-Prefer_FSRM,-AVX,-AVX2,-AVX_Usable,-AVX2_Usable,-AVX512F_Usable,-SSE4_1,-SSE4_2,-SSSE3,-Fast_Unaligned_Load,-ERMS,-AVX_Fast_Unaligned_Load error: subprocess failed: tst-tunables error: unexpected output from subprocess ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false [1] Spawned test for ,-,-Prefer_ERMS,-Prefer_FSRM,-AVX,-AVX2,-AVX_Usable,-AVX2_Usable,-AVX512F_Usable,-SSE4_1,-SSE4_2,,-SSSE3,-Fast_Unaligned_Load,,-,-ERMS,-AVX_Fast_Unaligned_Load,-, error: subprocess failed: tst-tunables error: unexpected output from subprocess ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false error: 2 test failures on Intel Tiger Lake.	2023-12-19 13:34:14 -08:00
Adhemerval Zanella	47a9eeb9ba	i686: Do not raise exception traps on fesetexcept (BZ 30989) According to ISO C23 (7.6.4.4), fesetexcept is supposed to set floating-point exception flags without raising a trap (unlike feraiseexcept, which is supposed to raise a trap if feenableexcept was called with the appropriate argument). The flags can be set in the 387 unit or in the SSE unit. To set a flag, it is sufficient to do it in the SSE unit, because that is guaranteed to not trap. However, on i386 CPUs that have only a 387 unit, set the flags in the 387, as long as this cannot trap. Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-12-19 15:12:38 -03:00
Adhemerval Zanella	2a969b53c0	elf: Do not duplicate the GLIBC_TUNABLES string The tunable parsing duplicates the tunable environment variable so it null-terminates each one since it simplifies the later parsing. It has the drawback of adding another point of failure (__minimal_malloc failing), and the memory copy requires tuning the compiler to avoid mem operations calls. The parsing now tracks the tunable start and its size. The dl-tunable-parse.h adds helper functions to help parsing, like a strcmp that also checks for size and an iterator for suboptions that are comma-separated (used on hwcap parsing by x86, powerpc, and s390x). Since the environment variable is allocated on the stack by the kernel, it is safe to keep the references to the suboptions for later parsing of string tunables (as done by set_hwcaps by multiple architectures). Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-12-19 13:25:45 -03:00
H.J. Lu	4d8a01d2b0	x86/cet: Check CPU_FEATURE_ACTIVE in permissive mode Verify that CPU_FEATURE_ACTIVE works properly in permissive mode.	2023-12-19 06:58:05 -08:00
H.J. Lu	28bd6f832d	x86/cet: Check legacy shadow stack code in .init_array section Verify that legacy shadow stack code in .init_array section in application and shared library, which are marked as shadow stack enabled, will trigger segfault.	2023-12-19 06:57:47 -08:00
H.J. Lu	9424ce80c2	x86/cet: Add tests for GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK Verify that GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK turns off shadow stack properly.	2023-12-19 06:57:39 -08:00
H.J. Lu	71c0cc3357	x86/cet: Check CPU_FEATURE_ACTIVE when CET is disabled Verify that CPU_FEATURE_ACTIVE (SHSTK) works properly when CET is disabled.	2023-12-19 06:57:32 -08:00
H.J. Lu	f418fe6f97	x86/cet: Check legacy shadow stack applications Add tests to verify that legacy shadow stack applications run properly when shadow stack is enabled in Linux kernel.	2023-12-19 06:57:27 -08:00
H.J. Lu	442983319b	x86/cet: Don't assume that SHSTK implies IBT Since shadow stack (SHSTK) is enabled in the Linux kernel without enabling indirect branch tracking (IBT), don't assume that SHSTK implies IBT. Use "CPU_FEATURE_ACTIVE (IBT)" to check if IBT is active and "CPU_FEATURE_ACTIVE (SHSTK)" to check if SHSTK is active.	2023-12-18 07:04:18 -08:00
H.J. Lu	0b850186fd	x86/cet: Check user_shstk in /proc/cpuinfo Linux kernel reports CPU shadow stack feature in /proc/cpuinfo as user_shstk, instead of shstk.	2023-12-17 10:42:06 -08:00
H.J. Lu	4753e92868	x86: Check PT_GNU_PROPERTY early The PT_GNU_PROPERTY segment is scanned before PT_NOTE. For binaries with the PT_GNU_PROPERTY segment, we can check it to avoid scan of the PT_NOTE segment. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-12-11 11:05:46 -08:00
H.J. Lu	7e03e0de7e	sysdeps/x86/Makefile: Split and sort tests Put each test on a separate line and sort tests.	2023-12-11 08:49:57 -08:00
Adhemerval Zanella	4862d546c0	x86: Use dl-symbol-redir-ifunc.h on cpu-tunables The dl-symbol-redir-ifunc.h redirects compiler-generated libcalls to arch-specific memory implementations to avoid ifunc calls where it is not yet possible. The memcmp-isa-default-impl.h aims to fix the same issue by calling the specific memset implementation directly. Using the memcmp symbol directly allows the compiler to inline the memset calls (especially because _dl_tunable_set_hwcaps uses constants values), generating better code. Checked on x86_64-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-11-21 16:15:42 -03:00
Noah Goldstein	d90b43a4ed	x86: Add support for AVX10 preset and vec size in cpu-features This commit add support for the new AVX10 cpu features: https://cdrdv2-public.intel.com/784267/355989-intel-avx10-spec.pdf We add checks for: - `AVX10`: Check if AVX10 is present. - `AVX10_{X,Y,Z}MM`: Check if a given vec class has AVX10 support. `make check` passes and cpuid output was checked against GNR/DMR on an emulator.	2023-09-29 14:18:42 -05:00
H.J. Lu	1493622f4f	x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643 ] The old Intel software developer manual specified that the low byte of EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of CPUDID leaf 2 was needed to retrieve the complete cache information. The newer Intel manual has been changed to that it should always return 1 and be ignored. If the lower byte isn't 1, CPUID leaf 2 can't be used. In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead. If CPUID leaf 4 doesn't contain the cache information, cache information isn't available at all. This addresses BZ #30643.	2023-08-29 12:57:41 -07:00
Noah Goldstein	084fb31bc2	x86: Fix incorrect scope of setting `shared_per_thread` [BZ# 30745] The: ``` if (shared_per_thread > 0 && threads > 0) shared_per_thread /= threads; ``` Code was accidentally moved to inside the else scope. This doesn't match how it was previously (before `af992e7abd`). This patch fixes that by putting the division after the `else` block.	2023-08-11 15:33:08 -05:00
Sajan Karumanchi	dcad5c8578	x86: Fix for cache computation on AMD legacy cpus. Some legacy AMD CPUs and hypervisors have the _cpuid_ '0x8000_001D' set to Zero, thus resulting in zeroed-out computed cache values. This patch reintroduces the old way of cache computation as a fail-safe option to handle these exceptions. Fixed 'level4_cache_size' value through handle_amd(). Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com> Tested-by: Florian Weimer <fweimer@redhat.com>	2023-08-06 19:10:42 +05:30
H.J. Lu	1547d6a64f	<sys/platform/x86.h>: Add APX support Add support for Intel Advanced Performance Extensions: https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html to <sys/platform/x86.h>.	2023-07-27 08:42:32 -07:00
Noah Goldstein	8b9a0af8ca	[PATCH v1] x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold. On some machines we end up with incomplete cache information. This can make the new calculation of `sizeof(total-L3)/custom-divisor` end up lower than intended (and lower than the prior value). So reintroduce the old bound as a lower bound to avoid potentially regressing code where we don't have complete information to make the decision. Reviewed-by: DJ Delorie <dj@redhat.com>	2023-07-18 22:34:34 -05:00
Noah Goldstein	47f7472178	x86: Fix slight bug in `shared_per_thread` cache size calculation. After: ``` commit `af992e7abd` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 7 13:18:01 2023 -0500 x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` ``` Split `shared` (cumulative cache size) from `shared_per_thread` (cache size per socket), the `shared_per_thread` can be slightly off from the previous calculation. Previously we added `core` even if `threads_l2` was invalid, and only used `threads_l2` to divide `core` if it was present. The changed version only included `core` if `threads_l2` was valid. This change restores the old behavior if `threads_l2` is invalid by adding the entire value of `core`. Reviewed-by: DJ Delorie <dj@redhat.com>	2023-07-18 20:56:25 -05:00
Siddhesh Poyarekar	c6cb8783b5	configure: Use autoconf 2.71 Bump autoconf requirement to 2.71 to allow regenerating configure on more recent distributions. autoconf 2.71 has been in Fedora since F36 and is the current version in Debian stable (bookworm). It appears to be current in Gentoo as well. All sysdeps configure and preconfigure scripts have also been regenerated; all changes are trivial transformations that do not affect functionality. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-07-17 10:08:10 -04:00
Sergey Bugaev	45e2483a6c	x86: Make dl-cache.h and readelflib.c not Linux-specific These files could be useful to any port that wants to use ld.so.cache. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-06-26 10:04:31 -03:00
Paul Pluzhnikov	4290aed051	Fix misspellings -- BZ 25337	2023-06-19 21:58:33 +00:00
Noah Goldstein	180897c161	x86: Make the divisor in setting `non_temporal_threshold` cpu specific Different systems prefer a different divisors. From benchmarks[1] so far the following divisors have been found: ICX : 2 SKX : 2 BWD : 8 For Intel, we are generalizing that BWD and older prefers 8 as a divisor, and SKL and newer prefers 2. This number can be further tuned as benchmarks are run. [1]: https://github.com/goldsteinn/memcpy-nt-benchmarks Reviewed-by: DJ Delorie <dj@redhat.com>	2023-06-12 11:33:39 -05:00
Noah Goldstein	f193ea20ed	x86: Refactor Intel `init_cpu_features` This patch should have no affect on existing functionality. The current code, which has a single switch for model detection and setting prefered features, is difficult to follow/extend. The cases use magic numbers and many microarchitectures are missing. This makes it difficult to reason about what is implemented so far and/or how/where to add support for new features. This patch splits the model detection and preference setting stages so that CPU preferences can be set based on a complete list of available microarchitectures, rather than based on model magic numbers. Reviewed-by: DJ Delorie <dj@redhat.com>	2023-06-12 11:33:39 -05:00
Noah Goldstein	af992e7abd	x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 / ncores_per_socket'. This patch updates that value to roughly 'sizeof_L3 / 4` The original value (specifically dividing the `ncores_per_socket`) was done to limit the amount of other threads' data a `memcpy`/`memset` could evict. Dividing by 'ncores_per_socket', however leads to exceedingly low non-temporal thresholds and leads to using non-temporal stores in cases where REP MOVSB is multiple times faster. Furthermore, non-temporal stores are written directly to main memory so using it at a size much smaller than L3 can place soon to be accessed data much further away than it otherwise could be. As well, modern machines are able to detect streaming patterns (especially if REP MOVSB is used) and provide LRU hints to the memory subsystem. This in affect caps the total amount of eviction at 1/cache_associativity, far below meaningfully thrashing the entire cache. As best I can tell, the benchmarks that lead this small threshold where done comparing non-temporal stores versus standard cacheable stores. A better comparison (linked below) is to be REP MOVSB which, on the measure systems, is nearly 2x faster than non-temporal stores at the low-end of the previous threshold, and within 10% for over 100MB copies (well past even the current threshold). In cases with a low number of threads competing for bandwidth, REP MOVSB is ~2x faster up to `sizeof_L3`. The divisor of `4` is a somewhat arbitrary value. From benchmarks it seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs such as Broadwell prefer something closer to `8`. This patch is meant to be followed up by another one to make the divisor cpu-specific, but in the meantime (and for easier backporting), this patch settles on `4` as a middle-ground. Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable stores where done using: https://github.com/goldsteinn/memcpy-nt-benchmarks Sheets results (also available in pdf on the github): https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-06-12 11:33:39 -05:00
Paul Pluzhnikov	2cbeda847b	Fix a few more typos I missed in previous round -- BZ 25337	2023-06-02 23:46:32 +00:00
Paul Pluzhnikov	65cc53fe7c	Fix misspellings in sysdeps/ -- BZ 25337	2023-05-30 23:02:29 +00:00
Noah Goldstein	ed2f9dc942	x86: Use 64MB as nt-store threshold if no cacheinfo [BZ #30429 ] If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, it almost certainly means we failed to read the systems cache info. In this case, rather than defaulting the minimum correct value, we should default to a value that gets at least reasonable performance. 64MB is chosen conservatively to be at the very high end. This should never cause non-temporal stores when, if we had read cache info, we wouldn't have otherwise. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2023-05-27 21:32:57 -05:00
H.J. Lu	81a3cc956e	<sys/platform/x86.h>: Add PREFETCHI support Add PREFETCHI support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	b05521c916	<sys/platform/x86.h>: Add AMX-COMPLEX support Add AMX-COMPLEX support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	609b7b2d3c	<sys/platform/x86.h>: Add AVX-NE-CONVERT support Add AVX-NE-CONVERT support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	4c120c88a6	<sys/platform/x86.h>: Add AVX-VNNI-INT8 support Add AVX-VNNI-INT8 support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	b39741b45f	<sys/platform/x86.h>: Add MSRLIST support Add MSRLIST support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	96037c697d	<sys/platform/x86.h>: Add AVX-IFMA support Add AVX-IFMA support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	8b4cc05eab	<sys/platform/x86.h>: Add AMX-FP16 support Add AMX-FP16 support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	227983551d	<sys/platform/x86.h>: Add WRMSRNS support Add WRMSRNS support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	a00db8305d	<sys/platform/x86.h>: Add ArchPerfmonExt support Add Architectural Performance Monitoring Extended Leaf (EAX = 23H) support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	2f02d0d8e1	<sys/platform/x86.h>: Add CMPCCXADD support Add CMPCCXADD support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	aa528a579b	<sys/platform/x86.h>: Add LASS support Add Linear Address Space Separation (LASS) support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	231bf916ce	<sys/platform/x86.h>: Add RAO-INT support Add RAO-INT support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	fb90dc8513	<sys/platform/x86.h>: Add LBR support Add architectural LBR support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	f47b7d96fb	<sys/platform/x86.h>: Add RTM_FORCE_ABORT support Add RTM_FORCE_ABORT support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	f6790a489d	<sys/platform/x86.h>: Add SGX-KEYS support Add SGX-KEYS support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	09cc5fee21	<sys/platform/x86.h>: Add BUS_LOCK_DETECT support Add Bus lock debug exceptions (BUS_LOCK_DETECT) support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	8c8e391166	<sys/platform/x86.h>: Add LA57 support Add 57-bit linear addresses and five-level paging (LA57) support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	2d8c590a5e	<bits/platform/x86.h>: Rename to x86_cpu_INDEX_7_ECX_15 Rename x86_cpu_INDEX_7_ECX_1 to x86_cpu_INDEX_7_ECX_15 for the unused bit 15 in ECX from CPUID with EAX == 0x7 and ECX == 0. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
Andreas Schwab	856bab7717	x86/dl-cacheinfo: remove unsused parameter from handle_amd Also replace an unreachable assert with __builtin_unreachable.	2023-04-04 16:16:21 +02:00
H.J. Lu	743113d42e	x86: Set FSGSBASE to active if enabled by kernel Linux kernel uses AT_HWCAP2 to indicate if FSGSBASE instructions are enabled. If the HWCAP2_FSGSBASE bit in AT_HWCAP2 is set, FSGSBASE instructions can be used in user space. Define dl_check_hwcap2 to set the FSGSBASE feature to active on Linux when the HWCAP2_FSGSBASE bit is set. Add a test to verify that FSGSBASE is active on current kernels. NB: This test will fail if the kernel doesn't set the HWCAP2_FSGSBASE bit in AT_HWCAP2 while fsgsbase shows up in /proc/cpuinfo. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2023-04-03 11:36:48 -07:00
Adhemerval Zanella Netto	33237fe83d	Remove --enable-tunables configure option And make always supported. The configure option was added on glibc 2.25 and some features require it (such as hwcap mask, huge pages support, and lock elisition tuning). It also simplifies the build permutations. Changes from v1: * Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs more discussion. * Cleanup more code. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-03-29 14:33:06 -03:00
DJ Delorie	db9b47e9f9	x86: Don't check PREFETCHWT1 in tst-cpu-features-cpuinfo.c Don't check PREFETCHWT1 against /proc/cpuinfo since kernel doesn't report PREFETCHWT1 in /proc/cpuinfo. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-03-21 17:49:49 -04:00
caiyinyu	4c721f24fc	x86: Fix bug about glibc.cpu.hwcaps. Recorded in [BZ #30183]: 1. export GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX512 2. Add _dl_printf("p -- %s\n", p); just before switch(nl) in sysdeps/x86/cpu-tunables.c 3. compiled and run ./testrun.sh /usr/bin/ls you will get: p -- -AVX512 p -- LC_ADDRESS=en_US.UTF-8 p -- LC_NUMERIC=C ... The function, TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp), checks far more than it should and it should stop at end of "-AVX512".	2023-03-07 21:42:25 +08:00
H.J. Lu	317f1c0a8a	x86-64: Add glibc.cpu.prefer_map_32bit_exec [BZ #28656 ] Crossing 2GB boundaries with indirect calls and jumps can use more branch prediction resources on Intel Golden Cove CPU (see the "Misprediction for Branches >2GB" section in Intel 64 and IA-32 Architectures Optimization Reference Manual.) There is visible performance improvement on workloads with many PLT calls when executable and shared libraries are mmapped below 2GB. Add the Prefer_MAP_32BIT_EXEC bit so that mmap will try to map executable or denywrite pages in shared libraries with MAP_32BIT first. NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space layout randomization (ASLR), which is always disabled for SUID programs and can only be enabled by the tunable, glibc.cpu.prefer_map_32bit_exec, or the environment variable, LD_PREFER_MAP_32BIT_EXEC. This works only between shared libraries or between shared libraries and executables with addresses below 2GB. PIEs are usually loaded at a random address above 4GB by the kernel.	2023-02-22 18:28:37 -08:00
Adhemerval Zanella	a9b3b770f5	string: Remove string_private.h Now that _STRING_ARCH_unaligned is not used anymore. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2023-02-17 15:56:54 -03:00
Samuel Thibault	bfb583e791	htl: Generalize i386 pt-machdep.h to x86	2023-02-12 16:33:39 +01:00
Sajan Karumanchi	103a469dc7	x86: Cache computation for AMD architecture. All AMD architectures cache details will be computed based on __cpuid__ `0x8000_001D` and the reference to __cpuid__ `0x8000_0006` will be zeroed out for future architectures. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>	2023-01-18 19:28:54 +01:00
Joseph Myers	6d7e8eda9b	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
H.J. Lu	48b74865c6	x86: Check minimum/maximum of non_temporal_threshold [BZ #29953 ] The minimum non_temporal_threshold is 0x4040. non_temporal_threshold may be set to less than the minimum value when the shared cache size isn't available (e.g., in an emulator) or by the tunable. Add checks for minimum and maximum of non_temporal_threshold. This fixes BZ #29953.	2023-01-03 13:25:50 -08:00
Adhemerval Zanella	8d6083717c	x86_64: State assembler is being tested on sysdeps/x86/configure	2022-12-06 13:47:47 -03:00
Adhemerval Zanella	2b0da5028d	configure: Remove AS check The assembler is not issued directly, but rather always through CC wrapper. The binutils version check if done with LD instead. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2022-12-06 09:40:19 -03:00
Javier Pello	ab40f20364	elf: Remove _dl_string_hwcap Removal of legacy hwcaps support from the dynamic loader left no users of _dl_string_hwcap. Signed-off-by: Javier Pello <devel@otheo.eu> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-10-06 07:59:48 -03:00
Aurelien Jarno	7e8283170c	x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk' instruction which belongs to the BMI1 CPU feature and the 'shrx' instruction, which belongs to the BMI2 CPU feature. Fixes: `df7e295d18` ("x86: Optimize {str\|wcs}rchr-avx2") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	3c0c78afab	x86-64: Require BMI2 and LZCNT for AVX2 memrchr implementation The AVX2 memrchr implementation uses the 'shlxl' instruction, which belongs to the BMI2 CPU feature and uses the 'lzcnt' instruction, which belongs to the LZCNT CPU feature. Fixes: `af5306a735` ("x86: Optimize memrchr-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	b80f16adbd	x86: include BMI1 and BMI2 in x86-64-v3 level The "System V Application Binary Interface AMD64 Architecture Processor Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3 level. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Joseph Myers	3e5760fcb4	Update _FloatN header support for C++ in GCC 13 GCC 13 adds support for _FloatN and _FloatNx types in C++, so breaking the installed glibc headers that assume such support is not present. GCC mostly works around this with fixincludes, but that doesn't help for building glibc and its tests (glibc doesn't itself contain C++ code, but there's C++ code built for tests). Update glibc's bits/floatn-common.h and bits/floatn.h headers to handle the GCC 13 support directly. In general the changes match those made by fixincludes, though I think the ones in sysdeps/powerpc/bits/floatn.h, where the header tests __LDBL_MANT_DIG__ == 113 or uses #elif, wouldn't match the existing fixincludes patterns. Some places involving special C++ handling in relation to _FloatN support are not changed. There's no need to change the __HAVE_FLOATN_NOT_TYPEDEF definition (also in a form that wouldn't be matched by the fixincludes fixes) because it's only used in relation to macro definitions using features not supported for C++ (__builtin_types_compatible_p and _Generic). And there's no need to change the inline function overloads for issignaling, iszero and iscanonical in C++ because cases where types have the same format but are no longer compatible types are handled automatically by the C++ overload resolution rules. This patch also does not change the overload handling for iseqsig, and there I think changes are needed, beyond those in this patch or made by fixincludes. The way that overload is defined, via a template parameter to a structure type, requires overloads whenever the types are incompatible, even if they have the same format. So I think we need to add overloads with GCC 13 for every supported _FloatN and _FloatNx type, rather than just having one for _Float128 when it has a different ABI to long double as at present (but for older GCC, such overloads must not be defined for types that end up defined as typedefs for another type). Tested with build-many-glibcs.py: compilers build for aarch64-linux-gnu ia64-linux-gnu mips64-linux-gnu powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu; glibcs build for aarch64-linux-gnu ia64-linux-gnu i686-linux-gnu mips-linux-gnu mips64-linux-gnu-n32 powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu.	2022-09-28 20:10:08 +00:00
Adhemerval Zanella	2fc7320668	math: x86: Use prefix for FP_INIT_ROUNDMODE Not all compilers support the inline asm prefix '%v' to emit the avx instruction if AVX is enable. Use a prefix instead. Checked on x86_64-linux-gnu and i686-linux-gnu.	2022-09-05 10:54:41 -03:00
Adhemerval Zanella	8cd559cf5a	nptl: x86_64: Use same code for CURRENT_STACK_FRAME and stackinfo_get_sp It avoids the possible warning of uninitialized 'frame' variable when building with clang: ../sysdeps/nptl/jmp-unwind.c:27:42: error: variable 'frame' is uninitialized when used here [-Werror,-Wuninitialized] __pthread_cleanup_upto (env->__jmpbuf, CURRENT_STACK_FRAME); The resulting code is similar to CURRENT_STACK_FRAME. Checked on x86_64-linux-gnu.	2022-08-31 09:04:27 -03:00
Noah Goldstein	ceabdcd130	x86: Add support to build strcmp/strlen/strchr with explicit ISA level 1. Add default ISA level selection in non-multiarch/rtld implementations. 2. Add ISA level build guards to different implementations. - I.e strcmp-avx2.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (strcmp-evex.S). 3. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-16 03:07:59 -07:00
Noah Goldstein	7c8ca17893	x86: Add missing rtm tests for strcmp family Add new tests for: strcasecmp strncasecmp strcmp wcscmp These functions all have avx2_rtm implementations so should be tested.	2022-07-13 14:55:31 -07:00
Noah Goldstein	ae308947ff	x86: Add support for building {w}memcmp{eq} with explicit ISA level 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memcmp sse2 from memcmp.S to multiarch/memcmp-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memcmp-evex-movbe.S). 3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the non-multiarch {w}memcmp{eq}.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-05 16:42:42 -07:00
Noah Goldstein	a3563f3f36	x86: Add more feature definitions to isa-level.h This commit doesn't change anything in itself. It is just to add definitions that will be needed by future patches.	2022-06-28 08:24:56 -07:00
H.J. Lu	cfdc4df66c	x86-64: Only define used SSE/AVX/AVX512 run-time resolvers When glibc is built with x86-64 ISA level v3, SSE run-time resolvers aren't used. For x86-64 ISA level v4 build, both SSE and AVX resolvers are unused. Check the minimum x86-64 ISA level to exclude the unused run-time resolvers.	2022-06-27 14:17:52 -07:00
H.J. Lu	f56c497d2b	x86: Move CPU_FEATURE{S}_{USABLE\|ARCH}_P to isa-level.h Move X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P to where MINIMUM_X86_ISA_LEVEL and XXX_X86_ISA_LEVEL are defined.	2022-06-27 12:52:58 -07:00
Noah Goldstein	4fc321dc58	x86: Fix backwards Prefer_No_VZEROUPPER check in ifunc-evex.h Add third argument to X86_ISA_CPU_FEATURES_ARCH_P macro so the runtime CPU_FEATURES_ARCH_P check can be inverted if the MINIMUM_X86_ISA_LEVEL is not high enough to constantly evaluate the check. Use this new macro to correct the backwards check in ifunc-evex.h	2022-06-27 08:35:51 -07:00
Noah Goldstein	703f434108	x86: Add defines / utilities for making ISA specific x86 builds 1. Factor out some of the ISA level defines in isa-level.c to standalone header isa-level.h 2. Add new headers with ISA level dependent macros for handling ifuncs. Note, this file does not change any code. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-06-22 19:41:35 -07:00
Noah Goldstein	8da9f346cb	x86: Add BMI1/BMI2 checks for ISA_V3 check BMI1/BMI2 are part of the ISA V3 requirements: https://en.wikipedia.org/wiki/X86-64 And defined by GCC when building with `-march=x86-64-v3`	2022-06-16 20:17:45 -07:00
Noah Goldstein	b446822b6a	x86: Add bounds `x86_non_temporal_threshold` The lower-bound (16448) and upper-bound (SIZE_MAX / 16) are assumed by memmove-vec-unaligned-erms. The lower-bound is needed because memmove-vec-unaligned-erms unrolls the loop aggressively in the L(large_memset_4x) case. The upper-bound is needed because memmove-vec-unaligned-erms right-shifts the value of `x86_non_temporal_threshold` by LOG_4X_MEMCPY_THRESH (4) which without a bound may overflow. The lack of lower-bound can be a correctness issue. The lack of upper-bound cannot.	2022-06-15 14:25:55 -07:00
Fangrui Song	de38b2a343	elf: Remove ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA If an executable has copy relocations for extern protected data, that can only work if the library containing the definition is built with assumptions (a) the compiler emits GOT-generating relocations (b) the linker produces R__GLOB_DAT instead of R__RELATIVE. Otherwise the library uses its own definition directly and the executable accesses a stale copy. Note: the GOT relocations defeat the purpose of protected visibility as an optimization, but allow rtld to make the executable and library use the same copy when copy relocations are present, but it turns out this never worked perfectly. ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA has strange semantics when both a.so and b.so define protected var and the executable copy relocates var: b.so accesses its own copy even with GLOB_DAT. The behavior change is from commit `62da1e3b00` (x86) and then copied to nios2 (`ae5eae7cfc`) and arc (`0e7d930c4c`). Without ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA, b.so accesses the copy relocated data like a.so. There is now a warning for copy relocation on protected symbol since commit `7374c02b68`. It's extremely unlikely anyone relies on the ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA behavior, so let's remove it: this removes a check in the symbol lookup code.	2022-06-15 11:29:55 -07:00
Noah Goldstein	0355915514	x86: Fix misordered logic for setting `rep_movsb_stop_threshold` Move the setting of `rep_movsb_stop_threshold` to after the tunables have been collected so that the `rep_movsb_stop_threshold` (which is used to redirect control flow to the non_temporal case) will use any user value for `non_temporal_threshold` (set using glibc.cpu.x86_non_temporal_threshold)	2022-06-14 20:58:07 -07:00
Noah Goldstein	9a421348cd	elf: Optimize _dl_new_hash in dl-new-hash.h Unroll slightly and enforce good instruction scheduling. This improves performance on out-of-order machines. The unrolling allows for pipelined multiplies. As well, as an optional sysdep, reorder the operations and prevent reassosiation for better scheduling and higher ILP. This commit only adds the barrier for x86, although it should be either no change or a win for any architecture. Unrolling further started to induce slowdowns for sizes [0, 4] but can help the loop so if larger sizes are the target further unrolling can be beneficial. Results for _dl_new_hash Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Time as Geometric Mean of N=30 runs Geometric of all benchmark New / Old: 0.674 type, length, New Time, Old Time, New Time / Old Time fixed, 0, 2.865, 2.72, 1.053 fixed, 1, 3.567, 2.489, 1.433 fixed, 2, 2.577, 3.649, 0.706 fixed, 3, 3.644, 5.983, 0.609 fixed, 4, 4.211, 6.833, 0.616 fixed, 5, 4.741, 9.372, 0.506 fixed, 6, 5.415, 9.561, 0.566 fixed, 7, 6.649, 10.789, 0.616 fixed, 8, 8.081, 11.808, 0.684 fixed, 9, 8.427, 12.935, 0.651 fixed, 10, 8.673, 14.134, 0.614 fixed, 11, 10.69, 15.408, 0.694 fixed, 12, 10.789, 16.982, 0.635 fixed, 13, 12.169, 18.411, 0.661 fixed, 14, 12.659, 19.914, 0.636 fixed, 15, 13.526, 21.541, 0.628 fixed, 16, 14.211, 23.088, 0.616 fixed, 32, 29.412, 52.722, 0.558 fixed, 64, 65.41, 142.351, 0.459 fixed, 128, 138.505, 295.625, 0.469 fixed, 256, 291.707, 601.983, 0.485 random, 2, 12.698, 12.849, 0.988 random, 4, 16.065, 15.857, 1.013 random, 8, 19.564, 21.105, 0.927 random, 16, 23.919, 26.823, 0.892 random, 32, 31.987, 39.591, 0.808 random, 64, 49.282, 71.487, 0.689 random, 128, 82.23, 145.364, 0.566 random, 256, 152.209, 298.434, 0.51 Co-authored-by: Alexander Monakov <amonakov@ispras.ru> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2022-05-23 10:38:40 -05:00
Fangrui Song	098a657fe4	elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage variables and hidden visibility variables in a shared object (ld.so) need dynamic relocations (usually R_*_RELATIVE). PI (position independent) in the macro name is a misnomer: a code sequence using GOT is typically position-independent as well, but using dynamic relocations does not meet the requirement. Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new ports will define PI_STATIC_AND_HIDDEN. Current ports defining PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure default. No functional change. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-04-26 09:26:22 -07:00
Noah Goldstein	9fef7039a7	x86: Fix fallback for wcsncmp_avx2 in strcmp-avx2.S [BZ #28896 ] Overflow case for __wcsncmp_avx2_rtm should be __wcscmp_avx2_rtm not __wcscmp_avx2. commit `ddf0992cf5` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:21 2022 -0600 x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755] Set the wrong fallback function for `__wcsncmp_avx2_rtm`. It was set to fallback on to `__wcscmp_avx2` instead of `__wcscmp_avx2_rtm` which can cause spurious aborts. This change will need to be backported. All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-03-25 11:46:13 -05:00

1 2 3 4 5 ...

545 Commits