glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-26 15:00:06 +00:00

Author	SHA1	Message	Date
H.J. Lu	b45115a666	x86: Move x86-64 shadow stack startup codes Move sysdeps/x86/libc-start.h to sysdeps/x86_64/libc-start.h and use sysdeps/generic/libc-start.h for i386. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-10 05:19:32 -08:00
Adhemerval Zanella	25f1e16ef0	i386: Remove CET support CET is only support for x86_64, this patch reverts: - `faaee1f07e` x86: Support shadow stack pointer in setjmp/longjmp. - `be9ccd27c0` i386: Add _CET_ENDBR to indirect jump targets in add_n.S/sub_n.S - `c02695d776` x86/CET: Update vfork to prevent child return - `5d844e1b72` i386: Enable CET support in ucontext functions - `124bcde683` x86: Add _CET_ENDBR to functions in crti.S - `562837c002` x86: Add _CET_ENDBR to functions in dl-tlsdesc.S - `f753fa7dea` x86: Support IBT and SHSTK in Intel CET [BZ #21598] - `825b58f3fb` i386-mcount.S: Add _CET_ENDBR to _mcount and __fentry__ - `7e119cd582` i386: Use _CET_NOTRACK in i686/memcmp.S - `177824e232` i386: Use _CET_NOTRACK in memcmp-sse4.S - `0a899af097` i386: Use _CET_NOTRACK in memcpy-ssse3-rep.S - `7fb613361c` i386: Use _CET_NOTRACK in memcpy-ssse3.S - `77a8ae0948` i386: Use _CET_NOTRACK in memset-sse2-rep.S - `00e7b76a8f` i386: Use _CET_NOTRACK in memset-sse2.S - `90d15dc577` i386: Use _CET_NOTRACK in strcat-sse2.S - `f1574581c7` i386: Use _CET_NOTRACK in strcpy-sse2.S - `4031d7484a` i386/sub_n.S: Add a missing _CET_ENDBR to indirect jump - target - Checked on i686-linux-gnu.	2024-01-09 13:55:51 -03:00
Adhemerval Zanella	b7fc4a07f2	x86: Move CET infrastructure to x86_64 The CET is only supported for x86_64 and there is no plan to add kernel support for i386. Move the Makefile rules and files from the generic x86 folder to x86_64 one. Checked on x86_64-linux-gnu and i686-linux-gnu.	2024-01-09 13:55:51 -03:00
Adhemerval Zanella	460860f457	Remove ia64-linux-gnu Linux 6.7 removed ia64 from the official tree [1], following the general principle that a glibc port needs upstream support for the architecture in all the components it depends on (binutils, GCC, and the Linux kernel). Apart from the removal of sysdeps/ia64 and sysdeps/unix/sysv/linux/ia64, there are updates to various comments referencing ia64 for which removal of those references seemed appropriate. The configuration is removed from README and build-many-glibcs.py. The CONTRIBUTED-BY, elf/elf.h, manual/contrib.texi (the porting mention), *.po files, config.guess, and longlong.h are not changed. For Linux it allows cleanup some clone2 support on multiple files. The following bug can be closed as WONTFIX: BZ 22634 [2], BZ 14250 [3], BZ 21634 [4], BZ 10163 [5], BZ 16401 [6], and BZ 11585 [7]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=43ff221426d33db909f7159fdf620c3b052e2d1c [2] https://sourceware.org/bugzilla/show_bug.cgi?id=22634 [3] https://sourceware.org/bugzilla/show_bug.cgi?id=14250 [4] https://sourceware.org/bugzilla/show_bug.cgi?id=21634 [5] https://sourceware.org/bugzilla/show_bug.cgi?id=10163 [6] https://sourceware.org/bugzilla/show_bug.cgi?id=16401 [7] https://sourceware.org/bugzilla/show_bug.cgi?id=11585 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-01-08 17:09:36 -03:00
H.J. Lu	848746e88e	elf: Add ELF_DYNAMIC_AFTER_RELOC to rewrite PLT Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after relocation. For x86-64, add #define DT_X86_64_PLT (DT_LOPROC + 0) #define DT_X86_64_PLTSZ (DT_LOPROC + 1) #define DT_X86_64_PLTENT (DT_LOPROC + 3) 1. DT_X86_64_PLT: The address of the procedure linkage table. 2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage table. 3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table entry. With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the memory offset of the indirect branch instruction. Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section with direct branch after relocation when the lazy binding is disabled. PLT rewrite is disabled by default since SELinux may disallow modifying code pages and ld.so can't detect it in all cases. Use $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1 to enable PLT rewrite with 32-bit direct jump at run-time or $ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2 to enable PLT rewrite with 32-bit direct jump and on APX processors with 64-bit absolute jump at run-time. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-05 05:49:49 -08:00
H.J. Lu	35694d3416	x86-64/cet: Check the restore token in longjmp setcontext and swapcontext put a restore token on the old shadow stack which is used to restore the target shadow stack when switching user contexts. When longjmp from a user context, the target shadow stack can be different from the current shadow stack and INCSSP can't be used to restore the shadow stack pointer to the target shadow stack. Update longjmp to search for a restore token. If found, use the token to restore the shadow stack pointer before using INCSSP to pop the shadow stack. Stop the token search and use INCSSP if the shadow stack entry value is the same as the current shadow stack pointer. It is a user error if there is a shadow stack switch without leaving a restore token on the old shadow stack. The only difference between __longjmp.S and __longjmp_chk.S is that __longjmp_chk.S has a check for invalid longjmp usages. Merge __longjmp.S and __longjmp_chk.S by adding the CHECK_INVALID_LONGJMP macro. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-04 13:38:26 -08:00
H.J. Lu	bbfb54930c	i386: Ignore --enable-cet Since shadow stack is only supported for x86-64, ignore --enable-cet for i386. Always setting $(enable-cet) for i386 to "no" to support ifneq ($(enable-cet),no) in x86 Makefiles. We can't use ifeq ($(enable-cet),yes) since $(enable-cet) can be "yes", "no" or "permissive". Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-04 06:08:55 -08:00
H.J. Lu	b5dcccfb12	x86/cet: Add -fcf-protection=none before -fcf-protection=branch When shadow stack is enabled, some CET tests failed when compiled with GCC 14: FAIL: elf/tst-cet-legacy-4 FAIL: elf/tst-cet-legacy-5a FAIL: elf/tst-cet-legacy-6a which are caused by https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113039 These tests use -fcf-protection -fcf-protection=branch and assume that -fcf-protection=branch will override -fcf-protection. But this GCC 14 commit: https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=1c6231c05bdcca changed the -fcf-protection behavior such that -fcf-protection -fcf-protection=branch is treated the same as -fcf-protection Use -fcf-protection -fcf-protection=none -fcf-protection=branch as the workaround. This fixes BZ #31187. Tested with GCC 13 and GCC 14 on Intel Tiger Lake. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-01-01 15:53:52 -08:00
Paul Eggert	dff8da6b3e	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
H.J. Lu	cf9481724b	x86/cet: Run some CET tests with shadow stack When CET is disabled by default, run some CET tests with shadow stack enabled using $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK	2024-01-01 05:22:48 -08:00
H.J. Lu	55d63e7312	x86/cet: Don't set CET active by default Not all CET enabled applications and libraries have been properly tested in CET enabled environments. Some CET enabled applications or libraries will crash or misbehave when CET is enabled. Don't set CET active by default so that all applications and libraries will run normally regardless of whether CET is active or not. Shadow stack can be enabled by $ export GLIBC_TUNABLES=glibc.cpu.hwcaps=SHSTK at run-time if shadow stack can be enabled by kernel. NB: This commit can be reverted if it is OK to enable CET by default for all applications and libraries.	2024-01-01 05:22:48 -08:00
H.J. Lu	d360dcc001	x86/cet: Check feature_1 in TCB for active IBT and SHSTK Initially, IBT and SHSTK are marked as active when CPU supports them and CET are enabled in glibc. They can be disabled early by tunables before relocation. Since after relocation, GLRO(dl_x86_cpu_features) becomes read-only, we can't update GLRO(dl_x86_cpu_features) to mark IBT and SHSTK as inactive. Instead, check the feature_1 field in TCB to decide if IBT and SHST are active.	2024-01-01 05:22:48 -08:00
H.J. Lu	541641a3de	x86/cet: Enable shadow stack during startup Previously, CET was enabled by kernel before passing control to user space and the startup code must disable CET if applications or shared libraries aren't CET enabled. Since the current kernel only supports shadow stack and won't enable shadow stack before passing control to user space, we need to enable shadow stack during startup if the application and all shared library are shadow stack enabled. There is no need to disable shadow stack at startup. Shadow stack can only be enabled in a function which will never return. Otherwise, shadow stack will underflow at the function return. 1. GL(dl_x86_feature_1) is set to the CET features which are supported by the processor and are not disabled by the tunable. Only non-zero features in GL(dl_x86_feature_1) should be enabled. After enabling shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check if shadow stack is really enabled. 2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable. It is safe since RTLD_START never returns. 3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static executable. Since the start function using ARCH_SETUP_TLS never returns, it is safe to enable shadow stack in ARCH_SETUP_TLS.	2024-01-01 05:22:48 -08:00
H.J. Lu	edb5e0c8f9	x86/cet: Sync with Linux kernel 6.6 shadow stack interface Sync with Linux kernel 6.6 shadow stack interface. Since only x86-64 is supported, i386 shadow stack codes are unchanged and CET shouldn't be enabled for i386. 1. When the shadow stack base in TCB is unset, the default shadow stack is in use. Use the current shadow stack pointer as the marker for the default shadow stack. It is used to identify if the current shadow stack is the same as the target shadow stack when switching ucontexts. If yes, INCSSP will be used to unwind shadow stack. Otherwise, shadow stack restore token will be used. 2. Allocate shadow stack with the map_shadow_stack syscall. Since there is no function to explicitly release ucontext, there is no place to release shadow stack allocated by map_shadow_stack in ucontext functions. Such shadow stacks will be leaked. 3. Rename arch_prctl CET commands to ARCH_SHSTK_XXX. 4. Rewrite the CET control functions with the current kernel shadow stack interface. Since CET is no longer enabled by kernel, a separate patch will enable shadow stack during startup.	2024-01-01 05:22:48 -08:00
H.J. Lu	41560a9312	x86/cet: Don't disable CET if not single threaded In permissive mode, don't disable IBT nor SHSTK when dlopening a legacy shared library if not single threaded since IBT and SHSTK may be still enabled in other threads. Other threads with IBT or SHSTK enabled will crash when calling functions in the legacy shared library. Instead, an error will be issued.	2023-12-20 05:03:37 -08:00
H.J. Lu	c04035809a	x86: Modularize sysdeps/x86/dl-cet.c Improve readability and make maintenance easier for dl-feature.c by modularizing sysdeps/x86/dl-cet.c: 1. Support processors with: a. Only IBT. Or b. Only SHSTK. Or c. Both IBT and SHSTK. 2. Lock CET features only if IBT or SHSTK are enabled and are not enabled permissively.	2023-12-20 04:57:55 -08:00
H.J. Lu	50bef9bd63	Fix elf: Do not duplicate the GLIBC_TUNABLES string commit `2a969b53c0` Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Wed Dec 6 10:24:01 2023 -0300 elf: Do not duplicate the GLIBC_TUNABLES string has @@ -38,7 +39,7 @@ which isn't available. */ #define CHECK_GLIBC_IFUNC_PREFERRED_OFF(f, cpu_features, name, len) \ _Static_assert (sizeof (#name) - 1 == len, #name " != " #len); \ - if (memcmp (f, #name, len) == 0) \ + if (tunable_str_comma_strcmp_cte (&f, #name) == 0) \ { \ cpu_features->preferred[index_arch_##name] \ &= ~bit_arch_##name; \ @@ -46,12 +47,11 @@ Fix it by removing "== 0" after tunable_str_comma_strcmp_cte.	2023-12-19 16:01:33 -08:00
H.J. Lu	cad5703e4f	Fix elf: Do not duplicate the GLIBC_TUNABLES string Fix issues in sysdeps/x86/tst-hwcap-tunables.c added by Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Wed Dec 6 10:24:01 2023 -0300 elf: Do not duplicate the GLIBC_TUNABLES string 1. -AVX,-AVX2,-AVX512F should be used to disable AVX, AVX2 and AVX512. 2. AVX512 IFUNC functions check AVX512VL. -AVX512VL should be added to disable these functions. This fixed: FAIL: elf/tst-hwcap-tunables ... [0] Spawned test for -Prefer_ERMS,-Prefer_FSRM,-AVX,-AVX2,-AVX_Usable,-AVX2_Usable,-AVX512F_Usable,-SSE4_1,-SSE4_2,-SSSE3,-Fast_Unaligned_Load,-ERMS,-AVX_Fast_Unaligned_Load error: subprocess failed: tst-tunables error: unexpected output from subprocess ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false [1] Spawned test for ,-,-Prefer_ERMS,-Prefer_FSRM,-AVX,-AVX2,-AVX_Usable,-AVX2_Usable,-AVX512F_Usable,-SSE4_1,-SSE4_2,,-SSSE3,-Fast_Unaligned_Load,,-,-ERMS,-AVX_Fast_Unaligned_Load,-, error: subprocess failed: tst-tunables error: unexpected output from subprocess ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false ../sysdeps/x86/tst-hwcap-tunables.c:91: numeric comparison failure left: 1 (0x1); from: impls[i].usable right: 0 (0x0); from: false error: 2 test failures on Intel Tiger Lake.	2023-12-19 13:34:14 -08:00
Adhemerval Zanella	47a9eeb9ba	i686: Do not raise exception traps on fesetexcept (BZ 30989) According to ISO C23 (7.6.4.4), fesetexcept is supposed to set floating-point exception flags without raising a trap (unlike feraiseexcept, which is supposed to raise a trap if feenableexcept was called with the appropriate argument). The flags can be set in the 387 unit or in the SSE unit. To set a flag, it is sufficient to do it in the SSE unit, because that is guaranteed to not trap. However, on i386 CPUs that have only a 387 unit, set the flags in the 387, as long as this cannot trap. Checked on i686-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-12-19 15:12:38 -03:00
Adhemerval Zanella	2a969b53c0	elf: Do not duplicate the GLIBC_TUNABLES string The tunable parsing duplicates the tunable environment variable so it null-terminates each one since it simplifies the later parsing. It has the drawback of adding another point of failure (__minimal_malloc failing), and the memory copy requires tuning the compiler to avoid mem operations calls. The parsing now tracks the tunable start and its size. The dl-tunable-parse.h adds helper functions to help parsing, like a strcmp that also checks for size and an iterator for suboptions that are comma-separated (used on hwcap parsing by x86, powerpc, and s390x). Since the environment variable is allocated on the stack by the kernel, it is safe to keep the references to the suboptions for later parsing of string tunables (as done by set_hwcaps by multiple architectures). Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-12-19 13:25:45 -03:00
H.J. Lu	4d8a01d2b0	x86/cet: Check CPU_FEATURE_ACTIVE in permissive mode Verify that CPU_FEATURE_ACTIVE works properly in permissive mode.	2023-12-19 06:58:05 -08:00
H.J. Lu	28bd6f832d	x86/cet: Check legacy shadow stack code in .init_array section Verify that legacy shadow stack code in .init_array section in application and shared library, which are marked as shadow stack enabled, will trigger segfault.	2023-12-19 06:57:47 -08:00
H.J. Lu	9424ce80c2	x86/cet: Add tests for GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK Verify that GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK turns off shadow stack properly.	2023-12-19 06:57:39 -08:00
H.J. Lu	71c0cc3357	x86/cet: Check CPU_FEATURE_ACTIVE when CET is disabled Verify that CPU_FEATURE_ACTIVE (SHSTK) works properly when CET is disabled.	2023-12-19 06:57:32 -08:00
H.J. Lu	f418fe6f97	x86/cet: Check legacy shadow stack applications Add tests to verify that legacy shadow stack applications run properly when shadow stack is enabled in Linux kernel.	2023-12-19 06:57:27 -08:00
H.J. Lu	442983319b	x86/cet: Don't assume that SHSTK implies IBT Since shadow stack (SHSTK) is enabled in the Linux kernel without enabling indirect branch tracking (IBT), don't assume that SHSTK implies IBT. Use "CPU_FEATURE_ACTIVE (IBT)" to check if IBT is active and "CPU_FEATURE_ACTIVE (SHSTK)" to check if SHSTK is active.	2023-12-18 07:04:18 -08:00
H.J. Lu	0b850186fd	x86/cet: Check user_shstk in /proc/cpuinfo Linux kernel reports CPU shadow stack feature in /proc/cpuinfo as user_shstk, instead of shstk.	2023-12-17 10:42:06 -08:00
H.J. Lu	4753e92868	x86: Check PT_GNU_PROPERTY early The PT_GNU_PROPERTY segment is scanned before PT_NOTE. For binaries with the PT_GNU_PROPERTY segment, we can check it to avoid scan of the PT_NOTE segment. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-12-11 11:05:46 -08:00
H.J. Lu	7e03e0de7e	sysdeps/x86/Makefile: Split and sort tests Put each test on a separate line and sort tests.	2023-12-11 08:49:57 -08:00
Adhemerval Zanella	4862d546c0	x86: Use dl-symbol-redir-ifunc.h on cpu-tunables The dl-symbol-redir-ifunc.h redirects compiler-generated libcalls to arch-specific memory implementations to avoid ifunc calls where it is not yet possible. The memcmp-isa-default-impl.h aims to fix the same issue by calling the specific memset implementation directly. Using the memcmp symbol directly allows the compiler to inline the memset calls (especially because _dl_tunable_set_hwcaps uses constants values), generating better code. Checked on x86_64-linux-gnu. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-11-21 16:15:42 -03:00
Noah Goldstein	d90b43a4ed	x86: Add support for AVX10 preset and vec size in cpu-features This commit add support for the new AVX10 cpu features: https://cdrdv2-public.intel.com/784267/355989-intel-avx10-spec.pdf We add checks for: - `AVX10`: Check if AVX10 is present. - `AVX10_{X,Y,Z}MM`: Check if a given vec class has AVX10 support. `make check` passes and cpuid output was checked against GNR/DMR on an emulator.	2023-09-29 14:18:42 -05:00
H.J. Lu	1493622f4f	x86: Check the lower byte of EAX of CPUID leaf 2 [BZ #30643 ] The old Intel software developer manual specified that the low byte of EAX of CPUID leaf 2 returned 1 which indicated the number of rounds of CPUDID leaf 2 was needed to retrieve the complete cache information. The newer Intel manual has been changed to that it should always return 1 and be ignored. If the lower byte isn't 1, CPUID leaf 2 can't be used. In this case, we ignore CPUID leaf 2 and use CPUID leaf 4 instead. If CPUID leaf 4 doesn't contain the cache information, cache information isn't available at all. This addresses BZ #30643.	2023-08-29 12:57:41 -07:00
Noah Goldstein	084fb31bc2	x86: Fix incorrect scope of setting `shared_per_thread` [BZ# 30745] The: ``` if (shared_per_thread > 0 && threads > 0) shared_per_thread /= threads; ``` Code was accidentally moved to inside the else scope. This doesn't match how it was previously (before `af992e7abd`). This patch fixes that by putting the division after the `else` block.	2023-08-11 15:33:08 -05:00
Sajan Karumanchi	dcad5c8578	x86: Fix for cache computation on AMD legacy cpus. Some legacy AMD CPUs and hypervisors have the _cpuid_ '0x8000_001D' set to Zero, thus resulting in zeroed-out computed cache values. This patch reintroduces the old way of cache computation as a fail-safe option to handle these exceptions. Fixed 'level4_cache_size' value through handle_amd(). Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com> Tested-by: Florian Weimer <fweimer@redhat.com>	2023-08-06 19:10:42 +05:30
H.J. Lu	1547d6a64f	<sys/platform/x86.h>: Add APX support Add support for Intel Advanced Performance Extensions: https://www.intel.com/content/www/us/en/developer/articles/technical/advanced-performance-extensions-apx.html to <sys/platform/x86.h>.	2023-07-27 08:42:32 -07:00
Noah Goldstein	8b9a0af8ca	[PATCH v1] x86: Use `3/4*sizeof(per-thread-L3)` as low bound for NT threshold. On some machines we end up with incomplete cache information. This can make the new calculation of `sizeof(total-L3)/custom-divisor` end up lower than intended (and lower than the prior value). So reintroduce the old bound as a lower bound to avoid potentially regressing code where we don't have complete information to make the decision. Reviewed-by: DJ Delorie <dj@redhat.com>	2023-07-18 22:34:34 -05:00
Noah Goldstein	47f7472178	x86: Fix slight bug in `shared_per_thread` cache size calculation. After: ``` commit `af992e7abd` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Wed Jun 7 13:18:01 2023 -0500 x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` ``` Split `shared` (cumulative cache size) from `shared_per_thread` (cache size per socket), the `shared_per_thread` can be slightly off from the previous calculation. Previously we added `core` even if `threads_l2` was invalid, and only used `threads_l2` to divide `core` if it was present. The changed version only included `core` if `threads_l2` was valid. This change restores the old behavior if `threads_l2` is invalid by adding the entire value of `core`. Reviewed-by: DJ Delorie <dj@redhat.com>	2023-07-18 20:56:25 -05:00
Siddhesh Poyarekar	c6cb8783b5	configure: Use autoconf 2.71 Bump autoconf requirement to 2.71 to allow regenerating configure on more recent distributions. autoconf 2.71 has been in Fedora since F36 and is the current version in Debian stable (bookworm). It appears to be current in Gentoo as well. All sysdeps configure and preconfigure scripts have also been regenerated; all changes are trivial transformations that do not affect functionality. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-07-17 10:08:10 -04:00
Sergey Bugaev	45e2483a6c	x86: Make dl-cache.h and readelflib.c not Linux-specific These files could be useful to any port that wants to use ld.so.cache. Signed-off-by: Sergey Bugaev <bugaevc@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-06-26 10:04:31 -03:00
Paul Pluzhnikov	4290aed051	Fix misspellings -- BZ 25337	2023-06-19 21:58:33 +00:00
Noah Goldstein	180897c161	x86: Make the divisor in setting `non_temporal_threshold` cpu specific Different systems prefer a different divisors. From benchmarks[1] so far the following divisors have been found: ICX : 2 SKX : 2 BWD : 8 For Intel, we are generalizing that BWD and older prefers 8 as a divisor, and SKL and newer prefers 2. This number can be further tuned as benchmarks are run. [1]: https://github.com/goldsteinn/memcpy-nt-benchmarks Reviewed-by: DJ Delorie <dj@redhat.com>	2023-06-12 11:33:39 -05:00
Noah Goldstein	f193ea20ed	x86: Refactor Intel `init_cpu_features` This patch should have no affect on existing functionality. The current code, which has a single switch for model detection and setting prefered features, is difficult to follow/extend. The cases use magic numbers and many microarchitectures are missing. This makes it difficult to reason about what is implemented so far and/or how/where to add support for new features. This patch splits the model detection and preference setting stages so that CPU preferences can be set based on a complete list of available microarchitectures, rather than based on model magic numbers. Reviewed-by: DJ Delorie <dj@redhat.com>	2023-06-12 11:33:39 -05:00
Noah Goldstein	af992e7abd	x86: Increase `non_temporal_threshold` to roughly `sizeof_L3 / 4` Current `non_temporal_threshold` set to roughly '3/4 * sizeof_L3 / ncores_per_socket'. This patch updates that value to roughly 'sizeof_L3 / 4` The original value (specifically dividing the `ncores_per_socket`) was done to limit the amount of other threads' data a `memcpy`/`memset` could evict. Dividing by 'ncores_per_socket', however leads to exceedingly low non-temporal thresholds and leads to using non-temporal stores in cases where REP MOVSB is multiple times faster. Furthermore, non-temporal stores are written directly to main memory so using it at a size much smaller than L3 can place soon to be accessed data much further away than it otherwise could be. As well, modern machines are able to detect streaming patterns (especially if REP MOVSB is used) and provide LRU hints to the memory subsystem. This in affect caps the total amount of eviction at 1/cache_associativity, far below meaningfully thrashing the entire cache. As best I can tell, the benchmarks that lead this small threshold where done comparing non-temporal stores versus standard cacheable stores. A better comparison (linked below) is to be REP MOVSB which, on the measure systems, is nearly 2x faster than non-temporal stores at the low-end of the previous threshold, and within 10% for over 100MB copies (well past even the current threshold). In cases with a low number of threads competing for bandwidth, REP MOVSB is ~2x faster up to `sizeof_L3`. The divisor of `4` is a somewhat arbitrary value. From benchmarks it seems Skylake and Icelake both prefer a divisor of `2`, but older CPUs such as Broadwell prefer something closer to `8`. This patch is meant to be followed up by another one to make the divisor cpu-specific, but in the meantime (and for easier backporting), this patch settles on `4` as a middle-ground. Benchmarks comparing non-temporal stores, REP MOVSB, and cacheable stores where done using: https://github.com/goldsteinn/memcpy-nt-benchmarks Sheets results (also available in pdf on the github): https://docs.google.com/spreadsheets/d/e/2PACX-1vS183r0rW_jRX6tG_E90m9qVuFiMbRIJvi5VAE8yYOvEOIEEc3aSNuEsrFbuXw5c3nGboxMmrupZD7K/pubhtml Reviewed-by: DJ Delorie <dj@redhat.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-06-12 11:33:39 -05:00
Paul Pluzhnikov	2cbeda847b	Fix a few more typos I missed in previous round -- BZ 25337	2023-06-02 23:46:32 +00:00
Paul Pluzhnikov	65cc53fe7c	Fix misspellings in sysdeps/ -- BZ 25337	2023-05-30 23:02:29 +00:00
Noah Goldstein	ed2f9dc942	x86: Use 64MB as nt-store threshold if no cacheinfo [BZ #30429 ] If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, it almost certainly means we failed to read the systems cache info. In this case, rather than defaulting the minimum correct value, we should default to a value that gets at least reasonable performance. 64MB is chosen conservatively to be at the very high end. This should never cause non-temporal stores when, if we had read cache info, we wouldn't have otherwise. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2023-05-27 21:32:57 -05:00
H.J. Lu	81a3cc956e	<sys/platform/x86.h>: Add PREFETCHI support Add PREFETCHI support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	b05521c916	<sys/platform/x86.h>: Add AMX-COMPLEX support Add AMX-COMPLEX support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	609b7b2d3c	<sys/platform/x86.h>: Add AVX-NE-CONVERT support Add AVX-NE-CONVERT support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	4c120c88a6	<sys/platform/x86.h>: Add AVX-VNNI-INT8 support Add AVX-VNNI-INT8 support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00

1 2 3 4 5 ...

494 Commits