glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-12-11 22:00:08 +00:00

Author	SHA1	Message	Date
H.J. Lu	f3a99b2216	x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used. (cherry picked from commit `ceeffe968c`)	2022-04-26 18:18:16 -07:00
Noah Goldstein	cecbac5212	x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h No bug. This patch doubles the rep_movsb_threshold when using ERMS. Based on benchmarks the vector copy loop, especially now that it handles 4k aliasing, is better for these medium ranged. On Skylake with ERMS: Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) 4096, 0, 0, 0, 0.975 4096, 0, 0, 1, 0.953 4096, 12, 0, 0, 0.969 4096, 12, 0, 1, 0.872 4096, 44, 0, 0, 0.979 4096, 44, 0, 1, 0.83 4096, 0, 12, 0, 1.006 4096, 0, 12, 1, 0.989 4096, 0, 44, 0, 0.739 4096, 0, 44, 1, 0.942 4096, 12, 12, 0, 1.009 4096, 12, 12, 1, 0.973 4096, 44, 44, 0, 0.791 4096, 44, 44, 1, 0.961 4096, 2048, 0, 0, 0.978 4096, 2048, 0, 1, 0.951 4096, 2060, 0, 0, 0.986 4096, 2060, 0, 1, 0.963 4096, 2048, 12, 0, 0.971 4096, 2048, 12, 1, 0.941 4096, 2060, 12, 0, 0.977 4096, 2060, 12, 1, 0.949 8192, 0, 0, 0, 0.85 8192, 0, 0, 1, 0.845 8192, 13, 0, 0, 0.937 8192, 13, 0, 1, 0.939 8192, 45, 0, 0, 0.932 8192, 45, 0, 1, 0.927 8192, 0, 13, 0, 0.621 8192, 0, 13, 1, 0.62 8192, 0, 45, 0, 0.53 8192, 0, 45, 1, 0.516 8192, 13, 13, 0, 0.664 8192, 13, 13, 1, 0.659 8192, 45, 45, 0, 0.593 8192, 45, 45, 1, 0.575 8192, 2048, 0, 0, 0.854 8192, 2048, 0, 1, 0.834 8192, 2061, 0, 0, 0.863 8192, 2061, 0, 1, 0.857 8192, 2048, 13, 0, 0.63 8192, 2048, 13, 1, 0.629 8192, 2061, 13, 0, 0.627 8192, 2061, 13, 1, 0.62 Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `475b63702e`)	2022-04-26 18:18:16 -07:00
H.J. Lu	a182bb7a39	x86-64: Remove Prefer_AVX2_STRCMP Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1 VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1 VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake. (cherry picked from commit `14dbbf46a0`)	2022-04-26 18:18:16 -07:00
Noah Goldstein	b5a44a6a47	x86: Modify ENTRY in sysdep.h so that p2align can be specified No bug. This change adds a new macro ENTRY_P2ALIGN which takes a second argument, log2 of the desired function alignment. The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this doesn't affect any existing functionality. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `fc5bd179ef`)	2022-04-26 18:18:16 -07:00
Noah Goldstein	15b00d2af0	x86: Fix TEST_NAME to make it a string in tst-strncmp-rtm.c Previously TEST_NAME was passing a function pointer. This didn't fail because of the -Wno-error flag (to allow for overflow sizes passed to strncmp/wcsncmp) Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `b98d0bbf74`)	2022-02-18 15:34:40 -08:00
Noah Goldstein	d093b677c3	x86: Test wcscmp RTM in the wcsncmp overflow case [BZ #28896 ] In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `7835d611af`)	2022-02-18 14:59:54 -08:00
Noah Goldstein	38e0d24794	x86: Fallback {str\|wcs}cmp RTM in the ncmp overflow case [BZ #28896 ] In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `c627209832`)	2022-02-18 14:59:47 -08:00
H.J. Lu	aa601d0244	x86: Use CHECK_FEATURE_PRESENT to check HLE [BZ #27398 ] HLE is disabled on blacklisted CPUs. Use CHECK_FEATURE_PRESENT, instead of CHECK_FEATURE_ACTIVE, to check HLE. (cherry picked from commit `501246c5e2`)	2022-02-01 05:44:27 -08:00
H.J. Lu	b952c25dc7	x86: Black list more Intel CPUs for TSX [BZ #27398 ] Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html This fixes BZ #27398. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com> (cherry picked from commit `1e000d3d33`)	2022-02-01 05:44:12 -08:00
Aurelien Jarno	1d401d1fcc	x86: use default cache size if it cannot be determined [BZ #28784 ] In some cases (e.g QEMU, non-Intel/AMD CPU) the cache information can not be retrieved and the corresponding values are set to 0. Commit `2d651eb926` ("x86: Move x86 processor cache info to cpu_features") changed the behaviour in such case by defining the __x86_shared_cache_size and __x86_data_cache_size variables to 0 instead of using the default values. This cause an issue with the i686 SSE2 optimized bzero/routine which assumes that the cache size is at least 128 bytes, and otherwise tries to zero/set the whole address space minus 128 bytes. Fix that by restoring the original code to only update __x86_shared_cache_size and __x86_data_cache_size variables if the corresponding cache sizes are not zero. Fixes bug 28784 Fixes commit `2d651eb926` Reviewed-by: H.J. Lu <hjl.tools@gmail.com> (cherry picked from commit `c242fcce06`)	2022-01-17 19:46:24 +01:00
Aurelien Jarno	03de6917bd	elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704 ] On KVM guests running on some AMD systems, the IBRS feature is reported as a synthetic feature using the Intel feature, while the cpuinfo entry keeps the same. Handle that by first checking the presence of the Intel feature on AMD systems. Fixes bug 28704. (cherry picked from commit `94058f6cde`)	2021-12-17 20:35:45 +01:00
H.J. Lu	91cc803d27	x86-64: Add Avoid_Short_Distance_REP_MOVSB commit `3ec5d83d2a` Author: H.J. Lu <hjl.tools@gmail.com> Date: Sat Jan 25 14:19:40 2020 -0800 x86-64: Avoid rep movsb with short distance [BZ #27130] introduced some regressions on Intel processors without Fast Short REP MOV (FSRM). Add Avoid_Short_Distance_REP_MOVSB to avoid rep movsb with short distance only on Intel processors with FSRM. bench-memmove-large on Skylake server shows that cycles of __memmove_evex_unaligned_erms improves for the following data size: before after Improvement length=4127, align1=3, align2=0: 479.38 349.25 27% length=4223, align1=9, align2=5: 405.62 333.25 18% length=8223, align1=3, align2=0: 786.12 496.38 37% length=8319, align1=9, align2=5: 727.50 501.38 31% length=16415, align1=3, align2=0: 1436.88 840.00 41% length=16511, align1=9, align2=5: 1375.50 836.38 39% length=32799, align1=3, align2=0: 2890.00 1860.12 36% length=32895, align1=9, align2=5: 2891.38 1931.88 33%	2021-07-28 13:23:57 -07:00
H.J. Lu	7c124e3714	x86: Install <bits/platform/x86.h> [BZ #27958 ] 1. Install <bits/platform/x86.h> for <sys/platform/x86.h> which includes <bits/platform/x86.h>. 2. Rename HAS_CPU_FEATURE to CPU_FEATURE_PRESENT which checks if the processor has the feature. 3. Rename CPU_FEATURE_USABLE to CPU_FEATURE_ACTIVE which checks if the feature is active. There may be other preconditions, like sufficient stack space or further setup for AMX, which must be satisfied before the feature can be used. This fixes BZ #27958. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-07-23 05:12:51 -07:00
Siddhesh Poyarekar	5b8d271571	Fix build and tests with --disable-tunables Remove unused code and declare __libc_mallopt when !IS_IN (libc) to allow the debug hook to build with --disable-tunables. Also, run tst-ifunc-isa-2* tests only when tunables are enabled since the result depends on it. Tested on x86_64. Reported-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-07-23 13:57:56 +05:30
Adhemerval Zanella	469761eac8	elf: Fix tst-cpu-features-cpuinfo on some AMD systems (BZ #28090 ) The SSBD feature is implemented in 2 different ways on AMD processors: newer systems (Zen3) provides AMD_SSBD (function 8000_0008, EBX[24]), while older system provides AMD_VIRT_SSBD (function 8000_0008, EBX[25]). However for AMD_VIRT_SSBD, kernel shows both 'ssdb' and 'virt_ssdb' on /proc/cpuinfo; while for AMD_SSBD only 'ssdb' is provided. This now check is AMD_SSBD is set to check for 'ssbd', otherwise check if AMD_VIRT_SSDB is set to check for 'virt_ssbd'. Checked on x86_64-linux-gnu on a Ryzen 9 5900x. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-07-19 14:12:29 -03:00
H.J. Lu	ea8e465a6b	x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033 ] From https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html * Intel TSX will be disabled by default. * The processor will force abort all Restricted Transactional Memory (RTM) transactions by default. * A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated, which is set to indicate to updated software that the loaded microcode is forcing RTM abort. * On processors that enumerate support for RTM, the CPUID enumeration bits for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to be set by default after microcode update. * Workloads that were benefited from Intel TSX might experience a change in performance. * System software may use a new bit in Model-Specific Register (MSR) 0x10F TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock Elision (HLE) and RTM bits to indicate to software that Intel TSX is disabled. 1. Add RTM_ALWAYS_ABORT to CPUID features. 2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set. This skips the string/tst-memchr-rtm etc. testcases on the affected processors, which always fail after a microcde update. 3. Check RTM feature, instead of usability, against /proc/cpuinfo. This fixes BZ #28033.	2021-07-01 10:47:35 -07:00
Adhemerval Zanella	e3e3eb0a2e	x86: Fix tst-cpu-features-cpuinfo on Ryzen 9 (BZ #27873 ) AMD define different flags for IRPB, IBRS, and STIPBP [1], so new x86_64_cpu are added and IBRS_IBPB is only tested for Intel. The SSDB is also defined and implemented different on AMD [2], and also a new AMD_SSDB flag is added. It should map to the cpuinfo 'ssdb' on recent AMD cpus. It fixes tst-cpu-features-cpuinfo and tst-cpu-features-cpuinfo-static on recent AMD cpus. Checked on x86_64-linux-gnu on AMD Ryzen 9 5900X. [1] https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf [2] https://bugzilla.kernel.org/show_bug.cgi?id=199889 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-06-24 09:57:46 -03:00
H.J. Lu	ea26ff0322	x86: Copy IBT and SHSTK usable only if CET is enabled IBT and SHSTK usable bits are copied from CPUID feature bits and later cleared if kernel doesn't support CET. Copy IBT and SHSTK usable only if CET is enabled so that they aren't set on CET capable processors with non-CET enabled glibc.	2021-06-23 17:35:47 -07:00
Florian Weimer	6f1c701026	dlfcn: Cleanups after -ldl is no longer required This commit removes the ELF constructor and internal variables from dlfcn/dlfcn.c. The file now serves the same purpose as nptl/libpthread-compat.c, so it is renamed to dlfcn/libdl-compat.c. The use of libdl-shared-only-routines ensures that libdl.a is empty. This commit adjusts the test suite not to use $(libdl). The libdl.so symbolic link is no longer installed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-06-03 09:11:45 +02:00
H.J. Lu	79aec84102	Properly check stack alignment [BZ #27901 ] 1. Replace if ((((uintptr_t) &_d) & (__alignof (double) - 1)) != 0) which may be optimized out by compiler, with int __attribute__ ((weak, noclone, noinline)) is_aligned (void *p, int align) { return (((uintptr_t) p) & (align - 1)) != 0; } 2. Add TEST_STACK_ALIGN_INIT to TEST_STACK_ALIGN. 3. Add a common TEST_STACK_ALIGN_INIT to check 16-byte stack alignment for both i386 and x86-64. 4. Update powerpc to use TEST_STACK_ALIGN_INIT. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-05-24 07:42:12 -07:00
H.J. Lu	cf2c57526b	x86: Set rep_movsb_threshold to 2112 on processors with FSRM The glibc memcpy benchmark on Intel Core i7-1065G7 (Ice Lake) showed that REP MOVSB became faster after 2112 bytes: Vector Move REP MOVSB length=2112, align1=0, align2=0: 24.20 24.40 length=2112, align1=1, align2=0: 26.07 23.13 length=2112, align1=0, align2=1: 27.18 28.13 length=2112, align1=1, align2=1: 26.23 25.16 length=2176, align1=0, align2=0: 23.18 22.52 length=2176, align1=2, align2=0: 25.45 22.52 length=2176, align1=0, align2=2: 27.14 27.82 length=2176, align1=2, align2=2: 22.73 25.56 length=2240, align1=0, align2=0: 24.62 24.25 length=2240, align1=3, align2=0: 29.77 27.15 length=2240, align1=0, align2=3: 35.55 29.93 length=2240, align1=3, align2=3: 34.49 25.15 length=2304, align1=0, align2=0: 34.75 26.64 length=2304, align1=4, align2=0: 32.09 22.63 length=2304, align1=0, align2=4: 28.43 31.24 Use REP MOVSB for data size > 2112 bytes in memcpy on processors with fast short REP MOVSB (FSRM). * sysdeps/x86/dl-cacheinfo.h (dl_init_cacheinfo): Set rep_movsb_threshold to 2112 on processors with fast short REP MOVSB (FSRM).	2021-05-03 05:08:22 -07:00
H.J. Lu	7fc9152e83	x86: tst-cpu-features-supports.c: Update AMX check Pass "amx-bf16", "amx-int8" and "amx-tile", instead of "amx_bf16", "amx_int8" and "amx_tile", to __builtin_cpu_supports for GCC 11.	2021-04-22 10:09:49 -07:00
Florian Weimer	81dfc6694c	nptl: Remove longjmp, siglongjmp from libpthread The definitions in libc are sufficient, the forwarders are no longer needed. The symbols have been moved using scripts/move-symbol-to-libc.py. s390-linux-gnu and s390x-linux-gnu need a new version placeholder to keep the GLIBC_2.19 symbol version in libpthread. Tested on i386-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu, x86_64-linux-gnu. Built with build-many-glibcs.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-21 19:49:50 +02:00
Siddhesh Poyarekar	abadbef5c8	Move __isnanf128 to libc.so All of the isnan functions are in libc.so due to printf_fp, so move __isnanf128 there too for consistency. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-03-30 14:58:19 +05:30
H.J. Lu	4bd660be40	x86: Add string/memory function tests in RTM region At function exit, AVX optimized string/memory functions have VZEROUPPER which triggers RTM abort. When such functions are called inside a transactionally executing RTM region, RTM abort causes severe performance degradation. Add tests to verify that string/memory functions won't cause RTM abort in RTM region.	2021-03-29 07:40:17 -07:00
H.J. Lu	1da50d4bda	x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP 1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. 2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.	2021-03-29 07:40:17 -07:00
H.J. Lu	27f7463675	x86: Properly disable XSAVE related features [BZ #27605 ] 1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE. 2. Disable all features which depend on XSAVE: a. If OSXSAVE is disabled by glibc tunables. Or b. If both XSAVE and XSAVEC aren't usable.	2021-03-29 06:04:17 -07:00
Samuel Thibault	16b597807d	elf: Fix not compiling ifunc tests that need gcc ifunc support	2021-03-24 01:52:46 +01:00
Siddhesh Poyarekar	941ea10f80	Build get-cpuid-feature-leaf.c without stack-protector [BZ #27555 ] __x86_get_cpuid_feature_leaf is called during early startup, before the stack check guard is initialized and is hence not safe to build with stack-protector. Additionally, IFUNC resolvers for static tst-ifunc-isa tests get called too early for stack protector to be useful, so fix them to disable stack protector for the resolver functions. This fixes all failures seen with --enable-stack-protector=all configuration. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-03-15 20:24:45 +05:30
H.J. Lu	f53ffc9b90	x86: Handle _SC_LEVEL1_ICACHE_LINESIZE [BZ #27444 ] commit `2d651eb926` Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Sep 18 07:55:14 2020 -0700 x86: Move x86 processor cache info to cpu_features missed _SC_LEVEL1_ICACHE_LINESIZE. 1. Add level1_icache_linesize to struct cpu_features. 2. Initialize level1_icache_linesize by calling handle_intel, handle_zhaoxin and handle_amd with _SC_LEVEL1_ICACHE_LINESIZE. 3. Return level1_icache_linesize for _SC_LEVEL1_ICACHE_LINESIZE. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-03-15 05:43:26 -07:00
H.J. Lu	339bf918ea	x86: Set minimum x86-64 level marker [BZ #27318 ] Since the full ISA set used in an ELF binary is unknown to compiler, an x86-64 ISA level marker indicates the minimum, not maximum, ISA set required to run such an ELF binary. We never guarantee a library with an x86-64 ISA level v3 marker doesn't contain other ISAs beyond x86-64 ISA level v3, like AVX VNNI. We check the x86-64 ISA level marker for the minimum ISA set. Since -march=sandybridge enables only some ISAs in x86-64 ISA level v3, we should set the needed ISA marker to v2. Otherwise, libc is compiled with -march=sandybridge will fail to run on Sandy Bridge: $ ./elf/ld.so ./libc.so ./libc.so: (p) CPU ISA level is lower than required: needed: 7; got: 3 Set the minimum, instead of maximum, x86-64 ISA level marker should have no impact on the glibc-hwcaps directory assignment logic in ldconfig nor ld.so.	2021-03-06 07:49:30 -08:00
Florian Weimer	01a5746b6c	x86: Add CPU-specific diagnostics to ld.so --list-diagnostics	2021-03-02 15:01:10 +01:00
Florian Weimer	e4933c8a92	x86: Automate generation of PREFERRED_FEATURE_INDEX_1 bitfield Use a .def file to define the bitfield layout, so that it is possible to iterate over field members using the preprocessor.	2021-03-02 15:01:06 +01:00
H.J. Lu	89de9d3958	x86: Use x86/nptl/pthreaddef.h 1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h. 2. Remove sysdeps/x86_64/nptl/pthreaddef.h. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-02-22 15:52:56 -08:00
Florian Weimer	feb741bb81	x86: Remove unused variables for raw cache sizes from cacheinfo.h	2021-02-22 17:36:03 +01:00
H.J. Lu	ba230b6387	<bits/platform/x86.h>: Correct x86_cpu_TBM x86_cpu_TBM should be x86_cpu_index_80000001_ecx + 21.	2021-02-22 04:31:51 -08:00
H.J. Lu	ce4a94b12e	x86: Remove the extra space between "# endif" Remove the extra space between "# endif" left over from commit `f380868f6d` Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Dec 24 15:43:34 2020 -0800 Remove _ISOMAC check from <cpu-features.h>	2021-02-12 07:50:29 -08:00
Siddhesh Poyarekar	a1b8b06a55	x86: Use SIZE_MAX instead of (long int)-1 for tunable range value The tunable types are SIZE_T, so set the ranges to the correct maximum value, i.e. SIZE_MAX.	2021-02-10 19:08:33 +05:30
Siddhesh Poyarekar	61117bfa1b	tunables: Simplify TUNABLE_SET interface The TUNABLE_SET interface took a primitive C type argument, which resulted in inconsistent type conversions internally due to incorrect dereferencing of types, especialy on 32-bit architectures. This change simplifies the TUNABLE setting logic along with the interfaces. Now all numeric tunable values are stored as signed numbers in tunable_num_t, which is intmax_t. All calls to set tunables cast the input value to its primitive type and then to tunable_num_t for storage. This relies on gcc-specific (although I suspect other compilers woul also do the same) unsigned to signed integer conversion semantics, i.e. the bit pattern is conserved. The reverse conversion is guaranteed by the standard.	2021-02-10 19:08:33 +05:30
H.J. Lu	5ab25c8875	x86: Add PTWRITE feature detection [BZ #27346 ] 1. Add CPUID_INDEX_14_ECX_0 for CPUID leaf 0x14 to detect PTWRITE feature in EBX of CPUID leaf 0x14 with ECX == 0. 2. Add PTWRITE detection to CPU feature tests. 3. Add 2 static CPU feature tests.	2021-02-07 08:01:14 -08:00
Sajan Karumanchi	6e02b3e932	x86: Adding an upper bound for Enhanced REP MOVSB. In the process of optimizing memcpy for AMD machines, we have found the vector move operations are outperforming enhanced REP MOVSB for data transfers above the L2 cache size on Zen3 architectures. To handle this use case, we are adding an upper bound parameter on enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'. As per large-bench results, we are configuring this parameter to the L2 cache size for AMD machines and applicable from Zen3 architecture supporting the ERMS feature. For architectures other than AMD, it is the computed value of non-temporal threshold parameter. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>	2021-02-02 12:42:15 +01:00
H.J. Lu	6c57d32048	sysconf: Add _SC_MINSIGSTKSZ/_SC_SIGSTKSZ [BZ #20305 ] Add _SC_MINSIGSTKSZ for the minimum signal stack size derived from AT_MINSIGSTKSZ, which is the minimum number of bytes of free stack space required in order to gurantee successful, non-nested handling of a single signal whose handler is an empty function, and _SC_SIGSTKSZ which is the suggested minimum number of bytes of stack space required for a signal stack. If AT_MINSIGSTKSZ isn't available, sysconf (_SC_MINSIGSTKSZ) returns MINSIGSTKSZ. On Linux/x86 with XSAVE, the signal frame used by kernel is composed of the following areas and laid out as: ------------------------------ \| alignment padding \| ------------------------------ \| xsave buffer \| ------------------------------ \| fsave header (32-bit only) \| ------------------------------ \| siginfo + ucontext \| ------------------------------ Compute AT_MINSIGSTKSZ value as size of xsave buffer + size of fsave header (32-bit only) + size of siginfo and ucontext + alignment padding. If _SC_SIGSTKSZ_SOURCE or _GNU_SOURCE are defined, MINSIGSTKSZ and SIGSTKSZ are redefined as /* Default stack size for a signal handler: sysconf (SC_SIGSTKSZ). / # undef SIGSTKSZ # define SIGSTKSZ sysconf (_SC_SIGSTKSZ) / Minimum stack size for a signal handler: SIGSTKSZ. */ # undef MINSIGSTKSZ # define MINSIGSTKSZ SIGSTKSZ Compilation will fail if the source assumes constant MINSIGSTKSZ or SIGSTKSZ. The reason for not simply increasing the kernel's MINSIGSTKSZ #define (apart from the fact that it is rarely used, due to glibc's shadowing definitions) was that userspace binaries will have baked in the old value of the constant and may be making assumptions about it. For example, the type (char [MINSIGSTKSZ]) changes if this #define changes. This could be a problem if an newly built library tries to memcpy() or dump such an object defined by and old binary. Bounds-checking and the stack sizes passed to things like sigaltstack() and makecontext() could similarly go wrong.	2021-02-01 11:00:52 -08:00
H.J. Lu	04dff6fc0d	x86: Properly set usable CET feature bits [BZ #26625 ] commit `94cd37ebb2` Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Sep 16 05:27:32 2020 -0700 x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625] broke GLIBC_TUNABLES=glibc.cpu.hwcaps=-IBT,-SHSTK since it can no longer disable IBT nor SHSTK. Handle IBT and SHSTK with: 1. Revert commit `94cd37ebb2`. 2. Clears the usable CET feature bits if kernel doesn't support CET. 3. Add GLIBC_TUNABLES tests without dlopen. 4. Add tests to verify that CPU_FEATURE_USABLE on IBT and SHSTK matches _get_ssp. 5. Update GLIBC_TUNABLES tests with dlopen to verify that CET is disabled with GLIBC_TUNABLES. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-01-29 03:58:11 -08:00
Andreas Schwab	31f6488722	Fix misplaced const Constify __x86_cacheinfo_p and __x86_cpu_features_p, not their pointer target types.	2021-01-25 15:09:02 +01:00
H.J. Lu	5f478eb0fb	x86: Properly match CPU features in /proc/cpuinfo [BZ #27222 ] Search " YYY " and " YYY\n", instead of "YYY", to avoid matching "XXXYYYZZZ" with "YYY". Update /proc/cpuinfo CPU feature names: /proc/cpuinfo glibc ------------------------------------------------ avx512vbmi AVX512_VBMI dts DS pni SSE3 tsc_deadline_timer TSC_DEADLINE	2021-01-22 10:15:46 -08:00
H.J. Lu	7a5ab88e21	x86: Check ifunc resolver with CPU_FEATURE_USABLE [BZ #27072 ] Check ifunc resolver with CPU_FEATURE_USABLE and tunables in dynamic and static executables to verify that CPUID features are initialized early in static PIE. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-21 10:22:26 -08:00
Szabolcs Nagy	47618209d0	Use hidden visibility for early static PIE code Extern symbol access in position independent code usually involves GOT indirection which needs RELATIVE reloc in a static linked PIE. (On some targets this is avoided e.g. because the linker can relax a GOT access to a pc-relative access, but this is not generally true.) Code that runs before static PIE self relocation must avoid relying on dynamic relocations which can be ensured by using hidden visibility. However we cannot just make all symbols hidden: On i386, all calls to IFUNC functions must go through PLT and calls to hidden functions CANNOT go through PLT in PIE since EBX used in PIE PLT may not be set up for local calls to hidden IFUNC functions. This patch aims to make symbol references hidden in code that is used before and by _dl_relocate_static_pie when building a static PIE libc. Note: for an object that is used in the startup code, its references and definition may not have consistent visibility: it is only forced hidden in the startup code. This is needed for fixing bug 27072. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-21 15:55:01 +00:00
H.J. Lu	ff6d62e9ed	<sys/platform/x86.h>: Remove the C preprocessor magic In <sys/platform/x86.h>, define CPU features as enum instead of using the C preprocessor magic to make it easier to wrap this functionality in other languages. Move the C preprocessor magic to internal header for better GCC codegen when more than one features are checked in a single expression as in x86-64 dl-hwcaps-subdirs.c. 1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX. 2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h. 3. Remove struct cpu_features and __x86_get_cpu_features from <sys/platform/x86.h>. 4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it in libc. 5. Make __get_cpu_features() private to glibc. 6. Replace __x86_get_cpu_features(N) with __get_cpu_features(). 7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE. 8. Use a single enum index for each CPU feature detection. 9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf. 10. Return zero struct cpuid_feature for the older glibc binary with a smaller CPUID_INDEX_MAX [BZ #27104]. 11. Inside glibc, use the C preprocessor magic so that cpu_features data can be loaded just once leading to more compact code for glibc. 256 bits are used for each CPUID leaf. Some leaves only contain a few features. We can add exceptions to such leaves. But it will increase code sizes and it is harder to provide backward/forward compatibilities when new features are added to such leaves in the future. When new leaves are added, _rtld_global_ro offsets will change which leads to race condition during in-place updates. We may avoid in-place updates by 1. Rename the old glibc. 2. Install the new glibc. 3. Remove the old glibc. NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy relocation issue with IFUNC resolver as shown in IFUNC resolver tests.	2021-01-21 05:58:17 -08:00
H.J. Lu	2d651eb926	x86: Move x86 processor cache info to cpu_features 1. Move x86 processor cache info to _dl_x86_cpu_features in ld.so. 2. Update tunable bounds with TUNABLE_SET_WITH_BOUNDS. 3. Move x86 cache info initialization to dl-cacheinfo.h and initialize x86 cache info in init_cpu_features (). 4. Put x86 cache info for libc in cacheinfo.h, which is included in libc-start.c in libc.a and is included in cacheinfo.c in libc.so. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-14 11:38:45 -08:00
Adhemerval Zanella	d18f59bf92	Fix x86 build with --enable-tunable=no Checked on x86_64-linux-gnu.	2021-01-14 16:04:05 -03:00

1 2 3 4 5 ...

355 Commits