glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-22 04:50:07 +00:00

Author	SHA1	Message	Date
Noah Goldstein	ed2f9dc942	x86: Use 64MB as nt-store threshold if no cacheinfo [BZ #30429 ] If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, it almost certainly means we failed to read the systems cache info. In this case, rather than defaulting the minimum correct value, we should default to a value that gets at least reasonable performance. 64MB is chosen conservatively to be at the very high end. This should never cause non-temporal stores when, if we had read cache info, we wouldn't have otherwise. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2023-05-27 21:32:57 -05:00
H.J. Lu	81a3cc956e	<sys/platform/x86.h>: Add PREFETCHI support Add PREFETCHI support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	b05521c916	<sys/platform/x86.h>: Add AMX-COMPLEX support Add AMX-COMPLEX support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	609b7b2d3c	<sys/platform/x86.h>: Add AVX-NE-CONVERT support Add AVX-NE-CONVERT support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	4c120c88a6	<sys/platform/x86.h>: Add AVX-VNNI-INT8 support Add AVX-VNNI-INT8 support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	b39741b45f	<sys/platform/x86.h>: Add MSRLIST support Add MSRLIST support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	96037c697d	<sys/platform/x86.h>: Add AVX-IFMA support Add AVX-IFMA support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	8b4cc05eab	<sys/platform/x86.h>: Add AMX-FP16 support Add AMX-FP16 support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	227983551d	<sys/platform/x86.h>: Add WRMSRNS support Add WRMSRNS support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	a00db8305d	<sys/platform/x86.h>: Add ArchPerfmonExt support Add Architectural Performance Monitoring Extended Leaf (EAX = 23H) support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	2f02d0d8e1	<sys/platform/x86.h>: Add CMPCCXADD support Add CMPCCXADD support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	aa528a579b	<sys/platform/x86.h>: Add LASS support Add Linear Address Space Separation (LASS) support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	231bf916ce	<sys/platform/x86.h>: Add RAO-INT support Add RAO-INT support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	fb90dc8513	<sys/platform/x86.h>: Add LBR support Add architectural LBR support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	f47b7d96fb	<sys/platform/x86.h>: Add RTM_FORCE_ABORT support Add RTM_FORCE_ABORT support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	f6790a489d	<sys/platform/x86.h>: Add SGX-KEYS support Add SGX-KEYS support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	09cc5fee21	<sys/platform/x86.h>: Add BUS_LOCK_DETECT support Add Bus lock debug exceptions (BUS_LOCK_DETECT) support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	8c8e391166	<sys/platform/x86.h>: Add LA57 support Add 57-bit linear addresses and five-level paging (LA57) support to <sys/platform/x86.h>. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
H.J. Lu	2d8c590a5e	<bits/platform/x86.h>: Rename to x86_cpu_INDEX_7_ECX_15 Rename x86_cpu_INDEX_7_ECX_1 to x86_cpu_INDEX_7_ECX_15 for the unused bit 15 in ECX from CPUID with EAX == 0x7 and ECX == 0. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-04-05 14:46:10 -07:00
Andreas Schwab	856bab7717	x86/dl-cacheinfo: remove unsused parameter from handle_amd Also replace an unreachable assert with __builtin_unreachable.	2023-04-04 16:16:21 +02:00
H.J. Lu	743113d42e	x86: Set FSGSBASE to active if enabled by kernel Linux kernel uses AT_HWCAP2 to indicate if FSGSBASE instructions are enabled. If the HWCAP2_FSGSBASE bit in AT_HWCAP2 is set, FSGSBASE instructions can be used in user space. Define dl_check_hwcap2 to set the FSGSBASE feature to active on Linux when the HWCAP2_FSGSBASE bit is set. Add a test to verify that FSGSBASE is active on current kernels. NB: This test will fail if the kernel doesn't set the HWCAP2_FSGSBASE bit in AT_HWCAP2 while fsgsbase shows up in /proc/cpuinfo. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2023-04-03 11:36:48 -07:00
Adhemerval Zanella Netto	33237fe83d	Remove --enable-tunables configure option And make always supported. The configure option was added on glibc 2.25 and some features require it (such as hwcap mask, huge pages support, and lock elisition tuning). It also simplifies the build permutations. Changes from v1: * Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs more discussion. * Cleanup more code. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-03-29 14:33:06 -03:00
DJ Delorie	db9b47e9f9	x86: Don't check PREFETCHWT1 in tst-cpu-features-cpuinfo.c Don't check PREFETCHWT1 against /proc/cpuinfo since kernel doesn't report PREFETCHWT1 in /proc/cpuinfo. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2023-03-21 17:49:49 -04:00
caiyinyu	4c721f24fc	x86: Fix bug about glibc.cpu.hwcaps. Recorded in [BZ #30183]: 1. export GLIBC_TUNABLES=glibc.cpu.hwcaps=-AVX512 2. Add _dl_printf("p -- %s\n", p); just before switch(nl) in sysdeps/x86/cpu-tunables.c 3. compiled and run ./testrun.sh /usr/bin/ls you will get: p -- -AVX512 p -- LC_ADDRESS=en_US.UTF-8 p -- LC_NUMERIC=C ... The function, TUNABLE_CALLBACK (set_hwcaps) (tunable_val_t *valp), checks far more than it should and it should stop at end of "-AVX512".	2023-03-07 21:42:25 +08:00
H.J. Lu	317f1c0a8a	x86-64: Add glibc.cpu.prefer_map_32bit_exec [BZ #28656 ] Crossing 2GB boundaries with indirect calls and jumps can use more branch prediction resources on Intel Golden Cove CPU (see the "Misprediction for Branches >2GB" section in Intel 64 and IA-32 Architectures Optimization Reference Manual.) There is visible performance improvement on workloads with many PLT calls when executable and shared libraries are mmapped below 2GB. Add the Prefer_MAP_32BIT_EXEC bit so that mmap will try to map executable or denywrite pages in shared libraries with MAP_32BIT first. NB: Prefer_MAP_32BIT_EXEC reduces bits available for address space layout randomization (ASLR), which is always disabled for SUID programs and can only be enabled by the tunable, glibc.cpu.prefer_map_32bit_exec, or the environment variable, LD_PREFER_MAP_32BIT_EXEC. This works only between shared libraries or between shared libraries and executables with addresses below 2GB. PIEs are usually loaded at a random address above 4GB by the kernel.	2023-02-22 18:28:37 -08:00
Adhemerval Zanella	a9b3b770f5	string: Remove string_private.h Now that _STRING_ARCH_unaligned is not used anymore. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2023-02-17 15:56:54 -03:00
Samuel Thibault	bfb583e791	htl: Generalize i386 pt-machdep.h to x86	2023-02-12 16:33:39 +01:00
Sajan Karumanchi	103a469dc7	x86: Cache computation for AMD architecture. All AMD architectures cache details will be computed based on __cpuid__ `0x8000_001D` and the reference to __cpuid__ `0x8000_0006` will be zeroed out for future architectures. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>	2023-01-18 19:28:54 +01:00
Joseph Myers	6d7e8eda9b	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
H.J. Lu	48b74865c6	x86: Check minimum/maximum of non_temporal_threshold [BZ #29953 ] The minimum non_temporal_threshold is 0x4040. non_temporal_threshold may be set to less than the minimum value when the shared cache size isn't available (e.g., in an emulator) or by the tunable. Add checks for minimum and maximum of non_temporal_threshold. This fixes BZ #29953.	2023-01-03 13:25:50 -08:00
Adhemerval Zanella	8d6083717c	x86_64: State assembler is being tested on sysdeps/x86/configure	2022-12-06 13:47:47 -03:00
Adhemerval Zanella	2b0da5028d	configure: Remove AS check The assembler is not issued directly, but rather always through CC wrapper. The binutils version check if done with LD instead. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2022-12-06 09:40:19 -03:00
Javier Pello	ab40f20364	elf: Remove _dl_string_hwcap Removal of legacy hwcaps support from the dynamic loader left no users of _dl_string_hwcap. Signed-off-by: Javier Pello <devel@otheo.eu> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-10-06 07:59:48 -03:00
Aurelien Jarno	7e8283170c	x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk' instruction which belongs to the BMI1 CPU feature and the 'shrx' instruction, which belongs to the BMI2 CPU feature. Fixes: `df7e295d18` ("x86: Optimize {str\|wcs}rchr-avx2") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	3c0c78afab	x86-64: Require BMI2 and LZCNT for AVX2 memrchr implementation The AVX2 memrchr implementation uses the 'shlxl' instruction, which belongs to the BMI2 CPU feature and uses the 'lzcnt' instruction, which belongs to the LZCNT CPU feature. Fixes: `af5306a735` ("x86: Optimize memrchr-avx2.S") Partially resolves: BZ #29611 Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Aurelien Jarno	b80f16adbd	x86: include BMI1 and BMI2 in x86-64-v3 level The "System V Application Binary Interface AMD64 Architecture Processor Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3 level. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-10-03 23:46:11 +02:00
Joseph Myers	3e5760fcb4	Update _FloatN header support for C++ in GCC 13 GCC 13 adds support for _FloatN and _FloatNx types in C++, so breaking the installed glibc headers that assume such support is not present. GCC mostly works around this with fixincludes, but that doesn't help for building glibc and its tests (glibc doesn't itself contain C++ code, but there's C++ code built for tests). Update glibc's bits/floatn-common.h and bits/floatn.h headers to handle the GCC 13 support directly. In general the changes match those made by fixincludes, though I think the ones in sysdeps/powerpc/bits/floatn.h, where the header tests __LDBL_MANT_DIG__ == 113 or uses #elif, wouldn't match the existing fixincludes patterns. Some places involving special C++ handling in relation to _FloatN support are not changed. There's no need to change the __HAVE_FLOATN_NOT_TYPEDEF definition (also in a form that wouldn't be matched by the fixincludes fixes) because it's only used in relation to macro definitions using features not supported for C++ (__builtin_types_compatible_p and _Generic). And there's no need to change the inline function overloads for issignaling, iszero and iscanonical in C++ because cases where types have the same format but are no longer compatible types are handled automatically by the C++ overload resolution rules. This patch also does not change the overload handling for iseqsig, and there I think changes are needed, beyond those in this patch or made by fixincludes. The way that overload is defined, via a template parameter to a structure type, requires overloads whenever the types are incompatible, even if they have the same format. So I think we need to add overloads with GCC 13 for every supported _FloatN and _FloatNx type, rather than just having one for _Float128 when it has a different ABI to long double as at present (but for older GCC, such overloads must not be defined for types that end up defined as typedefs for another type). Tested with build-many-glibcs.py: compilers build for aarch64-linux-gnu ia64-linux-gnu mips64-linux-gnu powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu; glibcs build for aarch64-linux-gnu ia64-linux-gnu i686-linux-gnu mips-linux-gnu mips64-linux-gnu-n32 powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu.	2022-09-28 20:10:08 +00:00
Adhemerval Zanella	2fc7320668	math: x86: Use prefix for FP_INIT_ROUNDMODE Not all compilers support the inline asm prefix '%v' to emit the avx instruction if AVX is enable. Use a prefix instead. Checked on x86_64-linux-gnu and i686-linux-gnu.	2022-09-05 10:54:41 -03:00
Adhemerval Zanella	8cd559cf5a	nptl: x86_64: Use same code for CURRENT_STACK_FRAME and stackinfo_get_sp It avoids the possible warning of uninitialized 'frame' variable when building with clang: ../sysdeps/nptl/jmp-unwind.c:27:42: error: variable 'frame' is uninitialized when used here [-Werror,-Wuninitialized] __pthread_cleanup_upto (env->__jmpbuf, CURRENT_STACK_FRAME); The resulting code is similar to CURRENT_STACK_FRAME. Checked on x86_64-linux-gnu.	2022-08-31 09:04:27 -03:00
Noah Goldstein	ceabdcd130	x86: Add support to build strcmp/strlen/strchr with explicit ISA level 1. Add default ISA level selection in non-multiarch/rtld implementations. 2. Add ISA level build guards to different implementations. - I.e strcmp-avx2.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (strcmp-evex.S). 3. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-16 03:07:59 -07:00
Noah Goldstein	7c8ca17893	x86: Add missing rtm tests for strcmp family Add new tests for: strcasecmp strncasecmp strcmp wcscmp These functions all have avx2_rtm implementations so should be tested.	2022-07-13 14:55:31 -07:00
Noah Goldstein	ae308947ff	x86: Add support for building {w}memcmp{eq} with explicit ISA level 1. Refactor files so that all implementations are in the multiarch directory - Moved the implementation portion of memcmp sse2 from memcmp.S to multiarch/memcmp-sse2.S - The non-multiarch file now only includes one of the implementations in the multiarch directory based on the compiled ISA level (only used for non-multiarch builds. Otherwise we go through the ifunc selector). 2. Add ISA level build guards to different implementations. - I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if compiled ISA level <= 3. Otherwise there is no reason to include it as we will always use one of the ISA level 4 implementations (memcmp-evex-movbe.S). 3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the non-multiarch {w}memcmp{eq}.S which will in turn select the best implementation based on the compiled ISA level. 4. Refactor the ifunc selector and ifunc implementation list to use the ISA level aware wrapper macros that allow functions below the compiled ISA level (with a guranteed replacement) to be skipped. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-07-05 16:42:42 -07:00
Noah Goldstein	a3563f3f36	x86: Add more feature definitions to isa-level.h This commit doesn't change anything in itself. It is just to add definitions that will be needed by future patches.	2022-06-28 08:24:56 -07:00
H.J. Lu	cfdc4df66c	x86-64: Only define used SSE/AVX/AVX512 run-time resolvers When glibc is built with x86-64 ISA level v3, SSE run-time resolvers aren't used. For x86-64 ISA level v4 build, both SSE and AVX resolvers are unused. Check the minimum x86-64 ISA level to exclude the unused run-time resolvers.	2022-06-27 14:17:52 -07:00
H.J. Lu	f56c497d2b	x86: Move CPU_FEATURE{S}_{USABLE\|ARCH}_P to isa-level.h Move X86_ISA_CPU_FEATURE_USABLE_P and X86_ISA_CPU_FEATURES_ARCH_P to where MINIMUM_X86_ISA_LEVEL and XXX_X86_ISA_LEVEL are defined.	2022-06-27 12:52:58 -07:00
Noah Goldstein	4fc321dc58	x86: Fix backwards Prefer_No_VZEROUPPER check in ifunc-evex.h Add third argument to X86_ISA_CPU_FEATURES_ARCH_P macro so the runtime CPU_FEATURES_ARCH_P check can be inverted if the MINIMUM_X86_ISA_LEVEL is not high enough to constantly evaluate the check. Use this new macro to correct the backwards check in ifunc-evex.h	2022-06-27 08:35:51 -07:00
Noah Goldstein	703f434108	x86: Add defines / utilities for making ISA specific x86 builds 1. Factor out some of the ISA level defines in isa-level.c to standalone header isa-level.h 2. Add new headers with ISA level dependent macros for handling ifuncs. Note, this file does not change any code. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-06-22 19:41:35 -07:00
Noah Goldstein	8da9f346cb	x86: Add BMI1/BMI2 checks for ISA_V3 check BMI1/BMI2 are part of the ISA V3 requirements: https://en.wikipedia.org/wiki/X86-64 And defined by GCC when building with `-march=x86-64-v3`	2022-06-16 20:17:45 -07:00
Noah Goldstein	b446822b6a	x86: Add bounds `x86_non_temporal_threshold` The lower-bound (16448) and upper-bound (SIZE_MAX / 16) are assumed by memmove-vec-unaligned-erms. The lower-bound is needed because memmove-vec-unaligned-erms unrolls the loop aggressively in the L(large_memset_4x) case. The upper-bound is needed because memmove-vec-unaligned-erms right-shifts the value of `x86_non_temporal_threshold` by LOG_4X_MEMCPY_THRESH (4) which without a bound may overflow. The lack of lower-bound can be a correctness issue. The lack of upper-bound cannot.	2022-06-15 14:25:55 -07:00
Fangrui Song	de38b2a343	elf: Remove ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA If an executable has copy relocations for extern protected data, that can only work if the library containing the definition is built with assumptions (a) the compiler emits GOT-generating relocations (b) the linker produces R__GLOB_DAT instead of R__RELATIVE. Otherwise the library uses its own definition directly and the executable accesses a stale copy. Note: the GOT relocations defeat the purpose of protected visibility as an optimization, but allow rtld to make the executable and library use the same copy when copy relocations are present, but it turns out this never worked perfectly. ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA has strange semantics when both a.so and b.so define protected var and the executable copy relocates var: b.so accesses its own copy even with GLOB_DAT. The behavior change is from commit `62da1e3b00` (x86) and then copied to nios2 (`ae5eae7cfc`) and arc (`0e7d930c4c`). Without ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA, b.so accesses the copy relocated data like a.so. There is now a warning for copy relocation on protected symbol since commit `7374c02b68`. It's extremely unlikely anyone relies on the ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA behavior, so let's remove it: this removes a check in the symbol lookup code.	2022-06-15 11:29:55 -07:00

1 2 3 4 5 ...

449 Commits