glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-29 00:01:12 +00:00

Author	SHA1	Message	Date
Noah Goldstein	4fc321dc58	x86: Fix backwards Prefer_No_VZEROUPPER check in ifunc-evex.h Add third argument to X86_ISA_CPU_FEATURES_ARCH_P macro so the runtime CPU_FEATURES_ARCH_P check can be inverted if the MINIMUM_X86_ISA_LEVEL is not high enough to constantly evaluate the check. Use this new macro to correct the backwards check in ifunc-evex.h	2022-06-27 08:35:51 -07:00
Noah Goldstein	703f434108	x86: Add defines / utilities for making ISA specific x86 builds 1. Factor out some of the ISA level defines in isa-level.c to standalone header isa-level.h 2. Add new headers with ISA level dependent macros for handling ifuncs. Note, this file does not change any code. Tested with and without multiarch on x86_64 for ISA levels: {generic, x86-64-v2, x86-64-v3, x86-64-v4} And m32 with and without multiarch.	2022-06-22 19:41:35 -07:00
Noah Goldstein	8da9f346cb	x86: Add BMI1/BMI2 checks for ISA_V3 check BMI1/BMI2 are part of the ISA V3 requirements: https://en.wikipedia.org/wiki/X86-64 And defined by GCC when building with `-march=x86-64-v3`	2022-06-16 20:17:45 -07:00
Noah Goldstein	b446822b6a	x86: Add bounds `x86_non_temporal_threshold` The lower-bound (16448) and upper-bound (SIZE_MAX / 16) are assumed by memmove-vec-unaligned-erms. The lower-bound is needed because memmove-vec-unaligned-erms unrolls the loop aggressively in the L(large_memset_4x) case. The upper-bound is needed because memmove-vec-unaligned-erms right-shifts the value of `x86_non_temporal_threshold` by LOG_4X_MEMCPY_THRESH (4) which without a bound may overflow. The lack of lower-bound can be a correctness issue. The lack of upper-bound cannot.	2022-06-15 14:25:55 -07:00
Fangrui Song	de38b2a343	elf: Remove ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA If an executable has copy relocations for extern protected data, that can only work if the library containing the definition is built with assumptions (a) the compiler emits GOT-generating relocations (b) the linker produces R__GLOB_DAT instead of R__RELATIVE. Otherwise the library uses its own definition directly and the executable accesses a stale copy. Note: the GOT relocations defeat the purpose of protected visibility as an optimization, but allow rtld to make the executable and library use the same copy when copy relocations are present, but it turns out this never worked perfectly. ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA has strange semantics when both a.so and b.so define protected var and the executable copy relocates var: b.so accesses its own copy even with GLOB_DAT. The behavior change is from commit `62da1e3b00` (x86) and then copied to nios2 (`ae5eae7cfc`) and arc (`0e7d930c4c`). Without ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA, b.so accesses the copy relocated data like a.so. There is now a warning for copy relocation on protected symbol since commit `7374c02b68`. It's extremely unlikely anyone relies on the ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA behavior, so let's remove it: this removes a check in the symbol lookup code.	2022-06-15 11:29:55 -07:00
Noah Goldstein	0355915514	x86: Fix misordered logic for setting `rep_movsb_stop_threshold` Move the setting of `rep_movsb_stop_threshold` to after the tunables have been collected so that the `rep_movsb_stop_threshold` (which is used to redirect control flow to the non_temporal case) will use any user value for `non_temporal_threshold` (set using glibc.cpu.x86_non_temporal_threshold)	2022-06-14 20:58:07 -07:00
Noah Goldstein	9a421348cd	elf: Optimize _dl_new_hash in dl-new-hash.h Unroll slightly and enforce good instruction scheduling. This improves performance on out-of-order machines. The unrolling allows for pipelined multiplies. As well, as an optional sysdep, reorder the operations and prevent reassosiation for better scheduling and higher ILP. This commit only adds the barrier for x86, although it should be either no change or a win for any architecture. Unrolling further started to induce slowdowns for sizes [0, 4] but can help the loop so if larger sizes are the target further unrolling can be beneficial. Results for _dl_new_hash Benchmarked on Tigerlake: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz Time as Geometric Mean of N=30 runs Geometric of all benchmark New / Old: 0.674 type, length, New Time, Old Time, New Time / Old Time fixed, 0, 2.865, 2.72, 1.053 fixed, 1, 3.567, 2.489, 1.433 fixed, 2, 2.577, 3.649, 0.706 fixed, 3, 3.644, 5.983, 0.609 fixed, 4, 4.211, 6.833, 0.616 fixed, 5, 4.741, 9.372, 0.506 fixed, 6, 5.415, 9.561, 0.566 fixed, 7, 6.649, 10.789, 0.616 fixed, 8, 8.081, 11.808, 0.684 fixed, 9, 8.427, 12.935, 0.651 fixed, 10, 8.673, 14.134, 0.614 fixed, 11, 10.69, 15.408, 0.694 fixed, 12, 10.789, 16.982, 0.635 fixed, 13, 12.169, 18.411, 0.661 fixed, 14, 12.659, 19.914, 0.636 fixed, 15, 13.526, 21.541, 0.628 fixed, 16, 14.211, 23.088, 0.616 fixed, 32, 29.412, 52.722, 0.558 fixed, 64, 65.41, 142.351, 0.459 fixed, 128, 138.505, 295.625, 0.469 fixed, 256, 291.707, 601.983, 0.485 random, 2, 12.698, 12.849, 0.988 random, 4, 16.065, 15.857, 1.013 random, 8, 19.564, 21.105, 0.927 random, 16, 23.919, 26.823, 0.892 random, 32, 31.987, 39.591, 0.808 random, 64, 49.282, 71.487, 0.689 random, 128, 82.23, 145.364, 0.566 random, 256, 152.209, 298.434, 0.51 Co-authored-by: Alexander Monakov <amonakov@ispras.ru> Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2022-05-23 10:38:40 -05:00
Fangrui Song	098a657fe4	elf: Replace PI_STATIC_AND_HIDDEN with opposite HIDDEN_VAR_NEEDS_DYNAMIC_RELOC PI_STATIC_AND_HIDDEN indicates whether accesses to internal linkage variables and hidden visibility variables in a shared object (ld.so) need dynamic relocations (usually R_*_RELATIVE). PI (position independent) in the macro name is a misnomer: a code sequence using GOT is typically position-independent as well, but using dynamic relocations does not meet the requirement. Not defining PI_STATIC_AND_HIDDEN is legacy and we expect that all new ports will define PI_STATIC_AND_HIDDEN. Current ports defining PI_STATIC_AND_HIDDEN are more than the opposite. Change the configure default. No functional change. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-04-26 09:26:22 -07:00
Noah Goldstein	9fef7039a7	x86: Fix fallback for wcsncmp_avx2 in strcmp-avx2.S [BZ #28896 ] Overflow case for __wcsncmp_avx2_rtm should be __wcscmp_avx2_rtm not __wcscmp_avx2. commit `ddf0992cf5` Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Sun Jan 9 16:02:21 2022 -0600 x86: Fix __wcsncmp_avx2 in strcmp-avx2.S [BZ# 28755] Set the wrong fallback function for `__wcsncmp_avx2_rtm`. It was set to fallback on to `__wcscmp_avx2` instead of `__wcscmp_avx2_rtm` which can cause spurious aborts. This change will need to be backported. All string/memory tests pass. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-03-25 11:46:13 -05:00
Noah Goldstein	b98d0bbf74	x86: Fix TEST_NAME to make it a string in tst-strncmp-rtm.c Previously TEST_NAME was passing a function pointer. This didn't fail because of the -Wno-error flag (to allow for overflow sizes passed to strncmp/wcsncmp) Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-02-18 15:24:50 -08:00
Noah Goldstein	7835d611af	x86: Test wcscmp RTM in the wcsncmp overflow case [BZ #28896 ] In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-02-18 16:35:18 -06:00
Noah Goldstein	c627209832	x86: Fallback {str\|wcs}cmp RTM in the ncmp overflow case [BZ #28896 ] In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would call strcmp-avx2 and wcscmp-avx2 respectively. This would have not checks around vzeroupper and would trigger spurious aborts. This commit fixes that. test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on AVX2 machines with and without RTM. Co-authored-by: H.J. Lu <hjl.tools@gmail.com>	2022-02-17 15:43:05 -06:00
H.J. Lu	f9db5433f3	x86/configure.ac: Define PI_STATIC_AND_HIDDEN/SUPPORT_STATIC_PIE Move PI_STATIC_AND_HIDDEN and SUPPORT_STATIC_PIE to sysdeps/x86/configure.ac.	2022-02-14 07:34:54 -08:00
H.J. Lu	6229aa74fb	x86: Use CHECK_FEATURE_PRESENT on PCONFIG PCONFIG is a privileged instruction. Use CHECK_FEATURE_PRESENT, instead of CHECK_FEATURE_ACTIVE, on PCONFIG in tst-cpu-features-supports.c.	2022-02-14 05:53:03 -08:00
H.J. Lu	61a4425dd4	x86: Don't check PTWRITE in tst-cpu-features-cpuinfo.c Don't check PTWRITE against /proc/cpuinfo since kernel doesn't report PTWRITE in /proc/cpuinfo.	2022-02-14 05:53:03 -08:00
H.J. Lu	1283948f23	x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ))	2022-02-05 16:42:17 -08:00
H.J. Lu	501246c5e2	x86: Use CHECK_FEATURE_PRESENT to check HLE [BZ #27398 ] HLE is disabled on blacklisted CPUs. Use CHECK_FEATURE_PRESENT, instead of CHECK_FEATURE_ACTIVE, to check HLE.	2022-01-26 21:08:05 -08:00
H.J. Lu	1e000d3d33	x86: Black list more Intel CPUs for TSX [BZ #27398 ] Disable TSX and enable RTM_ALWAYS_ABORT for Intel CPUs listed in: https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html This fixes BZ #27398. Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2022-01-18 14:20:09 -08:00
Aurelien Jarno	c242fcce06	x86: use default cache size if it cannot be determined [BZ #28784 ] In some cases (e.g QEMU, non-Intel/AMD CPU) the cache information can not be retrieved and the corresponding values are set to 0. Commit `2d651eb926` ("x86: Move x86 processor cache info to cpu_features") changed the behaviour in such case by defining the __x86_shared_cache_size and __x86_data_cache_size variables to 0 instead of using the default values. This cause an issue with the i686 SSE2 optimized bzero/routine which assumes that the cache size is at least 128 bytes, and otherwise tries to zero/set the whole address space minus 128 bytes. Fix that by restoring the original code to only update __x86_shared_cache_size and __x86_data_cache_size variables if the corresponding cache sizes are not zero. Fixes bug 28784 Fixes commit `2d651eb926` Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-01-17 19:42:46 +01:00
Florian Weimer	990c953bce	x86: Add x86-64-vN check to early startup This ISA level covers the glibc build itself. <dl-hwcap-check.h> cannot be used because this check (by design) happens before DL_PLATFORM_INIT and the x86 CPU flags initialization. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-01-14 20:17:49 +01:00
Florian Weimer	5732a881aa	x86: HAVE_X86_LAHF_SAHF, HAVE_X86_MOVBE and -march=x86-64-vN (bug 28782) HAVE_X86_LAHF_SAHF is implied by x86-64-v2, and HAVE_X86_MOVBE by x86-64-v3. The individual flag does not appear in -fverbose-asm flag output even if the ISA level implies it. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-01-14 16:09:20 +01:00
Paul Eggert	581c785bf3	Update copyright dates with scripts/update-copyrights I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: * 912-#endif remote: * 913: remote: * 914- remote: * error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines	2022-01-01 11:40:24 -08:00
Sunil K Pandey	c21c7bc24e	x86-64: Add vector tan/tanf implementation to libmvec Implement vectorized tan/tanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector tan/tanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-30 10:19:13 -08:00
Sunil K Pandey	8881cca8fb	x86-64: Add vector erfc/erfcf implementation to libmvec Implement vectorized erfc/erfcf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector erfc/erfcf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-30 10:19:03 -08:00
Sunil K Pandey	e682d01578	x86-64: Add vector asinh/asinhf implementation to libmvec Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asinh/asinhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:56 -08:00
Sunil K Pandey	c0f36fc303	x86-64: Add vector tanh/tanhf implementation to libmvec Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector tanh/tanhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:50 -08:00
Sunil K Pandey	f9ce13fdac	x86-64: Add vector erf/erff implementation to libmvec Implement vectorized erf/erff containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector erf/erff with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:44 -08:00
Sunil K Pandey	0625489ccc	x86-64: Add vector acosh/acoshf implementation to libmvec Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acosh/acoshf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:39 -08:00
Sunil K Pandey	6dea4dd3da	x86-64: Add vector atanh/atanhf implementation to libmvec Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atanh/atanhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:34 -08:00
Sunil K Pandey	74265c16ab	x86-64: Add vector log1p/log1pf implementation to libmvec Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log1p/log1pf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:27 -08:00
Sunil K Pandey	7e1722fec8	x86-64: Add vector log2/log2f implementation to libmvec Implement vectorized log2/log2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log2/log2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:21 -08:00
Sunil K Pandey	8f8566026d	x86-64: Add vector log10/log10f implementation to libmvec Implement vectorized log10/log10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector log10/log10f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:15 -08:00
Sunil K Pandey	2941a24f8c	x86-64: Add vector atan2/atan2f implementation to libmvec Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan2/atan2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:09 -08:00
Sunil K Pandey	2bf02c5843	x86-64: Add vector cbrt/cbrtf implementation to libmvec Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:38:02 -08:00
Sunil K Pandey	aa1809a1df	x86-64: Add vector sinh/sinhf implementation to libmvec Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector sinh/sinhf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:55 -08:00
Sunil K Pandey	76ddc74e86	x86-64: Add vector expm1/expm1f implementation to libmvec Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector expm1/expm1f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:49 -08:00
Sunil K Pandey	ef7ea9c132	x86-64: Add vector cosh/coshf implementation to libmvec Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector cosh/coshf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:42 -08:00
Sunil K Pandey	8b726453d5	x86-64: Add vector exp10/exp10f implementation to libmvec Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp10/exp10f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:35 -08:00
Sunil K Pandey	3fc9ccc20b	x86-64: Add vector exp2/exp2f implementation to libmvec Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector exp2/exp2f with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:29 -08:00
Sunil K Pandey	37475ba883	x86-64: Add vector hypot/hypotf implementation to libmvec Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector hypot/hypotf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:21 -08:00
Sunil K Pandey	11c01de14c	x86-64: Add vector asin/asinf implementation to libmvec Implement vectorized asin/asinf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector asin/asinf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:37:03 -08:00
Sunil K Pandey	146310177a	x86-64: Add vector atan/atanf implementation to libmvec Implement vectorized atan/atanf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector atan/atanf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-29 11:36:46 -08:00
Florian Weimer	5d28a8962d	elf: Add _dl_find_object function It can be used to speed up the libgcc unwinder, and the internal _dl_find_dso_for_object function (which is used for caller identification in dlopen and related functions, and in dladdr). _dl_find_object is in the internal namespace due to bug 28503. If libgcc switches to _dl_find_object, this namespace issue will be fixed. It is located in libc for two reasons: it is necessary to forward the call to the static libc after static dlopen, and there is a link ordering issue with -static-libgcc and libgcc_eh.a because libc.so is not a linker script that includes ld.so in the glibc build tree (so that GCC's internal -lc after libgcc_eh.a does not pick up ld.so). It is necessary to do the i386 customization in the sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because otherwise, multilib installations are broken. The implementation uses software transactional memory, as suggested by Torvald Riegel. Two copies of the supporting data structures are used, also achieving full async-signal-safety. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-12-28 22:52:56 +01:00
Adhemerval Zanella	92ff345137	Remove atomic-machine.h atomic typedefs Now that memusage.c uses generic types we can remove them.	2021-12-28 14:57:57 -03:00
Sunil K Pandey	f20f980c71	x86-64: Add vector acos/acosf implementation to libmvec Implement vectorized acos/acosf containing SSE, AVX, AVX2 and AVX512 versions for libmvec as per vector ABI. It also contains accuracy and ABI tests for vector acos/acosf with regenerated ulps. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-12-22 13:03:14 -08:00
Aurelien Jarno	94058f6cde	elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704 ] On KVM guests running on some AMD systems, the IBRS feature is reported as a synthetic feature using the Intel feature, while the cpuinfo entry keeps the same. Handle that by first checking the presence of the Intel feature on AMD systems. Fixes bug 28704.	2021-12-17 20:20:15 +01:00
H.J. Lu	ea5814467a	x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656 ] Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since the first PT_LOAD segment is no longer executable due to defaulting to -z separate-code. This fixes [BZ #28656]. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-12-10 14:01:34 -08:00
Florian Weimer	8dbeb0561e	nptl: Add <thread_pointer.h> for defining __thread_pointer <tls.h> already contains a definition that is quite similar, but it is not consistent across architectures. Only architectures for which rseq support is added are covered. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2021-12-09 09:49:32 +01:00
H.J. Lu	ceeffe968c	x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since they won't lower CPU frequency when ZMM load and store instructions are used.	2021-12-06 07:14:12 -08:00
Noah Goldstein	475b63702e	x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h No bug. This patch doubles the rep_movsb_threshold when using ERMS. Based on benchmarks the vector copy loop, especially now that it handles 4k aliasing, is better for these medium ranged. On Skylake with ERMS: Size, Align1, Align2, dst>src,(rep movsb) / (vec copy) 4096, 0, 0, 0, 0.975 4096, 0, 0, 1, 0.953 4096, 12, 0, 0, 0.969 4096, 12, 0, 1, 0.872 4096, 44, 0, 0, 0.979 4096, 44, 0, 1, 0.83 4096, 0, 12, 0, 1.006 4096, 0, 12, 1, 0.989 4096, 0, 44, 0, 0.739 4096, 0, 44, 1, 0.942 4096, 12, 12, 0, 1.009 4096, 12, 12, 1, 0.973 4096, 44, 44, 0, 0.791 4096, 44, 44, 1, 0.961 4096, 2048, 0, 0, 0.978 4096, 2048, 0, 1, 0.951 4096, 2060, 0, 0, 0.986 4096, 2060, 0, 1, 0.963 4096, 2048, 12, 0, 0.971 4096, 2048, 12, 1, 0.941 4096, 2060, 12, 0, 0.977 4096, 2060, 12, 1, 0.949 8192, 0, 0, 0, 0.85 8192, 0, 0, 1, 0.845 8192, 13, 0, 0, 0.937 8192, 13, 0, 1, 0.939 8192, 45, 0, 0, 0.932 8192, 45, 0, 1, 0.927 8192, 0, 13, 0, 0.621 8192, 0, 13, 1, 0.62 8192, 0, 45, 0, 0.53 8192, 0, 45, 1, 0.516 8192, 13, 13, 0, 0.664 8192, 13, 13, 1, 0.659 8192, 45, 45, 0, 0.593 8192, 45, 45, 1, 0.575 8192, 2048, 0, 0, 0.854 8192, 2048, 0, 1, 0.834 8192, 2061, 0, 0, 0.863 8192, 2061, 0, 1, 0.857 8192, 2048, 13, 0, 0.63 8192, 2048, 13, 1, 0.629 8192, 2061, 13, 0, 0.627 8192, 2061, 13, 1, 0.62 Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-11-06 16:18:08 -05:00
Florian Weimer	cca75bd8b5	i386: Explain why __HAVE_64B_ATOMICS has to be 0	2021-11-02 10:26:23 +01:00
H.J. Lu	14dbbf46a0	x86-64: Remove Prefer_AVX2_STRCMP Remove Prefer_AVX2_STRCMP to enable EVEX strcmp. When comparing 2 32-byte strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1 VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1 VPMOVMSKB and 1 TESTL. EVEX strcmp is now faster than AVX2 strcmp by up to 40% on Tiger Lake and Ice Lake.	2021-11-01 07:53:04 -07:00
Fangrui Song	bf433b849a	elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT) Intel MPX failed to gain wide adoption and has been deprecated for a while. GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in 2019. This patch removes the support code from the dynamic loader. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-10-11 11:14:02 -07:00
Noah Goldstein	fc5bd179ef	x86: Modify ENTRY in sysdep.h so that p2align can be specified No bug. This change adds a new macro ENTRY_P2ALIGN which takes a second argument, log2 of the desired function alignment. The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this doesn't affect any existing functionality. Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>	2021-10-08 11:30:52 -05:00
H.J. Lu	1bd888d0b7	Initial support for GNU_PROPERTY_1_NEEDED 1. Add GNU_PROPERTY_1_NEEDED: #define GNU_PROPERTY_1_NEEDED GNU_PROPERTY_UINT32_OR_LO to indicate the needed properties by the object file. 2. Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS: #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0) to indicate that the object file requires canonical function pointers and cannot be used with copy relocation. 3. Scan GNU_PROPERTY_1_NEEDED property and store it in l_1_needed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-10-07 10:26:08 -07:00
Joseph Myers	b26901b26e	Fix sysdeps/x86/fpu/s_ffma.c for 32-bit FMA processor case It turns out the __SSE2_MATH__ conditional in sysdeps/x86/fpu/s_ffma.c does not cover all cases where the x86 fenv_private.h macros might manipulate one of the SSE and 387 floating-point state, while the actual fma implementation uses the other. Specifically, in the 32-bit case, with a compiler not defaulting to -mfpmath=sse, but testing on a processor with hardware FMA support, the multiarch fma function implementations will end up using SSE, while the fenv_private.h macros will use the 387 state for double. Change the conditional to use the default macros rather than the optimized ones in all cases except when the compiler inlines an fma instruction (in which case, since all those instructions are SSE instructions and -mfpmath=sse must be in effect for them to be inlined, the optimized macros will only use the SSE state and it's OK for them to only use the SSE state). Tested for x86_64 and x86. H.J. reports in <https://sourceware.org/pipermail/libc-alpha/2021-September/131367.html> that it fixes the problems he observed.	2021-09-24 17:59:22 +00:00
Joseph Myers	4ed7a383f9	Fix ffma use of round-to-odd on x86 On 32-bit x86 with -mfpmath=sse, and on x86_64 with --disable-multi-arch, the tests of ffma and its aliases (fma narrowing from binary64 to binary32) fail. This is probably the issue reported by H.J. in <https://sourceware.org/pipermail/libc-alpha/2021-September/131277.html>. The problem is the use of fenv_private.h macros in the round-to-odd implementation. Those macros are set up to manipulate only one of the SSE and 387 floating-point state, whichever is relevant for the type indicated by the suffix on the macro name. But x86 configurations sometimes use the ldbl-96 implementation of binary64 fma (that's where --disable-multi-arch is relevant for x86_64: it causes the ldbl-96 implementation to be used, instead of an IFUNC implementation that falls back to the dbl-64 version), contrary to the expectations of those macros for functions operating on double when __SSE2_MATH__ is defined. This can be addressed by using the default versions of those macros (giving x86 its own version of s_ffma.c), as is done for the *f128 macro variants where it depends on the details of how GCC was configured when building libgcc which floating-point state is affected by _Float128 arithmetic. The issue only applies when __SSE2_MATH__ is defined, and doesn't apply when __FP_FAST_FMA is defined (because in that case, fma will be inlined by the compiler, meaning it's definitely an SSE operation; for the same reason, this is not an issue for narrowing sqrt, as hardware sqrt is always inlined in that implementation for x86), but in other cases it's safest to use the default versions of the fenv_private.h macros to ensure things work whichever fma implementation is used. Tested for x86_64 (with and without --disable-multi-arch) and x86 (with and without -mfpmath=sse).	2021-09-23 21:18:31 +00:00
Siddhesh Poyarekar	30891f35fa	Remove "Contributed by" lines We stopped adding "Contributed by" or similar lines in sources in 2012 in favour of git logs and keeping the Contributors section of the glibc manual up to date. Removing these lines makes the license header a bit more consistent across files and also removes the possibility of error in attribution when license blocks or files are copied across since the contributed-by lines don't actually reflect reality in those cases. Move all "Contributed by" and similar lines (Written by, Test by, etc.) into a new file CONTRIBUTED-BY to retain record of these contributions. These contributors are also mentioned in manual/contrib.texi, so we just maintain this additional record as a courtesy to the earlier developers. The following scripts were used to filter a list of files to edit in place and to clean up the CONTRIBUTED-BY file respectively. These were not added to the glibc sources because they're not expected to be of any use in future given that this is a one time task: https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-09-03 22:06:44 +05:30
Fangrui Song	224edada60	configure: Allow LD to be LLD 13.0.0 or above [BZ #26558 ] When using LLD (LLVM linker) as the linker, configure prints a confusing message. *** These critical programs are missing or too old: GNU ld LLD>=13.0.0 can build glibc --enable-static-pie. (8.0.0 needs one workaround for -Wl,-defsym=_begin=0. 9.0.0 works with --disable-static-pie). XFAIL two tests sysdeps/x86/tst-ifunc-isa-* which have the BZ #28154 issue (LLD follows the PowerPC port of GNU ld for ifunc by placing IRELATIVE relocations in .rela.dyn, triggering a glibc ifunc fragility). The set of dynamic symbols is the same with GNU ld and LLD, modulo unused SHN_ABS version node symbols. For comparison, gold does not support --enable-static-pie yet (--no-dynamic-linker is unsupported BZ #22221), yet has 6 failures more than LLD. gold linked libc.so has larger .dynsym differences with GNU ld and LLD (non-default version symbols are changed to default versions by a version script BZ #28196).	2021-08-31 20:23:34 -07:00
Matt Whitlock	0835c0f0ba	x86: fix Autoconf caching of instruction support checks [BZ #27991 ] The Autoconf documentation for the AC_CACHE_CHECK macro states: The commands-to-set-it must have no side effects except for setting the variable cache-id, see below. However, the tests for support of -msahf and -mmovbe were embedded in the commands-to-set-it for lib_cv_include_x86_isa_level. This had the consequence that libc_cv_have_x86_lahf_sahf and libc_cv_have_x86_movbe were not defined whenever lib_cv_include_x86_isa_level was read from cache. These variables' being undefined meant that their unquoted use in later test expressions led to the 'test' built-in's misparsing its arguments and emitting errors like "test: =: unexpected operator" or "test: =: unary operator expected", depending on the particular shell. This commit refactors the tests for LAHF/SAHF and MOVBE instruction support into their own AC_CACHE_CHECK macro invocations to obey the rule that the commands-to-set-it must have no side effects other than setting the variable named by cache-id. Signed-off-by: Matt Whitlock <sourceware@mattwhitlock.name> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-08-19 09:11:35 -03:00
H.J. Lu	91cc803d27	x86-64: Add Avoid_Short_Distance_REP_MOVSB commit `3ec5d83d2a` Author: H.J. Lu <hjl.tools@gmail.com> Date: Sat Jan 25 14:19:40 2020 -0800 x86-64: Avoid rep movsb with short distance [BZ #27130] introduced some regressions on Intel processors without Fast Short REP MOV (FSRM). Add Avoid_Short_Distance_REP_MOVSB to avoid rep movsb with short distance only on Intel processors with FSRM. bench-memmove-large on Skylake server shows that cycles of __memmove_evex_unaligned_erms improves for the following data size: before after Improvement length=4127, align1=3, align2=0: 479.38 349.25 27% length=4223, align1=9, align2=5: 405.62 333.25 18% length=8223, align1=3, align2=0: 786.12 496.38 37% length=8319, align1=9, align2=5: 727.50 501.38 31% length=16415, align1=3, align2=0: 1436.88 840.00 41% length=16511, align1=9, align2=5: 1375.50 836.38 39% length=32799, align1=3, align2=0: 2890.00 1860.12 36% length=32895, align1=9, align2=5: 2891.38 1931.88 33%	2021-07-28 13:23:57 -07:00
H.J. Lu	7c124e3714	x86: Install <bits/platform/x86.h> [BZ #27958 ] 1. Install <bits/platform/x86.h> for <sys/platform/x86.h> which includes <bits/platform/x86.h>. 2. Rename HAS_CPU_FEATURE to CPU_FEATURE_PRESENT which checks if the processor has the feature. 3. Rename CPU_FEATURE_USABLE to CPU_FEATURE_ACTIVE which checks if the feature is active. There may be other preconditions, like sufficient stack space or further setup for AMX, which must be satisfied before the feature can be used. This fixes BZ #27958. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-07-23 05:12:51 -07:00
Siddhesh Poyarekar	5b8d271571	Fix build and tests with --disable-tunables Remove unused code and declare __libc_mallopt when !IS_IN (libc) to allow the debug hook to build with --disable-tunables. Also, run tst-ifunc-isa-2* tests only when tunables are enabled since the result depends on it. Tested on x86_64. Reported-by: Matheus Castanho <msc@linux.ibm.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-07-23 13:57:56 +05:30
Adhemerval Zanella	469761eac8	elf: Fix tst-cpu-features-cpuinfo on some AMD systems (BZ #28090 ) The SSBD feature is implemented in 2 different ways on AMD processors: newer systems (Zen3) provides AMD_SSBD (function 8000_0008, EBX[24]), while older system provides AMD_VIRT_SSBD (function 8000_0008, EBX[25]). However for AMD_VIRT_SSBD, kernel shows both 'ssdb' and 'virt_ssdb' on /proc/cpuinfo; while for AMD_SSBD only 'ssdb' is provided. This now check is AMD_SSBD is set to check for 'ssbd', otherwise check if AMD_VIRT_SSDB is set to check for 'virt_ssbd'. Checked on x86_64-linux-gnu on a Ryzen 9 5900x. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-07-19 14:12:29 -03:00
H.J. Lu	ea8e465a6b	x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033 ] From https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html * Intel TSX will be disabled by default. * The processor will force abort all Restricted Transactional Memory (RTM) transactions by default. * A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated, which is set to indicate to updated software that the loaded microcode is forcing RTM abort. * On processors that enumerate support for RTM, the CPUID enumeration bits for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to be set by default after microcode update. * Workloads that were benefited from Intel TSX might experience a change in performance. * System software may use a new bit in Model-Specific Register (MSR) 0x10F TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock Elision (HLE) and RTM bits to indicate to software that Intel TSX is disabled. 1. Add RTM_ALWAYS_ABORT to CPUID features. 2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set. This skips the string/tst-memchr-rtm etc. testcases on the affected processors, which always fail after a microcde update. 3. Check RTM feature, instead of usability, against /proc/cpuinfo. This fixes BZ #28033.	2021-07-01 10:47:35 -07:00
Adhemerval Zanella	e3e3eb0a2e	x86: Fix tst-cpu-features-cpuinfo on Ryzen 9 (BZ #27873 ) AMD define different flags for IRPB, IBRS, and STIPBP [1], so new x86_64_cpu are added and IBRS_IBPB is only tested for Intel. The SSDB is also defined and implemented different on AMD [2], and also a new AMD_SSDB flag is added. It should map to the cpuinfo 'ssdb' on recent AMD cpus. It fixes tst-cpu-features-cpuinfo and tst-cpu-features-cpuinfo-static on recent AMD cpus. Checked on x86_64-linux-gnu on AMD Ryzen 9 5900X. [1] https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf [2] https://bugzilla.kernel.org/show_bug.cgi?id=199889 Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-06-24 09:57:46 -03:00
H.J. Lu	ea26ff0322	x86: Copy IBT and SHSTK usable only if CET is enabled IBT and SHSTK usable bits are copied from CPUID feature bits and later cleared if kernel doesn't support CET. Copy IBT and SHSTK usable only if CET is enabled so that they aren't set on CET capable processors with non-CET enabled glibc.	2021-06-23 17:35:47 -07:00
Florian Weimer	6f1c701026	dlfcn: Cleanups after -ldl is no longer required This commit removes the ELF constructor and internal variables from dlfcn/dlfcn.c. The file now serves the same purpose as nptl/libpthread-compat.c, so it is renamed to dlfcn/libdl-compat.c. The use of libdl-shared-only-routines ensures that libdl.a is empty. This commit adjusts the test suite not to use $(libdl). The libdl.so symbolic link is no longer installed. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-06-03 09:11:45 +02:00
H.J. Lu	79aec84102	Properly check stack alignment [BZ #27901 ] 1. Replace if ((((uintptr_t) &_d) & (__alignof (double) - 1)) != 0) which may be optimized out by compiler, with int __attribute__ ((weak, noclone, noinline)) is_aligned (void *p, int align) { return (((uintptr_t) p) & (align - 1)) != 0; } 2. Add TEST_STACK_ALIGN_INIT to TEST_STACK_ALIGN. 3. Add a common TEST_STACK_ALIGN_INIT to check 16-byte stack alignment for both i386 and x86-64. 4. Update powerpc to use TEST_STACK_ALIGN_INIT. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-05-24 07:42:12 -07:00
H.J. Lu	cf2c57526b	x86: Set rep_movsb_threshold to 2112 on processors with FSRM The glibc memcpy benchmark on Intel Core i7-1065G7 (Ice Lake) showed that REP MOVSB became faster after 2112 bytes: Vector Move REP MOVSB length=2112, align1=0, align2=0: 24.20 24.40 length=2112, align1=1, align2=0: 26.07 23.13 length=2112, align1=0, align2=1: 27.18 28.13 length=2112, align1=1, align2=1: 26.23 25.16 length=2176, align1=0, align2=0: 23.18 22.52 length=2176, align1=2, align2=0: 25.45 22.52 length=2176, align1=0, align2=2: 27.14 27.82 length=2176, align1=2, align2=2: 22.73 25.56 length=2240, align1=0, align2=0: 24.62 24.25 length=2240, align1=3, align2=0: 29.77 27.15 length=2240, align1=0, align2=3: 35.55 29.93 length=2240, align1=3, align2=3: 34.49 25.15 length=2304, align1=0, align2=0: 34.75 26.64 length=2304, align1=4, align2=0: 32.09 22.63 length=2304, align1=0, align2=4: 28.43 31.24 Use REP MOVSB for data size > 2112 bytes in memcpy on processors with fast short REP MOVSB (FSRM). * sysdeps/x86/dl-cacheinfo.h (dl_init_cacheinfo): Set rep_movsb_threshold to 2112 on processors with fast short REP MOVSB (FSRM).	2021-05-03 05:08:22 -07:00
H.J. Lu	7fc9152e83	x86: tst-cpu-features-supports.c: Update AMX check Pass "amx-bf16", "amx-int8" and "amx-tile", instead of "amx_bf16", "amx_int8" and "amx_tile", to __builtin_cpu_supports for GCC 11.	2021-04-22 10:09:49 -07:00
Florian Weimer	81dfc6694c	nptl: Remove longjmp, siglongjmp from libpthread The definitions in libc are sufficient, the forwarders are no longer needed. The symbols have been moved using scripts/move-symbol-to-libc.py. s390-linux-gnu and s390x-linux-gnu need a new version placeholder to keep the GLIBC_2.19 symbol version in libpthread. Tested on i386-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu, x86_64-linux-gnu. Built with build-many-glibcs.py. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-21 19:49:50 +02:00
Siddhesh Poyarekar	abadbef5c8	Move __isnanf128 to libc.so All of the isnan functions are in libc.so due to printf_fp, so move __isnanf128 there too for consistency. Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br> Reviewed-by: Florian Weimer <fweimer@redhat.com>	2021-03-30 14:58:19 +05:30
H.J. Lu	4bd660be40	x86: Add string/memory function tests in RTM region At function exit, AVX optimized string/memory functions have VZEROUPPER which triggers RTM abort. When such functions are called inside a transactionally executing RTM region, RTM abort causes severe performance degradation. Add tests to verify that string/memory functions won't cause RTM abort in RTM region.	2021-03-29 07:40:17 -07:00
H.J. Lu	1da50d4bda	x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP 1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered by VZEROUPPER inside a transactionally executing RTM region. 2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2 loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.	2021-03-29 07:40:17 -07:00
H.J. Lu	27f7463675	x86: Properly disable XSAVE related features [BZ #27605 ] 1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE. 2. Disable all features which depend on XSAVE: a. If OSXSAVE is disabled by glibc tunables. Or b. If both XSAVE and XSAVEC aren't usable.	2021-03-29 06:04:17 -07:00
Samuel Thibault	16b597807d	elf: Fix not compiling ifunc tests that need gcc ifunc support	2021-03-24 01:52:46 +01:00
Siddhesh Poyarekar	941ea10f80	Build get-cpuid-feature-leaf.c without stack-protector [BZ #27555 ] __x86_get_cpuid_feature_leaf is called during early startup, before the stack check guard is initialized and is hence not safe to build with stack-protector. Additionally, IFUNC resolvers for static tst-ifunc-isa tests get called too early for stack protector to be useful, so fix them to disable stack protector for the resolver functions. This fixes all failures seen with --enable-stack-protector=all configuration. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-03-15 20:24:45 +05:30
H.J. Lu	f53ffc9b90	x86: Handle _SC_LEVEL1_ICACHE_LINESIZE [BZ #27444 ] commit `2d651eb926` Author: H.J. Lu <hjl.tools@gmail.com> Date: Fri Sep 18 07:55:14 2020 -0700 x86: Move x86 processor cache info to cpu_features missed _SC_LEVEL1_ICACHE_LINESIZE. 1. Add level1_icache_linesize to struct cpu_features. 2. Initialize level1_icache_linesize by calling handle_intel, handle_zhaoxin and handle_amd with _SC_LEVEL1_ICACHE_LINESIZE. 3. Return level1_icache_linesize for _SC_LEVEL1_ICACHE_LINESIZE. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-03-15 05:43:26 -07:00
H.J. Lu	339bf918ea	x86: Set minimum x86-64 level marker [BZ #27318 ] Since the full ISA set used in an ELF binary is unknown to compiler, an x86-64 ISA level marker indicates the minimum, not maximum, ISA set required to run such an ELF binary. We never guarantee a library with an x86-64 ISA level v3 marker doesn't contain other ISAs beyond x86-64 ISA level v3, like AVX VNNI. We check the x86-64 ISA level marker for the minimum ISA set. Since -march=sandybridge enables only some ISAs in x86-64 ISA level v3, we should set the needed ISA marker to v2. Otherwise, libc is compiled with -march=sandybridge will fail to run on Sandy Bridge: $ ./elf/ld.so ./libc.so ./libc.so: (p) CPU ISA level is lower than required: needed: 7; got: 3 Set the minimum, instead of maximum, x86-64 ISA level marker should have no impact on the glibc-hwcaps directory assignment logic in ldconfig nor ld.so.	2021-03-06 07:49:30 -08:00
Florian Weimer	01a5746b6c	x86: Add CPU-specific diagnostics to ld.so --list-diagnostics	2021-03-02 15:01:10 +01:00
Florian Weimer	e4933c8a92	x86: Automate generation of PREFERRED_FEATURE_INDEX_1 bitfield Use a .def file to define the bitfield layout, so that it is possible to iterate over field members using the preprocessor.	2021-03-02 15:01:06 +01:00
H.J. Lu	89de9d3958	x86: Use x86/nptl/pthreaddef.h 1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h. 2. Remove sysdeps/x86_64/nptl/pthreaddef.h. Reviewed-by: DJ Delorie <dj@redhat.com>	2021-02-22 15:52:56 -08:00
Florian Weimer	feb741bb81	x86: Remove unused variables for raw cache sizes from cacheinfo.h	2021-02-22 17:36:03 +01:00
H.J. Lu	ba230b6387	<bits/platform/x86.h>: Correct x86_cpu_TBM x86_cpu_TBM should be x86_cpu_index_80000001_ecx + 21.	2021-02-22 04:31:51 -08:00
H.J. Lu	ce4a94b12e	x86: Remove the extra space between "# endif" Remove the extra space between "# endif" left over from commit `f380868f6d` Author: H.J. Lu <hjl.tools@gmail.com> Date: Thu Dec 24 15:43:34 2020 -0800 Remove _ISOMAC check from <cpu-features.h>	2021-02-12 07:50:29 -08:00
Siddhesh Poyarekar	a1b8b06a55	x86: Use SIZE_MAX instead of (long int)-1 for tunable range value The tunable types are SIZE_T, so set the ranges to the correct maximum value, i.e. SIZE_MAX.	2021-02-10 19:08:33 +05:30
Siddhesh Poyarekar	61117bfa1b	tunables: Simplify TUNABLE_SET interface The TUNABLE_SET interface took a primitive C type argument, which resulted in inconsistent type conversions internally due to incorrect dereferencing of types, especialy on 32-bit architectures. This change simplifies the TUNABLE setting logic along with the interfaces. Now all numeric tunable values are stored as signed numbers in tunable_num_t, which is intmax_t. All calls to set tunables cast the input value to its primitive type and then to tunable_num_t for storage. This relies on gcc-specific (although I suspect other compilers woul also do the same) unsigned to signed integer conversion semantics, i.e. the bit pattern is conserved. The reverse conversion is guaranteed by the standard.	2021-02-10 19:08:33 +05:30
H.J. Lu	5ab25c8875	x86: Add PTWRITE feature detection [BZ #27346 ] 1. Add CPUID_INDEX_14_ECX_0 for CPUID leaf 0x14 to detect PTWRITE feature in EBX of CPUID leaf 0x14 with ECX == 0. 2. Add PTWRITE detection to CPU feature tests. 3. Add 2 static CPU feature tests.	2021-02-07 08:01:14 -08:00
Sajan Karumanchi	6e02b3e932	x86: Adding an upper bound for Enhanced REP MOVSB. In the process of optimizing memcpy for AMD machines, we have found the vector move operations are outperforming enhanced REP MOVSB for data transfers above the L2 cache size on Zen3 architectures. To handle this use case, we are adding an upper bound parameter on enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'. As per large-bench results, we are configuring this parameter to the L2 cache size for AMD machines and applicable from Zen3 architecture supporting the ERMS feature. For architectures other than AMD, it is the computed value of non-temporal threshold parameter. Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>	2021-02-02 12:42:15 +01:00
H.J. Lu	6c57d32048	sysconf: Add _SC_MINSIGSTKSZ/_SC_SIGSTKSZ [BZ #20305 ] Add _SC_MINSIGSTKSZ for the minimum signal stack size derived from AT_MINSIGSTKSZ, which is the minimum number of bytes of free stack space required in order to gurantee successful, non-nested handling of a single signal whose handler is an empty function, and _SC_SIGSTKSZ which is the suggested minimum number of bytes of stack space required for a signal stack. If AT_MINSIGSTKSZ isn't available, sysconf (_SC_MINSIGSTKSZ) returns MINSIGSTKSZ. On Linux/x86 with XSAVE, the signal frame used by kernel is composed of the following areas and laid out as: ------------------------------ \| alignment padding \| ------------------------------ \| xsave buffer \| ------------------------------ \| fsave header (32-bit only) \| ------------------------------ \| siginfo + ucontext \| ------------------------------ Compute AT_MINSIGSTKSZ value as size of xsave buffer + size of fsave header (32-bit only) + size of siginfo and ucontext + alignment padding. If _SC_SIGSTKSZ_SOURCE or _GNU_SOURCE are defined, MINSIGSTKSZ and SIGSTKSZ are redefined as /* Default stack size for a signal handler: sysconf (SC_SIGSTKSZ). / # undef SIGSTKSZ # define SIGSTKSZ sysconf (_SC_SIGSTKSZ) / Minimum stack size for a signal handler: SIGSTKSZ. */ # undef MINSIGSTKSZ # define MINSIGSTKSZ SIGSTKSZ Compilation will fail if the source assumes constant MINSIGSTKSZ or SIGSTKSZ. The reason for not simply increasing the kernel's MINSIGSTKSZ #define (apart from the fact that it is rarely used, due to glibc's shadowing definitions) was that userspace binaries will have baked in the old value of the constant and may be making assumptions about it. For example, the type (char [MINSIGSTKSZ]) changes if this #define changes. This could be a problem if an newly built library tries to memcpy() or dump such an object defined by and old binary. Bounds-checking and the stack sizes passed to things like sigaltstack() and makecontext() could similarly go wrong.	2021-02-01 11:00:52 -08:00
H.J. Lu	04dff6fc0d	x86: Properly set usable CET feature bits [BZ #26625 ] commit `94cd37ebb2` Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Sep 16 05:27:32 2020 -0700 x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625] broke GLIBC_TUNABLES=glibc.cpu.hwcaps=-IBT,-SHSTK since it can no longer disable IBT nor SHSTK. Handle IBT and SHSTK with: 1. Revert commit `94cd37ebb2`. 2. Clears the usable CET feature bits if kernel doesn't support CET. 3. Add GLIBC_TUNABLES tests without dlopen. 4. Add tests to verify that CPU_FEATURE_USABLE on IBT and SHSTK matches _get_ssp. 5. Update GLIBC_TUNABLES tests with dlopen to verify that CET is disabled with GLIBC_TUNABLES. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-01-29 03:58:11 -08:00
Andreas Schwab	31f6488722	Fix misplaced const Constify __x86_cacheinfo_p and __x86_cpu_features_p, not their pointer target types.	2021-01-25 15:09:02 +01:00
H.J. Lu	5f478eb0fb	x86: Properly match CPU features in /proc/cpuinfo [BZ #27222 ] Search " YYY " and " YYY\n", instead of "YYY", to avoid matching "XXXYYYZZZ" with "YYY". Update /proc/cpuinfo CPU feature names: /proc/cpuinfo glibc ------------------------------------------------ avx512vbmi AVX512_VBMI dts DS pni SSE3 tsc_deadline_timer TSC_DEADLINE	2021-01-22 10:15:46 -08:00
H.J. Lu	7a5ab88e21	x86: Check ifunc resolver with CPU_FEATURE_USABLE [BZ #27072 ] Check ifunc resolver with CPU_FEATURE_USABLE and tunables in dynamic and static executables to verify that CPUID features are initialized early in static PIE. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-21 10:22:26 -08:00
Szabolcs Nagy	47618209d0	Use hidden visibility for early static PIE code Extern symbol access in position independent code usually involves GOT indirection which needs RELATIVE reloc in a static linked PIE. (On some targets this is avoided e.g. because the linker can relax a GOT access to a pc-relative access, but this is not generally true.) Code that runs before static PIE self relocation must avoid relying on dynamic relocations which can be ensured by using hidden visibility. However we cannot just make all symbols hidden: On i386, all calls to IFUNC functions must go through PLT and calls to hidden functions CANNOT go through PLT in PIE since EBX used in PIE PLT may not be set up for local calls to hidden IFUNC functions. This patch aims to make symbol references hidden in code that is used before and by _dl_relocate_static_pie when building a static PIE libc. Note: for an object that is used in the startup code, its references and definition may not have consistent visibility: it is only forced hidden in the startup code. This is needed for fixing bug 27072. Co-authored-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-21 15:55:01 +00:00
H.J. Lu	ff6d62e9ed	<sys/platform/x86.h>: Remove the C preprocessor magic In <sys/platform/x86.h>, define CPU features as enum instead of using the C preprocessor magic to make it easier to wrap this functionality in other languages. Move the C preprocessor magic to internal header for better GCC codegen when more than one features are checked in a single expression as in x86-64 dl-hwcaps-subdirs.c. 1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX. 2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h. 3. Remove struct cpu_features and __x86_get_cpu_features from <sys/platform/x86.h>. 4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it in libc. 5. Make __get_cpu_features() private to glibc. 6. Replace __x86_get_cpu_features(N) with __get_cpu_features(). 7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE. 8. Use a single enum index for each CPU feature detection. 9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf. 10. Return zero struct cpuid_feature for the older glibc binary with a smaller CPUID_INDEX_MAX [BZ #27104]. 11. Inside glibc, use the C preprocessor magic so that cpu_features data can be loaded just once leading to more compact code for glibc. 256 bits are used for each CPUID leaf. Some leaves only contain a few features. We can add exceptions to such leaves. But it will increase code sizes and it is harder to provide backward/forward compatibilities when new features are added to such leaves in the future. When new leaves are added, _rtld_global_ro offsets will change which leads to race condition during in-place updates. We may avoid in-place updates by 1. Rename the old glibc. 2. Install the new glibc. 3. Remove the old glibc. NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy relocation issue with IFUNC resolver as shown in IFUNC resolver tests.	2021-01-21 05:58:17 -08:00
H.J. Lu	2d651eb926	x86: Move x86 processor cache info to cpu_features 1. Move x86 processor cache info to _dl_x86_cpu_features in ld.so. 2. Update tunable bounds with TUNABLE_SET_WITH_BOUNDS. 3. Move x86 cache info initialization to dl-cacheinfo.h and initialize x86 cache info in init_cpu_features (). 4. Put x86 cache info for libc in cacheinfo.h, which is included in libc-start.c in libc.a and is included in cacheinfo.c in libc.so. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-14 11:38:45 -08:00
Adhemerval Zanella	d18f59bf92	Fix x86 build with --enable-tunable=no Checked on x86_64-linux-gnu.	2021-01-14 16:04:05 -03:00
H.J. Lu	efbbd9c33a	ldconfig/x86: Store ISA level in cache and aux cache Store ISA level in the portion of the unused upper 32 bits of the hwcaps field in cache and the unused pad field in aux cache. ISA level is stored and checked only for shared objects in glibc-hwcaps subdirectories. The shared objects in the default directories aren't checked since there are no fallbacks for these shared objects. Tested on x86-64-v2, x86-64-v3 and x86-64-v4 machines with --disable-hardcoded-path-in-tests and --enable-hardcoded-path-in-tests.	2021-01-13 05:51:17 -08:00

1 2 3 4 5 ...

454 Commits