glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-22 13:00:06 +00:00

Author	SHA1	Message	Date
H.J. Lu	94cd37ebb2	x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625 ] commit `04bba1e5d8` Author: H.J. Lu <hjl.tools@gmail.com> Date: Wed Aug 5 13:51:56 2020 -0700 x86: Set CPU usable feature bits conservatively [BZ #26552] Set CPU usable feature bits only for CPU features which are usable in user space and whose usability can be detected from user space, excluding features like FSGSBASE whose enable bit can only be checked in the kernel. no longer turns on the usable bits of IBT and SHSTK since we don't know if IBT and SHSTK are usable until much later. Use HAS_CPU_FEATURE to check if the processor supports IBT and SHSTK.	2020-09-17 05:18:36 -07:00
H.J. Lu	f2c679d4b2	<sys/platform/x86.h>: Add Intel Key Locker support Add Intel Key Locker: https://software.intel.com/content/www/us/en/develop/download/intel-key-locker-specification.html support to <sys/platform/x86.h>. Intel Key Locker has 1. KL: AES Key Locker instructions. 2. WIDE_KL: AES wide Key Locker instructions. 3. AESKLE: AES Key Locker instructions are enabled by OS. Applications should use if (CPU_FEATURE_USABLE (KL)) and if (CPU_FEATURE_USABLE (WIDE_KL)) to check if AES Key Locker instructions and AES wide Key Locker instructions are usable.	2020-09-16 05:56:10 -07:00
H.J. Lu	04bba1e5d8	x86: Set CPU usable feature bits conservatively [BZ #26552 ] Set CPU usable feature bits only for CPU features which are usable in user space and whose usability can be detected from user space, excluding features like FSGSBASE whose enable bit can only be checked in the kernel.	2020-09-03 04:36:20 -07:00
H.J. Lu	107e6a3c22	x86: Support usable check for all CPU features Support usable check for all CPU features with the following changes: 1. Change struct cpu_features to struct cpuid_features { struct cpuid_registers cpuid; struct cpuid_registers usable; }; struct cpu_features { struct cpu_features_basic basic; struct cpuid_features features[COMMON_CPUID_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; so that there is a usable bit for each cpuid bit. 2. After the cpuid bits have been initialized, copy the known bits to the usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for CPU feature detection. 3. Clear the usable bits which require OS support. 4. If the feature is supported by OS, copy its cpuid bit to its usable bit. 5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE and CPU_FEATURE_USABLE_P to check if a feature is usable. 6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13. 7. Unset MPX feature since it has been deprecated. The results are 1. If the feature is known and doesn't requre OS support, its usable bit is copied from the cpuid bit. 2. Otherwise, its usable bit is copied from the cpuid bit only if the feature is known to supported by OS. 3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the feature can be used. 4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports the feature.	2020-07-13 06:05:16 -07:00
H.J. Lu	3f4b61a0b8	x86: Add thresholds for "rep movsb/stosb" to tunables Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables to update thresholds for "rep movsb" and "rep stosb" at run-time. Note that the user specified threshold for "rep movsb" smaller than the minimum threshold will be ignored. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2020-07-06 11:48:42 -07:00
H.J. Lu	4fdd4d41a1	x86: Detect Intel Advanced Matrix Extensions Intel Advanced Matrix Extensions (Intel AMX) is a new programming paradigm consisting of two components: a set of 2-dimensional registers (tiles) representing sub-arrays from a larger 2-dimensional memory image, and accelerators able to operate on tiles. Intel AMX is an extensible architecture. New accelerators can be added and the existing accelerator may be enhanced to provide higher performance. The initial features are AMX-BF16, AMX-TILE and AMX-INT8, which are usable only if the operating system supports both XTILECFG state and XTILEDATA state. Add AMX-BF16, AMX-TILE and AMX-INT8 support to HAS_CPU_FEATURE and CPU_FEATURE_USABLE.	2020-06-26 06:53:05 -07:00
H.J. Lu	ecbbadbf10	x86: Update CPU feature detection [BZ #26149 ] 1. Divide architecture features into the usable features and the preferred features. The usable features are for correctness and can be exported in a stable ABI. The preferred features are for performance and only for glibc internal use. 2. Change struct cpu_features to struct cpu_features { struct cpu_features_basic basic; unsigned int usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; unsigned int usable[USABLE_FEATURE_INDEX_MAX]; unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX]; ... }; and initialize usable_p to pointer to the usable arary so that struct cpu_features { struct cpu_features_basic basic; unsigned int usable_p; struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX]; }; can be exported via a stable ABI. The cpuid and usable arrays can be expanded with backward binary compatibility for both .o and .so files. 3. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16. 4. Detect ENQCMD, PKS, AVX512_VP2INTERSECT, MD_CLEAR, SERIALIZE, HYBRID, TSXLDTRK, L1D_FLUSH, CORE_CAPABILITIES and AVX512_BF16. 5. Rename CAPABILITIES to ARCH_CAPABILITIES. 6. Check if AVX512_VP2INTERSECT, AVX512_BF16 and PKU are usable. 7. Update CPU feature detection test.	2020-06-22 13:09:33 -07:00
H.J. Lu	27f8864bd4	x86: Update F16C detection [BZ #26133 ] Since F16C requires AVX, set F16C usable only when AVX is usable.	2020-06-18 07:01:58 -07:00
H.J. Lu	76d5b2f002	x86: Update Intel Atom processor family optimization Enable Intel Silvermont optimization for Intel Goldmont Plus. Detect more Intel Airmont processors. Optimize Intel Tremont like Intel Silvermont with rep string instructions.	2020-05-21 13:36:54 -07:00
H.J. Lu	674ea88294	x86: Move CET control to _dl_x86_feature_control [BZ #25887 ] 1. Include <dl-procruntime.c> to get architecture specific initializer in rtld_global. 2. Change _dl_x86_feature_1[2] to _dl_x86_feature_1. 3. Add _dl_x86_feature_control after _dl_x86_feature_1, which is a struct of 2 bitfields for IBT and SHSTK control This fixes [BZ #25887].	2020-05-18 06:15:02 -07:00
mayshao	32ac0b9884	x86: Add CPU Vendor ID detection support for Zhaoxin processors To recognize Zhaoxin CPU Vendor ID, add a new architecture type arch_kind_zhaoxin for Vendor Zhaoxin detection.	2020-04-30 06:36:48 -07:00
Joseph Myers	d614a75396	Update copyright dates with scripts/update-copyrights.	2020-01-01 00:14:33 +00:00
Paul Eggert	5a82c74822	Prefer https to http for gnu.org and fsf.org URLs Also, change sources.redhat.com to sourceware.org. This patch was automatically generated by running the following shell script, which uses GNU sed, and which avoids modifying files imported from upstream: sed -ri ' s,(http\|ftp)(://(.\.)?(gnu\|fsf\|sourceware)\.org($\|[^.]\|\.[^a-z])),https\2,g s,(http\|ftp)(://(.\.)?)sources\.redhat\.com($\|[^.]\|\.[^a-z]),https\2sourceware.org\4,g ' \ $(find $(git ls-files) -prune -type f \ ! -name '.po' \ ! -name 'ChangeLog' \ ! -path COPYING ! -path COPYING.LIB \ ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \ ! -path manual/texinfo.tex ! -path scripts/config.guess \ ! -path scripts/config.sub ! -path scripts/install-sh \ ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \ ! -path INSTALL ! -path locale/programs/charmap-kw.h \ ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \ ! '(' -name configure \ -execdir test -f configure.ac -o -f configure.in ';' ')' \ ! '(' -name preconfigure \ -execdir test -f preconfigure.ac ';' ')' \ -print) and then by running 'make dist-prepare' to regenerate files built from the altered files, and then executing the following to cleanup: chmod a+x sysdeps/unix/sysv/linux/riscv/configure # Omit irrelevant whitespace and comment-only changes, # perhaps from a slightly-different Autoconf version. git checkout -f \ sysdeps/csky/configure \ sysdeps/hppa/configure \ sysdeps/riscv/configure \ sysdeps/unix/sysv/linux/csky/configure # Omit changes that caused a pre-commit check to fail like this: # remote: * error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines git checkout -f \ sysdeps/powerpc/powerpc64/ppc-mcount.S \ sysdeps/unix/sysv/linux/s390/s390-64/syscall.S # Omit change that caused a pre-commit check to fail like this: # remote: * error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S	2019-09-07 02:43:31 -07:00
Joseph Myers	a04549c194	Break more lines before not after operators. This patch makes further coding style fixes where code was breaking lines after an operator, contrary to the GNU Coding Standards. As with the previous patch, it is limited to files following a reasonable approximation to GNU style already, and is not exhaustive; more such issues remain to be fixed. Tested for x86_64, and with build-many-glibcs.py. * dirent/dirent.h [!_DIRENT_HAVE_D_NAMLEN && _DIRENT_HAVE_D_RECLEN] (_D_ALLOC_NAMLEN): Break lines before rather than after operators. * elf/cache.c (print_cache): Likewise. * gshadow/fgetsgent_r.c (__fgetsgent_r): Likewise. * htl/pt-getattr.c (__pthread_getattr_np): Likewise. * hurd/hurdinit.c (_hurd_setproc): Likewise. * hurd/hurdkill.c (_hurd_sig_post): Likewise. * hurd/hurdlookup.c (__file_name_lookup_under): Likewise. * hurd/hurdsig.c (_hurd_internal_post_signal): Likewise. (reauth_proc): Likewise. * hurd/lookup-at.c (__file_name_lookup_at): Likewise. (__file_name_split_at): Likewise. (__directory_name_split_at): Likewise. * hurd/lookup-retry.c (__hurd_file_name_lookup_retry): Likewise. * hurd/port2fd.c (_hurd_port2fd): Likewise. * iconv/gconv_dl.c (do_print): Likewise. * inet/netinet/in.h (struct sockaddr_in): Likewise. * libio/wstrops.c (_IO_wstr_seekoff): Likewise. * locale/setlocale.c (new_composite_name): Likewise. * malloc/memusagestat.c (main): Likewise. * misc/fstab.c (fstab_convert): Likewise. * nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_usercnt): Likewise. * nss/nss_compat/compat-grp.c (getgrent_next_nss): Likewise. (getgrent_next_file): Likewise. (internal_getgrnam_r): Likewise. (internal_getgrgid_r): Likewise. * nss/nss_compat/compat-initgroups.c (getgrent_next_nss): Likewise. (internal_getgrent_r): Likewise. * nss/nss_compat/compat-pwd.c (getpwent_next_nss_netgr): Likewise. (getpwent_next_nss): Likewise. (getpwent_next_file): Likewise. (internal_getpwnam_r): Likewise. (internal_getpwuid_r): Likewise. * nss/nss_compat/compat-spwd.c (getspent_next_nss_netgr): Likewise. (getspent_next_nss): Likewise. (internal_getspnam_r): Likewise. * pwd/fgetpwent_r.c (__fgetpwent_r): Likewise. * shadow/fgetspent_r.c (__fgetspent_r): Likewise. * string/strchr.c (STRCHR): Likewise. * string/strchrnul.c (STRCHRNUL): Likewise. * sysdeps/aarch64/fpu/fpu_control.h (_FPU_FPCR_IEEE): Likewise. * sysdeps/aarch64/sfp-machine.h (_FP_CHOOSENAN): Likewise. * sysdeps/csky/dl-machine.h (elf_machine_rela): Likewise. * sysdeps/generic/memcopy.h (PAGE_COPY_FWD_MAYBE): Likewise. * sysdeps/generic/symbol-hacks.h (__stack_chk_fail_local): Likewise. * sysdeps/gnu/netinet/ip_icmp.h (ICMP_INFOTYPE): Likewise. * sysdeps/gnu/updwtmp.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/gnu/utmp_file.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/hppa/jmpbuf-unwind.h (_JMPBUF_UNWINDS): Likewise. * sysdeps/mach/hurd/bits/stat.h (S_ISPARE): Likewise. * sysdeps/mach/hurd/dl-sysdep.c (_dl_sysdep_start): Likewise. (open_file): Likewise. * sysdeps/mach/hurd/htl/pt-mutexattr-setprotocol.c (pthread_mutexattr_setprotocol): Likewise. * sysdeps/mach/hurd/ioctl.c (__ioctl): Likewise. * sysdeps/mach/hurd/mmap.c (__mmap): Likewise. * sysdeps/mach/hurd/ptrace.c (ptrace): Likewise. * sysdeps/mach/hurd/spawni.c (__spawni): Likewise. * sysdeps/microblaze/dl-machine.h (elf_machine_type_class): Likewise. (elf_machine_rela): Likewise. * sysdeps/mips/mips32/sfp-machine.h (_FP_CHOOSENAN): Likewise. * sysdeps/mips/mips64/sfp-machine.h (_FP_CHOOSENAN): Likewise. * sysdeps/mips/sys/asm.h (multiple #if conditionals): Likewise. * sysdeps/posix/rename.c (rename): Likewise. * sysdeps/powerpc/novmx-sigjmp.c (__novmx__sigjmp_save): Likewise. * sysdeps/powerpc/sigjmp.c (__vmx__sigjmp_save): Likewise. * sysdeps/s390/fpu/fenv_libc.h (FPC_VALID_MASK): Likewise. * sysdeps/s390/utf8-utf16-z9.c (gconv_end): Likewise. * sysdeps/unix/grantpt.c (grantpt): Likewise. * sysdeps/unix/sysv/linux/a.out.h (N_TXTOFF): Likewise. * sysdeps/unix/sysv/linux/updwtmp.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/unix/sysv/linux/utmp_file.c (TRANSFORM_UTMP_FILE_NAME): Likewise. * sysdeps/x86/cpu-features.c (get_common_indices): Likewise. * time/tzfile.c (__tzfile_compute): Likewise.	2019-02-25 13:19:19 +00:00
Joseph Myers	32db86d558	Add fall-through comments. This patch adds fall-through comments in some cases where -Wextra produces implicit-fallthrough warnings. The patch is non-exhaustive. Apart from architecture-specific code for non-x86_64 architectures, it does not change sunrpc/xdr.c (legacy code, probably should have such changes, but left to be dealt with separately), or places that already had comments about the fall-through but not matching the form expected by -Wimplicit-fallthrough=3 (the default level with -Wextra; my inclination is to adjust those comments to match rather than downgrading to -Wimplicit-fallthrough=1 to allow any comment), or one place where I thought the implicit fallthrough was not correct and so should be handled separately as a bug fix. I think the key thing to consider in review of this patch is whether the fall-through is indeed intended and correct in each place where such a comment is added. Tested for x86_64. * elf/dl-exception.c (_dl_exception_create_format): Add fall-through comments. * elf/ldconfig.c (parse_conf_include): Likewise. * elf/rtld.c (print_statistics): Likewise. * locale/programs/charmap.c (parse_charmap): Likewise. * misc/mntent_r.c (__getmntent_r): Likewise. * posix/wordexp.c (parse_arith): Likewise. (parse_backtick): Likewise. * resolv/ns_ttl.c (ns_parse_ttl): Likewise. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.	2019-02-12 10:30:34 +00:00
Joseph Myers	04277e02d7	Update copyright dates with scripts/update-copyrights. * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.	2019-01-01 00:11:28 +00:00
Carlos O'Donell	ade8b817fe	x86: Add Hygon Dhyana support. This patch fix Hygon Dhyana processor CPU Vendor ID detection problem in glibc sysdep module, current glibc codes doesn't recognize Dhyana CPU Vendor ID("HygonGenuine") and set kind to arch_kind_other, which result to incorrect zero value for __cache_sysconf() syscall. As Hygon Dhyana share most architecture feature as AMD Family 17h, this patch add Hygon CPU Vendor ID check and setup kind to arch_kind_amd and reuse AMD code path, which lead to correct return value in __cache_sysconf() syscall. we run the glibc test suite for both Hygon Dhyana and AMD EPYC and found no failure case. Background: Chengdu Haiguang IC Design Co., Ltd (Hygon) is a Joint Venture between AMD and Haiguang Information Technology Co.,Ltd., aims at providing high performance x86 processor for China server market. Its first generation processor codename is Dhyana, which originates from AMD technology and shares most of the architecture with AMD's family 17h, but with different CPU Vendor ID("HygonGenuine")/Family series number(Family 18h). Related Hygon kernel patch can be found on http://lkml.kernel.org/r/5ce86123a7b9dad925ac583d88d2f921040e859b.1538583282.git.puwen@hygon.cn Signed-off-by: fanjinke <fanjinke@hygon.cn> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2018-12-13 09:25:20 -05:00
H.J. Lu	c22e4c2a14	x86: Extend CPUID support in struct cpu_features Extend CPUID support for all feature bits from CPUID. Add a new macro, CPU_FEATURE_USABLE, which can be used to check if a feature is usable at run-time, instead of HAS_CPU_FEATURE and HAS_ARCH_FEATURE. Add COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007 and COMMON_CPUID_INDEX_80000008 to check CPU feature bits in them. Tested on i686 and x86-64 as well as using build-many-glibcs.py with x86 targets. * sysdeps/x86/cacheinfo.c (intel_check_word): Updated for cpu_features_basic. (__cache_sysconf): Likewise. (init_cacheinfo): Likewise. * sysdeps/x86/cpu-features.c (get_extended_indeces): Also populate COMMON_CPUID_INDEX_80000007 and COMMON_CPUID_INDEX_80000008. (get_common_indices): Also populate COMMON_CPUID_INDEX_D_ECX_1. Use CPU_FEATURES_CPU_P (cpu_features, XSAVEC) to check if XSAVEC is available. Set the bit_arch_XXX_Usable bits. (init_cpu_features): Use _Static_assert on index_arch_Fast_Unaligned_Load. __get_cpuid_registers and __get_arch_feature. Updated for cpu_features_basic. Set stepping in cpu_features. * sysdeps/x86/cpu-features.h: (FEATURE_INDEX_1): Changed to enum. (FEATURE_INDEX_2): New. (FEATURE_INDEX_MAX): Changed to enum. (COMMON_CPUID_INDEX_D_ECX_1): New. (COMMON_CPUID_INDEX_80000007): Likewise. (COMMON_CPUID_INDEX_80000008): Likewise. (cpuid_registers): Likewise. (cpu_features_basic): Likewise. (CPU_FEATURE_USABLE): Likewise. (bit_arch_XXX_Usable): Likewise. (cpu_features): Use cpuid_registers and cpu_features_basic. (bit_arch_XXX): Reweritten. (bit_cpu_XXX): Likewise. (index_cpu_XXX): Likewise. (reg_XXX): Likewise. * sysdeps/x86/tst-get-cpu-features.c: Include <stdio.h> and <support/check.h>. (CHECK_CPU_FEATURE): New. (CHECK_CPU_FEATURE_USABLE): Likewise. (cpu_kinds): Likewise. (do_test): Print vendor, family, model and stepping. Check HAS_CPU_FEATURE and CPU_FEATURE_USABLE. (TEST_FUNCTION): Removed. Include <support/test-driver.c> instead of "../../test-skeleton.c". * sysdeps/x86_64/multiarch/sched_cpucount.c (__sched_cpucount): Check POPCNT instead of POPCOUNT. * sysdeps/x86_64/multiarch/test-multiarch.c (do_test): Likewise.	2018-12-03 05:54:56 -08:00
Adhemerval Zanella	c3d8dc45c9	x86: Fix Haswell strong flags (BZ#23709) Th commit 'Disable TSX on some Haswell processors.' (`2702856bf4`) changed the default flags for Haswell models. Previously, new models were handled by the default switch path, which assumed a Core i3/i5/i7 if AVX is available. After the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and Prefer_PMINUB_for_stringop (only the TSX one). This patch fixes it by disentangle the TSX flag handling from the memory optimization ones. The strstr case cited on patch now selects the __strstr_sse2_unaligned as expected for the Haswell cpu. Checked on x86_64-linux-gnu. [BZ #23709] * sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits independently of other flags.	2018-10-23 14:57:02 -03:00
Siddhesh Poyarekar	dce452dc52	Rename the glibc.tune namespace to glibc.cpu The glibc.tune namespace is vaguely named since it is a 'tunable', so give it a more specific name that describes what it refers to. Rename the tunable namespace to 'cpu' to more accurately reflect what it encompasses. Also rename glibc.tune.cpu to glibc.cpu.name since glibc.cpu.cpu is weird. * NEWS: Mention the change. * elf/dl-tunables.list: Rename tune namespace to cpu. * sysdeps/powerpc/dl-tunables.list: Likewise. * sysdeps/x86/dl-tunables.list: Likewise. * sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to cpu.name. * elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust. * elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise. * manual/README.tunables: Likewise. * manual/tunables.texi: Likewise. * sysdeps/powerpc/cpu-features.c: Likewise. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.c: Likewise. * sysdeps/x86/cpu-features.h: Likewise. * sysdeps/x86/cpu-tunables.c: Likewise. * sysdeps/x86_64/Makefile: Likewise. * sysdeps/x86/dl-cet.c: Likewise. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2018-08-02 23:49:19 +05:30
H.J. Lu	82c80ac2eb	x86: Rename get_common_indeces to get_common_indices Reviewed-by: Carlos O'Donell <carlos@redhat.com> * sysdeps/x86/cpu-features.c (get_common_indeces): Renamed to ... (get_common_indices): This. (init_cpu_features): Updated.	2018-08-01 04:57:50 -07:00
H.J. Lu	be525a69a6	x86: Populate COMMON_CPUID_INDEX_80000001 for Intel CPUs [BZ #23459 ] Reviewed-by: Carlos O'Donell <carlos@redhat.com> [BZ #23459] * sysdeps/x86/cpu-features.c (get_extended_indices): New function. (init_cpu_features): Call get_extended_indices for both Intel and AMD CPUs. * sysdeps/x86/cpu-features.h (COMMON_CPUID_INDEX_80000001): Remove "for AMD" comment.	2018-07-26 13:31:11 -07:00
H.J. Lu	ba2ea23d05	x86: Always include <dl-cet.h>/cet-tunables.h> for --enable-cet Always include <dl-cet.h> and cet-tunables.h> when CET is enabled. Otherwise, configure glibc with --enable-cet --disable-tunables will fail to build. * sysdeps/x86/cpu-features.c: Always include <dl-cet.h> and cet-tunables.h> when CET is enabled.	2018-07-17 04:16:35 -07:00
H.J. Lu	f753fa7dea	x86: Support IBT and SHSTK in Intel CET [BZ #21598 ] Intel Control-flow Enforcement Technology (CET) instructions: https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en forcement-technology-preview.pdf includes Indirect Branch Tracking (IBT) and Shadow Stack (SHSTK). GNU_PROPERTY_X86_FEATURE_1_IBT is added to GNU program property to indicate that all executable sections are compatible with IBT when ENDBR instruction starts each valid target where an indirect branch instruction can land. Linker sets GNU_PROPERTY_X86_FEATURE_1_IBT on output only if it is set on all relocatable inputs. On an IBT capable processor, the following steps should be taken: 1. When loading an executable without an interpreter, enable IBT and lock IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the executable. 2. When loading an executable with an interpreter, enable IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the interpreter. a. If GNU_PROPERTY_X86_FEATURE_1_IBT isn't set on the executable, disable IBT. b. Lock IBT. 3. If IBT is enabled, when loading a shared object without GNU_PROPERTY_X86_FEATURE_1_IBT: a. If legacy interwork is allowed, then mark all pages in executable PT_LOAD segments in legacy code page bitmap. Failure of legacy code page bitmap allocation causes an error. b. If legacy interwork isn't allowed, it causes an error. GNU_PROPERTY_X86_FEATURE_1_SHSTK is added to GNU program property to indicate that all executable sections are compatible with SHSTK where return address popped from shadow stack always matches return address popped from normal stack. Linker sets GNU_PROPERTY_X86_FEATURE_1_SHSTK on output only if it is set on all relocatable inputs. On a SHSTK capable processor, the following steps should be taken: 1. When loading an executable without an interpreter, enable SHSTK if GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on the executable. 2. When loading an executable with an interpreter, enable SHSTK if GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on interpreter. a. If GNU_PROPERTY_X86_FEATURE_1_SHSTK isn't set on the executable or any shared objects loaded via the DT_NEEDED tag, disable SHSTK. b. Otherwise lock SHSTK. 3. After SHSTK is enabled, it is an error to load a shared object without GNU_PROPERTY_X86_FEATURE_1_SHSTK. To enable CET support in glibc, --enable-cet is required to configure glibc. When CET is enabled, both compiler and assembler must support CET. Otherwise, it is a configure-time error. To support CET run-time control, 1. _dl_x86_feature_1 is added to the writable ld.so namespace to indicate if IBT or SHSTK are enabled at run-time. It should be initialized by init_cpu_features. 2. For dynamic executables: a. A l_cet field is added to struct link_map to indicate if IBT or SHSTK is enabled in an ELF module. _dl_process_pt_note or _rtld_process_pt_note is called to process PT_NOTE segment for GNU program property and set l_cet. b. _dl_open_check is added to check IBT and SHSTK compatibilty when dlopening a shared object. 3. Replace i386 _dl_runtime_resolve and _dl_runtime_profile with _dl_runtime_resolve_shstk and _dl_runtime_profile_shstk, respectively if SHSTK is enabled. CET run-time control can be changed via GLIBC_TUNABLES with $ export GLIBC_TUNABLES=glibc.tune.x86_shstk=[permissive\|on\|off] $ export GLIBC_TUNABLES=glibc.tune.x86_ibt=[permissive\|on\|off] 1. permissive: SHSTK is disabled when dlopening a legacy ELF module. 2. on: IBT or SHSTK are always enabled, regardless if there are IBT or SHSTK bits in GNU program property. 3. off: IBT or SHSTK are always disabled, regardless if there are IBT or SHSTK bits in GNU program property. <cet.h> from CET-enabled GCC is automatically included by assembly codes to add GNU_PROPERTY_X86_FEATURE_1_IBT and GNU_PROPERTY_X86_FEATURE_1_SHSTK to GNU program property. _CET_ENDBR is added at the entrance of all assembly functions whose address may be taken. _CET_NOTRACK is used to insert NOTRACK prefix with indirect jump table to support IBT. It is defined as notrack when _CET_NOTRACK is defined in <cet.h>. [BZ #21598] * configure.ac: Add --enable-cet. * configure: Regenerated. * elf/Makefille (all-built-dso): Add a comment. * elf/dl-load.c (filebuf): Moved before "dynamic-link.h". Include <dl-prop.h>. (_dl_map_object_from_fd): Call _dl_process_pt_note on PT_NOTE segment. * elf/dl-open.c: Include <dl-prop.h>. (dl_open_worker): Call _dl_open_check. * elf/rtld.c: Include <dl-prop.h>. (dl_main): Call _rtld_process_pt_note on PT_NOTE segment. Call _rtld_main_check. * sysdeps/generic/dl-prop.h: New file. * sysdeps/i386/dl-cet.c: Likewise. * sysdeps/unix/sysv/linux/x86/cpu-features.c: Likewise. * sysdeps/unix/sysv/linux/x86/dl-cet.h: Likewise. * sysdeps/x86/cet-tunables.h: Likewise. * sysdeps/x86/check-cet.awk: Likewise. * sysdeps/x86/configure: Likewise. * sysdeps/x86/configure.ac: Likewise. * sysdeps/x86/dl-cet.c: Likewise. * sysdeps/x86/dl-procruntime.c: Likewise. * sysdeps/x86/dl-prop.h: Likewise. * sysdeps/x86/libc-start.h: Likewise. * sysdeps/x86/link_map.h: Likewise. * sysdeps/i386/dl-trampoline.S (_dl_runtime_resolve): Add _CET_ENDBR. (_dl_runtime_profile): Likewise. (_dl_runtime_resolve_shstk): New. (_dl_runtime_profile_shstk): Likewise. * sysdeps/linux/x86/Makefile (sysdep-dl-routines): Add dl-cet if CET is enabled. (CFLAGS-.o): Add -fcf-protection if CET is enabled. (CFLAGS-.os): Likewise. (CFLAGS-.op): Likewise. (CFLAGS-.oS): Likewise. (asm-CPPFLAGS): Add -fcf-protection -include cet.h if CET is enabled. (tests-special): Add $(objpfx)check-cet.out. (cet-built-dso): New. (+$(cet-built-dso:=.note)): Likewise. (common-generated): Add $(cet-built-dso:$(common-objpfx)%=%.note). ($(objpfx)check-cet.out): New. (generated): Add check-cet.out. * sysdeps/x86/cpu-features.c: Include <dl-cet.h> and <cet-tunables.h>. (TUNABLE_CALLBACK (set_x86_ibt)): New prototype. (TUNABLE_CALLBACK (set_x86_shstk)): Likewise. (init_cpu_features): Call get_cet_status to check CET status and update dl_x86_feature_1 with CET status. Call TUNABLE_CALLBACK (set_x86_ibt) and TUNABLE_CALLBACK (set_x86_shstk). Disable and lock CET in libc.a. * sysdeps/x86/cpu-tunables.c: Include <cet-tunables.h>. (TUNABLE_CALLBACK (set_x86_ibt)): New function. (TUNABLE_CALLBACK (set_x86_shstk)): Likewise. * sysdeps/x86/sysdep.h (_CET_NOTRACK): New. (_CET_ENDBR): Define if not defined. (ENTRY): Add _CET_ENDBR. * sysdeps/x86/dl-tunables.list (glibc.tune): Add x86_ibt and x86_shstk. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Add _CET_ENDBR. (_dl_runtime_profile): Likewise.	2018-07-16 14:08:27 -07:00
Amit Pawar	bce5911b67	Use AVX_Fast_Unaligned_Load from Zen onwards. From Zen onwards this will be enabled. It was disabled for the Excavator case and will remain disabled. Reviewd-by: Carlos O'Donell <carlos@redhat.com>	2018-07-06 09:55:36 -04:00
Joseph Myers	688903eb3e	Update copyright dates with scripts/update-copyrights. * All files with FSF copyright notices: Update copyright dates using scripts/update-copyrights. * locale/programs/charmap-kw.h: Regenerated. * locale/programs/locfile-kw.h: Likewise.	2018-01-01 00:32:25 +00:00
H.J. Lu	b52b0d793d	x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265 ] In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector, mask and bound registers. It simplifies _dl_runtime_resolve and supports different calling conventions. ld.so code size is reduced by more than 1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles than saving and restoring vector and bound registers individually. Latency for _dl_runtime_resolve to lookup the function, foo, from one shared library plus libc.so: Before After Change Westmere (SSE)/fxsave 345 866 151% IvyBridge (AVX)/xsave 420 643 53% Haswell (AVX)/xsave 713 1252 75% Skylake (AVX+MPX)/xsavec 559 719 28% Skylake (AVX512+MPX)/xsavec 145 272 87% Ryzen (AVX)/xsavec 280 553 97% This is the worst case where portion of time spent for saving and restoring registers is bigger than majority of cases. With smaller _dl_runtime_resolve code size, overall performance impact is negligible. On IvyBridge, differences in build and test time of binutils with lazy binding GCC and binutils are noises. On Westmere, differences in bootstrap and "makc check" time of GCC 7 with lazy binding GCC and binutils are also noises. [BZ #21265] * sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET): New. * sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>. (get_common_indeces): Set xsave_state_size, xsave_state_full_size and bit_arch_XSAVEC_Usable if needed. (init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow and bit_arch_Use_dl_runtime_resolve_opt. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): Removed. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (bit_arch_Prefer_No_AVX512): Updated. (bit_arch_MathVec_Prefer_No_AVX512): Likewise. (bit_arch_XSAVEC_Usable): New. (STATE_SAVE_OFFSET): Likewise. (STATE_SAVE_MASK): Likewise. [__ASSEMBLER__]: Include <cpu-features-offsets.h>. (cpu_features): Add xsave_state_size and xsave_state_full_size. (index_arch_Use_dl_runtime_resolve_opt): Removed. (index_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_XSAVEC_Usable): New. * sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)): Support XSAVEC_Usable. Remove Use_dl_runtime_resolve_slow. * sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables is enabled. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx, _dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt, _dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and _dl_runtime_resolve_xsavec. * sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE): Removed. (DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT instead of VEC_SIZE. (REGISTER_SAVE_BND0): Removed. (REGISTER_SAVE_BND1): Likewise. (REGISTER_SAVE_BND3): Likewise. (REGISTER_SAVE_RAX): Always defined to 0. (VMOV): Removed. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_slow): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_avx512): Likewise. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (USE_FXSAVE): New. (_dl_runtime_resolve_fxsave): Likewise. (USE_XSAVE): Likewise. (_dl_runtime_resolve_xsave): Likewise. (USE_XSAVEC): Likewise. (_dl_runtime_resolve_xsavec): Likewise. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512): Removed. (_dl_runtime_resolve_avx512_opt): Likewise. (_dl_runtime_resolve_avx): Likewise. (_dl_runtime_resolve_avx_opt): Likewise. (_dl_runtime_resolve_sse): Likewise. (_dl_runtime_resolve_sse_vex): Likewise. (_dl_runtime_resolve_fxsave): New. (_dl_runtime_resolve_xsave): Likewise. (_dl_runtime_resolve_xsavec): Likewise.	2017-10-20 11:00:34 -07:00
H.J. Lu	4d916f0f12	x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299 ] Since ld.so expands $PLATFORM with GLRO(dl_platform), don't set GLRO(dl_platform) to NULL. [BZ #22299] * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set GLRO(dl_platform) to NULL. * sysdeps/x86_64/Makefile (tests): Add tst-platform-1. (modules-names): Add tst-platformmod-1 and x86_64/tst-platformmod-2. (CFLAGS-tst-platform-1.c): New. (CFLAGS-tst-platformmod-1.c): Likewise. (CFLAGS-tst-platformmod-2.c): Likewise. (LDFLAGS-tst-platformmod-2.so): Likewise. ($(objpfx)tst-platform-1): Likewise. ($(objpfx)tst-platform-1.out): Likewise. (tst-platform-1-ENV): Likewise. ($(objpfx)x86_64/tst-platformmod-2.os): Likewise. * sysdeps/x86_64/tst-platform-1.c: New file. * sysdeps/x86_64/tst-platformmod-1.c: Likewise. * sysdeps/x86_64/tst-platformmod-2.c: Likewise.	2017-10-19 08:28:26 -07:00
H.J. Lu	45ff34638f	x86: Add x86_64 to x86-64 HWCAP [BZ #22093 ] Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the "x86_64" subdirectory when loading a shared library. ld.so in glibc 2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based on supported ISAs. This led to shared library loading failure for shared libraries placed under the "x86_64" subdirectory. This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always search the "x86_64" subdirectory when loading a shared library. NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi" machines. Tested on i686 and x86-64. [BZ #22093] * sysdeps/x86/cpu-features.c (init_cpu_features): Initialize GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64. * sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated. (HWCAP_IMPORTANT): Likewise. (HWCAP_X86_64): New enum. (HWCAP_X86_AVX512_1): Updated. * sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64". * sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1. (modules-names): Add x86_64/tst-x86_64mod-1. (LDFLAGS-tst-x86_64mod-1.so): New. ($(objpfx)tst-x86_64-1): Likewise. ($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise. (tst-x86_64-1-clean): Likewise. * sysdeps/x86_64/tst-x86_64-1.c: New file. * sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.	2017-09-11 08:18:32 -07:00
H.J. Lu	d2cf37c0a2	x86-64: Use _dl_runtime_resolve_opt only with AVX512F [BZ #21871 ] On AVX machines with XGETBV (ECX == 1) like Skylake processors, (gdb) disass _dl_runtime_resolve_avx_opt Dump of assembler code for function _dl_runtime_resolve_avx_opt: 0x0000000000015890 <+0>: push %rax 0x0000000000015891 <+1>: push %rcx 0x0000000000015892 <+2>: push %rdx 0x0000000000015893 <+3>: mov $0x1,%ecx 0x0000000000015898 <+8>: xgetbv 0x000000000001589b <+11>: mov %eax,%r11d 0x000000000001589e <+14>: pop %rdx 0x000000000001589f <+15>: pop %rcx 0x00000000000158a0 <+16>: pop %rax 0x00000000000158a1 <+17>: and $0x4,%r11d 0x00000000000158a5 <+21>: bnd je 0x16200 <_dl_runtime_resolve_sse_vex> End of assembler dump. is slower than: (gdb) disass _dl_runtime_resolve_avx_slow Dump of assembler code for function _dl_runtime_resolve_avx_slow: 0x0000000000015850 <+0>: vorpd %ymm0,%ymm1,%ymm8 0x0000000000015854 <+4>: vorpd %ymm2,%ymm3,%ymm9 0x0000000000015858 <+8>: vorpd %ymm4,%ymm5,%ymm10 0x000000000001585c <+12>: vorpd %ymm6,%ymm7,%ymm11 0x0000000000015860 <+16>: vorpd %ymm8,%ymm9,%ymm9 0x0000000000015865 <+21>: vorpd %ymm10,%ymm11,%ymm10 0x000000000001586a <+26>: vpcmpeqd %xmm8,%xmm8,%xmm8 0x000000000001586f <+31>: vorpd %ymm9,%ymm10,%ymm10 0x0000000000015874 <+36>: vptest %ymm10,%ymm8 0x0000000000015879 <+41>: bnd jae 0x158b0 <_dl_runtime_resolve_avx> 0x000000000001587c <+44>: vzeroupper 0x000000000001587f <+47>: bnd jmpq 0x16200 <_dl_runtime_resolve_sse_vex> End of assembler dump. (gdb) since xgetbv takes much more cycles than single cycle operations like vpord/vvpcmpeq/ptest. _dl_runtime_resolve_opt should be used only with AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake server. [BZ #21871] * sysdeps/x86/cpu-features.c (init_cpu_features): Set bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.	2017-08-04 11:14:33 -07:00
H.J. Lu	03feacb562	x86: Rename glibc.tune.ifunc to glibc.tune.hwcaps Rename glibc.tune.ifunc to glibc.tune.hwcaps and move it to sysdeps/x86/dl-tunables.list since it is x86 specicifc. Also change type of data_cache_size, data_cache_size and non_temporal_threshold to unsigned long int to match size_t. Remove usage DEFAULT_STRLEN from cpu-tunables.c. * elf/dl-tunables.list (glibc.tune.ifunc): Removed. * sysdeps/x86/dl-tunables.list (glibc.tune.hwcaps): New. Remove security_level on all fields. * manual/tunables.texi: Replace ifunc with hwcaps. * sysdeps/x86/cpu-features.c (TUNABLE_CALLBACK (set_ifunc)): Renamed to .. (TUNABLE_CALLBACK (set_hwcaps)): This. (init_cpu_features): Updated. * sysdeps/x86/cpu-features.h (cpu_features): Change type of data_cache_size, data_cache_size and non_temporal_threshold to unsigned long int. * sysdeps/x86/cpu-tunables.c (DEFAULT_STRLEN): Removed. (TUNABLE_CALLBACK (set_ifunc)): Renamed to ... (TUNABLE_CALLBACK (set_hwcaps)): This. Update comments. Don't use DEFAULT_STRLEN.	2017-06-21 10:21:37 -07:00
H.J. Lu	905947c304	tunables: Add IFUNC selection and cache sizes The current IFUNC selection is based on microbenchmarks in glibc. It should give the best performance for most workloads. But other choices may have better performance for a particular workload or on the hardware which wasn't available at the selection was made. The environment variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz, where the feature name is case-sensitive and has to match the ones in cpu-features.h. It can be used by glibc developers to override the IFUNC selection to tune for a new processor or improve performance for a particular workload. It isn't intended for normal end users. NOTE: the IFUNC selection may change over time. Please check all multiarch implementations when experimenting. Also, GLIBC_TUNABLES=glibc.tune.x86_non_temporal_threshold=NUMBER is provided to set threshold to use non temporal store to NUMBER, GLIBC_TUNABLES=glibc.tune.x86_data_cache_size=NUMBER to set data cache size, GLIBC_TUNABLES=glibc.tune.x86_shared_cache_size=NUMBER to set shared cache size. * elf/dl-tunables.list (tune): Add ifunc, x86_non_temporal_threshold, x86_data_cache_size and x86_shared_cache_size. * manual/tunables.texi: Document glibc.tune.ifunc, glibc.tune.x86_data_cache_size, glibc.tune.x86_shared_cache_size and glibc.tune.x86_non_temporal_threshold. * sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file. * sysdeps/x86/cpu-tunables.c: Likewise. * sysdeps/x86/cacheinfo.c (init_cacheinfo): Check and get data cache size, shared cache size and non temporal threshold from cpu_features. * sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE): New. [HAVE_TUNABLES] Include <unistd.h>. [HAVE_TUNABLES] Include <elf/dl-tunables.h>. [HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise. [HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set IFUNC selection, data cache size, shared cache size and non temporal threshold. * sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size, shared_cache_size and non_temporal_threshold.	2017-06-20 08:37:28 -07:00
Siddhesh Poyarekar	511c5a1087	Make LD_HWCAP_MASK usable for static binaries The LD_HWCAP_MASK environment variable was ignored in static binaries, which is inconsistent with the behaviour of dynamically linked binaries. This seems to have been because of the inability of ld_hwcap_mask being read early enough to influence anything but now that it is in tunables, the mask is usable in static binaries as well. This feature is important for aarch64, which relies on HWCAP_CPUID being masked out to disable multiarch. A sanity test on x86_64 shows that there are no failures. Likewise for aarch64. * elf/dl-hwcaps.h [HAVE_TUNABLES]: Always read hwcap_mask. * sysdeps/sparc/sparc32/dl-machine.h [HAVE_TUNABLES]: Likewise. * sysdeps/x86/cpu-features.c (init_cpu_features): Always set up hwcap and hwcap_mask.	2017-06-07 11:11:40 +05:30
Siddhesh Poyarekar	ff08fc59e3	tunables: Use glibc.tune.hwcap_mask tunable instead of _dl_hwcap_mask Drop _dl_hwcap_mask when building with tunables. This completes the transition of hwcap_mask reading from _dl_hwcap_mask to tunables. * elf/dl-hwcaps.h: New file. * elf/dl-hwcaps.c: Include it. (_dl_important_hwcaps)[HAVE_TUNABLES]: Read and update glibc.tune.hwcap_mask. * elf/dl-cache.c: Include dl-hwcaps.h. (_dl_load_cache_lookup)[HAVE_TUNABLES]: Read glibc.tune.hwcap_mask. * sysdeps/sparc/sparc32/dl-machine.h: Likewise. * elf/dl-support.c (_dl_hwcap2)[HAVE_TUNABLES]: Drop _dl_hwcap_mask. * elf/rtld.c (rtld_global_ro)[HAVE_TUNABLES]: Drop _dl_hwcap_mask. (process_envvars)[HAVE_TUNABLES]: Likewise. * sysdeps/generic/ldsodefs.h (rtld_global_ro)[HAVE_TUNABLES]: Likewise. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't initialize dl_hwcap_mask when tunables are enabled.	2017-06-07 11:11:38 +05:30
H.J. Lu	1432d38ea0	x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391 ] dl_platform and dl_hwcap are set from AT_PLATFORM and AT_HWCAP very early during startup. They are used by dynamic linker to determine platform and build an array of hardware capability names, which are added to search path when loading shared object. dl_platform and dl_hwcap are unused on x86-64. On i386, i386, i486, i586 and i686 platforms were supported and only SSE2 capability was used. On x86, usage of AT_PLATFORM and AT_HWCAP to determine platform and processor capabilities is obsolete since all information is available in dl_x86_cpu_features. This patch sets dl_platform and dl_hwcap from dl_x86_cpu_features in dynamic linker. On i386, the available plaforms are changed to i586 and i686 since i386 has been deprecated. On x86-64, the available plaforms are haswell, which is for Haswell class processors with BMI1, BMI2, LZCNT, MOVBE, POPCNT, AVX2 and FMA, and xeon_phi, which is for Xeon Phi class processors with AVX512F, AVX512CD, AVX512ER and AVX512PF. A capability, avx512_1, is also added to x86-64 for AVX512 ISAs: AVX512F, AVX512CD, AVX512BW, AVX512DQ and AVX512VL. [BZ #21391] * sysdeps/i386/dl-machine.h (dl_platform_init) [IS_IN (rtld)]: Only call init_cpu_features. [!IS_IN (rtld)]: Only set GLRO(dl_platform) to NULL if needed. * sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise. * sysdeps/i386/dl-procinfo.h: Removed. * sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Don't include <sysdeps/i386/dl-procinfo.h> nor <ldsodefs.h>. Include <sysdeps/x86/dl-procinfo.h>. (_dl_procinfo): Replace _DL_HWCAP_COUNT with 32. * sysdeps/unix/sysv/linux/x86_64/dl-procinfo.h [!IS_IN (ldconfig)]: Include <sysdeps/x86/dl-procinfo.h> instead of <sysdeps/generic/dl-procinfo.h>. * sysdeps/x86/cpu-features.c: Include <dl-hwcap.h>. (init_cpu_features): Set dl_platform, dl_hwcap and dl_hwcap_mask. * sysdeps/x86/cpu-features.h (bit_cpu_LZCNT): New. (bit_cpu_MOVBE): Likewise. (bit_cpu_BMI1): Likewise. (bit_cpu_BMI2): Likewise. (index_cpu_BMI1): Likewise. (index_cpu_BMI2): Likewise. (index_cpu_LZCNT): Likewise. (index_cpu_MOVBE): Likewise. (index_cpu_POPCNT): Likewise. (reg_BMI1): Likewise. (reg_BMI2): Likewise. (reg_LZCNT): Likewise. (reg_MOVBE): Likewise. (reg_POPCNT): Likewise. * sysdeps/x86/dl-hwcap.h: New file. * sysdeps/x86/dl-procinfo.h: Likewise. * sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): New. (_dl_x86_platforms): Likewise.	2017-05-03 13:44:35 -07:00
H.J. Lu	4cb334c4d6	x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396 ] On Skylake server, AVX512 load/store instructions in memcpy/memset may lead to lower CPU turbo frequency in certain situations. Use of AVX2 in memcpy/memset has been observed to have improved overall performance in many workloads due to the higher frequency. Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512 if AVX512ER isn't available so that AVX2 versions of memcpy/memset are used on Skylake server. [BZ #21396] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Prefer_No_AVX512 if AVX512ER isn't available. * sysdeps/x86/cpu-features.h (bit_arch_Prefer_No_AVX512): New. (index_arch_Prefer_No_AVX512): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Don't use AVX512 version if Prefer_No_AVX512 is set. * sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Likewise. * sysdeps/x86_64/multiarch/memmove_chk.S (__memmove_chk): Likewise. * sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise. * sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise. * sysdeps/x86_64/multiarch/memset.S (memset): Likewise. * sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk): Likewise.	2017-04-18 14:01:45 -07:00
H.J. Lu	1c53cb49de	x86: Set Prefer_No_VZEROUPPER if AVX512ER is available AVX512ER won't be implemented in any Xeon processors and will be in all Xeon Phi processors. Don't check CPU model number when setting Prefer_No_VZEROUPPER for Xeon Phi. Instead, set Prefer_No_VZEROUPPER if AVX512ER is available. It works with current and future Xeon Phi and non-Xeon Phi processors. * sysdeps/x86/cpu-features.c (init_cpu_features): Set Prefer_No_VZEROUPPER if AVX512ER is available. * sysdeps/x86/cpu-features.h (bit_cpu_AVX512PF): New. (bit_cpu_AVX512ER): Likewise. (bit_cpu_AVX512CD): Likewise. (bit_cpu_AVX512BW): Likewise. (bit_cpu_AVX512VL): Likewise. (index_cpu_AVX512PF): Likewise. (index_cpu_AVX512ER): Likewise. (index_cpu_AVX512CD): Likewise. (index_cpu_AVX512BW): Likewise. (index_cpu_AVX512VL): Likewise. (reg_AVX512PF): Likewise. (reg_AVX512ER): Likewise. (reg_AVX512CD): Likewise. (reg_AVX512BW): Likewise. (reg_AVX512VL): Likewise.	2017-04-18 08:27:32 -07:00
H.J. Lu	b170d2e7ab	Use CPU_FEATURES_CPU_P to check if AVX is available Don't use bit_cpu_AVX directly. * sysdeps/x86/cpu-features.c (init_cpu_features): Check AVX with CPU_FEATURES_CPU_P.	2017-03-17 11:38:13 -07:00
H.J. Lu	52ac22365a	Use index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bit * sysdeps/x86/cpu-features.c (init_cpu_features): Use index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bit.	2017-02-17 11:53:26 -08:00
Joseph Myers	bfff8b1bec	Update copyright dates with scripts/update-copyrights.	2017-01-01 00:14:16 +00:00
Andrew Senkevich	2702856bf4	Disable TSX on some Haswell processors. Patch disables Intel TSX on some Haswell processors to avoid TSX on kernels that weren't updated with the latest microcode package (which disables broken feature by default). * sysdeps/x86/cpu-features.c (get_common_indeces): Add stepping identification. (init_cpu_features): Add handle of Haswell.	2016-12-19 14:15:57 +03:00
Carlos O'Donell	b3d17c1cf2	Bug 20689: Fix FMA and AVX2 detection on Intel In the Intel Architecture Instruction Set Extensions Programming reference the recommended way to test for FMA in section '2.2.1 Detection of FMA' is: "Application Software must identify that hardware supports AVX as explained in ... after that it must also detect support for FMA..." We don't do that in glibc. We use osxsave to detect the use of xgetbv, and after that we check for AVX and FMA orthogonally. It is conceivable that you could have the AVX bit clear and the FMA bit in an undefined state. This commit fixes FMA and AVX2 detection to depend on usable AVX as required by the recommended Intel sequences. v1: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00241.html v2: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00265.html	2016-10-17 19:39:54 -04:00
H.J. Lu	fb0f7a6755	X86-64: Add _dl_runtime_resolve_avx[512]_{opt\|slow} [BZ #20508 ] There is transition penalty when SSE instructions are mixed with 256-bit AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM registers, there is transition penalty when SSE instructions are used with lazy binding on AVX and AVX512 processors. To avoid SSE transition penalty, if only the lower 128 bits of the first 8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers with the zero upper bits. For AVX and AVX512 processors which support XGETBV with ECX == 1, we can use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers or the upper 256 bits of ZMM registers are zero. We can restore only the non-zero portion of vector registers with AVX/AVX512 load instructions which will zero-extend upper bits of vector registers. This patch adds _dl_runtime_resolve_sse_vex which saves and restores XMM registers with 128-bit AVX store/load instructions. It is used to preserve YMM/ZMM registers when only the lower 128 bits are non-zero. _dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so that we store and load only the non-zero portion of vector registers. This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and _dl_runtime_profile_avx512 when only the lower 128 bits of vector registers are used. _dl_runtime_resolve_avx_slow is added and used for AVX processors which don't support XGETBV with ECX == 1. Since there is no SSE transition penalty on AVX512 processors which don't support XGETBV with ECX == 1, _dl_runtime_resolve_avx512_slow isn't provided. [BZ #20495] [BZ #20508] * sysdeps/x86/cpu-features.c (init_cpu_features): For Intel processors, set Use_dl_runtime_resolve_slow and set Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1. * sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt): New. (bit_arch_Use_dl_runtime_resolve_slow): Likewise. (index_arch_Use_dl_runtime_resolve_opt): Likewise. (index_arch_Use_dl_runtime_resolve_slow): Likewise. * sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use _dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt if Use_dl_runtime_resolve_opt is set. Use _dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set. * sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>. (_dl_runtime_resolve_opt): New. Defined for AVX and AVX512. (_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex. * sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow): New. (_dl_runtime_resolve_opt): Likewise. (_dl_runtime_profile): Define only if _dl_runtime_profile is defined.	2016-09-06 08:51:07 -07:00
H.J. Lu	91655fc307	Check FMA after COMMON_CPUID_INDEX_80000001 Since the FMA4 bit is in COMMON_CPUID_INDEX_80000001 and FMA4 requires AVX, determine if FMA4 is usable after COMMON_CPUID_INDEX_80000001 is available and if AVX is usable. [BZ #20195] * sysdeps/x86/cpu-features.c (get_common_indeces): Move FMA4 check to ... (init_cpu_features): Here.	2016-06-07 08:00:40 -07:00
H.J. Lu	2e2d9796da	Detect Intel Goldmont and Airmont processors Updated from the model numbers of Goldmont and Airmont processors in Intel64 And IA-32 Processor Architectures Software Developer's Manual Volume 3 Revision 058. * sysdeps/x86/cpu-features.c (init_cpu_features): Detect Intel Goldmont and Airmont processors.	2016-04-15 05:23:06 -07:00
H.J. Lu	27d3ce1467	Remove Fast_Copy_Backward from Intel Core processors Intel Core i3, i5 and i7 processors have fast unaligned copy and copy backward is ignored. Remove Fast_Copy_Backward from Intel Core processors to avoid confusion. * sysdeps/x86/cpu-features.c (init_cpu_features): Don't set bit_arch_Fast_Copy_Backward for Intel Core proessors.	2016-04-01 15:09:14 -07:00
H.J. Lu	e41b395523	[x86] Add a feature bit: Fast_Unaligned_Copy On AMD processors, memcpy optimized with unaligned SSE load is slower than emcpy optimized with aligned SSSE3 while other string functions are faster with unaligned SSE load. A feature bit, Fast_Unaligned_Copy, is added to select memcpy optimized with unaligned SSE load. [BZ #19583] * sysdeps/x86/cpu-features.c (init_cpu_features): Set Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel processors. Set Fast_Copy_Backward for AMD Excavator processors. * sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy): New. (index_arch_Fast_Unaligned_Copy): Likewise. * sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check Fast_Unaligned_Copy instead of Fast_Unaligned_Load.	2016-03-28 04:40:03 -07:00
H.J. Lu	f781a9e961	Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors Since only Intel processors with AVX2 have fast unaligned load, we should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors. Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces and call get_common_indeces for other processors. Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading GLRO(dl_x86_cpu_features) in cpu-features.c. [BZ #19583] * sysdeps/x86/cpu-features.c (get_common_indeces): Remove inline. Check family before setting family, model and extended_model. Set AVX, AVX2, AVX512, FMA and FMA4 usable bits here. (init_cpu_features): Replace HAS_CPU_FEATURE and HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P. Set index_arch_AVX_Fast_Unaligned_Load for Intel processors with usable AVX2. Call get_common_indeces for other processors with family == NULL. * sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro. (CPU_FEATURES_ARCH_P): Likewise. (HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P. (HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.	2016-03-22 07:47:20 -07:00
H.J. Lu	6aa3e97e25	Add _arch_/_cpu_ to index_/bit_ in x86 cpu-features.h index_* and bit_* macros are used to access cpuid and feature arrays o struct cpu_features. It is very easy to use bits and indices of cpuid array on feature array, especially in assembly codes. For example, sysdeps/i386/i686/multiarch/bcopy.S has HAS_CPU_FEATURE (Fast_Rep_String) which should be HAS_ARCH_FEATURE (Fast_Rep_String) We change index_* and bit_* to index_cpu_/index_arch_ and bit_cpu_/bit_arch_ so that we can catch such error at build time. [BZ #19762] * sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h (EXTRA_LD_ENVVARS): Add _arch_ to index_/bit_. * sysdeps/x86/cpu-features.c (init_cpu_features): Likewise. * sysdeps/x86/cpu-features.h (bit_): Renamed to ... (bit_arch_): This for feature array. (bit_): Renamed to ... (bit_cpu_): This for cpu array. (index_): Renamed to ... (index_arch_): This for feature array. (index_): Renamed to ... (index_cpu_): This for cpu array. [__ASSEMBLER__] (HAS_FEATURE): Add and use field. [__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE. [__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE. [!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and bit_##name with index_cpu_##name and bit_cpu_##name. [!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and bit_##name with index_arch_##name and bit_arch_##name.	2016-03-10 05:27:07 -08:00
Amit Pawar	d7890e6947	Set index_Fast_Unaligned_Load for Excavator family CPUs GLIBC benchtest testcases shows SSE2_Unaligned based implementations are performing faster compare to SSE2 based implementations for routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family 0x15h CPU's. This makes SSE2_Unaligned based implementations as default for these routines. [BZ #19467] * sysdeps/x86/cpu-features.c (init_cpu_features): Set index_Fast_Unaligned_Load flag for Excavator family CPUs.	2016-01-14 08:14:31 -08:00

1 2

58 Commits