Commit Graph

58 Commits

Author SHA1 Message Date
H.J. Lu
94cd37ebb2 x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625]
commit 04bba1e5d8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Aug 5 13:51:56 2020 -0700

    x86: Set CPU usable feature bits conservatively [BZ #26552]

    Set CPU usable feature bits only for CPU features which are usable in
    user space and whose usability can be detected from user space, excluding
    features like FSGSBASE whose enable bit can only be checked in the kernel.

no longer turns on the usable bits of IBT and SHSTK since we don't know
if IBT and SHSTK are usable until much later.  Use HAS_CPU_FEATURE to
check if the processor supports IBT and SHSTK.
2020-09-17 05:18:36 -07:00
H.J. Lu
f2c679d4b2 <sys/platform/x86.h>: Add Intel Key Locker support
Add Intel Key Locker:

https://software.intel.com/content/www/us/en/develop/download/intel-key-locker-specification.html

support to <sys/platform/x86.h>.  Intel Key Locker has

1. KL: AES Key Locker instructions.
2. WIDE_KL: AES wide Key Locker instructions.
3. AESKLE: AES Key Locker instructions are enabled by OS.

Applications should use

if (CPU_FEATURE_USABLE (KL))

and

if (CPU_FEATURE_USABLE (WIDE_KL))

to check if AES Key Locker instructions and AES wide Key Locker
instructions are usable.
2020-09-16 05:56:10 -07:00
H.J. Lu
04bba1e5d8 x86: Set CPU usable feature bits conservatively [BZ #26552]
Set CPU usable feature bits only for CPU features which are usable in
user space and whose usability can be detected from user space, excluding
features like FSGSBASE whose enable bit can only be checked in the kernel.
2020-09-03 04:36:20 -07:00
H.J. Lu
107e6a3c22 x86: Support usable check for all CPU features
Support usable check for all CPU features with the following changes:

1. Change struct cpu_features to

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
  unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};

so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits.  EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.

The results are

1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
2020-07-13 06:05:16 -07:00
H.J. Lu
3f4b61a0b8 x86: Add thresholds for "rep movsb/stosb" to tunables
Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables
to update thresholds for "rep movsb" and "rep stosb" at run-time.

Note that the user specified threshold for "rep movsb" smaller than
the minimum threshold will be ignored.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-07-06 11:48:42 -07:00
H.J. Lu
4fdd4d41a1 x86: Detect Intel Advanced Matrix Extensions
Intel Advanced Matrix Extensions (Intel AMX) is a new programming
paradigm consisting of two components: a set of 2-dimensional registers
(tiles) representing sub-arrays from a larger 2-dimensional memory image,
and accelerators able to operate on tiles.  Intel AMX is an extensible
architecture.  New accelerators can be added and the existing accelerator
may be enhanced to provide higher performance.  The initial features are
AMX-BF16, AMX-TILE and AMX-INT8, which are usable only if the operating
system supports both XTILECFG state and XTILEDATA state.

Add AMX-BF16, AMX-TILE and AMX-INT8 support to HAS_CPU_FEATURE and
CPU_FEATURE_USABLE.
2020-06-26 06:53:05 -07:00
H.J. Lu
ecbbadbf10 x86: Update CPU feature detection [BZ #26149]
1. Divide architecture features into the usable features and the preferred
features.  The usable features are for correctness and can be exported in
a stable ABI.  The preferred features are for performance and only for
glibc internal use.
2. Change struct cpu_features to

struct cpu_features
{
  struct cpu_features_basic basic;
  unsigned int *usable_p;
  struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];
  unsigned int usable[USABLE_FEATURE_INDEX_MAX];
  unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
  ...
};

and initialize usable_p to pointer to the usable arary so that

struct cpu_features
{
  struct cpu_features_basic basic;
  unsigned int *usable_p;
  struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];
};

can be exported via a stable ABI.  The cpuid and usable arrays can be
expanded with backward binary compatibility for both .o and .so files.
3. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16.
4. Detect ENQCMD, PKS, AVX512_VP2INTERSECT, MD_CLEAR, SERIALIZE, HYBRID,
TSXLDTRK, L1D_FLUSH, CORE_CAPABILITIES and AVX512_BF16.
5. Rename CAPABILITIES to ARCH_CAPABILITIES.
6. Check if AVX512_VP2INTERSECT, AVX512_BF16 and PKU are usable.
7. Update CPU feature detection test.
2020-06-22 13:09:33 -07:00
H.J. Lu
27f8864bd4 x86: Update F16C detection [BZ #26133]
Since F16C requires AVX, set F16C usable only when AVX is usable.
2020-06-18 07:01:58 -07:00
H.J. Lu
76d5b2f002 x86: Update Intel Atom processor family optimization
Enable Intel Silvermont optimization for Intel Goldmont Plus.  Detect more
Intel Airmont processors.  Optimize Intel Tremont like Intel Silvermont
with rep string instructions.
2020-05-21 13:36:54 -07:00
H.J. Lu
674ea88294 x86: Move CET control to _dl_x86_feature_control [BZ #25887]
1. Include <dl-procruntime.c> to get architecture specific initializer in
rtld_global.
2. Change _dl_x86_feature_1[2] to _dl_x86_feature_1.
3. Add _dl_x86_feature_control after _dl_x86_feature_1, which is a
struct of 2 bitfields for IBT and SHSTK control

This fixes [BZ #25887].
2020-05-18 06:15:02 -07:00
mayshao
32ac0b9884 x86: Add CPU Vendor ID detection support for Zhaoxin processors
To recognize Zhaoxin CPU Vendor ID, add a new architecture type
arch_kind_zhaoxin for Vendor Zhaoxin detection.
2020-04-30 06:36:48 -07:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Joseph Myers
a04549c194 Break more lines before not after operators.
This patch makes further coding style fixes where code was breaking
lines after an operator, contrary to the GNU Coding Standards.  As
with the previous patch, it is limited to files following a reasonable
approximation to GNU style already, and is not exhaustive; more such
issues remain to be fixed.

Tested for x86_64, and with build-many-glibcs.py.

	* dirent/dirent.h [!_DIRENT_HAVE_D_NAMLEN
	&& _DIRENT_HAVE_D_RECLEN] (_D_ALLOC_NAMLEN): Break lines before
	rather than after operators.
	* elf/cache.c (print_cache): Likewise.
	* gshadow/fgetsgent_r.c (__fgetsgent_r): Likewise.
	* htl/pt-getattr.c (__pthread_getattr_np): Likewise.
	* hurd/hurdinit.c (_hurd_setproc): Likewise.
	* hurd/hurdkill.c (_hurd_sig_post): Likewise.
	* hurd/hurdlookup.c (__file_name_lookup_under): Likewise.
	* hurd/hurdsig.c (_hurd_internal_post_signal): Likewise.
	(reauth_proc): Likewise.
	* hurd/lookup-at.c (__file_name_lookup_at): Likewise.
	(__file_name_split_at): Likewise.
	(__directory_name_split_at): Likewise.
	* hurd/lookup-retry.c (__hurd_file_name_lookup_retry): Likewise.
	* hurd/port2fd.c (_hurd_port2fd): Likewise.
	* iconv/gconv_dl.c (do_print): Likewise.
	* inet/netinet/in.h (struct sockaddr_in): Likewise.
	* libio/wstrops.c (_IO_wstr_seekoff): Likewise.
	* locale/setlocale.c (new_composite_name): Likewise.
	* malloc/memusagestat.c (main): Likewise.
	* misc/fstab.c (fstab_convert): Likewise.
	* nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_usercnt):
	Likewise.
	* nss/nss_compat/compat-grp.c (getgrent_next_nss): Likewise.
	(getgrent_next_file): Likewise.
	(internal_getgrnam_r): Likewise.
	(internal_getgrgid_r): Likewise.
	* nss/nss_compat/compat-initgroups.c (getgrent_next_nss):
	Likewise.
	(internal_getgrent_r): Likewise.
	* nss/nss_compat/compat-pwd.c (getpwent_next_nss_netgr): Likewise.
	(getpwent_next_nss): Likewise.
	(getpwent_next_file): Likewise.
	(internal_getpwnam_r): Likewise.
	(internal_getpwuid_r): Likewise.
	* nss/nss_compat/compat-spwd.c (getspent_next_nss_netgr):
	Likewise.
	(getspent_next_nss): Likewise.
	(internal_getspnam_r): Likewise.
	* pwd/fgetpwent_r.c (__fgetpwent_r): Likewise.
	* shadow/fgetspent_r.c (__fgetspent_r): Likewise.
	* string/strchr.c (STRCHR): Likewise.
	* string/strchrnul.c (STRCHRNUL): Likewise.
	* sysdeps/aarch64/fpu/fpu_control.h (_FPU_FPCR_IEEE): Likewise.
	* sysdeps/aarch64/sfp-machine.h (_FP_CHOOSENAN): Likewise.
	* sysdeps/csky/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/generic/memcopy.h (PAGE_COPY_FWD_MAYBE): Likewise.
	* sysdeps/generic/symbol-hacks.h (__stack_chk_fail_local):
	Likewise.
	* sysdeps/gnu/netinet/ip_icmp.h (ICMP_INFOTYPE): Likewise.
	* sysdeps/gnu/updwtmp.c (TRANSFORM_UTMP_FILE_NAME): Likewise.
	* sysdeps/gnu/utmp_file.c (TRANSFORM_UTMP_FILE_NAME): Likewise.
	* sysdeps/hppa/jmpbuf-unwind.h (_JMPBUF_UNWINDS): Likewise.
	* sysdeps/mach/hurd/bits/stat.h (S_ISPARE): Likewise.
	* sysdeps/mach/hurd/dl-sysdep.c (_dl_sysdep_start): Likewise.
	(open_file): Likewise.
	* sysdeps/mach/hurd/htl/pt-mutexattr-setprotocol.c
	(pthread_mutexattr_setprotocol): Likewise.
	* sysdeps/mach/hurd/ioctl.c (__ioctl): Likewise.
	* sysdeps/mach/hurd/mmap.c (__mmap): Likewise.
	* sysdeps/mach/hurd/ptrace.c (ptrace): Likewise.
	* sysdeps/mach/hurd/spawni.c (__spawni): Likewise.
	* sysdeps/microblaze/dl-machine.h (elf_machine_type_class):
	Likewise.
	(elf_machine_rela): Likewise.
	* sysdeps/mips/mips32/sfp-machine.h (_FP_CHOOSENAN): Likewise.
	* sysdeps/mips/mips64/sfp-machine.h (_FP_CHOOSENAN): Likewise.
	* sysdeps/mips/sys/asm.h (multiple #if conditionals): Likewise.
	* sysdeps/posix/rename.c (rename): Likewise.
	* sysdeps/powerpc/novmx-sigjmp.c (__novmx__sigjmp_save): Likewise.
	* sysdeps/powerpc/sigjmp.c (__vmx__sigjmp_save): Likewise.
	* sysdeps/s390/fpu/fenv_libc.h (FPC_VALID_MASK): Likewise.
	* sysdeps/s390/utf8-utf16-z9.c (gconv_end): Likewise.
	* sysdeps/unix/grantpt.c (grantpt): Likewise.
	* sysdeps/unix/sysv/linux/a.out.h (N_TXTOFF): Likewise.
	* sysdeps/unix/sysv/linux/updwtmp.c (TRANSFORM_UTMP_FILE_NAME):
	Likewise.
	* sysdeps/unix/sysv/linux/utmp_file.c (TRANSFORM_UTMP_FILE_NAME):
	Likewise.
	* sysdeps/x86/cpu-features.c (get_common_indices): Likewise.
	* time/tzfile.c (__tzfile_compute): Likewise.
2019-02-25 13:19:19 +00:00
Joseph Myers
32db86d558 Add fall-through comments.
This patch adds fall-through comments in some cases where -Wextra
produces implicit-fallthrough warnings.

The patch is non-exhaustive.  Apart from architecture-specific code
for non-x86_64 architectures, it does not change sunrpc/xdr.c (legacy
code, probably should have such changes, but left to be dealt with
separately), or places that already had comments about the
fall-through but not matching the form expected by
-Wimplicit-fallthrough=3 (the default level with -Wextra; my
inclination is to adjust those comments to match rather than
downgrading to -Wimplicit-fallthrough=1 to allow any comment), or one
place where I thought the implicit fallthrough was not correct and so
should be handled separately as a bug fix.  I think the key thing to
consider in review of this patch is whether the fall-through is indeed
intended and correct in each place where such a comment is added.

Tested for x86_64.

	* elf/dl-exception.c (_dl_exception_create_format): Add
	fall-through comments.
	* elf/ldconfig.c (parse_conf_include): Likewise.
	* elf/rtld.c (print_statistics): Likewise.
	* locale/programs/charmap.c (parse_charmap): Likewise.
	* misc/mntent_r.c (__getmntent_r): Likewise.
	* posix/wordexp.c (parse_arith): Likewise.
	(parse_backtick): Likewise.
	* resolv/ns_ttl.c (ns_parse_ttl): Likewise.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
2019-02-12 10:30:34 +00:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
Carlos O'Donell
ade8b817fe x86: Add Hygon Dhyana support.
This patch fix Hygon Dhyana processor CPU Vendor ID detection
problem in glibc sysdep module, current glibc codes doesn't
recognize Dhyana CPU Vendor ID("HygonGenuine") and set kind to
arch_kind_other, which result to incorrect zero value for
__cache_sysconf() syscall. As Hygon Dhyana share most
architecture feature as AMD Family 17h, this patch add Hygon CPU
Vendor ID check and setup kind to arch_kind_amd and reuse AMD
code path, which lead to correct return value in
__cache_sysconf() syscall. we run the glibc test suite for both
Hygon Dhyana and AMD EPYC and found no failure case.

Background:
Chengdu Haiguang IC Design Co., Ltd (Hygon) is a Joint Venture
between AMD and Haiguang Information Technology Co.,Ltd., aims at
providing high performance x86 processor for China server market.
Its first generation processor codename is Dhyana, which
originates from AMD technology and shares most of the
architecture with AMD's family 17h, but with different CPU Vendor
ID("HygonGenuine")/Family series number(Family 18h).

Related Hygon kernel patch can be found on
http://lkml.kernel.org/r/5ce86123a7b9dad925ac583d88d2f921040e859b.1538583282.git.puwen@hygon.cn

Signed-off-by: fanjinke <fanjinke@hygon.cn>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2018-12-13 09:25:20 -05:00
H.J. Lu
c22e4c2a14 x86: Extend CPUID support in struct cpu_features
Extend CPUID support for all feature bits from CPUID.  Add a new macro,
CPU_FEATURE_USABLE, which can be used to check if a feature is usable at
run-time, instead of HAS_CPU_FEATURE and HAS_ARCH_FEATURE.

Add COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007 and
COMMON_CPUID_INDEX_80000008 to check CPU feature bits in them.

Tested on i686 and x86-64 as well as using build-many-glibcs.py with
x86 targets.

	* sysdeps/x86/cacheinfo.c (intel_check_word): Updated for
	cpu_features_basic.
	(__cache_sysconf): Likewise.
	(init_cacheinfo): Likewise.
	* sysdeps/x86/cpu-features.c (get_extended_indeces): Also
	populate COMMON_CPUID_INDEX_80000007 and
	COMMON_CPUID_INDEX_80000008.
	(get_common_indices): Also populate COMMON_CPUID_INDEX_D_ECX_1.
	Use CPU_FEATURES_CPU_P (cpu_features, XSAVEC) to check if
	XSAVEC is available.  Set the bit_arch_XXX_Usable bits.
	(init_cpu_features): Use _Static_assert on
	index_arch_Fast_Unaligned_Load.
	__get_cpuid_registers and __get_arch_feature.  Updated for
	cpu_features_basic.  Set stepping in cpu_features.
	* sysdeps/x86/cpu-features.h: (FEATURE_INDEX_1): Changed to enum.
	(FEATURE_INDEX_2): New.
	(FEATURE_INDEX_MAX): Changed to enum.
	(COMMON_CPUID_INDEX_D_ECX_1): New.
	(COMMON_CPUID_INDEX_80000007): Likewise.
	(COMMON_CPUID_INDEX_80000008): Likewise.
	(cpuid_registers): Likewise.
	(cpu_features_basic): Likewise.
	(CPU_FEATURE_USABLE): Likewise.
	(bit_arch_XXX_Usable): Likewise.
	(cpu_features): Use cpuid_registers and cpu_features_basic.
	(bit_arch_XXX): Reweritten.
	(bit_cpu_XXX): Likewise.
	(index_cpu_XXX): Likewise.
	(reg_XXX): Likewise.
	* sysdeps/x86/tst-get-cpu-features.c: Include <stdio.h> and
	<support/check.h>.
	(CHECK_CPU_FEATURE): New.
	(CHECK_CPU_FEATURE_USABLE): Likewise.
	(cpu_kinds): Likewise.
	(do_test): Print vendor, family, model and stepping.  Check
	HAS_CPU_FEATURE and CPU_FEATURE_USABLE.
	(TEST_FUNCTION): Removed.
	Include <support/test-driver.c> instead of
	"../../test-skeleton.c".
	* sysdeps/x86_64/multiarch/sched_cpucount.c (__sched_cpucount):
	Check POPCNT instead of POPCOUNT.
	* sysdeps/x86_64/multiarch/test-multiarch.c (do_test): Likewise.
2018-12-03 05:54:56 -08:00
Adhemerval Zanella
c3d8dc45c9 x86: Fix Haswell strong flags (BZ#23709)
Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
default flags for Haswell models.  Previously, new models were handled by the
default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
Prefer_PMINUB_for_stringop (only the TSX one).

This patch fixes it by disentangle the TSX flag handling from the memory
optimization ones.  The strstr case cited on patch now selects the
__strstr_sse2_unaligned as expected for the Haswell cpu.

Checked on x86_64-linux-gnu.

	[BZ #23709]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
	independently of other flags.
2018-10-23 14:57:02 -03:00
Siddhesh Poyarekar
dce452dc52 Rename the glibc.tune namespace to glibc.cpu
The glibc.tune namespace is vaguely named since it is a 'tunable', so
give it a more specific name that describes what it refers to.  Rename
the tunable namespace to 'cpu' to more accurately reflect what it
encompasses.  Also rename glibc.tune.cpu to glibc.cpu.name since
glibc.cpu.cpu is weird.

	* NEWS: Mention the change.
	* elf/dl-tunables.list: Rename tune namespace to cpu.
	* sysdeps/powerpc/dl-tunables.list: Likewise.
	* sysdeps/x86/dl-tunables.list: Likewise.
	* sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to
	cpu.name.
	* elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust.
	* elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise.
	* manual/README.tunables: Likewise.
	* manual/tunables.texi: Likewise.
	* sysdeps/powerpc/cpu-features.c: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
	(init_cpu_features): Likewise.
	* sysdeps/x86/cpu-features.c: Likewise.
	* sysdeps/x86/cpu-features.h: Likewise.
	* sysdeps/x86/cpu-tunables.c: Likewise.
	* sysdeps/x86_64/Makefile: Likewise.
	* sysdeps/x86/dl-cet.c: Likewise.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2018-08-02 23:49:19 +05:30
H.J. Lu
82c80ac2eb x86: Rename get_common_indeces to get_common_indices
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/x86/cpu-features.c (get_common_indeces): Renamed to
	...
	(get_common_indices): This.
	(init_cpu_features): Updated.
2018-08-01 04:57:50 -07:00
H.J. Lu
be525a69a6 x86: Populate COMMON_CPUID_INDEX_80000001 for Intel CPUs [BZ #23459]
Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	[BZ #23459]
	* sysdeps/x86/cpu-features.c (get_extended_indices): New
	function.
	(init_cpu_features): Call get_extended_indices for both Intel
	and AMD CPUs.
	* sysdeps/x86/cpu-features.h (COMMON_CPUID_INDEX_80000001):
	Remove "for AMD" comment.
2018-07-26 13:31:11 -07:00
H.J. Lu
ba2ea23d05 x86: Always include <dl-cet.h>/cet-tunables.h> for --enable-cet
Always include <dl-cet.h> and cet-tunables.h> when CET is enabled.
Otherwise, configure glibc with --enable-cet --disable-tunables will
fail to build.

	* sysdeps/x86/cpu-features.c: Always include <dl-cet.h> and
	cet-tunables.h> when CET is enabled.
2018-07-17 04:16:35 -07:00
H.J. Lu
f753fa7dea x86: Support IBT and SHSTK in Intel CET [BZ #21598]
Intel Control-flow Enforcement Technology (CET) instructions:

https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en
forcement-technology-preview.pdf

includes Indirect Branch Tracking (IBT) and Shadow Stack (SHSTK).

GNU_PROPERTY_X86_FEATURE_1_IBT is added to GNU program property to
indicate that all executable sections are compatible with IBT when
ENDBR instruction starts each valid target where an indirect branch
instruction can land.  Linker sets GNU_PROPERTY_X86_FEATURE_1_IBT on
output only if it is set on all relocatable inputs.

On an IBT capable processor, the following steps should be taken:

1. When loading an executable without an interpreter, enable IBT and
lock IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the executable.
2. When loading an executable with an interpreter, enable IBT if
GNU_PROPERTY_X86_FEATURE_1_IBT is set on the interpreter.
  a. If GNU_PROPERTY_X86_FEATURE_1_IBT isn't set on the executable,
     disable IBT.
  b. Lock IBT.
3. If IBT is enabled, when loading a shared object without
GNU_PROPERTY_X86_FEATURE_1_IBT:
  a. If legacy interwork is allowed, then mark all pages in executable
     PT_LOAD segments in legacy code page bitmap.  Failure of legacy code
     page bitmap allocation causes an error.
  b. If legacy interwork isn't allowed, it causes an error.

GNU_PROPERTY_X86_FEATURE_1_SHSTK is added to GNU program property to
indicate that all executable sections are compatible with SHSTK where
return address popped from shadow stack always matches return address
popped from normal stack.  Linker sets GNU_PROPERTY_X86_FEATURE_1_SHSTK
on output only if it is set on all relocatable inputs.

On a SHSTK capable processor, the following steps should be taken:

1. When loading an executable without an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on the executable.
2. When loading an executable with an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on interpreter.
  a. If GNU_PROPERTY_X86_FEATURE_1_SHSTK isn't set on the executable
     or any shared objects loaded via the DT_NEEDED tag, disable SHSTK.
  b. Otherwise lock SHSTK.
3. After SHSTK is enabled, it is an error to load a shared object
without GNU_PROPERTY_X86_FEATURE_1_SHSTK.

To enable CET support in glibc, --enable-cet is required to configure
glibc.  When CET is enabled, both compiler and assembler must support
CET.  Otherwise, it is a configure-time error.

To support CET run-time control,

1. _dl_x86_feature_1 is added to the writable ld.so namespace to indicate
if IBT or SHSTK are enabled at run-time.  It should be initialized by
init_cpu_features.
2. For dynamic executables:
   a. A l_cet field is added to struct link_map to indicate if IBT or
      SHSTK is enabled in an ELF module.  _dl_process_pt_note or
      _rtld_process_pt_note is called to process PT_NOTE segment for
      GNU program property and set l_cet.
   b. _dl_open_check is added to check IBT and SHSTK compatibilty when
      dlopening a shared object.
3. Replace i386 _dl_runtime_resolve and _dl_runtime_profile with
_dl_runtime_resolve_shstk and _dl_runtime_profile_shstk, respectively if
SHSTK is enabled.

CET run-time control can be changed via GLIBC_TUNABLES with

$ export GLIBC_TUNABLES=glibc.tune.x86_shstk=[permissive|on|off]
$ export GLIBC_TUNABLES=glibc.tune.x86_ibt=[permissive|on|off]

1. permissive: SHSTK is disabled when dlopening a legacy ELF module.
2. on: IBT or SHSTK are always enabled, regardless if there are IBT or
SHSTK bits in GNU program property.
3. off: IBT or SHSTK are always disabled, regardless if there are IBT or
SHSTK bits in GNU program property.

<cet.h> from CET-enabled GCC is automatically included by assembly codes
to add GNU_PROPERTY_X86_FEATURE_1_IBT and GNU_PROPERTY_X86_FEATURE_1_SHSTK
to GNU program property.  _CET_ENDBR is added at the entrance of all
assembly functions whose address may be taken.  _CET_NOTRACK is used to
insert NOTRACK prefix with indirect jump table to support IBT.  It is
defined as notrack when _CET_NOTRACK is defined in <cet.h>.

	 [BZ #21598]
	* configure.ac: Add --enable-cet.
	* configure: Regenerated.
	* elf/Makefille (all-built-dso): Add a comment.
	* elf/dl-load.c (filebuf): Moved before "dynamic-link.h".
	Include <dl-prop.h>.
	(_dl_map_object_from_fd): Call _dl_process_pt_note on PT_NOTE
	segment.
	* elf/dl-open.c: Include <dl-prop.h>.
	(dl_open_worker): Call _dl_open_check.
	* elf/rtld.c: Include <dl-prop.h>.
	(dl_main): Call _rtld_process_pt_note on PT_NOTE segment.  Call
	_rtld_main_check.
	* sysdeps/generic/dl-prop.h: New file.
	* sysdeps/i386/dl-cet.c: Likewise.
	* sysdeps/unix/sysv/linux/x86/cpu-features.c: Likewise.
	* sysdeps/unix/sysv/linux/x86/dl-cet.h: Likewise.
	* sysdeps/x86/cet-tunables.h: Likewise.
	* sysdeps/x86/check-cet.awk: Likewise.
	* sysdeps/x86/configure: Likewise.
	* sysdeps/x86/configure.ac: Likewise.
	* sysdeps/x86/dl-cet.c: Likewise.
	* sysdeps/x86/dl-procruntime.c: Likewise.
	* sysdeps/x86/dl-prop.h: Likewise.
	* sysdeps/x86/libc-start.h: Likewise.
	* sysdeps/x86/link_map.h: Likewise.
	* sysdeps/i386/dl-trampoline.S (_dl_runtime_resolve): Add
	_CET_ENDBR.
	(_dl_runtime_profile): Likewise.
	(_dl_runtime_resolve_shstk): New.
	(_dl_runtime_profile_shstk): Likewise.
	* sysdeps/linux/x86/Makefile (sysdep-dl-routines): Add dl-cet
	if CET is enabled.
	(CFLAGS-.o): Add -fcf-protection if CET is enabled.
	(CFLAGS-.os): Likewise.
	(CFLAGS-.op): Likewise.
	(CFLAGS-.oS): Likewise.
	(asm-CPPFLAGS): Add -fcf-protection -include cet.h if CET
	is enabled.
	(tests-special): Add $(objpfx)check-cet.out.
	(cet-built-dso): New.
	(+$(cet-built-dso:=.note)): Likewise.
	(common-generated): Add $(cet-built-dso:$(common-objpfx)%=%.note).
	($(objpfx)check-cet.out): New.
	(generated): Add check-cet.out.
	* sysdeps/x86/cpu-features.c: Include <dl-cet.h> and
	<cet-tunables.h>.
	(TUNABLE_CALLBACK (set_x86_ibt)): New prototype.
	(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
	(init_cpu_features): Call get_cet_status to check CET status
	and update dl_x86_feature_1 with CET status.  Call
	TUNABLE_CALLBACK (set_x86_ibt) and TUNABLE_CALLBACK
	(set_x86_shstk).  Disable and lock CET in libc.a.
	* sysdeps/x86/cpu-tunables.c: Include <cet-tunables.h>.
	(TUNABLE_CALLBACK (set_x86_ibt)): New function.
	(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
	* sysdeps/x86/sysdep.h (_CET_NOTRACK): New.
	(_CET_ENDBR): Define if not defined.
	(ENTRY): Add _CET_ENDBR.
	* sysdeps/x86/dl-tunables.list (glibc.tune): Add x86_ibt and
	x86_shstk.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Add
	_CET_ENDBR.
	(_dl_runtime_profile): Likewise.
2018-07-16 14:08:27 -07:00
Amit Pawar
bce5911b67 Use AVX_Fast_Unaligned_Load from Zen onwards.
From Zen onwards this will be enabled. It was disabled for the
Excavator case and will remain disabled.

Reviewd-by: Carlos O'Donell <carlos@redhat.com>
2018-07-06 09:55:36 -04:00
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
H.J. Lu
b52b0d793d x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers.  It simplifies _dl_runtime_resolve and supports
different calling conventions.  ld.so code size is reduced by more than
1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.

Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:

                             Before    After     Change

Westmere (SSE)/fxsave         345      866       151%
IvyBridge (AVX)/xsave         420      643       53%
Haswell (AVX)/xsave           713      1252      75%
Skylake (AVX+MPX)/xsavec      559      719       28%
Skylake (AVX512+MPX)/xsavec   145      272       87%
Ryzen (AVX)/xsavec            280      553       97%

This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases.  With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.

On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises.  On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.

	[BZ #21265]
	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
	New.
	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
	and bit_arch_XSAVEC_Usable if needed.
	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
	and bit_arch_Use_dl_runtime_resolve_opt.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	Removed.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(bit_arch_Prefer_No_AVX512): Updated.
	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
	(bit_arch_XSAVEC_Usable): New.
	(STATE_SAVE_OFFSET): Likewise.
	(STATE_SAVE_MASK): Likewise.
	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
	(cpu_features): Add xsave_state_size and xsave_state_full_size.
	(index_arch_Use_dl_runtime_resolve_opt): Removed.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_XSAVEC_Usable): New.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
	is enabled.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
	_dl_runtime_resolve_xsavec.
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
	Removed.
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
	instead of VEC_SIZE.
	(REGISTER_SAVE_BND0): Removed.
	(REGISTER_SAVE_BND1): Likewise.
	(REGISTER_SAVE_BND3): Likewise.
	(REGISTER_SAVE_RAX): Always defined to 0.
	(VMOV): Removed.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_slow): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_avx512): Likewise.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(USE_FXSAVE): New.
	(_dl_runtime_resolve_fxsave): Likewise.
	(USE_XSAVE): Likewise.
	(_dl_runtime_resolve_xsave): Likewise.
	(USE_XSAVEC): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
	Removed.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(_dl_runtime_resolve_fxsave): New.
	(_dl_runtime_resolve_xsave): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
2017-10-20 11:00:34 -07:00
H.J. Lu
4d916f0f12 x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299]
Since ld.so expands $PLATFORM with GLRO(dl_platform), don't set
GLRO(dl_platform) to NULL.

	[BZ #22299]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
	GLRO(dl_platform) to NULL.
	* sysdeps/x86_64/Makefile (tests): Add tst-platform-1.
	(modules-names): Add tst-platformmod-1 and
	x86_64/tst-platformmod-2.
	(CFLAGS-tst-platform-1.c): New.
	(CFLAGS-tst-platformmod-1.c): Likewise.
	(CFLAGS-tst-platformmod-2.c): Likewise.
	(LDFLAGS-tst-platformmod-2.so): Likewise.
	($(objpfx)tst-platform-1): Likewise.
	($(objpfx)tst-platform-1.out): Likewise.
	(tst-platform-1-ENV): Likewise.
	($(objpfx)x86_64/tst-platformmod-2.os): Likewise.
	* sysdeps/x86_64/tst-platform-1.c: New file.
	* sysdeps/x86_64/tst-platformmod-1.c: Likewise.
	* sysdeps/x86_64/tst-platformmod-2.c: Likewise.
2017-10-19 08:28:26 -07:00
H.J. Lu
45ff34638f x86: Add x86_64 to x86-64 HWCAP [BZ #22093]
Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the
"x86_64" subdirectory when loading a shared library.  ld.so in glibc
2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based
on supported ISAs.  This led to shared library loading failure for
shared libraries placed under the "x86_64" subdirectory.

This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always
search the "x86_64" subdirectory when loading a shared library.

NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip
the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi"
machines.

Tested on i686 and x86-64.

	[BZ #22093]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Initialize
	GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64.
	* sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated.
	(HWCAP_IMPORTANT): Likewise.
	(HWCAP_X86_64): New enum.
	(HWCAP_X86_AVX512_1): Updated.
	* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64".
	* sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1.
	(modules-names): Add x86_64/tst-x86_64mod-1.
	(LDFLAGS-tst-x86_64mod-1.so): New.
	($(objpfx)tst-x86_64-1): Likewise.
	($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise.
	(tst-x86_64-1-clean): Likewise.
	* sysdeps/x86_64/tst-x86_64-1.c: New file.
	* sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.
2017-09-11 08:18:32 -07:00
H.J. Lu
d2cf37c0a2 x86-64: Use _dl_runtime_resolve_opt only with AVX512F [BZ #21871]
On AVX machines with XGETBV (ECX == 1) like Skylake processors,

(gdb) disass _dl_runtime_resolve_avx_opt
Dump of assembler code for function _dl_runtime_resolve_avx_opt:
   0x0000000000015890 <+0>:	push   %rax
   0x0000000000015891 <+1>:	push   %rcx
   0x0000000000015892 <+2>:	push   %rdx
   0x0000000000015893 <+3>:	mov    $0x1,%ecx
   0x0000000000015898 <+8>:	xgetbv
   0x000000000001589b <+11>:	mov    %eax,%r11d
   0x000000000001589e <+14>:	pop    %rdx
   0x000000000001589f <+15>:	pop    %rcx
   0x00000000000158a0 <+16>:	pop    %rax
   0x00000000000158a1 <+17>:	and    $0x4,%r11d
   0x00000000000158a5 <+21>:	bnd je 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.

is slower than:

(gdb) disass _dl_runtime_resolve_avx_slow
Dump of assembler code for function _dl_runtime_resolve_avx_slow:
   0x0000000000015850 <+0>:	vorpd  %ymm0,%ymm1,%ymm8
   0x0000000000015854 <+4>:	vorpd  %ymm2,%ymm3,%ymm9
   0x0000000000015858 <+8>:	vorpd  %ymm4,%ymm5,%ymm10
   0x000000000001585c <+12>:	vorpd  %ymm6,%ymm7,%ymm11
   0x0000000000015860 <+16>:	vorpd  %ymm8,%ymm9,%ymm9
   0x0000000000015865 <+21>:	vorpd  %ymm10,%ymm11,%ymm10
   0x000000000001586a <+26>:	vpcmpeqd %xmm8,%xmm8,%xmm8
   0x000000000001586f <+31>:	vorpd  %ymm9,%ymm10,%ymm10
   0x0000000000015874 <+36>:	vptest %ymm10,%ymm8
   0x0000000000015879 <+41>:	bnd jae 0x158b0 <_dl_runtime_resolve_avx>
   0x000000000001587c <+44>:	vzeroupper
   0x000000000001587f <+47>:	bnd jmpq 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.
(gdb)

since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest.  _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.

	[BZ #21871]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.
2017-08-04 11:14:33 -07:00
H.J. Lu
03feacb562 x86: Rename glibc.tune.ifunc to glibc.tune.hwcaps
Rename glibc.tune.ifunc to glibc.tune.hwcaps and move it to
sysdeps/x86/dl-tunables.list since it is x86 specicifc.  Also
change type of data_cache_size, data_cache_size and
non_temporal_threshold to unsigned long int to match size_t.
Remove usage DEFAULT_STRLEN from cpu-tunables.c.

	* elf/dl-tunables.list (glibc.tune.ifunc): Removed.
	* sysdeps/x86/dl-tunables.list (glibc.tune.hwcaps): New.
	Remove security_level on all fields.
	* manual/tunables.texi: Replace ifunc with hwcaps.
	* sysdeps/x86/cpu-features.c (TUNABLE_CALLBACK (set_ifunc)):
	Renamed to ..
	(TUNABLE_CALLBACK (set_hwcaps)): This.
	(init_cpu_features): Updated.
	* sysdeps/x86/cpu-features.h (cpu_features): Change type of
	data_cache_size, data_cache_size and non_temporal_threshold to
	unsigned long int.
	* sysdeps/x86/cpu-tunables.c (DEFAULT_STRLEN): Removed.
	(TUNABLE_CALLBACK (set_ifunc)): Renamed to ...
	(TUNABLE_CALLBACK (set_hwcaps)): This.  Update comments.  Don't
	use DEFAULT_STRLEN.
2017-06-21 10:21:37 -07:00
H.J. Lu
905947c304 tunables: Add IFUNC selection and cache sizes
The current IFUNC selection is based on microbenchmarks in glibc.  It
should give the best performance for most workloads.  But other choices
may have better performance for a particular workload or on the hardware
which wasn't available at the selection was made.  The environment
variable, GLIBC_TUNABLES=glibc.tune.ifunc=-xxx,yyy,-zzz...., can be used
to enable CPU/ARCH feature yyy, disable CPU/ARCH feature yyy and zzz,
where the feature name is case-sensitive and has to match the ones in
cpu-features.h.  It can be used by glibc developers to override the
IFUNC selection to tune for a new processor or improve performance for
a particular workload.  It isn't intended for normal end users.

NOTE: the IFUNC selection may change over time.  Please check all
multiarch implementations when experimenting.

Also, GLIBC_TUNABLES=glibc.tune.x86_non_temporal_threshold=NUMBER is
provided to set threshold to use non temporal store to NUMBER,
GLIBC_TUNABLES=glibc.tune.x86_data_cache_size=NUMBER to set data cache
size, GLIBC_TUNABLES=glibc.tune.x86_shared_cache_size=NUMBER to set
shared cache size.

	* elf/dl-tunables.list (tune): Add ifunc,
	x86_non_temporal_threshold,
	x86_data_cache_size and x86_shared_cache_size.
	* manual/tunables.texi: Document glibc.tune.ifunc,
	glibc.tune.x86_data_cache_size, glibc.tune.x86_shared_cache_size
	and glibc.tune.x86_non_temporal_threshold.
	* sysdeps/unix/sysv/linux/x86/dl-sysdep.c: New file.
	* sysdeps/x86/cpu-tunables.c: Likewise.
	* sysdeps/x86/cacheinfo.c
	(init_cacheinfo): Check and get data cache size, shared cache
	size and non temporal threshold from cpu_features.
	* sysdeps/x86/cpu-features.c [HAVE_TUNABLES] (TUNABLE_NAMESPACE):
	New.
	[HAVE_TUNABLES] Include <unistd.h>.
	[HAVE_TUNABLES] Include <elf/dl-tunables.h>.
	[HAVE_TUNABLES] (TUNABLE_CALLBACK (set_ifunc)): Likewise.
	[HAVE_TUNABLES] (init_cpu_features): Use TUNABLE_GET to set
	IFUNC selection, data cache size, shared cache size and non
	temporal threshold.
	* sysdeps/x86/cpu-features.h (cpu_features): Add data_cache_size,
	shared_cache_size and non_temporal_threshold.
2017-06-20 08:37:28 -07:00
Siddhesh Poyarekar
511c5a1087 Make LD_HWCAP_MASK usable for static binaries
The LD_HWCAP_MASK environment variable was ignored in static binaries,
which is inconsistent with the behaviour of dynamically linked
binaries.  This seems to have been because of the inability of
ld_hwcap_mask being read early enough to influence anything but now
that it is in tunables, the mask is usable in static binaries as well.

This feature is important for aarch64, which relies on HWCAP_CPUID
being masked out to disable multiarch.  A sanity test on x86_64 shows
that there are no failures.  Likewise for aarch64.

	* elf/dl-hwcaps.h [HAVE_TUNABLES]: Always read hwcap_mask.
	* sysdeps/sparc/sparc32/dl-machine.h [HAVE_TUNABLES]:
	Likewise.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Always set
	up hwcap and hwcap_mask.
2017-06-07 11:11:40 +05:30
Siddhesh Poyarekar
ff08fc59e3 tunables: Use glibc.tune.hwcap_mask tunable instead of _dl_hwcap_mask
Drop _dl_hwcap_mask when building with tunables.  This completes the
transition of hwcap_mask reading from _dl_hwcap_mask to tunables.

	* elf/dl-hwcaps.h: New file.
	* elf/dl-hwcaps.c: Include it.
	(_dl_important_hwcaps)[HAVE_TUNABLES]: Read and update
	glibc.tune.hwcap_mask.
	* elf/dl-cache.c: Include dl-hwcaps.h.
	(_dl_load_cache_lookup)[HAVE_TUNABLES]: Read
	glibc.tune.hwcap_mask.
	* sysdeps/sparc/sparc32/dl-machine.h: Likewise.
	* elf/dl-support.c (_dl_hwcap2)[HAVE_TUNABLES]: Drop
	_dl_hwcap_mask.
	* elf/rtld.c (rtld_global_ro)[HAVE_TUNABLES]: Drop
	_dl_hwcap_mask.
	(process_envvars)[HAVE_TUNABLES]: Likewise.
	* sysdeps/generic/ldsodefs.h (rtld_global_ro)[HAVE_TUNABLES]:
	Likewise.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't
	initialize dl_hwcap_mask when tunables are enabled.
2017-06-07 11:11:38 +05:30
H.J. Lu
1432d38ea0 x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391]
dl_platform and dl_hwcap are set from AT_PLATFORM and AT_HWCAP very
early during startup.  They are used by dynamic linker to determine
platform and build an array of hardware capability names, which are
added to search path when loading shared object.  dl_platform and
dl_hwcap are unused on x86-64.  On i386, i386, i486, i586 and i686
platforms were supported and only SSE2 capability was used.

On x86, usage of AT_PLATFORM and AT_HWCAP to determine platform and
processor capabilities is obsolete since all information is available
in dl_x86_cpu_features.  This patch sets dl_platform and dl_hwcap from
dl_x86_cpu_features in dynamic linker.  On i386, the available plaforms
are changed to i586 and i686 since i386 has been deprecated.  On x86-64,
the available plaforms are haswell, which is for Haswell class processors
with BMI1, BMI2, LZCNT, MOVBE, POPCNT, AVX2 and FMA, and xeon_phi, which
is for Xeon Phi class processors with AVX512F, AVX512CD, AVX512ER and
AVX512PF.  A capability, avx512_1, is also added to x86-64 for AVX512
ISAs: AVX512F, AVX512CD, AVX512BW, AVX512DQ and AVX512VL.

	[BZ #21391]
	* sysdeps/i386/dl-machine.h (dl_platform_init) [IS_IN (rtld)]:
	Only call init_cpu_features.
	[!IS_IN (rtld)]: Only set GLRO(dl_platform) to NULL if needed.
	* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
	* sysdeps/i386/dl-procinfo.h: Removed.
	* sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Don't include
	<sysdeps/i386/dl-procinfo.h> nor <ldsodefs.h>.  Include
	<sysdeps/x86/dl-procinfo.h>.
	(_dl_procinfo): Replace _DL_HWCAP_COUNT with 32.
	* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.h [!IS_IN (ldconfig)]:
	Include <sysdeps/x86/dl-procinfo.h> instead of
	 <sysdeps/generic/dl-procinfo.h>.
	* sysdeps/x86/cpu-features.c: Include <dl-hwcap.h>.
	(init_cpu_features): Set dl_platform, dl_hwcap and dl_hwcap_mask.
	* sysdeps/x86/cpu-features.h (bit_cpu_LZCNT): New.
	(bit_cpu_MOVBE): Likewise.
	(bit_cpu_BMI1): Likewise.
	(bit_cpu_BMI2): Likewise.
	(index_cpu_BMI1): Likewise.
	(index_cpu_BMI2): Likewise.
	(index_cpu_LZCNT): Likewise.
	(index_cpu_MOVBE): Likewise.
	(index_cpu_POPCNT): Likewise.
	(reg_BMI1): Likewise.
	(reg_BMI2): Likewise.
	(reg_LZCNT): Likewise.
	(reg_MOVBE): Likewise.
	(reg_POPCNT): Likewise.
	* sysdeps/x86/dl-hwcap.h: New file.
	* sysdeps/x86/dl-procinfo.h: Likewise.
	* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): New.
	(_dl_x86_platforms): Likewise.
2017-05-03 13:44:35 -07:00
H.J. Lu
4cb334c4d6 x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396]
On Skylake server, AVX512 load/store instructions in memcpy/memset may
lead to lower CPU turbo frequency in certain situations.  Use of AVX2
in memcpy/memset has been observed to have improved overall performance
in many workloads due to the higher frequency.

Since AVX512ER is unique to Xeon Phi, this patch sets Prefer_No_AVX512
if AVX512ER isn't available so that AVX2 versions of memcpy/memset are
used on Skylake server.

	[BZ #21396]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	Prefer_No_AVX512 if AVX512ER isn't available.
	* sysdeps/x86/cpu-features.h (bit_arch_Prefer_No_AVX512): New.
	(index_arch_Prefer_No_AVX512): Likewise.
	* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Don't use
	AVX512 version if Prefer_No_AVX512 is set.
	* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk):
	Likewise.
	* sysdeps/x86_64/multiarch/memmove.S (__libc_memmove): Likewise.
	* sysdeps/x86_64/multiarch/memmove_chk.S (__memmove_chk):
	Likewise.
	* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
	* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk):
	Likewise.
	* sysdeps/x86_64/multiarch/memset.S (memset): Likewise.
	* sysdeps/x86_64/multiarch/memset_chk.S (__memset_chk):
	Likewise.
2017-04-18 14:01:45 -07:00
H.J. Lu
1c53cb49de x86: Set Prefer_No_VZEROUPPER if AVX512ER is available
AVX512ER won't be implemented in any Xeon processors and will be in
all Xeon Phi processors.  Don't check CPU model number when setting
Prefer_No_VZEROUPPER for Xeon Phi.  Instead, set Prefer_No_VZEROUPPER
if AVX512ER is available.  It works with current and future Xeon Phi
and non-Xeon Phi processors.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	Prefer_No_VZEROUPPER if AVX512ER is available.
	* sysdeps/x86/cpu-features.h
	(bit_cpu_AVX512PF): New.
	(bit_cpu_AVX512ER): Likewise.
	(bit_cpu_AVX512CD): Likewise.
	(bit_cpu_AVX512BW): Likewise.
	(bit_cpu_AVX512VL): Likewise.
	(index_cpu_AVX512PF): Likewise.
	(index_cpu_AVX512ER): Likewise.
	(index_cpu_AVX512CD): Likewise.
	(index_cpu_AVX512BW): Likewise.
	(index_cpu_AVX512VL): Likewise.
	(reg_AVX512PF): Likewise.
	(reg_AVX512ER): Likewise.
	(reg_AVX512CD): Likewise.
	(reg_AVX512BW): Likewise.
	(reg_AVX512VL): Likewise.
2017-04-18 08:27:32 -07:00
H.J. Lu
b170d2e7ab Use CPU_FEATURES_CPU_P to check if AVX is available
Don't use bit_cpu_AVX directly.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Check AVX with
	CPU_FEATURES_CPU_P.
2017-03-17 11:38:13 -07:00
H.J. Lu
52ac22365a Use index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bit
* sysdeps/x86/cpu-features.c (init_cpu_features): Use
	index_cpu_RTM and reg_RTM to clear the bit_cpu_RTM bit.
2017-02-17 11:53:26 -08:00
Joseph Myers
bfff8b1bec Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
Andrew Senkevich
2702856bf4 Disable TSX on some Haswell processors.
Patch disables Intel TSX on some Haswell processors to avoid TSX
on kernels that weren't updated with the latest microcode package
(which disables broken feature by default).

    * sysdeps/x86/cpu-features.c (get_common_indeces): Add
    stepping identification.
    (init_cpu_features): Add handle of Haswell.
2016-12-19 14:15:57 +03:00
Carlos O'Donell
b3d17c1cf2 Bug 20689: Fix FMA and AVX2 detection on Intel
In the Intel Architecture Instruction Set Extensions Programming
reference the recommended way to test for FMA in section
'2.2.1 Detection of FMA' is:

"Application Software must identify that hardware supports AVX as
explained in ... after that it must also detect support for FMA..."

We don't do that in glibc. We use osxsave to detect the use of xgetbv,
and after that we check for AVX and FMA orthogonally. It is conceivable
that you could have the AVX bit clear and the FMA bit in an undefined
state.

This commit fixes FMA and AVX2 detection to depend on usable AVX
as required by the recommended Intel sequences.

v1: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00241.html
v2: https://www.sourceware.org/ml/libc-alpha/2016-10/msg00265.html
2016-10-17 19:39:54 -04:00
H.J. Lu
fb0f7a6755 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508]
There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions.  Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.

To avoid SSE transition penalty, if only the lower 128 bits of the first
8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers
with the zero upper bits.

For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero.  We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.

This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions.  It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.

_dl_runtime_resolve_avx_slow is added and used for AVX processors which
don't support XGETBV with ECX == 1.  Since there is no SSE transition
penalty on AVX512 processors which don't support XGETBV with ECX == 1,
_dl_runtime_resolve_avx512_slow isn't provided.

	[BZ #20495]
	[BZ #20508]
	* sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
	processors, set Use_dl_runtime_resolve_slow and set
	Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	New.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_Use_dl_runtime_resolve_opt): Likewise.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
	_dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
	if Use_dl_runtime_resolve_opt is set.  Use
	_dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
	(_dl_runtime_resolve_opt): New.  Defined for AVX and AVX512.
	(_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
	New.
	(_dl_runtime_resolve_opt): Likewise.
	(_dl_runtime_profile): Define only if _dl_runtime_profile is
	defined.
2016-09-06 08:51:07 -07:00
H.J. Lu
91655fc307 Check FMA after COMMON_CPUID_INDEX_80000001
Since the FMA4 bit is in COMMON_CPUID_INDEX_80000001 and FMA4 requires
AVX, determine if FMA4 is usable after COMMON_CPUID_INDEX_80000001 is
available and if AVX is usable.

	[BZ #20195]
	* sysdeps/x86/cpu-features.c (get_common_indeces): Move FMA4
	check to ...
	(init_cpu_features): Here.
2016-06-07 08:00:40 -07:00
H.J. Lu
2e2d9796da Detect Intel Goldmont and Airmont processors
Updated from the model numbers of Goldmont and Airmont processors in
Intel64 And IA-32 Processor Architectures Software Developer's Manual
Volume 3 Revision 058.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Detect Intel
	Goldmont and Airmont processors.
2016-04-15 05:23:06 -07:00
H.J. Lu
27d3ce1467 Remove Fast_Copy_Backward from Intel Core processors
Intel Core i3, i5 and i7 processors have fast unaligned copy and
copy backward is ignored.  Remove Fast_Copy_Backward from Intel Core
processors to avoid confusion.

	* sysdeps/x86/cpu-features.c (init_cpu_features): Don't set
	bit_arch_Fast_Copy_Backward for Intel Core proessors.
2016-04-01 15:09:14 -07:00
H.J. Lu
e41b395523 [x86] Add a feature bit: Fast_Unaligned_Copy
On AMD processors, memcpy optimized with unaligned SSE load is
slower than emcpy optimized with aligned SSSE3 while other string
functions are faster with unaligned SSE load.  A feature bit,
Fast_Unaligned_Copy, is added to select memcpy optimized with
unaligned SSE load.

	[BZ #19583]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel
	processors.  Set Fast_Copy_Backward for AMD Excavator
	processors.
	* sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy):
	New.
	(index_arch_Fast_Unaligned_Copy): Likewise.
	* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check
	Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
2016-03-28 04:40:03 -07:00
H.J. Lu
f781a9e961 Set index_arch_AVX_Fast_Unaligned_Load only for Intel processors
Since only Intel processors with AVX2 have fast unaligned load, we
should set index_arch_AVX_Fast_Unaligned_Load only for Intel processors.

Move AVX, AVX2, AVX512, FMA and FMA4 detection into get_common_indeces
and call get_common_indeces for other processors.

Add CPU_FEATURES_CPU_P and CPU_FEATURES_ARCH_P to aoid loading
GLRO(dl_x86_cpu_features) in cpu-features.c.

	[BZ #19583]
	* sysdeps/x86/cpu-features.c (get_common_indeces): Remove
	inline.  Check family before setting family, model and
	extended_model.  Set AVX, AVX2, AVX512, FMA and FMA4 usable
	bits here.
	(init_cpu_features): Replace HAS_CPU_FEATURE and
	HAS_ARCH_FEATURE with CPU_FEATURES_CPU_P and
	CPU_FEATURES_ARCH_P.  Set index_arch_AVX_Fast_Unaligned_Load
	for Intel processors with usable AVX2.  Call get_common_indeces
	for other processors with family == NULL.
	* sysdeps/x86/cpu-features.h (CPU_FEATURES_CPU_P): New macro.
	(CPU_FEATURES_ARCH_P): Likewise.
	(HAS_CPU_FEATURE): Use CPU_FEATURES_CPU_P.
	(HAS_ARCH_FEATURE): Use CPU_FEATURES_ARCH_P.
2016-03-22 07:47:20 -07:00
H.J. Lu
6aa3e97e25 Add _arch_/_cpu_ to index_*/bit_* in x86 cpu-features.h
index_* and bit_* macros are used to access cpuid and feature arrays o
struct cpu_features.  It is very easy to use bits and indices of cpuid
array on feature array, especially in assembly codes.  For example,
sysdeps/i386/i686/multiarch/bcopy.S has

	HAS_CPU_FEATURE (Fast_Rep_String)

which should be

	HAS_ARCH_FEATURE (Fast_Rep_String)

We change index_* and bit_* to index_cpu_*/index_arch_* and
bit_cpu_*/bit_arch_* so that we can catch such error at build time.

	[BZ #19762]
	* sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
	(EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
	* sysdeps/x86/cpu-features.h (bit_*): Renamed to ...
	(bit_arch_*): This for feature array.
	(bit_*): Renamed to ...
	(bit_cpu_*): This for cpu array.
	(index_*): Renamed to ...
	(index_arch_*): This for feature array.
	(index_*): Renamed to ...
	(index_cpu_*): This for cpu array.
	[__ASSEMBLER__] (HAS_FEATURE): Add and use field.
	[__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE.
	[__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE.
	[!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and
	bit_##name with index_cpu_##name and bit_cpu_##name.
	[!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and
	bit_##name with index_arch_##name and bit_arch_##name.
2016-03-10 05:27:07 -08:00
Amit Pawar
d7890e6947 Set index_Fast_Unaligned_Load for Excavator family CPUs
GLIBC benchtest testcases shows SSE2_Unaligned based implementations
are performing faster compare to SSE2 based implementations for
routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy
and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family
0x15h CPU's. This makes SSE2_Unaligned based implementations as
default for these routines.

	[BZ #19467]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	index_Fast_Unaligned_Load flag for Excavator family CPUs.
2016-01-14 08:14:31 -08:00