glibc/sysdeps/x86
H.J. Lu 799859f663 x86-64: Use _dl_runtime_resolve_opt only with AVX512F [BZ #21871]
On AVX machines with XGETBV (ECX == 1) like Skylake processors,

(gdb) disass _dl_runtime_resolve_avx_opt
Dump of assembler code for function _dl_runtime_resolve_avx_opt:
   0x0000000000015890 <+0>:	push   %rax
   0x0000000000015891 <+1>:	push   %rcx
   0x0000000000015892 <+2>:	push   %rdx
   0x0000000000015893 <+3>:	mov    $0x1,%ecx
   0x0000000000015898 <+8>:	xgetbv
   0x000000000001589b <+11>:	mov    %eax,%r11d
   0x000000000001589e <+14>:	pop    %rdx
   0x000000000001589f <+15>:	pop    %rcx
   0x00000000000158a0 <+16>:	pop    %rax
   0x00000000000158a1 <+17>:	and    $0x4,%r11d
   0x00000000000158a5 <+21>:	bnd je 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.

is slower than:

(gdb) disass _dl_runtime_resolve_avx_slow
Dump of assembler code for function _dl_runtime_resolve_avx_slow:
   0x0000000000015850 <+0>:	vorpd  %ymm0,%ymm1,%ymm8
   0x0000000000015854 <+4>:	vorpd  %ymm2,%ymm3,%ymm9
   0x0000000000015858 <+8>:	vorpd  %ymm4,%ymm5,%ymm10
   0x000000000001585c <+12>:	vorpd  %ymm6,%ymm7,%ymm11
   0x0000000000015860 <+16>:	vorpd  %ymm8,%ymm9,%ymm9
   0x0000000000015865 <+21>:	vorpd  %ymm10,%ymm11,%ymm10
   0x000000000001586a <+26>:	vpcmpeqd %xmm8,%xmm8,%xmm8
   0x000000000001586f <+31>:	vorpd  %ymm9,%ymm10,%ymm10
   0x0000000000015874 <+36>:	vptest %ymm10,%ymm8
   0x0000000000015879 <+41>:	bnd jae 0x158b0 <_dl_runtime_resolve_avx>
   0x000000000001587c <+44>:	vzeroupper
   0x000000000001587f <+47>:	bnd jmpq 0x16200 <_dl_runtime_resolve_sse_vex>
End of assembler dump.
(gdb)

since xgetbv takes much more cycles than single cycle operations like
vpord/vvpcmpeq/ptest.  _dl_runtime_resolve_opt should be used only with
AVX512 where AVX512 instructions lead to lower CPU frequency on Skylake
server.

	[BZ #21871]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	bit_arch_Use_dl_runtime_resolve_opt only with AVX512F.

(cherry picked from commit d2cf37c0a2)
2017-08-06 10:44:44 -07:00
..
bits float128: Add signbit alternative for old compilers 2017-06-30 18:34:29 -03:00
fpu Add float128 support for x86_64, x86. 2017-06-26 22:02:24 +00:00
nptl/bits Move shared pthread definitions to common headers 2017-05-09 17:49:17 -03:00
cacheinfo.c tunables: Add IFUNC selection and cache sizes 2017-06-20 08:37:28 -07:00
cpu-features-offsets.sym Remove x86 ifunc-defines.sym and rtld-global-offsets.sym 2016-05-11 05:51:39 -07:00
cpu-features.c x86-64: Use _dl_runtime_resolve_opt only with AVX512F [BZ #21871] 2017-08-06 10:44:44 -07:00
cpu-features.h x86: Rename glibc.tune.ifunc to glibc.tune.hwcaps 2017-06-21 10:21:37 -07:00
cpu-tunables.c x86: Rename glibc.tune.ifunc to glibc.tune.hwcaps 2017-06-21 10:21:37 -07:00
dl-get-cpu-features.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
dl-hwcap.h x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391] 2017-05-03 13:44:35 -07:00
dl-procinfo.c x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391] 2017-05-03 13:44:35 -07:00
dl-procinfo.h x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391] 2017-05-03 13:44:35 -07:00
dl-tunables.list x86: Rename glibc.tune.ifunc to glibc.tune.hwcaps 2017-06-21 10:21:37 -07:00
elide.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
float128-abi.h Add float128 support for x86_64, x86. 2017-06-26 22:02:24 +00:00
fpu_control.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
init-arch.h x86: Add macros to implement ifunce selection in C 2017-06-05 08:28:13 -07:00
libc-start.c Delay initialization of CPU features struct in static binaries 2017-05-31 06:38:33 +05:30
linkmap.h Rename bits/linkmap.h to linkmap.h (bug 14912). 2015-09-04 19:44:27 +00:00
Makefile Remove x86 ifunc-defines.sym and rtld-global-offsets.sym 2016-05-11 05:51:39 -07:00
math-tests.h Add float128 support for x86_64, x86. 2017-06-26 22:02:24 +00:00
string_private.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tininess.h Use sysdeps/x86/tininess.h for i386 and x86_64 2012-10-30 20:38:31 -07:00
tst-get-cpu-features-static.c Add _dl_x86_cpu_features to rtld_global 2015-08-13 03:41:22 -07:00
tst-get-cpu-features.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
Versions Add _dl_x86_cpu_features to rtld_global 2015-08-13 03:41:22 -07:00