glibc/sysdeps
H.J. Lu b52b0d793d x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers.  It simplifies _dl_runtime_resolve and supports
different calling conventions.  ld.so code size is reduced by more than
1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.

Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:

                             Before    After     Change

Westmere (SSE)/fxsave         345      866       151%
IvyBridge (AVX)/xsave         420      643       53%
Haswell (AVX)/xsave           713      1252      75%
Skylake (AVX+MPX)/xsavec      559      719       28%
Skylake (AVX512+MPX)/xsavec   145      272       87%
Ryzen (AVX)/xsavec            280      553       97%

This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases.  With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.

On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises.  On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.

	[BZ #21265]
	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
	New.
	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
	and bit_arch_XSAVEC_Usable if needed.
	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
	and bit_arch_Use_dl_runtime_resolve_opt.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	Removed.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(bit_arch_Prefer_No_AVX512): Updated.
	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
	(bit_arch_XSAVEC_Usable): New.
	(STATE_SAVE_OFFSET): Likewise.
	(STATE_SAVE_MASK): Likewise.
	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
	(cpu_features): Add xsave_state_size and xsave_state_full_size.
	(index_arch_Use_dl_runtime_resolve_opt): Removed.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_XSAVEC_Usable): New.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
	is enabled.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
	_dl_runtime_resolve_xsavec.
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
	Removed.
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
	instead of VEC_SIZE.
	(REGISTER_SAVE_BND0): Removed.
	(REGISTER_SAVE_BND1): Likewise.
	(REGISTER_SAVE_BND3): Likewise.
	(REGISTER_SAVE_RAX): Always defined to 0.
	(VMOV): Removed.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_slow): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_avx512): Likewise.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(USE_FXSAVE): New.
	(_dl_runtime_resolve_fxsave): Likewise.
	(USE_XSAVE): Likewise.
	(_dl_runtime_resolve_xsave): Likewise.
	(USE_XSAVEC): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
	Removed.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(_dl_runtime_resolve_fxsave): New.
	(_dl_runtime_resolve_xsave): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
2017-10-20 11:00:34 -07:00
..
aarch64 [AARCH64] Rewrite elf_machine_load_address using _DYNAMIC symbol 2017-10-18 17:35:16 +01:00
alpha Use libm_alias_double for dbl-64 fma. 2017-10-04 20:32:48 +00:00
arm Update ARM libm-test-ulps. 2017-10-05 22:17:30 +00:00
generic Add _Float128 function aliases. 2017-10-18 17:37:18 +00:00
gnu hurd: Fix getifaddrs' and freeifaddrs' symbol exposition 2017-09-28 01:05:18 +02:00
hppa ld.so: Replace (&bootstrap_map) with BOOTSTRAP_MAP 2017-10-03 01:55:12 -07:00
i386 i386: Regenerate libm-test-ulps 2017-10-19 11:51:57 -07:00
ia64 ld.so: Replace (&bootstrap_map) with BOOTSTRAP_MAP 2017-10-03 01:55:12 -07:00
ieee754 Add _Float128 function aliases. 2017-10-18 17:37:18 +00:00
init_array Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
m68k m68k: Update elf_machine_load_address for static PIE 2017-10-20 03:36:47 -07:00
mach Introduce NO_RTLD_HIDDEN, make hurd use it instead of NO_HIDDEN 2017-10-03 01:33:38 +02:00
microblaze Obsolete pow10 functions. 2017-09-01 21:13:18 +00:00
mips Add MIPS bits/floatn.h. 2017-10-19 17:59:41 +00:00
nios2 Enable unwind info in libc-start.c and backtrace.c 2017-09-19 15:07:58 +01:00
nptl Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
posix sysconf: Fix missing definition of UIO_MAXIOV on Linux [BZ #22321] 2017-10-20 04:10:15 +02:00
powerpc powerpc: fix check-before-set in SET_RESTORE_ROUND 2017-10-18 12:08:28 -02:00
pthread aio: Remove internal_function function attribute 2017-08-31 15:59:06 +02:00
s390 S390: Regenerate ULPs 2017-10-05 12:50:49 +02:00
sh Enable unwind info in libc-start.c and backtrace.c 2017-09-19 15:07:58 +01:00
sparc Fix TLS relocations against local symbols on powerpc32, sparc32 and sparc64 2017-10-13 16:14:16 -03:00
tile Remove ancient __signbit inlines 2017-09-28 19:52:13 +01:00
unix sysconf: Fix missing definition of UIO_MAXIOV on Linux [BZ #22321] 2017-10-20 04:10:15 +02:00
wordsize-32 Build divdi3 only for architecture that required it 2017-04-06 15:14:34 -03:00
wordsize-64 posix: Consolidate Linux glob implementation 2017-09-08 16:34:02 +02:00
x86 x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
x86_64 x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00