glibc/sysdeps/x86
H.J. Lu b52b0d793d x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers.  It simplifies _dl_runtime_resolve and supports
different calling conventions.  ld.so code size is reduced by more than
1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.

Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:

                             Before    After     Change

Westmere (SSE)/fxsave         345      866       151%
IvyBridge (AVX)/xsave         420      643       53%
Haswell (AVX)/xsave           713      1252      75%
Skylake (AVX+MPX)/xsavec      559      719       28%
Skylake (AVX512+MPX)/xsavec   145      272       87%
Ryzen (AVX)/xsavec            280      553       97%

This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases.  With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.

On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises.  On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.

	[BZ #21265]
	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
	New.
	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
	and bit_arch_XSAVEC_Usable if needed.
	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
	and bit_arch_Use_dl_runtime_resolve_opt.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	Removed.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(bit_arch_Prefer_No_AVX512): Updated.
	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
	(bit_arch_XSAVEC_Usable): New.
	(STATE_SAVE_OFFSET): Likewise.
	(STATE_SAVE_MASK): Likewise.
	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
	(cpu_features): Add xsave_state_size and xsave_state_full_size.
	(index_arch_Use_dl_runtime_resolve_opt): Removed.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_XSAVEC_Usable): New.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
	is enabled.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
	_dl_runtime_resolve_xsavec.
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
	Removed.
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
	instead of VEC_SIZE.
	(REGISTER_SAVE_BND0): Removed.
	(REGISTER_SAVE_BND1): Likewise.
	(REGISTER_SAVE_BND3): Likewise.
	(REGISTER_SAVE_RAX): Always defined to 0.
	(VMOV): Removed.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_slow): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_avx512): Likewise.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(USE_FXSAVE): New.
	(_dl_runtime_resolve_fxsave): Likewise.
	(USE_XSAVE): Likewise.
	(_dl_runtime_resolve_xsave): Likewise.
	(USE_XSAVEC): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
	Removed.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(_dl_runtime_resolve_fxsave): New.
	(_dl_runtime_resolve_xsave): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
2017-10-20 11:00:34 -07:00
..
bits Simplify HUGE_VAL definitions. 2017-08-31 15:50:50 +00:00
fpu Remove ancient __signbit inlines 2017-09-28 19:52:13 +01:00
nptl/bits Move shared pthread definitions to common headers 2017-05-09 17:49:17 -03:00
cacheinfo.c tunables: Add IFUNC selection and cache sizes 2017-06-20 08:37:28 -07:00
cpu-features-offsets.sym x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
cpu-features.c x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
cpu-features.h x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
cpu-tunables.c x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
dl-get-cpu-features.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
dl-hwcap.h x86: Add x86_64 to x86-64 HWCAP [BZ #22093] 2017-09-11 08:18:32 -07:00
dl-procinfo.c x86: Add x86_64 to x86-64 HWCAP [BZ #22093] 2017-09-11 08:18:32 -07:00
dl-procinfo.h x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391] 2017-05-03 13:44:35 -07:00
dl-tunables.list x86: Rename glibc.tune.ifunc to glibc.tune.hwcaps 2017-06-21 10:21:37 -07:00
elide.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
float128-abi.h Add float128 support for x86_64, x86. 2017-06-26 22:02:24 +00:00
fpu_control.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
init-arch.h Add common ifunc-init.h header 2017-10-17 12:01:22 -02:00
libc-start.c Delay initialization of CPU features struct in static binaries 2017-05-31 06:38:33 +05:30
linkmap.h Rename bits/linkmap.h to linkmap.h (bug 14912). 2015-09-04 19:44:27 +00:00
Makefile Remove x86 ifunc-defines.sym and rtld-global-offsets.sym 2016-05-11 05:51:39 -07:00
math-tests.h Add float128 support for x86_64, x86. 2017-06-26 22:02:24 +00:00
string_private.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tininess.h Use sysdeps/x86/tininess.h for i386 and x86_64 2012-10-30 20:38:31 -07:00
tst-get-cpu-features-static.c Add _dl_x86_cpu_features to rtld_global 2015-08-13 03:41:22 -07:00
tst-get-cpu-features.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
Versions Add _dl_x86_cpu_features to rtld_global 2015-08-13 03:41:22 -07:00