glibc/sysdeps/x86_64
H.J. Lu b52b0d793d x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers.  It simplifies _dl_runtime_resolve and supports
different calling conventions.  ld.so code size is reduced by more than
1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.

Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:

                             Before    After     Change

Westmere (SSE)/fxsave         345      866       151%
IvyBridge (AVX)/xsave         420      643       53%
Haswell (AVX)/xsave           713      1252      75%
Skylake (AVX+MPX)/xsavec      559      719       28%
Skylake (AVX512+MPX)/xsavec   145      272       87%
Ryzen (AVX)/xsavec            280      553       97%

This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases.  With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.

On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises.  On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.

	[BZ #21265]
	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
	New.
	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
	and bit_arch_XSAVEC_Usable if needed.
	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
	and bit_arch_Use_dl_runtime_resolve_opt.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	Removed.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(bit_arch_Prefer_No_AVX512): Updated.
	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
	(bit_arch_XSAVEC_Usable): New.
	(STATE_SAVE_OFFSET): Likewise.
	(STATE_SAVE_MASK): Likewise.
	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
	(cpu_features): Add xsave_state_size and xsave_state_full_size.
	(index_arch_Use_dl_runtime_resolve_opt): Removed.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_XSAVEC_Usable): New.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
	is enabled.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
	_dl_runtime_resolve_xsavec.
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
	Removed.
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
	instead of VEC_SIZE.
	(REGISTER_SAVE_BND0): Removed.
	(REGISTER_SAVE_BND1): Likewise.
	(REGISTER_SAVE_BND3): Likewise.
	(REGISTER_SAVE_RAX): Always defined to 0.
	(VMOV): Removed.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_slow): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_avx512): Likewise.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(USE_FXSAVE): New.
	(_dl_runtime_resolve_fxsave): Likewise.
	(USE_XSAVE): Likewise.
	(_dl_runtime_resolve_xsave): Likewise.
	(USE_XSAVEC): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
	Removed.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(_dl_runtime_resolve_fxsave): New.
	(_dl_runtime_resolve_xsave): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
2017-10-20 11:00:34 -07:00
..
64 Move architecture shlib-versions files to Linux-specific directories. 2014-07-17 14:31:12 +00:00
fpu Make dbl-64 atan and tan into weak aliases. 2017-10-02 20:20:52 +00:00
multiarch Remove remaining _HAVE_STRING_ARCH_* definitions (BZ #18858) 2017-09-06 14:35:23 -03:00
nptl Remove CALL_THREAD_FCT macro 2017-04-04 18:03:35 -03:00
x32 Remove CALL_THREAD_FCT macro 2017-04-04 18:03:35 -03:00
____longjmp_chk.S ____longjmp_chk is now OS-specific. 2009-07-30 21:42:27 -07:00
__longjmp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
_mcount.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
abort-instr.h Update. 2001-09-19 10:37:31 +00:00
add_n.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
addmul_1.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
atomic-machine.h Optimize generic spinlock code and use C11 like atomic macros. 2017-06-06 09:41:56 +02:00
backtrace.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
bsd-_setjmp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
bsd-setjmp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
bzero.S Make an empty file. 2007-10-16 05:59:15 +00:00
configure Require binutils 2.25 or later to build glibc. 2017-06-28 11:31:50 +00:00
configure.ac Require binutils 2.25 or later to build glibc. 2017-06-28 11:31:50 +00:00
crti.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
crtn.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
dl-irel.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
dl-lookupcfg.h elf: Remove internal_function attribute 2017-08-31 16:59:37 +02:00
dl-machine.h x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
dl-procinfo.c Add sysdeps/x86/dl-procinfo.c 2017-04-10 12:01:45 -07:00
dl-runtime.c * elf/dl-runtime.c (reloc_offset): Define. 2009-03-15 00:26:14 +00:00
dl-tls.c x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
dl-tls.h x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
dl-tlsdesc.h elf: Remove internal_function attribute 2017-08-31 16:59:37 +02:00
dl-tlsdesc.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
dl-trampoline.h x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
dl-trampoline.S x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
ffs.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
ffsll.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
hp-timing.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
htonl.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
ifuncmain8.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
ifuncmod8.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
Implies Add float128 support for x86_64, x86. 2017-06-26 22:02:24 +00:00
jmpbuf-offsets.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
jmpbuf-unwind.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
l10nflist.c Minor optimization of popcount in l10nflist 2011-08-11 14:07:04 -04:00
ldbl2mpn.c [BZ #4586] 2007-06-08 02:50:59 +00:00
ldsodefs.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
link-defines.sym Replace __int128 with __int128_t 2014-05-30 10:50:21 -07:00
locale-defines.sym Implement optimized strcaecmp for x86-64. 2010-07-30 00:14:04 -07:00
localplt.data ld.so: Introduce struct dl_exception 2017-08-10 16:54:57 +02:00
lshift.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
machine-gmon.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
Makefile x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
memchr.S x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
memcmp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcopy.h X86-64: Add dummy memcopy.h and wordcopy.c 2016-06-09 04:38:34 -07:00
memcpy_chk.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcpy.S X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove 2016-06-08 13:58:08 -07:00
memmove_chk.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memmove.S x86-64: Use IFUNC memcpy and mempcpy in libc.a 2017-08-04 12:27:18 -07:00
mempcpy_chk.S x86_64: fix static build of __mempcpy_chk for compilers defaulting to PIC/PIE 2017-03-15 16:10:05 -07:00
mempcpy.S X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove 2016-06-08 13:58:08 -07:00
memrchr.S x86_64: Remove redundant REX bytes from memrchr.S 2017-06-05 07:41:26 -07:00
memset_chk.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memset.S x86: Remove __memset_zero_constant_len_parameter [BZ #21790] 2017-08-04 10:56:51 -07:00
memusage.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
mp_clz_tab.c * sysdeps/x86_64/mp_clz_tab.c: New file. 2009-04-15 04:30:41 +00:00
mul_1.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
preconfigure rename configure.in to configure.ac 2013-10-30 17:32:08 +10:00
preconfigure.ac rename configure.in to configure.ac 2013-10-30 17:32:08 +10:00
rawmemchr.S x86_64: Remove L(return_null) from rawmemchr.S 2017-05-20 06:13:38 -07:00
rshift.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
rtld-offsets.sym x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
sched_cpucount.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
setjmp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
stack-aliasing.h Clean up stack-coloring macros. 2014-06-20 19:50:16 -07:00
stackguard-macros.h BZ #15754: CVE-2013-4788 2013-09-23 00:52:09 -04:00
stackinfo.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
start.S x86-64: Check PIC instead of SHARED in start.S 2017-08-02 10:27:34 -07:00
stpcpy.S Update. 2004-05-28 06:56:51 +00:00
strcasecmp_l-nonascii.c Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
strcasecmp_l.S Implement optimized strcaecmp for x86-64. 2010-07-30 00:14:04 -07:00
strcasecmp.S Implement optimized strcaecmp for x86-64. 2010-07-30 00:14:04 -07:00
strcat.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strchr.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strchrnul.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcmp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcpy.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcspn.S x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C 2017-06-15 08:59:05 -07:00
strlen.S x86-64: Update strlen.S to support wcslen/wcsnlen 2017-06-05 07:58:23 -07:00
strncase_l-nonascii.c Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
strncase_l.S Add optimized strncasecmp versions for x86-64. 2010-08-14 22:04:01 -07:00
strncase.S Add optimized strncasecmp versions for x86-64. 2010-08-14 22:04:01 -07:00
strncmp.S Add SSE2 support to str{,n}cmp for x86-64. 2009-07-26 13:32:28 -07:00
strnlen.S Faster strlen on x64. 2013-03-18 07:39:12 +01:00
strpbrk.S x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C 2017-06-15 08:59:05 -07:00
strrchr.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strspn.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
sub_n.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
submul_1.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
sysdep.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tls_get_addr.S x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
tls-macros.h Split tls-macros.h into sysdeps directories. 2012-07-17 11:30:58 +00:00
tlsdesc.c elf: Remove internal_function attribute 2017-08-31 16:59:37 +02:00
tlsdesc.sym x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
tst-audit3.c Modify several tests to use test-skeleton.c 2014-11-05 15:24:08 +05:30
tst-audit4-aux.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-audit4.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-audit5.c Modify several tests to use test-skeleton.c 2014-11-05 15:24:08 +05:30
tst-audit6.c Modify several tests to use test-skeleton.c 2015-07-15 15:10:23 +05:30
tst-audit7.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-audit10-aux.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-audit10.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-audit.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-auditmod3a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod3b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod4a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod4b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod5a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod5b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod6a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod6b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod6c.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod7a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod7b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod10a.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-auditmod10b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-avx512-aux.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-avx512.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-avx512mod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-avx-aux.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-avx.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-avxmod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-mallocalign1.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-platform-1.c x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299] 2017-10-19 08:28:26 -07:00
tst-platformmod-1.c x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299] 2017-10-19 08:28:26 -07:00
tst-platformmod-2.c x86-64: Don't set GLRO(dl_platform) to NULL [BZ #22299] 2017-10-19 08:28:26 -07:00
tst-quad1.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-quad1pie.c Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-quad2.c Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-quad2pie.c Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-quadmod1.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-quadmod1pie.S Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-quadmod2.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-quadmod2pie.S Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-split-dynreloc.c Fix dynamic linker issue with bind-now 2015-08-19 05:37:01 -07:00
tst-split-dynreloc.lds Fix dynamic linker issue with bind-now 2015-08-19 05:37:01 -07:00
tst-sse.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-ssemod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-stack-align.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
tst-x86_64-1.c x86: Add x86_64 to x86-64 HWCAP [BZ #22093] 2017-09-11 08:18:32 -07:00
tst-x86_64mod-1.c x86: Add x86_64 to x86-64 HWCAP [BZ #22093] 2017-09-11 08:18:32 -07:00
Versions Work around old buggy program which cannot cope with memcpy semantics. 2011-04-01 19:38:21 -04:00
wcschr.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wcscmp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wcslen.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wcsrchr.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wmemset_chk.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
wmemset.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
wordcopy.c X86-64: Add dummy memcopy.h and wordcopy.c 2016-06-09 04:38:34 -07:00