In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers. It simplifies _dl_runtime_resolve and supports
different calling conventions. ld.so code size is reduced by more than
1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.
Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:
Before After Change
Westmere (SSE)/fxsave 345 866 151%
IvyBridge (AVX)/xsave 420 643 53%
Haswell (AVX)/xsave 713 1252 75%
Skylake (AVX+MPX)/xsavec 559 719 28%
Skylake (AVX512+MPX)/xsavec 145 272 87%
Ryzen (AVX)/xsavec 280 553 97%
This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases. With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.
On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises. On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.
[BZ #21265]
* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
New.
* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
(get_common_indeces): Set xsave_state_size, xsave_state_full_size
and bit_arch_XSAVEC_Usable if needed.
(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
and bit_arch_Use_dl_runtime_resolve_opt.
* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
Removed.
(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
(bit_arch_Prefer_No_AVX512): Updated.
(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
(bit_arch_XSAVEC_Usable): New.
(STATE_SAVE_OFFSET): Likewise.
(STATE_SAVE_MASK): Likewise.
[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
(cpu_features): Add xsave_state_size and xsave_state_full_size.
(index_arch_Use_dl_runtime_resolve_opt): Removed.
(index_arch_Use_dl_runtime_resolve_slow): Likewise.
(index_arch_XSAVEC_Usable): New.
* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
Support XSAVEC_Usable. Remove Use_dl_runtime_resolve_slow.
* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
is enabled.
* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
_dl_runtime_resolve_xsavec.
* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
Removed.
(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
instead of VEC_SIZE.
(REGISTER_SAVE_BND0): Removed.
(REGISTER_SAVE_BND1): Likewise.
(REGISTER_SAVE_BND3): Likewise.
(REGISTER_SAVE_RAX): Always defined to 0.
(VMOV): Removed.
(_dl_runtime_resolve_avx): Likewise.
(_dl_runtime_resolve_avx_slow): Likewise.
(_dl_runtime_resolve_avx_opt): Likewise.
(_dl_runtime_resolve_avx512): Likewise.
(_dl_runtime_resolve_avx512_opt): Likewise.
(_dl_runtime_resolve_sse): Likewise.
(_dl_runtime_resolve_sse_vex): Likewise.
(USE_FXSAVE): New.
(_dl_runtime_resolve_fxsave): Likewise.
(USE_XSAVE): Likewise.
(_dl_runtime_resolve_xsave): Likewise.
(USE_XSAVEC): Likewise.
(_dl_runtime_resolve_xsavec): Likewise.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
Removed.
(_dl_runtime_resolve_avx512_opt): Likewise.
(_dl_runtime_resolve_avx): Likewise.
(_dl_runtime_resolve_avx_opt): Likewise.
(_dl_runtime_resolve_sse): Likewise.
(_dl_runtime_resolve_sse_vex): Likewise.
(_dl_runtime_resolve_fxsave): New.
(_dl_runtime_resolve_xsave): Likewise.
(_dl_runtime_resolve_xsavec): Likewise.
Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the
"x86_64" subdirectory when loading a shared library. ld.so in glibc
2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based
on supported ISAs. This led to shared library loading failure for
shared libraries placed under the "x86_64" subdirectory.
This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always
search the "x86_64" subdirectory when loading a shared library.
NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip
the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi"
machines.
Tested on i686 and x86-64.
[BZ #22093]
* sysdeps/x86/cpu-features.c (init_cpu_features): Initialize
GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64.
* sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated.
(HWCAP_IMPORTANT): Likewise.
(HWCAP_X86_64): New enum.
(HWCAP_X86_AVX512_1): Updated.
* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64".
* sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1.
(modules-names): Add x86_64/tst-x86_64mod-1.
(LDFLAGS-tst-x86_64mod-1.so): New.
($(objpfx)tst-x86_64-1): Likewise.
($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise.
(tst-x86_64-1-clean): Likewise.
* sysdeps/x86_64/tst-x86_64-1.c: New file.
* sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.
Since gold doesn't support INSERT in linker script:
https://sourceware.org/bugzilla/show_bug.cgi?id=21676
tst-split-dynreloc fails to link with gold. Check if linker supports
INSERT in linker script before using it.
* config.make.in (have-insert): New.
* configure.ac (libc_cv_insert): New. Set to yes if linker
supports INSERT in linker script.
(AC_SUBST(libc_cv_insert): New.
* configure: Regenerated.
* sysdeps/x86_64/Makefile (tests): Add tst-split-dynreloc only
if $(have-insert) == yes.
This change forces realignment of the stack pointer in __tls_get_addr, so
that binaries compiled by GCCs older than GCC 4.9:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066
continue to work even if vector instructions are used in glibc which
require the ABI stack realignment.
__tls_get_addr_slow is added to handle the slow paths in the default
implementation of__tls_get_addr in elf/dl-tls.c. The new __tls_get_addr
calls __tls_get_addr_slow after realigning the stack. Internal calls
within ld.so go directly to the default implementation of __tls_get_addr
because they do not need stack realignment.
[BZ #21609]
* sysdeps/x86_64/Makefile (sysdep-dl-routines): Add tls_get_addr.
(gen-as-const-headers): Add rtld-offsets.sym.
* sysdeps/x86_64/dl-tls.c: New file.
* sysdeps/x86_64/rtld-offsets.sym: Likwise.
* sysdeps/x86_64/tls_get_addr.S: Likewise.
* sysdeps/x86_64/dl-tls.h: Add multiple inclusion guards.
* sysdeps/x86_64/tlsdesc.sym (TI_MODULE_OFFSET): New.
(TI_OFFSET_OFFSET): Likwise.
With stack protection enabled, these files have external symbol
references for the first time, so the fact that they are not compiled
with -fPIE and are then linked into a -pie binary starts to hurt.
No need to compile x86_64 _mcount.S with -pg. We can just copy the
normal static object.
* gmon/Makefile (noprof): Add $(sysdep_noprof).
* sysdeps/x86_64/Makefile (sysdep_noprof): Add _mcount.
GCC added support for -mno-vzeroupper in version 4.6. Thus the
configure tests for this support are obsolete, and this patch removes
them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).
* sysdeps/i386/configure.ac (libc_cv_cc_novzeroupper): Remove
configure test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (libc_cv_cc_novzeroupper): Remove
configure test.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/Makefile [$(config-cflags-novzeroupper) = yes]:
Make code unconditional.
Since x86-64 ld.so preserves vector registers now, we can use SSE in
x86-64 ld.so. We should run tst-ld-sse-use.sh only on i386.
* sysdeps/x86/Makefile [$(subdir) == elf] (CFLAGS-.os,
tests-special, $(objpfx)tst-ld-sse-use.out): Moved to ...
* sysdeps/i386/Makefile [$(subdir) == elf] (CFLAGS-.os,
tests-special, $(objpfx)tst-ld-sse-use.out): Here. Update
comments.
* sysdeps/x86_64/Makefile [$(subdir) == elf] (CFLAGS-.os): Add
-mno-mmx for $(all-rtld-routines).
* sysdeps/x86/tst-ld-sse-use.sh: Moved to ...
* sysdeps/i386/tst-ld-sse-use.sh: Here. Replace x86-64 with
i386.
This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve
and _dl_runtime_profile, which save and restore the first 8 vector
registers used for parameter passing. elf_machine_runtime_setup
selects the proper _dl_runtime_resolve or _dl_runtime_profile based
on _dl_x86_cpu_features. It avoids race condition caused by
FOREIGN_CALL macros, which are only used for x86-64.
Performance impact of saving and restoring 8 vector registers are
negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when
ld.so is optimized with SSE2.
[BZ #15128]
* sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add
ifuncmain8.
(modules-names): Add ifuncmod8.
($(objpfx)ifuncmain8): New rule.
* sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and
<cpuid.h>.
(elf_machine_runtime_setup): Use _dl_runtime_resolve_sse,
_dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512,
_dl_runtime_profile_sse, _dl_runtime_profile_avx, or
_dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE.
* sysdeps/x86_64/dl-trampoline.S: Rewrite.
* sysdeps/x86_64/dl-trampoline.h: Likewise.
* sysdeps/x86_64/ifuncmain8.c: New file.
* sysdeps/x86_64/ifuncmod8.c: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE):
Removed.
* sysdeps/x86_64/nptl/tls.h (__128bits): Removed.
(tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1.
Change rtld_savespace_sse to __glibc_unused2.
(RTLD_CHECK_FOREIGN_CALL): Removed.
(RTLD_ENABLE_FOREIGN_CALL): Likewise.
(RTLD_PREPARE_FOREIGN_CALL): Likewise.
(RTLD_FINALIZE_FOREIGN_CALL): Likewise.
Fix the bind-now case when DT_REL and DT_JMPREL sections are separate
and there is a gap between them.
[BZ #14341]
* elf/dynamic-link.h (elf_machine_lazy_rel): Properly handle the
case when there is a gap between DT_REL and DT_JMPREL sections.
* sysdeps/x86_64/Makefile (tests): Add tst-split-dynreloc.
(LDFLAGS-tst-split-dynreloc): New.
(tst-split-dynreloc-ENV): Likewise.
* sysdeps/x86_64/tst-split-dynreloc.c: New file.
* sysdeps/x86_64/tst-split-dynreloc.lds: Likewise.
AVX-512 ISA adds 512-bit zmm registers. This patch updates
_dl_runtime_profile to pass zmm registers to run-time audit. It also
changes _dl_x86_64_save_sse and _dl_x86_64_restore_sse to upport zmm
registers, which are called when only when RTLD_PREPARE_FOREIGN_CALL
is used. Its performance impact is minimum.
* config.h.in (HAVE_AVX512_SUPPORT): New #undef.
(HAVE_AVX512_ASM_SUPPORT): Likewise.
* sysdeps/x86_64/bits/link.h (La_x86_64_zmm): New.
(La_x86_64_vector): Add zmm.
* sysdeps/x86_64/Makefile (tests): Add tst-audit10.
(modules-names): Add tst-auditmod10a and tst-auditmod10b.
($(objpfx)tst-audit10): New target.
($(objpfx)tst-audit10.out): Likewise.
(tst-audit10-ENV): New.
(AVX512-CFLAGS): Likewise.
(CFLAGS-tst-audit10.c): Likewise.
(CFLAGS-tst-auditmod10a.c): Likewise.
(CFLAGS-tst-auditmod10b.c): Likewise.
* sysdeps/x86_64/configure.ac: Set config-cflags-avx512,
HAVE_AVX512_SUPPORT and HAVE_AVX512_ASM_SUPPORT.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Add
AVX-512 zmm register support.
(_dl_x86_64_save_sse): Likewise.
(_dl_x86_64_restore_sse): Likewise.
* sysdeps/x86_64/dl-trampoline.h: Updated to support different
size vector registers.
* sysdeps/x86_64/link-defines.sym (YMM_SIZE): New.
(ZMM_SIZE): Likewise.
* sysdeps/x86_64/tst-audit10.c: New file.
* sysdeps/x86_64/tst-auditmod10a.c: Likewise.
* sysdeps/x86_64/tst-auditmod10b.c: Likewise.
The test now takes the callgraph into account. Only code called
during runtime relocation is affected by the limitation. We now
determine the affected object files as closely as possible from
the outside. This allowed to remove some the specializations
for some of the string functions as they are only used in other
code paths.
This patch introduces a test to make sure no function modifies the
xmm/ymm registers. With the exception of the auditing functions.
The test is probably too pessimistic. All code linked into ld.so
is checked. Perhaps at some point the callgraph starting from
_dl_fixup and _dl_profile_fixup is checked and we can start using
faster SSE-using functions in parts of ld.so.
* sysdeps/unix/sysv/linux/x86_64/Makefile: New file.
* sysdeps/unix/sysv/linux/x86_64/Versions: New file.
* sysdeps/unix/sysv/linux/x86_64/bits/fcntl.h: New file.
* sysdeps/unix/sysv/linux/x86_64/bits/mman.h: New file.
* sysdeps/unix/sysv/linux/x86_64/bits/stat.h: New file.
* sysdeps/unix/sysv/linux/x86_64/bits/statfs.h: New file.
* sysdeps/unix/sysv/linux/x86_64/bits/time.h: New file.
* sysdeps/unix/sysv/linux/x86_64/bits/types.h: New file.
* sysdeps/unix/sysv/linux/x86_64/brk.c: New file.
* sysdeps/unix/sysv/linux/x86_64/clone.S: New file.
* sysdeps/unix/sysv/linux/x86_64/fstatfs64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/ftruncate64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/fxstat.c: New file.
* sysdeps/unix/sysv/linux/x86_64/fxstat64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/getdents.c: New file.
* sysdeps/unix/sysv/linux/x86_64/getdents64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/getrlimit64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/gettimeofday.c: New file.
* sysdeps/unix/sysv/linux/x86_64/glob64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/lxstat.c: New file.
* sysdeps/unix/sysv/linux/x86_64/lxstat64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/mmap64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/pread64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/profil-counter.h: New file.
* sysdeps/unix/sysv/linux/x86_64/pwrite64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/readdir.c: New file.
* sysdeps/unix/sysv/linux/x86_64/readdir64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/readdir64_r.c: New file.
* sysdeps/unix/sysv/linux/x86_64/readdir_r.c: New file.
* sysdeps/unix/sysv/linux/x86_64/recv.c: New file.
* sysdeps/unix/sysv/linux/x86_64/register-dump.h: New file.
* sysdeps/unix/sysv/linux/x86_64/send.c: New file.
* sysdeps/unix/sysv/linux/x86_64/setrlimit64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/sigaction.c: New file.
* sysdeps/unix/sysv/linux/x86_64/sigcontextinfo.h: New file.
* sysdeps/unix/sysv/linux/x86_64/sigpending.c: New file.
* sysdeps/unix/sysv/linux/x86_64/sigprocmask.c: New file.
* sysdeps/unix/sysv/linux/x86_64/sigsuspend.c: New file.
* sysdeps/unix/sysv/linux/x86_64/statfs64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/sys/perm.h: New file.
* sysdeps/unix/sysv/linux/x86_64/sys/procfs.h: New file.
* sysdeps/unix/sysv/linux/x86_64/sys/reg.h: New file.
* sysdeps/unix/sysv/linux/x86_64/sys/ucontext.h: New file.
* sysdeps/unix/sysv/linux/x86_64/sys/user.h: New file.
* sysdeps/unix/sysv/linux/x86_64/syscall.S: New file.
* sysdeps/unix/sysv/linux/x86_64/syscalls.list: New file.
* sysdeps/unix/sysv/linux/x86_64/sysdep.S: New file.
* sysdeps/unix/sysv/linux/x86_64/sysdep.h: New file.
* sysdeps/unix/sysv/linux/x86_64/time.c: New file.
* sysdeps/unix/sysv/linux/x86_64/truncate64.c: New file.
* sysdeps/unix/sysv/linux/x86_64/umount.c: New file.
* sysdeps/unix/sysv/linux/x86_64/vfork.S: New file.
* sysdeps/unix/sysv/linux/x86_64/xstat.c: New file.
* sysdeps/unix/sysv/linux/x86_64/xstat64.c: New file.
* sysdeps/unix/x86_64/sysdep.S: New file.
* sysdeps/unix/x86_64/sysdep.h: New file.
* sysdeps/x86_64/Implies: New file.
* sysdeps/x86_64/Makefile: New file.
* sysdeps/x86_64/Versions: New file.
* sysdeps/x86_64/__longjmp.S: New file.
* sysdeps/x86_64/abort-instr.h: New file.
* sysdeps/x86_64/atomicity.h: New file.
* sysdeps/x86_64/bits/endian.h: New file.
* sysdeps/x86_64/bits/setjmp.h: New file.
* sysdeps/x86_64/bits/string.h: New file.
* sysdeps/x86_64/bp-asm.h: New file.
* sysdeps/x86_64/bsd-_setjmp.S: New file.
* sysdeps/x86_64/bsd-setjmp.S: New file.
* sysdeps/x86_64/dl-machine.h: New file.
* sysdeps/x86_64/elf/initfini.c: New file.
* sysdeps/x86_64/elf/start.S: New file.
* sysdeps/x86_64/ffs.c: New file.
* sysdeps/x86_64/ffsll.c: New file.
* sysdeps/x86_64/fpu/bits/fenv.h: New file.
* sysdeps/x86_64/fpu/bits/mathdef.h: New file.
* sysdeps/x86_64/fpu/e_acosl.c: New file.
* sysdeps/x86_64/fpu/e_atan2l.c: New file.
* sysdeps/x86_64/fpu/e_exp2l.S: New file.
* sysdeps/x86_64/fpu/e_expl.c: New file.
* sysdeps/x86_64/fpu/e_fmodl.S: New file.
* sysdeps/x86_64/fpu/e_log10l.S: New file.
* sysdeps/x86_64/fpu/e_log2l.S: New file.
* sysdeps/x86_64/fpu/e_logl.S: New file.
* sysdeps/x86_64/fpu/e_powl.S: New file.
* sysdeps/x86_64/fpu/e_rem_pio2l.c: New file.
* sysdeps/x86_64/fpu/e_scalbl.S: New file.
* sysdeps/x86_64/fpu/e_sqrtl.c: New file.
* sysdeps/x86_64/fpu/fclrexcpt.c: New file.
* sysdeps/x86_64/fpu/fedisblxcpt.c: New file.
* sysdeps/x86_64/fpu/feenablxcpt.c: New file.
* sysdeps/x86_64/fpu/fegetenv.c: New file.
* sysdeps/x86_64/fpu/fegetexcept.c: New file.
* sysdeps/x86_64/fpu/fegetround.c: New file.
* sysdeps/x86_64/fpu/feholdexcpt.c: New file.
* sysdeps/x86_64/fpu/fesetenv.c: New file.
* sysdeps/x86_64/fpu/fesetround.c: New file.
* sysdeps/x86_64/fpu/fgetexcptflg.c: New file.
* sysdeps/x86_64/fpu/fraiseexcpt.c: New file.
* sysdeps/x86_64/fpu/fsetexcptflg.c: New file.
* sysdeps/x86_64/fpu/ftestexcept.c: New file.
* sysdeps/x86_64/fpu/libm-test-ulps: New file.
* sysdeps/x86_64/fpu/math_ldbl.h: New file.
* sysdeps/x86_64/fpu/printf_fphex.c: New file.
* sysdeps/x86_64/fpu/s_atanl.c: New file.
* sysdeps/x86_64/fpu/s_cosl.S: New file.
* sysdeps/x86_64/fpu/s_expm1l.S: New file.
* sysdeps/x86_64/fpu/s_fpclassifyl.c: New file.
* sysdeps/x86_64/fpu/s_isinfl.c: New file.
* sysdeps/x86_64/fpu/s_isnanl.c: New file.
* sysdeps/x86_64/fpu/s_log1pl.S: New file.
* sysdeps/x86_64/fpu/s_logbl.c: New file.
* sysdeps/x86_64/fpu/s_nextafterl.c: New file.
* sysdeps/x86_64/fpu/s_nexttoward.c: New file.
* sysdeps/x86_64/fpu/s_nexttowardf.c: New file.
* sysdeps/x86_64/fpu/s_rintl.c: New file.
* sysdeps/x86_64/fpu/s_significandl.c: New file.
* sysdeps/x86_64/fpu/s_sincosl.S: New file.
* sysdeps/x86_64/fpu/s_sinl.S: New file.
* sysdeps/x86_64/fpu/s_tanl.S: New file.
* sysdeps/x86_64/gmp-mparam.h: New file.
* sysdeps/x86_64/hp-timing.c: New file.
* sysdeps/x86_64/hp-timing.h: New file.
* sysdeps/x86_64/htonl.S: New file.
* sysdeps/x86_64/memusage.h: New file.
* sysdeps/x86_64/setjmp.S: New file.
* sysdeps/x86_64/soft-fp/sfp-machine.h: New file.
* sysdeps/x86_64/stackinfo.h: New file.
* sysdeps/x86_64/sysdep.h: New file.
* sysdeps/unix/sysv/linux/x86_64/ldd-rewrite.sed: New file.