This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve
and _dl_runtime_profile, which save and restore the first 8 vector
registers used for parameter passing. elf_machine_runtime_setup
selects the proper _dl_runtime_resolve or _dl_runtime_profile based
on _dl_x86_cpu_features. It avoids race condition caused by
FOREIGN_CALL macros, which are only used for x86-64.
Performance impact of saving and restoring 8 vector registers are
negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when
ld.so is optimized with SSE2.
[BZ #15128]
* sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add
ifuncmain8.
(modules-names): Add ifuncmod8.
($(objpfx)ifuncmain8): New rule.
* sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and
<cpuid.h>.
(elf_machine_runtime_setup): Use _dl_runtime_resolve_sse,
_dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512,
_dl_runtime_profile_sse, _dl_runtime_profile_avx, or
_dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE.
* sysdeps/x86_64/dl-trampoline.S: Rewrite.
* sysdeps/x86_64/dl-trampoline.h: Likewise.
* sysdeps/x86_64/ifuncmain8.c: New file.
* sysdeps/x86_64/ifuncmod8.c: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE):
Removed.
* sysdeps/x86_64/nptl/tls.h (__128bits): Removed.
(tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1.
Change rtld_savespace_sse to __glibc_unused2.
(RTLD_CHECK_FOREIGN_CALL): Removed.
(RTLD_ENABLE_FOREIGN_CALL): Likewise.
(RTLD_PREPARE_FOREIGN_CALL): Likewise.
(RTLD_FINALIZE_FOREIGN_CALL): Likewise.
This patch removes configure tests for assembler CFI support (and
thereby eliminates an architecture-specific case in the main
configure.ac), instead assuming that support is present
unconditionally.
The main test was added in 2003 around the time CFI support was added
to the assembler. cfi_personality and cfi_lsda support were added to
the assembler in 2006. cfi_sections support was added in 2009, a few
weeks before binutils 2.20 was released; it's in 2.20, the minimum
supported version, so even that configure test is obsolete.
Tested x86_64 that the installed shared libraries are unchanged by
this patch.
* configure.ac (libc_cv_asm_cfi_directives): Remove configure
test.
* configure: Regenerated.
* config.h.in (HAVE_ASM_CFI_DIRECTIVES): Remove macro undefine.
* sysdeps/arm/configure.ac (libc_cv_asm_cfi_directive_sections):
Remove configure test.
* sysdeps/arm/configure: Regenerated.
* sysdeps/nptl/configure.ac: Do not check
libc_cv_asm_cfi_directives.
* sysdeps/nptl/configure: Regenerated.
* sysdeps/x86_64/nptl/configure.ac: Remove file.
* sysdeps/x86_64/nptl/configure: Remove generated file.
* b/sysdeps/generic/sysdep.h [HAVE_ASM_CFI_DIRECTIVES]: Make code
unconditional.
[!HAVE_ASM_CFI_DIRECTIVES]: Remove conditional code.
This patch makes files using __ASSUME_* macros include
<kernel-features.h> explicitly, rather than relying on some other
header (such as tls.h, lowlevellock.h or pthreadP.h) to include it
implicitly. (I omitted cases where I've already posted or am testing
the patch that stops the file from needing __ASSUME_* at all.) This
accords with the general principle of making source files include the
headers for anything they use, and also helps make it safe to remove
<kernel-features.h> includes from any file that doesn't use
__ASSUME_* (some of those may be stray includes left behind after
increasing the minimum kernel version, others may never have been
needed or may have become obsolete after some other change).
Tested x86_64 that the disassembly of installed shared libraries is
unchanged by this patch.
* nptl/pthread_cond_wait.c: Include <kernel-features.h>.
* nptl/pthread_rwlock_timedrdlock.c: Likewise.
* nptl/pthread_rwlock_timedwrlock.c: Likewise.
* nptl/sysdeps/unix/sysv/linux/lowlevelrobustlock.c: Likewise.
* nscd/nscd.c: Likewise.
* sysdeps/i386/nptl/tcb-offsets.sym: Likewise.
* sysdeps/powerpc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sh/nptl/tcb-offsets.sym: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym: Likewise.