dl_platform and dl_hwcap are set from AT_PLATFORM and AT_HWCAP very
early during startup. They are used by dynamic linker to determine
platform and build an array of hardware capability names, which are
added to search path when loading shared object. dl_platform and
dl_hwcap are unused on x86-64. On i386, i386, i486, i586 and i686
platforms were supported and only SSE2 capability was used.
On x86, usage of AT_PLATFORM and AT_HWCAP to determine platform and
processor capabilities is obsolete since all information is available
in dl_x86_cpu_features. This patch sets dl_platform and dl_hwcap from
dl_x86_cpu_features in dynamic linker. On i386, the available plaforms
are changed to i586 and i686 since i386 has been deprecated. On x86-64,
the available plaforms are haswell, which is for Haswell class processors
with BMI1, BMI2, LZCNT, MOVBE, POPCNT, AVX2 and FMA, and xeon_phi, which
is for Xeon Phi class processors with AVX512F, AVX512CD, AVX512ER and
AVX512PF. A capability, avx512_1, is also added to x86-64 for AVX512
ISAs: AVX512F, AVX512CD, AVX512BW, AVX512DQ and AVX512VL.
[BZ #21391]
* sysdeps/i386/dl-machine.h (dl_platform_init) [IS_IN (rtld)]:
Only call init_cpu_features.
[!IS_IN (rtld)]: Only set GLRO(dl_platform) to NULL if needed.
* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
* sysdeps/i386/dl-procinfo.h: Removed.
* sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Don't include
<sysdeps/i386/dl-procinfo.h> nor <ldsodefs.h>. Include
<sysdeps/x86/dl-procinfo.h>.
(_dl_procinfo): Replace _DL_HWCAP_COUNT with 32.
* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.h [!IS_IN (ldconfig)]:
Include <sysdeps/x86/dl-procinfo.h> instead of
<sysdeps/generic/dl-procinfo.h>.
* sysdeps/x86/cpu-features.c: Include <dl-hwcap.h>.
(init_cpu_features): Set dl_platform, dl_hwcap and dl_hwcap_mask.
* sysdeps/x86/cpu-features.h (bit_cpu_LZCNT): New.
(bit_cpu_MOVBE): Likewise.
(bit_cpu_BMI1): Likewise.
(bit_cpu_BMI2): Likewise.
(index_cpu_BMI1): Likewise.
(index_cpu_BMI2): Likewise.
(index_cpu_LZCNT): Likewise.
(index_cpu_MOVBE): Likewise.
(index_cpu_POPCNT): Likewise.
(reg_BMI1): Likewise.
(reg_BMI2): Likewise.
(reg_LZCNT): Likewise.
(reg_MOVBE): Likewise.
(reg_POPCNT): Likewise.
* sysdeps/x86/dl-hwcap.h: New file.
* sysdeps/x86/dl-procinfo.h: Likewise.
* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): New.
(_dl_x86_platforms): Likewise.
IFUNC relocation against definition in unrelocated shared library
will lead to segfault when the IFUNC function is called. This
patch allows such IFUNC relocations with a warning. This isn't
a real fix for
https://sourceware.org/bugzilla/show_bug.cgi?id=21041
It simply allows the program to load. The program will segfault
when longjmp is called.
* sysdeps/i386/dl-machine.h (elf_machine_rel): Replace
_dl_fatal_printf with _dl_error_printf for IFUNC relocation
against unrelocated shared library.
* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
Calling an IFUNC function defined in unrelocated shared library may
lead to segfault. This patch issues an error message to request
relinking the shared library if it references IFUNC function defined
in the unrelocated shared library.
[BZ #20019]
* sysdeps/i386/dl-machine.h (elf_machine_rel): Check IFUNC
definition in unrelocated shared library.
* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions. Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.
To avoid SSE transition penalty, if only the lower 128 bits of the first
8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers
with the zero upper bits.
For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero. We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.
This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions. It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.
_dl_runtime_resolve_avx_slow is added and used for AVX processors which
don't support XGETBV with ECX == 1. Since there is no SSE transition
penalty on AVX512 processors which don't support XGETBV with ECX == 1,
_dl_runtime_resolve_avx512_slow isn't provided.
[BZ #20495]
[BZ #20508]
* sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
processors, set Use_dl_runtime_resolve_slow and set
Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
New.
(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
(index_arch_Use_dl_runtime_resolve_opt): Likewise.
(index_arch_Use_dl_runtime_resolve_slow): Likewise.
* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
_dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
if Use_dl_runtime_resolve_opt is set. Use
_dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
(_dl_runtime_resolve_opt): New. Defined for AVX and AVX512.
(_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
New.
(_dl_runtime_resolve_opt): Likewise.
(_dl_runtime_profile): Define only if _dl_runtime_profile is
defined.
In static executable, since init_cpu_features is called early from
__libc_start_main, there is no need to call it again in dl_platform_init.
[BZ #20072]
* sysdeps/i386/dl-machine.h (dl_platform_init): Call
init_cpu_features only if SHARED is defined.
* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve
and _dl_runtime_profile, which save and restore the first 8 vector
registers used for parameter passing. elf_machine_runtime_setup
selects the proper _dl_runtime_resolve or _dl_runtime_profile based
on _dl_x86_cpu_features. It avoids race condition caused by
FOREIGN_CALL macros, which are only used for x86-64.
Performance impact of saving and restoring 8 vector registers are
negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when
ld.so is optimized with SSE2.
[BZ #15128]
* sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add
ifuncmain8.
(modules-names): Add ifuncmod8.
($(objpfx)ifuncmain8): New rule.
* sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and
<cpuid.h>.
(elf_machine_runtime_setup): Use _dl_runtime_resolve_sse,
_dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512,
_dl_runtime_profile_sse, _dl_runtime_profile_avx, or
_dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE.
* sysdeps/x86_64/dl-trampoline.S: Rewrite.
* sysdeps/x86_64/dl-trampoline.h: Likewise.
* sysdeps/x86_64/ifuncmain8.c: New file.
* sysdeps/x86_64/ifuncmod8.c: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE):
Removed.
* sysdeps/x86_64/nptl/tls.h (__128bits): Removed.
(tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1.
Change rtld_savespace_sse to __glibc_unused2.
(RTLD_CHECK_FOREIGN_CALL): Removed.
(RTLD_ENABLE_FOREIGN_CALL): Likewise.
(RTLD_PREPARE_FOREIGN_CALL): Likewise.
(RTLD_FINALIZE_FOREIGN_CALL): Likewise.
With copy relocation, address of protected data defined in the shared
library may be external. When there is a relocation against the
protected data symbol within the shared library, we need to check if we
should skip the definition in the executable copied from the protected
data. This patch adds ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA and defines
it for x86. If ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA isn't 0, do_lookup_x
will skip the data definition in the executable from copy reloc.
[BZ #17711]
* elf/dl-lookup.c (do_lookup_x): When UNDEF_MAP is NULL, which
indicates it is called from do_lookup_x on relocation against
protected data, skip the data definion in the executable from
copy reloc.
(_dl_lookup_symbol_x): Pass ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA,
instead of ELF_RTYPE_CLASS_PLT, to do_lookup_x for
EXTERN_PROTECTED_DATA relocation against STT_OBJECT symbol.
* sysdeps/generic/ldsodefs.h * (ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA):
New. Defined to 4 if DL_EXTERN_PROTECTED_DATA is defined,
otherwise to 0.
* sysdeps/i386/dl-lookupcfg.h (DL_EXTERN_PROTECTED_DATA): New.
* sysdeps/i386/dl-machine.h (elf_machine_type_class): Set class
to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA for R_386_GLOB_DAT.
* sysdeps/x86_64/dl-lookupcfg.h (DL_EXTERN_PROTECTED_DATA): New.
* sysdeps/x86_64/dl-machine.h (elf_machine_type_class): Set class
to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA for R_X86_64_GLOB_DAT.
_dl_start_user in ld.so calls the local function _dl_init. There is no
need to go through PLT.
* sysdeps/i386/dl-machine.h (_dl_start_user): Remove @PLT
from "call _dl_init@PLT".
* sysdeps/x86_64/dl-machine.h (_dl_start_user): Likewise.
from "call _dl_init@PLT".
Continuing the removal of the obsolete INTDEF / INTUSE mechanism, this
patch eliminates its use for _dl_init. Since _dl_init was already
declared with hidden visibility, creating a second hidden alias for it
was completely pointless, so this patch replaces all uses of
_dl_init_internal with plain _dl_init instead of using hidden_proto /
hidden_def (which are only needed when you want a hidden alias for a
non-hidden symbol; it's quite possible there are cases where they are
used but don't need to be because the symbol in question is not part
of the public ABI and is only used within a single library, so using
attributes_hidden instead would suffice).
Tested for x86_64 that installed stripped shared libraries are
unchanged by the patch.
[BZ #14132]
* elf/dl-init.c (_dl_init): Don't use INTDEF.
* sysdeps/aarch64/dl-machine.h (RTLD_START): Use _dl_init instead
of _dl_init_internal.
* sysdeps/alpha/dl-machine.h (RTLD_START): Likewise.
* sysdeps/arm/dl-machine.h (RTLD_START): Likewise.
* sysdeps/hppa/dl-machine.h (RTLD_START): Likewise.
* sysdeps/i386/dl-machine.h (RTLD_START): Likewise.
* sysdeps/ia64/dl-machine.h (RTLD_START): Likewise.
* sysdeps/m68k/dl-machine.h (RTLD_START): Likewise.
* sysdeps/microblaze/dl-machine.h (RTLD_START): Likewise.
* sysdeps/mips/dl-machine.h (RTLD_START): Likewise.
* sysdeps/powerpc/powerpc32/dl-start.S (_start): Likewise.
* sysdeps/s390/s390-32/dl-machine.h (RTLD_START): Likewise.
* sysdeps/s390/s390-64/dl-machine.h (RTLD_START): Likewise.
* sysdeps/sh/dl-machine.h (RTLD_START): Likewise.
* sysdeps/sparc/sparc32/dl-machine.h (RTLD_START): Likewise.
* sysdeps/sparc/sparc64/dl-machine.h (RTLD_START): Likewise.
* sysdeps/tile/dl-start.S (_start): Likewise.
* sysdeps/x86_64/dl-machine.h (RTLD_START): Likewise.
* sysdeps/x86_64/x32/dl-machine.h (RTLD_START): Likewise.
This patch defines ELF_MACHINE_NO_RELA on all architectures. Tested
only on x86_64 to verify that the sources before and after are
identical except for two instructions that pass the current line
number in dl-machine.h to assert_fail.
Resolves: #15465
The program name may be unavailable if the user application tampers
with argc and argv[]. Some parts of the dynamic linker caters for
this while others don't, so this patch consolidates the check and
fallback into a single macro and updates all users.
[BZ #14538]
* sysdeps/x86_64/dl-machine.h (elf_machine_dynamic): Use the
first element of the GOT.
(elf_machine_load_address): Return the difference between
the runtime address of _DYNAMIC and elf_machine_dynamic ().
The test to call the indirect function now includes a subtest to
checked whether the symbol is defined. When coming to that point
this is almost always the case. The test for STT_GNU_IFUNC on the
other hand rarely is true. Move it to the front means we don't have
to perform the second test unless really necessary.
from definition.
* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Don't define
label if it is not used.
* elf/dl-profile.c (_dl_start_profile): Define real-type variant
of gmon_hist_hdr and gmon_hdr structures and use them.
* elf/dl-load.c (open_verify): Add temporary variable to avoid
warning.
* nscd/nscd_helper.c (get_mapping): Avoid casts to avoid warnings.
* sunrpc/clnt_raw.c (clntraw_private_s): Use union in definition
to avoid cast.
* inet/rexec.c (rexec_af): Make sa2 a union to avoid warnings.
* inet/rcmd.c (rcmd_af): Make from a union of the various needed types
to avoid warnings.
(iruserok_af): Use ss_family instead of casts.
* gmon/gmon.c (write_hist): Define real-type variant of
gmon_hist_hdr structure and use it.
(write_gmon): Likewise for gmon_hdr.
* sysdeps/unix/sysv/linux/readv.c: Avoid declaration of replacement
function if we are not going to define it.
* sysdeps/unix/sysv/linux/writev.c: Likewise.
* inet/inet6_option.c (optin_alloc): Add temporary variable to
avoid warning.
* libio/strfile.h (struct _IO_streambuf): Use correct type and
name of VTable element.
* libio/iovsprintf.c: Avoid casts to avoid warnings.
* libio/iovsscanf.c: Likewise.
* libio/vasprintf.c: Likewise.
* libio/vsnprintf.c: Likewise.
* stdio-common/isoc99_vsscanf.c: Likewise.
* stdlib/strfmon_l.c: Likewise.
* debug/vasprintf_chk.c: Likewise.
* debug/vsnprintf_chk.c: Likewise.
* debug/vsprintf_chk.c: Likewise.
2005-01-21 Jakub Jelinek <jakub@redhat.com>
* elf/Makefile: Add rules to build and run tst-align2.
* elf/tst-align2.c: New test.
* elf/tst-alignmod2.c: New file.
* sysdeps/powerpc/tst-stack-align.h: New file.
* sysdeps/i386/dl-machine.h (RTLD_START): Align stack and clear frame
pointer before calling _dl_init.
* sysdeps/x86_64/dl-machine.h (RTLD_START): Likewise.