Always do TLS descriptor initialization at load time during relocation
processing to avoid barriers at every TLS access. In non-dlopened shared
libraries the overhead of tls access vs static global access is > 3x
bigger when lazy initialization is used (_dl_tlsdesc_return_lazy)
compared to bind-now (_dl_tlsdesc_return) so the barriers dominate tls
access performance.
TLSDESC relocs are in DT_JMPREL which are processed at load time using
elf_machine_lazy_rel which is only supposed to do lightweight
initialization using the DT_TLSDESC_PLT trampoline (the trampoline code
jumps to the entry point in DT_TLSDESC_GOT which does the lazy tlsdesc
initialization at runtime). This patch changes elf_machine_lazy_rel
in aarch64 to do the symbol binding and initialization as if DF_BIND_NOW
was set, so the non-lazy code path of elf/do-rel.h was replicated.
The static linker could be changed to emit TLSDESC relocs in DT_REL*,
which are processed non-lazily, but the goal of this patch is to always
guarantee bind-now semantics, even if the binary was produced with an
old linker, so the barriers can be dropped in tls descriptor functions.
After this change the synchronizing ldar instructions can be dropped
as well as the lazy initialization machinery including the DT_TLSDESC_GOT
setup.
I believe this should be done on all targets, including ones where no
barrier is needed for lazy initialization. There is very little gain in
optimizing for large number of symbolic tlsdesc relocations which is an
extremely uncommon case. And currently the tlsdesc entries are only
readonly protected with -z now and some hardennings against writable
JUMPSLOT relocs don't work for TLSDESC so they are a security hazard.
(But to fix that the static linker has to be changed.)
* sysdeps/aarch64/dl-machine.h (elf_machine_lazy_rel): Do symbol
binding and initialization non-lazily for R_AARCH64_TLSDESC.
This patch rewrites aarch64 elf_machine_load_address to use special _DYNAMIC
symbol instead of _dl_start.
The static address of _DYNAMIC symbol is stored in the first GOT entry.
Here is the change which makes this solution work (part of binutils 2.24):
https://sourceware.org/ml/binutils/2013-06/msg00248.html
i386, x86_64 targets use the same method to do this as well.
The original implementation relies on a trick that R_AARCH64_ABS32 relocation
being resolved at link time and the static address fits in the 32bits.
However, in LP64, normally, the address is defined to be 64 bit.
Here is the C version one which should be portable in all cases.
* sysdeps/aarch64/dl-machine.h (elf_machine_load_address): Use
_DYNAMIC symbol to calculate load address.
ELFv2 functions with localentry:0 are those with a single entry point,
ie. global entry == local entry, that have no requirement on r2 or
r12 and guarantee r2 is unchanged on return. Such an external
function can be called via the PLT without saving r2 or restoring it
on return, avoiding a common load-hit-store for small functions.
This patch implements the ld.so changes necessary for this
optimization. ld.so needs to check that an optimized plt call
sequence is in fact calling a function implemented with localentry:0,
end emit a fatal error otherwise.
The elf/testobj6.c change is to stop "error while loading shared
libraries: expected localentry:0 `preload'" when running
elf/preloadtest, which we'd get otherwise.
* elf/elf.h (PPC64_OPT_LOCALENTRY): Define.
* sysdeps/alpha/dl-machine.h (elf_machine_fixup_plt): Add
refsym and sym parameters. Adjust callers.
* sysdeps/aarch64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/arm/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/generic/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/i386/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/ia64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/m68k/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/microblaze/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/mips/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/nios2/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_fixup_plt):
Likewise.
* sysdeps/s390/s390-32/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/s390/s390-64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/sh/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/tile/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/x86_64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/powerpc/powerpc64/dl-machine.c (_dl_error_localentry): New.
(_dl_reloc_overflow): Increase buffser size. Formatting.
* sysdeps/powerpc/powerpc64/dl-machine.h (ppc64_local_entry_offset):
Delete reloc param, add refsym and sym. Check optimized plt
call stubs for localentry:0 functions. Adjust callers.
(elf_machine_fixup_plt, elf_machine_plt_conflict): Add refsym
and sym parameters. Adjust callers.
(_dl_reloc_overflow): Move attribute.
(_dl_error_localentry): Declare.
* elf/dl-runtime.c (_dl_fixup): Save original sym. Pass
refsym and sym to elf_machine_fixup_plt.
* elf/testobj6.c (preload): Call printf.
This patch reformats the inline-asm in elf_machine_load_address so it is
easier to change only part of the inline-asm. That is using string
concatenating instead of string continuation.
Also document why this inline-asm works - it depends on the 32bit
relocation being resolved at link time.
ChangeLog:
2014-11-21 Will Newton <will.newton@linaro.org>
Andrew Pinski <andrew.pinski@caviumnetworks.com>
* sysdeps/aarch64/dl-machine.h (elf_machine_load_address):
Refactor inline-asm. Also add comment.
Using the macros for ELF types is required for adding ILP32 support.
In the standard AArch64 configuration this makes no difference to
the types used.
ChangeLog:
2014-11-21 Will Newton <will.newton@linaro.org>
Andrew Pinski <andrew.pinski@caviumnetworks.com>
* sysdeps/aarch64/bits/link.h (la_aarch64_gnu_pltenter): Use
ElfW macro instead of hardcoded Elf64 types.
(la_aarch64_gnu_pltenter): Likewise.
* sysdeps/aarch64/dl-machine.h
(elf_machine_runtime_setup): Use ElfW(Addr).
The latest version of the binutils ELF header defines a new set of
dynamic relocations for ILP32 and renames some to make the naming
more uniform.
ChangeLog:
2014-11-21 Will Newton <will.newton@linaro.org>
Andrew Pinski <andrew.pinski@caviumnetworks.com>
* elf/elf.h (R_AARCH64_P32_ABS32, R_AARCH64_P32_COPY,
R_AARCH64_P32_GLOB_DAT, R_AARCH64_P32_JUMP_SLOT,
R_AARCH64_P32_RELATIVE, R_AARCH64_P32_TLS_DTPMOD,
R_AARCH64_P32_TLS_DTPREL, R_AARCH64_P32_TLS_TPREL,
R_AARCH64_P32_TLSDESC, R_AARCH64_P32_IRELATIVE): Define.
(R_AARCH64_TLS_DTPMOD64): Rename to ..
(R_AARCH64_TLS_DTPMOD): This.
(R_AARCH64_TLS_DTPREL64): Rename to ...
(R_AARCH64_TLS_DTPREL): This.
(R_AARCH64_TLS_TPREL64): Rename to ...
(R_AARCH64_TLS_TPREL): This.
* sysdeps/aarch64/dl-machine.h (elf_machine_type_class): Update
R_AARCH64_TLS_DTPMOD64, R_AARCH64_TLS_DTPREL64, and
R_AARCH64_TLS_TPREL64.
(elf_machine_rela): Likewise.
Continuing the removal of the obsolete INTDEF / INTUSE mechanism, this
patch eliminates its use for _dl_init. Since _dl_init was already
declared with hidden visibility, creating a second hidden alias for it
was completely pointless, so this patch replaces all uses of
_dl_init_internal with plain _dl_init instead of using hidden_proto /
hidden_def (which are only needed when you want a hidden alias for a
non-hidden symbol; it's quite possible there are cases where they are
used but don't need to be because the symbol in question is not part
of the public ABI and is only used within a single library, so using
attributes_hidden instead would suffice).
Tested for x86_64 that installed stripped shared libraries are
unchanged by the patch.
[BZ #14132]
* elf/dl-init.c (_dl_init): Don't use INTDEF.
* sysdeps/aarch64/dl-machine.h (RTLD_START): Use _dl_init instead
of _dl_init_internal.
* sysdeps/alpha/dl-machine.h (RTLD_START): Likewise.
* sysdeps/arm/dl-machine.h (RTLD_START): Likewise.
* sysdeps/hppa/dl-machine.h (RTLD_START): Likewise.
* sysdeps/i386/dl-machine.h (RTLD_START): Likewise.
* sysdeps/ia64/dl-machine.h (RTLD_START): Likewise.
* sysdeps/m68k/dl-machine.h (RTLD_START): Likewise.
* sysdeps/microblaze/dl-machine.h (RTLD_START): Likewise.
* sysdeps/mips/dl-machine.h (RTLD_START): Likewise.
* sysdeps/powerpc/powerpc32/dl-start.S (_start): Likewise.
* sysdeps/s390/s390-32/dl-machine.h (RTLD_START): Likewise.
* sysdeps/s390/s390-64/dl-machine.h (RTLD_START): Likewise.
* sysdeps/sh/dl-machine.h (RTLD_START): Likewise.
* sysdeps/sparc/sparc32/dl-machine.h (RTLD_START): Likewise.
* sysdeps/sparc/sparc64/dl-machine.h (RTLD_START): Likewise.
* sysdeps/tile/dl-start.S (_start): Likewise.
* sysdeps/x86_64/dl-machine.h (RTLD_START): Likewise.
* sysdeps/x86_64/x32/dl-machine.h (RTLD_START): Likewise.
This patch defines ELF_MACHINE_NO_RELA on all architectures. Tested
only on x86_64 to verify that the sources before and after are
identical except for two instructions that pass the current line
number in dl-machine.h to assert_fail.
This patch moves the AArch64 port to the main sysdeps hierarchy. The
move is essentially:
git mv ports/sysdeps/aarch64 sysdeps/aarch64
git mv ports/sysdeps/unix/sysv/linux/aarch64 sysdeps/unix/sysv/linux/aarch64
The README is updated and I've updated ChangeLog.aarch64 along the
lines of the ARM move. The AArch64 build has been tested to confirm
that there were no changes in objdump -dr output or the shared
objects.