Commit Graph

114 Commits

Author SHA1 Message Date
H.J. Lu
f2b65a4471 x86-64/cet: Make CET feature check specific to Linux/x86
CET feature bits in TCB, which are Linux specific, are used to check if
CET features are active.  Move CET feature check to Linux/x86 directory.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-01-11 20:35:24 -08:00
H.J. Lu
0f9afc265a x32: Handle displacement overflow in PLT rewrite [BZ #31218]
PLT rewrite calculated displacement with

ElfW(Addr) disp = value - branch_start - JMP32_INSN_SIZE;

On x32, displacement from 0xf7fbe060 to 0x401030 was calculated as

unsigned int disp = 0x401030 - 0xf7fbe060 - 5;

with disp == 0x8442fcb and caused displacement overflow. The PLT entry
was changed to:

0xf7fbe060 <+0>:	e9 cb 2f 44 08     	jmp    0x401030
0xf7fbe065 <+5>:	cc                 	int3
0xf7fbe066 <+6>:	cc                 	int3
0xf7fbe067 <+7>:	cc                 	int3
0xf7fbe068 <+8>:	cc                 	int3
0xf7fbe069 <+9>:	cc                 	int3
0xf7fbe06a <+10>:	cc                 	int3
0xf7fbe06b <+11>:	cc                 	int3
0xf7fbe06c <+12>:	cc                 	int3
0xf7fbe06d <+13>:	cc                 	int3
0xf7fbe06e <+14>:	cc                 	int3
0xf7fbe06f <+15>:	cc                 	int3

x32 has 32-bit address range, but it doesn't wrap address around at 4GB,
JMP target was changed to 0x100401030 (0xf7fbe060LL + 0x8442fcbLL + 5),
which is above 4GB.

Always use uint64_t to calculate displacement.  This fixes BZ #31218.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-01-06 14:25:49 -08:00
H.J. Lu
848746e88e elf: Add ELF_DYNAMIC_AFTER_RELOC to rewrite PLT
Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
relocation.

For x86-64, add

 #define DT_X86_64_PLT     (DT_LOPROC + 0)
 #define DT_X86_64_PLTSZ   (DT_LOPROC + 1)
 #define DT_X86_64_PLTENT  (DT_LOPROC + 3)

1. DT_X86_64_PLT: The address of the procedure linkage table.
2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
table.
3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
entry.

With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
memory offset of the indirect branch instruction.

Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
with direct branch after relocation when the lazy binding is disabled.

PLT rewrite is disabled by default since SELinux may disallow modifying
code pages and ld.so can't detect it in all cases.  Use

$ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1

to enable PLT rewrite with 32-bit direct jump at run-time or

$ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2

to enable PLT rewrite with 32-bit direct jump and on APX processors with
64-bit absolute jump at run-time.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-01-05 05:49:49 -08:00
Paul Eggert
dff8da6b3e Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
H.J. Lu
541641a3de x86/cet: Enable shadow stack during startup
Previously, CET was enabled by kernel before passing control to user
space and the startup code must disable CET if applications or shared
libraries aren't CET enabled.  Since the current kernel only supports
shadow stack and won't enable shadow stack before passing control to
user space, we need to enable shadow stack during startup if the
application and all shared library are shadow stack enabled.  There
is no need to disable shadow stack at startup.  Shadow stack can only
be enabled in a function which will never return.  Otherwise, shadow
stack will underflow at the function return.

1. GL(dl_x86_feature_1) is set to the CET features which are supported
by the processor and are not disabled by the tunable.  Only non-zero
features in GL(dl_x86_feature_1) should be enabled.  After enabling
shadow stack with ARCH_SHSTK_ENABLE, ARCH_SHSTK_STATUS is used to check
if shadow stack is really enabled.
2. Use ARCH_SHSTK_ENABLE in RTLD_START in dynamic executable.  It is
safe since RTLD_START never returns.
3. Call arch_prctl (ARCH_SHSTK_ENABLE) from ARCH_SETUP_TLS in static
executable.  Since the start function using ARCH_SETUP_TLS never returns,
it is safe to enable shadow stack in ARCH_SETUP_TLS.
2024-01-01 05:22:48 -08:00
Adhemerval Zanella
55f41ef8de elf: Remove LD_PROFILE for static binaries
The _dl_non_dynamic_init does not parse LD_PROFILE, which does not
enable profile for dlopen objects.  Since dlopen is deprecated for
static objects, it is better to remove the support.

It also allows to trim down libc.a of profile support.

Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2023-11-21 16:15:42 -03:00
Joseph Myers
6d7e8eda9b Update copyright dates with scripts/update-copyrights 2023-01-06 21:14:39 +00:00
H.J. Lu
cfdc4df66c x86-64: Only define used SSE/AVX/AVX512 run-time resolvers
When glibc is built with x86-64 ISA level v3, SSE run-time resolvers
aren't used.  For x86-64 ISA level v4 build, both SSE and AVX resolvers
are unused.  Check the minimum x86-64 ISA level to exclude the unused
run-time resolvers.
2022-06-27 14:17:52 -07:00
Fangrui Song
4ef05df5ef x86-64: Handle fewer relocation types for RTLD_BOOTSTRAP
The RTLD_BOOTSTRAP branch is used to relocate ld.so itself.  It only
needs to handle RELATIVE, GLOB_DAT, and JUMP_SLOT.  RELATIVE has been
handled (by _ELF_DYNAMIC_DO_RELOC due to DT_RELACOUNT, or RELR), so the
switch statement only needs to handle GLOB_DAT and JUMP_SLOT.

We can drop these `#if[n]def RTLD_BOOTSTRAP` and add a large
`# ifndef RTLD_BOOTSTRAP` instead.
2022-06-16 11:48:15 -07:00
Fangrui Song
de38b2a343 elf: Remove ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA
If an executable has copy relocations for extern protected data, that
can only work if the library containing the definition is built with
assumptions (a) the compiler emits GOT-generating relocations (b) the
linker produces R_*_GLOB_DAT instead of R_*_RELATIVE.  Otherwise the
library uses its own definition directly and the executable accesses a
stale copy.  Note: the GOT relocations defeat the purpose of protected
visibility as an optimization, but allow rtld to make the executable and
library use the same copy when copy relocations are present, but it
turns out this never worked perfectly.

ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA has strange semantics when both
a.so and b.so define protected var and the executable copy relocates
var: b.so accesses its own copy even with GLOB_DAT.  The behavior change
is from commit 62da1e3b00 (x86) and then
copied to nios2 (ae5eae7cfc) and arc
(0e7d930c4c).

Without ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA, b.so accesses the copy
relocated data like a.so.

There is now a warning for copy relocation on protected symbol since
commit 7374c02b68.  It's extremely
unlikely anyone relies on the ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA
behavior, so let's remove it: this removes a check in the symbol lookup
code.
2022-06-15 11:29:55 -07:00
Adhemerval Zanella
ec7bc492b6 x86_64: Remove _dl_skip_args usage
Since ad43cac44a the generic code already shuffles the argv/envp/auxv
on the stack to remove the ld.so own arguments and thus _dl_skip_args
is always 0.   So there is no need to adjust the argc or argv.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-05-30 16:33:34 -03:00
H.J. Lu
f8587a6189 x86-64: Ignore r_addend for R_X86_64_GLOB_DAT/R_X86_64_JUMP_SLOT
According to x86-64 psABI, r_addend should be ignored for R_X86_64_GLOB_DAT
and R_X86_64_JUMP_SLOT.  Since linkers always set their r_addends to 0, we
can ignore their r_addends.

Reviewed-by: Fangrui Song <maskray@google.com>
2022-05-26 14:00:25 -07:00
Fangrui Song
3ee318c923 Remove -z combreloc and HAVE_Z_COMBRELOC
-z combreloc has been the default regadless of the architecture since
binutils commit f4d733664aabd7bd78c82895e030ec9779a92809 (2002). The
configure check added in commit fdde83499a (2001) has long been
unneeded.

We can therefore treat HAVE_Z_COMBRELOC as always 1 and delete dead code
paths in dl-machine.h files (many were copied from commit a711b01d34
and ee0cb67ec2).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-04-04 17:19:07 -07:00
Adhemerval Zanella
6628c742b2 elf: Remove prelink support
Prelinked binaries and libraries still work, the dynamic tags
DT_GNU_PRELINKED, DT_GNU_LIBLIST, DT_GNU_CONFLICT just ignored
(meaning the process is reallocated as default).

The loader environment variable TRACE_PRELINKING is also removed,
since it used solely on prelink.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-02-10 09:16:12 -03:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Adhemerval Zanella
d6d89608ac elf: Fix dynamic-link.h usage on rtld.c
The 4af6982e4c fix does not fully handle RTLD_BOOTSTRAP usage on
rtld.c due two issues:

  1. RTLD_BOOTSTRAP is also used on dl-machine.h on various
     architectures and it changes the semantics of various machine
     relocation functions.

  2. The elf_get_dynamic_info() change was done sideways, previously
     to 490e6c62aa get-dynamic-info.h was included by the first
     dynamic-link.h include *without* RTLD_BOOTSTRAP being defined.
     It means that the code within elf_get_dynamic_info() that uses
     RTLD_BOOTSTRAP is in fact unused.

To fix 1. this patch now includes dynamic-link.h only once with
RTLD_BOOTSTRAP defined.  The ELF_DYNAMIC_RELOCATE call will now have
the relocation fnctions with the expected semantics for the loader.

And to fix 2. part of 4af6982e4c is reverted (the check argument
elf_get_dynamic_info() is not required) and the RTLD_BOOTSTRAP
pieces are removed.

To reorganize the includes the static TLS definition is moved to
its own header to avoid a circular dependency (it is defined on
dynamic-link.h and dl-machine.h requires it at same time other
dynamic-link.h definition requires dl-machine.h defitions).

Also ELF_MACHINE_NO_REL, ELF_MACHINE_NO_RELA, and ELF_MACHINE_PLT_REL
are moved to its own header.  Only ancient ABIs need special values
(arm, i386, and mips), so a generic one is used as default.

The powerpc Elf64_FuncDesc is also moved to its own header, since
csu code required its definition (which would require either include
elf/ folder or add a full path with elf/).

Checked on x86_64, i686, aarch64, armhf, powerpc64, powerpc32,
and powerpc64le.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-10-14 14:52:07 -03:00
Adhemerval Zanella
4af6982e4c elf: Fix elf_get_dynamic_info definition
Before to 490e6c62aa ('elf: Avoid nested functions in the loader
[BZ #27220]'), elf_get_dynamic_info() was defined twice on rtld.c: on
the first dynamic-link.h include and later within _dl_start().  The
former definition did not define DONT_USE_BOOTSTRAP_MAP and it is used
on setup_vdso() (since it is a global definition), while the former does
define DONT_USE_BOOTSTRAP_MAP and it is used on loader self-relocation.

With the commit change, the function is now included and defined once
instead of defined as a nested function.  So rtld.c defines without
defining RTLD_BOOTSTRAP and it brokes at least powerpc32.

This patch fixes by moving the get-dynamic-info.h include out of
dynamic-link.h, which then the caller can corirectly set the expected
semantic by defining STATIC_PIE_BOOTSTRAP, RTLD_BOOTSTRAP, and/or
RESOLVE_MAP.

It also required to enable some asserts only for the loader bootstrap
to avoid issues when called from setup_vdso().

As a side note, this is another issues with nested functions: it is
not clear from pre-processed output (-E -dD) how the function will
be build and its semantic (since nested function will be local and
extra C defines may change it).

I checked on x86_64-linux-gnu (w/o --enable-static-pie),
i686-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu-power4,
aarch64-linux-gnu, arm-linux-gnu, sparc64-linux-gnu, and
s390x-linux-gnu.

Reviewed-by: Fangrui Song <maskray@google.com>
2021-10-12 13:25:43 -03:00
Fangrui Song
490e6c62aa elf: Avoid nested functions in the loader [BZ #27220]
dynamic-link.h is included more than once in some elf/ files (rtld.c,
dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested
functions. This harms readability and the nested functions usage
is the biggest obstacle prevents Clang build (Clang doesn't support GCC
nested functions).

The key idea for unnesting is to add extra parameters (struct link_map
*and struct r_scope_elm *[]) to RESOLVE_MAP,
ELF_MACHINE_BEFORE_RTLD_RELOC, ELF_DYNAMIC_RELOCATE, elf_machine_rel[a],
elf_machine_lazy_rel, and elf_machine_runtime_setup. (This is inspired
by Stan Shebs' ppc64/x86-64 implementation in the
google/grte/v5-2.27/master which uses mixed extra parameters and static
variables.)

Future simplification:
* If mips elf_machine_runtime_setup no longer needs RESOLVE_GOTSYM,
  elf_machine_runtime_setup can drop the `scope` parameter.
* If TLSDESC no longer need to be in elf_machine_lazy_rel,
  elf_machine_lazy_rel can drop the `scope` parameter.

Tested on aarch64, i386, x86-64, powerpc64le, powerpc64, powerpc32,
sparc64, sparcv9, s390x, s390, hppa, ia64, armhf, alpha, and mips64.
In addition, tested build-many-glibcs.py with {arc,csky,microblaze,nios2}-linux-gnu
and riscv64-linux-gnu-rv64imafdc-lp64d.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-07 11:55:02 -07:00
Siddhesh Poyarekar
30891f35fa Remove "Contributed by" lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date.  Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.

Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions.  These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.

The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively.  These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:

https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc
https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-09-03 22:06:44 +05:30
Fangrui Song
b37b75d269 x86_64: Simplify elf_machine_{load_address,dynamic}
and drop reliance on _GLOBAL_OFFSET_TABLE_[0] being the link-time
address of _DYNAMIC. &__ehdr_start is a better way to get the load address.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-08-17 10:45:57 -07:00
Szabolcs Nagy
55c9f32380 x86_64: Remove lazy tlsdesc relocation related code
_dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for
lazy tlsdesc relocation processing which is no longer supported.
2021-04-15 09:47:47 +01:00
Szabolcs Nagy
8f7e09f4db x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
Lazy tlsdesc relocation is racy because the static tls optimization and
tlsdesc management operations are done without holding the dlopen lock.

This similar to the commit b7cf203b5c
for aarch64, but it fixes a different race: bug 27137.

Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to
relocate tlsdesc lazily, but that does not work in a BIND_NOW module
due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at
load time fixes this bug 27721 too.
2021-04-15 09:47:37 +01:00
H.J. Lu
6ea5b57afa x86: Check IFUNC definition in unrelocated executable [BZ #20019]
Calling an IFUNC function defined in unrelocated executable also leads to
segfault.  Issue a fatal error message when calling IFUNC function defined
in the unrelocated executable from a shared library.
2021-01-04 12:01:01 -08:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
H.J. Lu
0f09154c64 x86: Initialize CPU info via IFUNC relocation [BZ 26203]
X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Since some fields in _rtld_global_ro
in ld.so are initialized by dynamic relocation, we can also initialize
x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so
by initializing dummy function pointers in ld.so and libc.so via IFUNC
relocation.

Key points:

1. IFUNC is always supported, independent of --enable-multi-arch or
--disable-multi-arch.  Linker generates IFUNC relocations from input
IFUNC objects and ld.so performs IFUNC relocations.
2. There are no IFUNC dependencies in ld.so before dynamic relocation
have been performed,
3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT
in dynamic executable and by IFUNC relocation in dlopen in static
executable.
4. The x86 cache info in libc.o is initialized by IFUNC relocation.
5. In libc.a, both x86 CPU features and cache info are initialized from
ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init
is called.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
2020-10-16 16:17:53 -07:00
H.J. Lu
107e6a3c22 x86: Support usable check for all CPU features
Support usable check for all CPU features with the following changes:

1. Change struct cpu_features to

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
  unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};

so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits.  EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.

The results are

1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
2020-07-13 06:05:16 -07:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Florian Weimer
4db71d2f98 elf: Do not run IFUNC resolvers for LD_DEBUG=unused [BZ #24214]
This commit adds missing skip_ifunc checks to aarch64, arm, i386,
sparc, and x86_64.  A new test case ensures that IRELATIVE IFUNC
resolvers do not run in various diagnostic modes of the dynamic
loader.

Reviewed-By: Szabolcs Nagy <szabolcs.nagy@arm.com>
2019-12-02 14:55:22 +01:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Joseph Myers
32db86d558 Add fall-through comments.
This patch adds fall-through comments in some cases where -Wextra
produces implicit-fallthrough warnings.

The patch is non-exhaustive.  Apart from architecture-specific code
for non-x86_64 architectures, it does not change sunrpc/xdr.c (legacy
code, probably should have such changes, but left to be dealt with
separately), or places that already had comments about the
fall-through but not matching the form expected by
-Wimplicit-fallthrough=3 (the default level with -Wextra; my
inclination is to adjust those comments to match rather than
downgrading to -Wimplicit-fallthrough=1 to allow any comment), or one
place where I thought the implicit fallthrough was not correct and so
should be handled separately as a bug fix.  I think the key thing to
consider in review of this patch is whether the fall-through is indeed
intended and correct in each place where such a comment is added.

Tested for x86_64.

	* elf/dl-exception.c (_dl_exception_create_format): Add
	fall-through comments.
	* elf/ldconfig.c (parse_conf_include): Likewise.
	* elf/rtld.c (print_statistics): Likewise.
	* locale/programs/charmap.c (parse_charmap): Likewise.
	* misc/mntent_r.c (__getmntent_r): Likewise.
	* posix/wordexp.c (parse_arith): Likewise.
	(parse_backtick): Likewise.
	* resolv/ns_ttl.c (ns_parse_ttl): Likewise.
	* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
2019-02-12 10:30:34 +00:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
Maciej W. Rozycki
10a446ddcc elf: Unify symbol address run-time calculation [BZ #19818]
Wrap symbol address run-time calculation into a macro and use it
throughout, replacing inline calculations.

There are a couple of variants, most of them different in a functionally
insignificant way.  Most calculations are right following RESOLVE_MAP,
at which point either the map or the symbol returned can be checked for
validity as the macro sets either both or neither.  In some places both
the symbol and the map has to be checked however.

My initial implementation therefore always checked both, however that
resulted in code larger by as much as 0.3%, as many places know from
elsewhere that no check is needed.  I have decided the size growth was
unacceptable.

Having looked closer I realized that it's the map that is the culprit.
Therefore I have modified LOOKUP_VALUE_ADDRESS to accept an additional
boolean argument telling it to access the map without checking it for
validity.  This in turn has brought quite nice results, with new code
actually being smaller for i686, and MIPS o32, n32 and little-endian n64
targets, unchanged in size for x86-64 and, unusually, marginally larger
for big-endian MIPS n64, as follows:

i686:
   text    data     bss     dec     hex filename
 152255    4052     192  156499   26353 ld-2.27.9000-base.so
 152159    4052     192  156403   262f3 ld-2.27.9000-elf-symbol-value.so
MIPS/o32/el:
   text    data     bss     dec     hex filename
 142906    4396     260  147562   2406a ld-2.27.9000-base.so
 142890    4396     260  147546   2405a ld-2.27.9000-elf-symbol-value.so
MIPS/n32/el:
   text    data     bss     dec     hex filename
 142267    4404     260  146931   23df3 ld-2.27.9000-base.so
 142171    4404     260  146835   23d93 ld-2.27.9000-elf-symbol-value.so
MIPS/n64/el:
   text    data     bss     dec     hex filename
 149835    7376     408  157619   267b3 ld-2.27.9000-base.so
 149787    7376     408  157571   26783 ld-2.27.9000-elf-symbol-value.so
MIPS/o32/eb:
   text    data     bss     dec     hex filename
 142870    4396     260  147526   24046 ld-2.27.9000-base.so
 142854    4396     260  147510   24036 ld-2.27.9000-elf-symbol-value.so
MIPS/n32/eb:
   text    data     bss     dec     hex filename
 142019    4404     260  146683   23cfb ld-2.27.9000-base.so
 141923    4404     260  146587   23c9b ld-2.27.9000-elf-symbol-value.so
MIPS/n64/eb:
   text    data     bss     dec     hex filename
 149763    7376     408  157547   2676b ld-2.27.9000-base.so
 149779    7376     408  157563   2677b ld-2.27.9000-elf-symbol-value.so
x86-64:
   text    data     bss     dec     hex filename
 148462    6452     400  155314   25eb2 ld-2.27.9000-base.so
 148462    6452     400  155314   25eb2 ld-2.27.9000-elf-symbol-value.so

	[BZ #19818]
	* sysdeps/generic/ldsodefs.h (LOOKUP_VALUE_ADDRESS): Add `set'
	parameter.
	(SYMBOL_ADDRESS): New macro.
	[!ELF_FUNCTION_PTR_IS_SPECIAL] (DL_SYMBOL_ADDRESS): Use
	SYMBOL_ADDRESS for symbol address calculation.
	* elf/dl-runtime.c (_dl_fixup): Likewise.
	(_dl_profile_fixup): Likewise.
	* elf/dl-symaddr.c (_dl_symbol_address): Likewise.
	* elf/rtld.c (dl_main): Likewise.
	* sysdeps/aarch64/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/alpha/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/arm/dl-machine.h (elf_machine_rel): Likewise.
	(elf_machine_rela): Likewise.
	* sysdeps/hppa/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/hppa/dl-symaddr.c (_dl_symbol_address): Likewise.
	* sysdeps/i386/dl-machine.h (elf_machine_rel): Likewise.
	(elf_machine_rela): Likewise.
	* sysdeps/ia64/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/m68k/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/microblaze/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/mips/dl-machine.h (ELF_MACHINE_BEFORE_RTLD_RELOC):
	Likewise.
	(elf_machine_reloc): Likewise.
	(elf_machine_got_rel): Likewise.
	* sysdeps/mips/dl-trampoline.c (__dl_runtime_resolve): Likewise.
	* sysdeps/nios2/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/riscv/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/s390/s390-32/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/s390/s390-64/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/sh/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/tile/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2018-04-04 23:09:37 +01:00
H.J. Lu
06fbebfff7 x86-64: Use __glibc_likely/__glibc_likely in dl-machine.h
The differences in elf/dl-reloc.os are

--- before    	2018-02-05 03:52:32.803125207 -0800
+++ after     	2018-02-05 03:52:14.913711879 -0800
@@ -1129,7 +1129,7 @@ _dl_relocate_object:
 	leaq	__PRETTY_FUNCTION__.9767(%rip), %rcx
 	leaq	.LC11(%rip), %rsi
 	leaq	.LC12(%rip), %rdi
-	movl	$540, %edx
+	movl	$539, %edx
 	call	__GI___assert_fail
 	.p2align 4,,10
 	.p2align 3

	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Replace
	__builtin_expect with __glibc_likely and __glibc_likely.
	(elf_machine_lazy_rel): Likewise.
2018-02-05 06:08:07 -08:00
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
H.J. Lu
b52b0d793d x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265]
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers.  It simplifies _dl_runtime_resolve and supports
different calling conventions.  ld.so code size is reduced by more than
1 KB.  However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.

Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:

                             Before    After     Change

Westmere (SSE)/fxsave         345      866       151%
IvyBridge (AVX)/xsave         420      643       53%
Haswell (AVX)/xsave           713      1252      75%
Skylake (AVX+MPX)/xsavec      559      719       28%
Skylake (AVX512+MPX)/xsavec   145      272       87%
Ryzen (AVX)/xsavec            280      553       97%

This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases.  With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.

On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises.  On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.

	[BZ #21265]
	* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
	New.
	* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
	(get_common_indeces): Set xsave_state_size, xsave_state_full_size
	and bit_arch_XSAVEC_Usable if needed.
	(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
	and bit_arch_Use_dl_runtime_resolve_opt.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	Removed.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(bit_arch_Prefer_No_AVX512): Updated.
	(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
	(bit_arch_XSAVEC_Usable): New.
	(STATE_SAVE_OFFSET): Likewise.
	(STATE_SAVE_MASK): Likewise.
	[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
	(cpu_features): Add xsave_state_size and xsave_state_full_size.
	(index_arch_Use_dl_runtime_resolve_opt): Removed.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_XSAVEC_Usable): New.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Support XSAVEC_Usable.  Remove Use_dl_runtime_resolve_slow.
	* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
	is enabled.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
	Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
	_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
	_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
	with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
	_dl_runtime_resolve_xsavec.
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
	Removed.
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
	instead of VEC_SIZE.
	(REGISTER_SAVE_BND0): Removed.
	(REGISTER_SAVE_BND1): Likewise.
	(REGISTER_SAVE_BND3): Likewise.
	(REGISTER_SAVE_RAX): Always defined to 0.
	(VMOV): Removed.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_slow): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_avx512): Likewise.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(USE_FXSAVE): New.
	(_dl_runtime_resolve_fxsave): Likewise.
	(USE_XSAVE): Likewise.
	(_dl_runtime_resolve_xsave): Likewise.
	(USE_XSAVEC): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
	Removed.
	(_dl_runtime_resolve_avx512_opt): Likewise.
	(_dl_runtime_resolve_avx): Likewise.
	(_dl_runtime_resolve_avx_opt): Likewise.
	(_dl_runtime_resolve_sse): Likewise.
	(_dl_runtime_resolve_sse_vex): Likewise.
	(_dl_runtime_resolve_fxsave): New.
	(_dl_runtime_resolve_xsave): Likewise.
	(_dl_runtime_resolve_xsavec): Likewise.
2017-10-20 11:00:34 -07:00
H.J. Lu
02d2d8927d Revert x86: Allow undefined _DYNAMIC in static executable
This code is used in non-PIE static executable and static PIE.  It checks
if _DYNAMIC is undefined before using it to compute load address.  But
not all targets can convert access _DYNAMIC via GOT, which needs dynamic
relocation, to PC-relative at link-time.

	* sysdeps/i386/dl-machine.h (elf_machine_load_address): Don't
	allow undefined _DYNAMIC in PIE libc.a.
	* sysdeps/x86_64/dl-machine.h (elf_machine_load_address):
	Likewse.
2017-10-03 17:49:09 -07:00
H.J. Lu
4088d8dd29 x86: Allow undefined _DYNAMIC in static executable
When --enable-static-pie is used to build static PIE, _DYNAMIC is used
to compute the load address of static PIE.  But _DYNAMIC is undefined
when creating static executable.  This patch makes _DYNAMIC weak in PIE
libc.a so that it can be undefined.

	* sysdeps/i386/dl-machine.h (elf_machine_load_address): Allow
	undefined _DYNAMIC in PIE libc.a.
	* sysdeps/x86_64/dl-machine.h (elf_machine_load_address):
	Likewse.
2017-09-28 15:28:12 -07:00
Alan Modra
0572433b5b PowerPC64 ELFv2 PPC64_OPT_LOCALENTRY
ELFv2 functions with localentry:0 are those with a single entry point,
ie. global entry == local entry, that have no requirement on r2 or
r12 and guarantee r2 is unchanged on return.  Such an external
function can be called via the PLT without saving r2 or restoring it
on return, avoiding a common load-hit-store for small functions.

This patch implements the ld.so changes necessary for this
optimization.  ld.so needs to check that an optimized plt call
sequence is in fact calling a function implemented with localentry:0,
end emit a fatal error otherwise.

The elf/testobj6.c change is to stop "error while loading shared
libraries: expected localentry:0 `preload'" when running
elf/preloadtest, which we'd get otherwise.

	* elf/elf.h (PPC64_OPT_LOCALENTRY): Define.
	* sysdeps/alpha/dl-machine.h (elf_machine_fixup_plt): Add
	refsym and sym parameters.  Adjust callers.
	* sysdeps/aarch64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/arm/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/generic/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/i386/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/ia64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/m68k/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/microblaze/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/mips/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/nios2/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_fixup_plt):
	Likewise.
	* sysdeps/s390/s390-32/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/s390/s390-64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/sh/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/tile/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_fixup_plt): Likewise.
	* sysdeps/powerpc/powerpc64/dl-machine.c (_dl_error_localentry): New.
	(_dl_reloc_overflow): Increase buffser size.  Formatting.
	* sysdeps/powerpc/powerpc64/dl-machine.h (ppc64_local_entry_offset):
	Delete reloc param, add refsym and sym.  Check optimized plt
	call stubs for localentry:0 functions.  Adjust callers.
	(elf_machine_fixup_plt, elf_machine_plt_conflict): Add refsym
	and sym parameters.  Adjust callers.
	(_dl_reloc_overflow): Move attribute.
	(_dl_error_localentry): Declare.
	* elf/dl-runtime.c (_dl_fixup): Save original sym.  Pass
	refsym and sym to elf_machine_fixup_plt.
	* elf/testobj6.c (preload): Call printf.
2017-06-14 10:47:25 +09:30
H.J. Lu
1432d38ea0 x86: Set dl_platform and dl_hwcap from CPU features [BZ #21391]
dl_platform and dl_hwcap are set from AT_PLATFORM and AT_HWCAP very
early during startup.  They are used by dynamic linker to determine
platform and build an array of hardware capability names, which are
added to search path when loading shared object.  dl_platform and
dl_hwcap are unused on x86-64.  On i386, i386, i486, i586 and i686
platforms were supported and only SSE2 capability was used.

On x86, usage of AT_PLATFORM and AT_HWCAP to determine platform and
processor capabilities is obsolete since all information is available
in dl_x86_cpu_features.  This patch sets dl_platform and dl_hwcap from
dl_x86_cpu_features in dynamic linker.  On i386, the available plaforms
are changed to i586 and i686 since i386 has been deprecated.  On x86-64,
the available plaforms are haswell, which is for Haswell class processors
with BMI1, BMI2, LZCNT, MOVBE, POPCNT, AVX2 and FMA, and xeon_phi, which
is for Xeon Phi class processors with AVX512F, AVX512CD, AVX512ER and
AVX512PF.  A capability, avx512_1, is also added to x86-64 for AVX512
ISAs: AVX512F, AVX512CD, AVX512BW, AVX512DQ and AVX512VL.

	[BZ #21391]
	* sysdeps/i386/dl-machine.h (dl_platform_init) [IS_IN (rtld)]:
	Only call init_cpu_features.
	[!IS_IN (rtld)]: Only set GLRO(dl_platform) to NULL if needed.
	* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
	* sysdeps/i386/dl-procinfo.h: Removed.
	* sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Don't include
	<sysdeps/i386/dl-procinfo.h> nor <ldsodefs.h>.  Include
	<sysdeps/x86/dl-procinfo.h>.
	(_dl_procinfo): Replace _DL_HWCAP_COUNT with 32.
	* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.h [!IS_IN (ldconfig)]:
	Include <sysdeps/x86/dl-procinfo.h> instead of
	 <sysdeps/generic/dl-procinfo.h>.
	* sysdeps/x86/cpu-features.c: Include <dl-hwcap.h>.
	(init_cpu_features): Set dl_platform, dl_hwcap and dl_hwcap_mask.
	* sysdeps/x86/cpu-features.h (bit_cpu_LZCNT): New.
	(bit_cpu_MOVBE): Likewise.
	(bit_cpu_BMI1): Likewise.
	(bit_cpu_BMI2): Likewise.
	(index_cpu_BMI1): Likewise.
	(index_cpu_BMI2): Likewise.
	(index_cpu_LZCNT): Likewise.
	(index_cpu_MOVBE): Likewise.
	(index_cpu_POPCNT): Likewise.
	(reg_BMI1): Likewise.
	(reg_BMI2): Likewise.
	(reg_LZCNT): Likewise.
	(reg_MOVBE): Likewise.
	(reg_POPCNT): Likewise.
	* sysdeps/x86/dl-hwcap.h: New file.
	* sysdeps/x86/dl-procinfo.h: Likewise.
	* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): New.
	(_dl_x86_platforms): Likewise.
2017-05-03 13:44:35 -07:00
H.J. Lu
6fab532b47 Allow IFUNC relocation against unrelocated shared library
IFUNC relocation against definition in unrelocated shared library
will lead to segfault when the IFUNC function is called.  This
patch allows such IFUNC relocations with a warning.  This isn't
a real fix for

https://sourceware.org/bugzilla/show_bug.cgi?id=21041

It simply allows the program to load.  The program will segfault
when longjmp is called.

	* sysdeps/i386/dl-machine.h (elf_machine_rel): Replace
	_dl_fatal_printf with _dl_error_printf for IFUNC relocation
	against unrelocated shared library.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
2017-02-02 13:14:59 -08:00
Joseph Myers
bfff8b1bec Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
H.J. Lu
0e6d3adc60 Check IFUNC definition in unrelocated shared library [BZ #20019]
Calling an IFUNC function defined in unrelocated shared library may
lead to segfault.  This patch issues an error message to request
relinking the shared library if it references IFUNC function defined
in the unrelocated shared library.

	[BZ #20019]
	* sysdeps/i386/dl-machine.h (elf_machine_rel): Check IFUNC
	definition in unrelocated shared library.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
2016-10-28 09:12:15 -07:00
H.J. Lu
fb0f7a6755 X86-64: Add _dl_runtime_resolve_avx[512]_{opt|slow} [BZ #20508]
There is transition penalty when SSE instructions are mixed with 256-bit
AVX or 512-bit AVX512 load instructions.  Since _dl_runtime_resolve_avx
and _dl_runtime_profile_avx512 save/restore 256-bit YMM/512-bit ZMM
registers, there is transition penalty when SSE instructions are used
with lazy binding on AVX and AVX512 processors.

To avoid SSE transition penalty, if only the lower 128 bits of the first
8 vector registers are non-zero, we can preserve %xmm0 - %xmm7 registers
with the zero upper bits.

For AVX and AVX512 processors which support XGETBV with ECX == 1, we can
use XGETBV with ECX == 1 to check if the upper 128 bits of YMM registers
or the upper 256 bits of ZMM registers are zero.  We can restore only the
non-zero portion of vector registers with AVX/AVX512 load instructions
which will zero-extend upper bits of vector registers.

This patch adds _dl_runtime_resolve_sse_vex which saves and restores
XMM registers with 128-bit AVX store/load instructions.  It is used to
preserve YMM/ZMM registers when only the lower 128 bits are non-zero.
_dl_runtime_resolve_avx_opt and _dl_runtime_resolve_avx512_opt are added
and used on AVX/AVX512 processors supporting XGETBV with ECX == 1 so
that we store and load only the non-zero portion of vector registers.
This avoids SSE transition penalty caused by _dl_runtime_resolve_avx and
_dl_runtime_profile_avx512 when only the lower 128 bits of vector
registers are used.

_dl_runtime_resolve_avx_slow is added and used for AVX processors which
don't support XGETBV with ECX == 1.  Since there is no SSE transition
penalty on AVX512 processors which don't support XGETBV with ECX == 1,
_dl_runtime_resolve_avx512_slow isn't provided.

	[BZ #20495]
	[BZ #20508]
	* sysdeps/x86/cpu-features.c (init_cpu_features): For Intel
	processors, set Use_dl_runtime_resolve_slow and set
	Use_dl_runtime_resolve_opt if XGETBV suports ECX == 1.
	* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
	New.
	(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
	(index_arch_Use_dl_runtime_resolve_opt): Likewise.
	(index_arch_Use_dl_runtime_resolve_slow): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup): Use
	_dl_runtime_resolve_avx512_opt and _dl_runtime_resolve_avx_opt
	if Use_dl_runtime_resolve_opt is set.  Use
	_dl_runtime_resolve_slow if Use_dl_runtime_resolve_slow is set.
	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features.h>.
	(_dl_runtime_resolve_opt): New.  Defined for AVX and AVX512.
	(_dl_runtime_resolve): Add one for _dl_runtime_resolve_sse_vex.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx_slow):
	New.
	(_dl_runtime_resolve_opt): Likewise.
	(_dl_runtime_profile): Define only if _dl_runtime_profile is
	defined.
2016-09-06 08:51:07 -07:00
H.J. Lu
4facca0b0e Call init_cpu_features only if SHARED is defined
In static executable, since init_cpu_features is called early from
__libc_start_main, there is no need to call it again in dl_platform_init.

	[BZ #20072]
	* sysdeps/i386/dl-machine.h (dl_platform_init): Call
	init_cpu_features only if SHARED is defined.
	* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
2016-05-13 08:29:33 -07:00
Joseph Myers
f7a9f785e5 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
H.J. Lu
f3dcae82d5 Save and restore vector registers in x86-64 ld.so
This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve
and _dl_runtime_profile, which save and restore the first 8 vector
registers used for parameter passing.  elf_machine_runtime_setup
selects the proper _dl_runtime_resolve or _dl_runtime_profile based
on _dl_x86_cpu_features.  It avoids race condition caused by
FOREIGN_CALL macros, which are only used for x86-64.

Performance impact of saving and restoring 8 vector registers are
negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when
ld.so is optimized with SSE2.

	[BZ #15128]
	* sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add
	ifuncmain8.
	(modules-names): Add ifuncmod8.
	($(objpfx)ifuncmain8): New rule.
	* sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and
	<cpuid.h>.
	(elf_machine_runtime_setup): Use _dl_runtime_resolve_sse,
	_dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512,
	_dl_runtime_profile_sse, _dl_runtime_profile_avx, or
	_dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE.
	* sysdeps/x86_64/dl-trampoline.S: Rewrite.
	* sysdeps/x86_64/dl-trampoline.h: Likewise.
	* sysdeps/x86_64/ifuncmain8.c: New file.
	* sysdeps/x86_64/ifuncmod8.c: Likewise.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE):
	Removed.
	* sysdeps/x86_64/nptl/tls.h (__128bits): Removed.
	(tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1.
	Change rtld_savespace_sse to __glibc_unused2.
	(RTLD_CHECK_FOREIGN_CALL): Removed.
	(RTLD_ENABLE_FOREIGN_CALL): Likewise.
	(RTLD_PREPARE_FOREIGN_CALL): Likewise.
	(RTLD_FINALIZE_FOREIGN_CALL): Likewise.
2015-08-25 04:34:13 -07:00
H.J. Lu
e2e4f56056 Add _dl_x86_cpu_features to rtld_global
This patch adds _dl_x86_cpu_features to rtld_global in x86 ld.so
and initializes it early before __libc_start_main is called so that
cpu_features is always available when it is used and we can avoid
calling __init_cpu_features in IFUNC selectors.

	* sysdeps/i386/dl-machine.h: Include <cpu-features.c>.
	(dl_platform_init): Call init_cpu_features.
	* sysdeps/i386/dl-procinfo.c (_dl_x86_cpu_features): New.
	* sysdeps/i386/i686/cacheinfo.c
	(DISABLE_PREFERRED_MEMORY_INSTRUCTION): Removed.
	* sysdeps/i386/i686/multiarch/Makefile (aux): Remove init-arch.
	* sysdeps/i386/i686/multiarch/Versions: Removed.
	* sysdeps/i386/i686/multiarch/ifunc-defines.sym (KIND_OFFSET):
	Removed.
	* sysdeps/i386/ldsodefs.h: Include <cpu-features.h>.
	* sysdeps/unix/sysv/linux/x86/Makefile
	(libpthread-sysdep_routines): Remove init-arch.
	* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.c: Include
	<sysdeps/x86_64/dl-procinfo.c> instead of
	sysdeps/generic/dl-procinfo.c>.
	* sysdeps/x86/Makefile [$(subdir) == csu] (gen-as-const-headers):
	Add cpu-features-offsets.sym and rtld-global-offsets.sym.
	[$(subdir) == elf] (sysdep-dl-routines): Add dl-get-cpu-features.
	[$(subdir) == elf] (tests): Add tst-get-cpu-features.
	[$(subdir) == elf] (tests-static): Add
	tst-get-cpu-features-static.
	* sysdeps/x86/Versions: New file.
	* sysdeps/x86/cpu-features-offsets.sym: Likewise.
	* sysdeps/x86/cpu-features.c: Likewise.
	* sysdeps/x86/cpu-features.h: Likewise.
	* sysdeps/x86/dl-get-cpu-features.c: Likewise.
	* sysdeps/x86/libc-start.c: Likewise.
	* sysdeps/x86/rtld-global-offsets.sym: Likewise.
	* sysdeps/x86/tst-get-cpu-features-static.c: Likewise.
	* sysdeps/x86/tst-get-cpu-features.c: Likewise.
	* sysdeps/x86_64/dl-procinfo.c: Likewise.
	* sysdeps/x86_64/cacheinfo.c (__cpuid_count): Removed.
	Assume USE_MULTIARCH is defined and don't check it.
	(is_intel): Replace __cpu_features with GLRO(dl_x86_cpu_features).
	(is_amd): Likewise.
	(max_cpuid): Likewise.
	(intel_check_word): Likewise.
	(__cache_sysconf): Don't call __init_cpu_features.
	(__x86_preferred_memory_instruction): Removed.
	(init_cacheinfo): Don't call __init_cpu_features. Replace
	__cpu_features with GLRO(dl_x86_cpu_features).
	* sysdeps/x86_64/dl-machine.h: <cpu-features.c>.
	(dl_platform_init): Call init_cpu_features.
	* sysdeps/x86_64/ldsodefs.h: Include <cpu-features.h>.
	* sysdeps/x86_64/multiarch/Makefile (aux): Remove init-arch.
	* sysdeps/x86_64/multiarch/Versions: Removed.
	* sysdeps/x86_64/multiarch/cacheinfo.c: Likewise.
	* sysdeps/x86_64/multiarch/init-arch.c: Likewise.
	* sysdeps/x86_64/multiarch/ifunc-defines.sym (KIND_OFFSET):
	Removed.
	* sysdeps/x86_64/multiarch/init-arch.h: Rewrite.
2015-08-13 03:41:22 -07:00
H.J. Lu
62da1e3b00 Add ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA to x86
With copy relocation, address of protected data defined in the shared
library may be external.   When there is a relocation against the
protected data symbol within the shared library, we need to check if we
should skip the definition in the executable copied from the protected
data.  This patch adds ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA and defines
it for x86.  If ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA isn't 0, do_lookup_x
will skip the data definition in the executable from copy reloc.

	[BZ #17711]
	* elf/dl-lookup.c (do_lookup_x): When UNDEF_MAP is NULL, which
	indicates it is called from do_lookup_x on relocation against
	protected data, skip the data definion in the executable from
	copy reloc.
	(_dl_lookup_symbol_x): Pass ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA,
	instead of ELF_RTYPE_CLASS_PLT, to do_lookup_x for
	EXTERN_PROTECTED_DATA relocation against STT_OBJECT symbol.
	* sysdeps/generic/ldsodefs.h * (ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA):
	New.  Defined to 4 if DL_EXTERN_PROTECTED_DATA is defined,
	otherwise to 0.
	* sysdeps/i386/dl-lookupcfg.h (DL_EXTERN_PROTECTED_DATA): New.
	* sysdeps/i386/dl-machine.h (elf_machine_type_class): Set class
	to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA for R_386_GLOB_DAT.
	* sysdeps/x86_64/dl-lookupcfg.h (DL_EXTERN_PROTECTED_DATA): New.
	* sysdeps/x86_64/dl-machine.h (elf_machine_type_class): Set class
	to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA for R_X86_64_GLOB_DAT.
2015-03-31 05:16:57 -07:00
H.J. Lu
209826bcf2 Replace ELF_RTYPE_CLASS_NOCOPY with ELF_RTYPE_CLASS_COPY
ELF_RTYPE_CLASS_NOCOPY in comments is a typo.  It should be
ELF_RTYPE_CLASS_COPY.

	[BZ #18082]
	* sysdeps/alpha/dl-machine.h (elf_machine_type_class): Replace
	ELF_RTYPE_CLASS_NOCOPY with ELF_RTYPE_CLASS_COPY in comments.
	* sysdeps/arm/dl-machine.h (elf_machine_type_class): Likewise.
	* sysdeps/hppa/dl-machine.h (elf_machine_type_class): Likewise.
	* sysdeps/i386/dl-machine.h (elf_machine_type_class): Likewise.
	* sysdeps/ia64/dl-machine.h (elf_machine_type_class): Likewise.
	* sysdeps/m68k/dl-machine.h (elf_machine_type_class): Likewise.
	* sysdeps/microblaze/dl-machine.h (elf_machine_type_class):
	Likewise.
	* sysdeps/nios2/dl-machine.h (elf_machine_type_class): Likewise.
	* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_type_class):
	Likewise.
	* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_type_class):
	Likewise.
	* sysdeps/s390/s390-32/dl-machine.h (elf_machine_type_class):
	Likewise.
	* sysdeps/s390/s390-64/dl-machine.h (elf_machine_type_class):
	Likewise.
	* sysdeps/sh/dl-machine.h (elf_machine_type_class): Likewise.
	* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_type_class):
	Likewise.
	* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_type_class):
	Likewise.
	* sysdeps/tile/dl-machine.h (elf_machine_type_class): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_type_class): Likewise.
2015-03-05 08:40:41 -08:00
Joseph Myers
b168057aaa Update copyright dates with scripts/update-copyrights. 2015-01-02 16:29:47 +00:00