Commit Graph

37925 Commits

Author SHA1 Message Date
Noah Goldstein
c796418d00 x86: Optimize L(less_vec) case in memcmp-evex-movbe.S
No bug.
Optimizations are twofold.

1) Replace page cross and 0/1 checks with masked load instructions in
   L(less_vec). In applications this reduces branch-misses in the
   hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.

Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range.  From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit abddd61de0)
2022-04-26 18:18:16 -07:00
H.J. Lu
f3a99b2216 x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
they won't lower CPU frequency when ZMM load and store instructions are
used.

(cherry picked from commit ceeffe968c)
2022-04-26 18:18:16 -07:00
Noah Goldstein
4bbd0f866a x86-64: Use notl in EVEX strcmp [BZ #28646]
Must use notl %edi here as lower bits are for CHAR comparisons
potentially out of range thus can be 0 without indicating mismatch.
This fixes BZ #28646.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 4df1fa6ddc)
2022-04-26 18:18:16 -07:00
Noah Goldstein
7cb126e7e7 x86: Shrink memcmp-sse4.S code size
No bug.

This implementation refactors memcmp-sse4.S primarily with minimizing
code size in mind. It does this by removing the lookup table logic and
removing the unrolled check from (256, 512] bytes.

memcmp-sse4 code size reduction : -3487 bytes
wmemcmp-sse4 code size reduction: -1472 bytes

The current memcmp-sse4.S implementation has a large code size
cost. This has serious adverse affects on the ICache / ITLB. While
in micro-benchmarks the implementations appears fast, traces of
real-world code have shown that the speed in micro benchmarks does not
translate when the ICache/ITLB are not primed, and that the cost
of the code size has measurable negative affects on overall
application performance.

See https://research.google/pubs/pub48320/ for more details.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 2f9062d717)
2022-04-26 18:18:16 -07:00
Noah Goldstein
cecbac5212 x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h
No bug.

This patch doubles the rep_movsb_threshold when using ERMS. Based on
benchmarks the vector copy loop, especially now that it handles 4k
aliasing, is better for these medium ranged.

On Skylake with ERMS:

Size,   Align1, Align2, dst>src,(rep movsb) / (vec copy)
4096,   0,      0,      0,      0.975
4096,   0,      0,      1,      0.953
4096,   12,     0,      0,      0.969
4096,   12,     0,      1,      0.872
4096,   44,     0,      0,      0.979
4096,   44,     0,      1,      0.83
4096,   0,      12,     0,      1.006
4096,   0,      12,     1,      0.989
4096,   0,      44,     0,      0.739
4096,   0,      44,     1,      0.942
4096,   12,     12,     0,      1.009
4096,   12,     12,     1,      0.973
4096,   44,     44,     0,      0.791
4096,   44,     44,     1,      0.961
4096,   2048,   0,      0,      0.978
4096,   2048,   0,      1,      0.951
4096,   2060,   0,      0,      0.986
4096,   2060,   0,      1,      0.963
4096,   2048,   12,     0,      0.971
4096,   2048,   12,     1,      0.941
4096,   2060,   12,     0,      0.977
4096,   2060,   12,     1,      0.949
8192,   0,      0,      0,      0.85
8192,   0,      0,      1,      0.845
8192,   13,     0,      0,      0.937
8192,   13,     0,      1,      0.939
8192,   45,     0,      0,      0.932
8192,   45,     0,      1,      0.927
8192,   0,      13,     0,      0.621
8192,   0,      13,     1,      0.62
8192,   0,      45,     0,      0.53
8192,   0,      45,     1,      0.516
8192,   13,     13,     0,      0.664
8192,   13,     13,     1,      0.659
8192,   45,     45,     0,      0.593
8192,   45,     45,     1,      0.575
8192,   2048,   0,      0,      0.854
8192,   2048,   0,      1,      0.834
8192,   2061,   0,      0,      0.863
8192,   2061,   0,      1,      0.857
8192,   2048,   13,     0,      0.63
8192,   2048,   13,     1,      0.629
8192,   2061,   13,     0,      0.627
8192,   2061,   13,     1,      0.62

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 475b63702e)
2022-04-26 18:18:16 -07:00
Noah Goldstein
a7392db2ff x86: Optimize memmove-vec-unaligned-erms.S
No bug.

The optimizations are as follows:

1) Always align entry to 64 bytes. This makes behavior more
   predictable and makes other frontend optimizations easier.

2) Make the L(more_8x_vec) cases 4k aliasing aware. This can have
   significant benefits in the case that:
        0 < (dst - src) < [256, 512]

3) Align before `rep movsb`. For ERMS this is roughly a [0, 30%]
   improvement and for FSRM [-10%, 25%].

In addition to these primary changes there is general cleanup
throughout to optimize the aligning routines and control flow logic.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit a6b7502ec0)
2022-04-26 18:18:16 -07:00
Fangrui Song
2e64237a87 x86-64: Replace movzx with movzbl
Clang cannot assemble movzx in the AT&T dialect mode.

../sysdeps/x86_64/strcmp.S:2232:16: error: invalid operand for instruction
 movzx (%rsi), %ecx
               ^~~~

Change movzx to movzbl, which follows the AT&T dialect and is used
elsewhere in the file.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit 6720d36b66)
2022-04-26 18:18:16 -07:00
H.J. Lu
a182bb7a39 x86-64: Remove Prefer_AVX2_STRCMP
Remove Prefer_AVX2_STRCMP to enable EVEX strcmp.  When comparing 2 32-byte
strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1
VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD
and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1
VPMOVMSKB and 1 TESTL.  EVEX strcmp is now faster than AVX2 strcmp by up
to 40% on Tiger Lake and Ice Lake.

(cherry picked from commit 14dbbf46a0)
2022-04-26 18:18:16 -07:00
H.J. Lu
f35ad30da4 x86-64: Improve EVEX strcmp with masked load
In strcmp-evex.S, to compare 2 32-byte strings, replace

        VMOVU   (%rdi, %rdx), %YMM0
        VMOVU   (%rsi, %rdx), %YMM1
        /* Each bit in K0 represents a mismatch in YMM0 and YMM1.  */
        VPCMP   $4, %YMM0, %YMM1, %k0
        VPCMP   $0, %YMMZERO, %YMM0, %k1
        VPCMP   $0, %YMMZERO, %YMM1, %k2
        /* Each bit in K1 represents a NULL in YMM0 or YMM1.  */
        kord    %k1, %k2, %k1
        /* Each bit in K1 represents a NULL or a mismatch.  */
        kord    %k0, %k1, %k1
        kmovd   %k1, %ecx
        testl   %ecx, %ecx
        jne     L(last_vector)

with

        VMOVU   (%rdi, %rdx), %YMM0
        VPTESTM %YMM0, %YMM0, %k2
        /* Each bit cleared in K1 represents a mismatch or a null CHAR
           in YMM0 and 32 bytes at (%rsi, %rdx).  */
        VPCMP   $0, (%rsi, %rdx), %YMM0, %k1{%k2}
        kmovd   %k1, %ecx
        incl    %ecx
        jne     L(last_vector)

It makes EVEX strcmp faster than AVX2 strcmp by up to 40% on Tiger Lake
and Ice Lake.

Co-Authored-By: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit c46e9afb2d)
2022-04-26 18:18:16 -07:00
Noah Goldstein
baf3ece634 x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S
This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'.

it could potentially be dangerous to use SSE2 if this function is ever
called without using 'vzeroupper' beforehand. While compilers appear
to use 'vzeroupper' before function calls if AVX2 has been used, using
SSE2 here is more brittle. Since it is not absolutely necessary it
should be avoided.

It costs 2-extra bytes but the extra bytes should only eat into
alignment padding.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit bad852b61b)
2022-04-26 18:18:16 -07:00
Noah Goldstein
6d18a93dbb x86: Optimize memset-vec-unaligned-erms.S
No bug.

Optimization are

1. change control flow for L(more_2x_vec) to fall through to loop and
   jump for L(less_4x_vec) and L(less_8x_vec). This uses less code
   size and saves jumps for length > 4x VEC_SIZE.

2. For EVEX/AVX512 move L(less_vec) closer to entry.

3. Avoid complex address mode for length > 2x VEC_SIZE

4. Slightly better aligning code for the loop from the perspective of
   code size and uops.

5. Align targets so they make full use of their fetch block and if
   possible cache line.

6. Try and reduce total number of icache lines that will need to be
   pulled in for a given length.

7. Include "local" version of stosb target. For AVX2/EVEX/AVX512
   jumping to the stosb target in the sse2 code section will almost
   certainly be to a new page. The new version does increase code size
   marginally by duplicating the target but should get better iTLB
   behavior as a result.

test-memset, test-wmemset, and test-bzero are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
(cherry picked from commit e59ced2384)
2022-04-26 18:18:16 -07:00
Noah Goldstein
5ec3416853 x86: Optimize memcmp-evex-movbe.S for frontend behavior and size
No bug.

The frontend optimizations are to:
1. Reorganize logically connected basic blocks so they are either in
   the same cache line or adjacent cache lines.
2. Avoid cases when basic blocks unnecissarily cross cache lines.
3. Try and 32 byte align any basic blocks possible without sacrificing
   code size. Smaller / Less hot basic blocks are used for this.

Overall code size shrunk by 168 bytes. This should make up for any
extra costs due to aligning to 64 bytes.

In general performance before deviated a great deal dependending on
whether entry alignment % 64 was 0, 16, 32, or 48. These changes
essentially make it so that the current implementation is at least
equal to the best alignment of the original for any arguments.

The only additional optimization is in the page cross case. Branch on
equals case was removed from the size == [4, 7] case. As well the [4,
7] and [2, 3] case where swapped as [4, 7] is likely a more hot
argument size.

test-memcmp and test-wmemcmp are both passing.

(cherry picked from commit 1bd8b8d58f)
2022-04-26 18:18:16 -07:00
Noah Goldstein
b5a44a6a47 x86: Modify ENTRY in sysdep.h so that p2align can be specified
No bug.

This change adds a new macro ENTRY_P2ALIGN which takes a second
argument, log2 of the desired function alignment.

The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this
doesn't affect any existing functionality.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit fc5bd179ef)
2022-04-26 18:18:16 -07:00
H.J. Lu
16245986fb x86-64: Optimize load of all bits set into ZMM register [BZ #28252]
Optimize loads of all bits set into ZMM register in AVX512 SVML codes
by replacing

	vpbroadcastq .L_2il0floatpacket.16(%rip), %zmmX

and

	vmovups   .L_2il0floatpacket.13(%rip), %zmmX

with
	vpternlogd $0xff, %zmmX, %zmmX, %zmmX

This fixes BZ #28252.

(cherry picked from commit 78c9ec9000)
2022-04-26 18:18:15 -07:00
Florian Weimer
83cc145830 scripts/glibcelf.py: Mark as UNSUPPORTED on Python 3.5 and earlier
enum.IntFlag and enum.EnumMeta._missing_ support are not part of
earlier Python versions.

(cherry picked from commit b571f3adff)
2022-04-26 15:59:12 +02:00
Florian Weimer
bc56ab1f4a dlfcn: Do not use rtld_active () to determine ld.so state (bug 29078)
When audit modules are loaded, ld.so initialization is not yet
complete, and rtld_active () returns false even though ld.so is
mostly working.  Instead, the static dlopen hook is used, but that
does not work at all because this is not a static dlopen situation.

Commit 466c1ea15f ("dlfcn: Rework
static dlopen hooks") moved the hook pointer into _rtld_global_ro,
which means that separate protection is not needed anymore and the
hook pointer can be checked directly.

The guard for disabling libio vtable hardening in _IO_vtable_check
should stay for now.

Fixes commit 8e1472d2c1 ("ld.so:
Examine GLRO to detect inactive loader [BZ #20204]").

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
(cherry picked from commit 8dcb6d0af0)
2022-04-26 15:28:39 +02:00
Florian Weimer
0d477e92c4 INSTALL: Rephrase -with-default-link documentation
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit c935789bdf)
2022-04-26 15:27:43 +02:00
Joan Bruguera
ca0faa140f misc: Fix rare fortify crash on wchar funcs. [BZ 29030]
If `__glibc_objsize (__o) == (size_t) -1` (i.e. `__o` is unknown size), fortify
checks should pass, and `__whatever_alias` should be called.

Previously, `__glibc_objsize (__o) == (size_t) -1` was explicitly checked, but
on commit a643f60c53, this was moved into `__glibc_safe_or_unknown_len`.

A comment says the -1 case should work as: "The -1 check is redundant because
since it implies that __glibc_safe_len_cond is true.". But this fails when:
* `__s > 1`
* `__osz == -1` (i.e. unknown size at compile time)
* `__l` is big enough
* `__l * __s <= __osz` can be folded to a constant
(I only found this to be true for `mbsrtowcs` and other functions in wchar2.h)

In this case `__l * __s <= __osz` is false, and `__whatever_chk_warn` will be
called by `__glibc_fortify` or `__glibc_fortify_n` and crash the program.

This commit adds the explicit `__osz == -1` check again.
moc crashes on startup due to this, see: https://bugs.archlinux.org/task/74041

Minimal test case (test.c):
    #include <wchar.h>

    int main (void)
    {
        const char *hw = "HelloWorld";
        mbsrtowcs (NULL, &hw, (size_t)-1, NULL);
        return 0;
    }

Build with:
    gcc -O2 -Wp,-D_FORTIFY_SOURCE=2 test.c -o test && ./test

Output:
    *** buffer overflow detected ***: terminated

Fixes: BZ #29030
Signed-off-by: Joan Bruguera <joanbrugueram@gmail.com>
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 33e03f9cd2)
2022-04-25 18:44:27 +05:30
Florian Weimer
f0c71b34f9 Default to --with-default-link=no (bug 25812)
This is necessary to place the libio vtables into the RELRO segment.
New tests elf/tst-relro-ldso and elf/tst-relro-libc are added to
verify that this is what actually happens.

The new tests fail on ia64 due to lack of (default) RELRO support
inbutils, so they are XFAILed there.

(cherry picked from commit 198abcbb94)
2022-04-22 11:31:14 +02:00
Florian Weimer
3e0a91b79b scripts: Add glibcelf.py module
Hopefully, this will lead to tests that are easier to maintain.  The
current approach of parsing readelf -W output using regular expressions
is not necessarily easier than parsing the ELF data directly.

This module is still somewhat incomplete (e.g., coverage of relocation
types and versioning information is missing), but it is sufficient to
perform basic symbol analysis or program header analysis.

The EM_* mapping for architecture-specific constant classes (e.g.,
SttX86_64) is not yet implemented.  The classes are defined for the
benefit of elf/tst-glibcelf.py.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
(cherry picked from commit 30035d6772)
2022-04-22 11:28:57 +02:00
Adhemerval Zanella
71326f1f2f nptl: Fix pthread_cancel cancelhandling atomic operations
The 404656009b reversion did not setup the atomic loop to set the
cancel bits correctly.  The fix is essentially what pthread_cancel
did prior 26cfbb7162.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

(cherry picked from commit 62be968167)
2022-04-20 12:22:34 -03:00
=Joshua Kinard
b87b697f15 mips: Fix mips64n32 64 bit time_t stat support (BZ#29069)
Add missing support initially added by 4e8521333b
(which missed n32 stat).

(cherry picked from commit 78fb888273)
2022-04-18 13:16:21 -03:00
Samuel Thibault
5d8c777634 hurd: Fix arbitrary error code
ELIBBAD is Linux-specific.

(cherry picked from commit 67ab66541d)
2022-04-18 17:54:13 +02:00
Adhemerval Zanella
290db09546 nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029)
Some Linux interfaces never restart after being interrupted by a signal
handler, regardless of the use of SA_RESTART [1].  It means that for
pthread cancellation, if the target thread disables cancellation with
pthread_setcancelstate and calls such interfaces (like poll or select),
it should not see spurious EINTR failures due the internal SIGCANCEL.

However recent changes made pthread_cancel to always sent the internal
signal, regardless of the target thread cancellation status or type.
To fix it, the previous semantic is restored, where the cancel signal
is only sent if the target thread has cancelation enabled in
asynchronous mode.

The cancel state and cancel type is moved back to cancelhandling
and atomic operation are used to synchronize between threads.  The
patch essentially revert the following commits:

  8c1c0aae20 nptl: Move cancel type out of cancelhandling
  2b51742531 nptl: Move cancel state out of cancelhandling
  26cfbb7162 nptl: Remove CANCELING_BITMASK

However I changed the atomic operation to follow the internal C11
semantic and removed the MACRO usage, it simplifies a bit the
resulting code (and removes another usage of the old atomic macros).

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
and powerpc64-linux-gnu.

[1] https://man7.org/linux/man-pages/man7/signal.7.html

Reviewed-by: Florian Weimer <fweimer@redhat.com>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>

(cherry-picked from commit 404656009b)
2022-04-15 09:52:54 -03:00
Stefan Liebler
0c03cb54c8 S390: Add new s390 platform z16.
The new IBM z16 is added to platform string array.
The macro _DL_PLATFORMS_COUNT is incremented.

_dl_hwcaps_subdir is extended by "z16" if HWCAP_S390_VXRS_PDE2
is set. HWCAP_S390_NNPA is not tested in _dl_hwcaps_subdirs_active
as those instructions may be replaced or removed in future.

tst-glibc-hwcaps.c is extended in order to test z16 via new marker5.

A fatal glibc error is dumped if glibc was build with architecture
level set for z16, but run on an older machine. (See dl-hwcap-check.h)

(cherry picked from commit 2376944b9e)
2022-04-14 14:21:57 +02:00
Carlos O'Donell
ceed89d089 NEWS: Update fixed bug list for LD_AUDIT backports. 2022-04-12 13:49:31 -04:00
Adhemerval Zanella
4dca2d3a7b hppa: Fix bind-now audit (BZ #28857)
On hppa, a function pointer returned by la_symbind is actually a function
descriptor has the plabel bit set (bit 30).  This must be cleared to get
the actual address of the descriptor.  If the descriptor has been bound,
the first word of the descriptor is the physical address of theA function,
otherwise, the first word of the descriptor points to a trampoline in the
PLT.

This patch also adds a workaround on tests because on hppa (and it seems
to be the only ABI I have see it), some shared library adds a dynamic PLT
relocation to am empty symbol name:

$ readelf -r elf/tst-audit25mod1.so
[...]
Relocation section '.rela.plt' at offset 0x464 contains 6 entries:
 Offset     Info    Type            Sym.Value  Sym. Name + Addend
00002008  00000081 R_PARISC_IPLT                508
[...]

It breaks some assumptions on the test, where a symbol with an empty
name ("") is passed on la_symbind.

Checked on x86_64-linux-gnu and hppa-linux-gnu.

(cherry picked from commit 9e94f57484)
2022-04-12 13:33:17 -04:00
H.J. Lu
aabdad371f elf: Replace tst-audit24bmod2.so with tst-audit24bmod2
Replace tst-audit24bmod2.so with tst-audit24bmod2 to silence:

make[2]: Entering directory '/export/gnu/import/git/gitlab/x86-glibc/elf'
Makefile:2201: warning: overriding recipe for target '/export/build/gnu/tools-build/glibc-gitlab/build-x86_64-linux/elf/tst-audit24bmod2.so'
../Makerules:765: warning: ignoring old recipe for target '/export/build/gnu/tools-build/glibc-gitlab/build-x86_64-linux/elf/tst-audit24bmod2.so'

(cherry picked from commit fa7ad1df19)
2022-04-12 13:33:17 -04:00
Szabolcs Nagy
165e7ad459 Fix elf/tst-audit25a with default bind now toolchains
This test relies on lazy binding for the executable so request that
explicitly in case the toolchain defaults to bind now.

(cherry picked from commit 80a08d0faa)
2022-04-12 13:33:17 -04:00
Ben Woodard
b118bce87a elf: Fix runtime linker auditing on aarch64 (BZ #26643)
The rtld audit support show two problems on aarch64:

  1. _dl_runtime_resolve does not preserve x8, the indirect result
      location register, which might generate wrong result calls
      depending of the function signature.

  2. The NEON Q registers pushed onto the stack by _dl_runtime_resolve
     were twice the size of D registers extracted from the stack frame by
     _dl_runtime_profile.

While 2. might result in wrong information passed on the PLT tracing,
1. generates wrong runtime behaviour.

The aarch64 rtld audit support is changed to:

  * Both La_aarch64_regs and La_aarch64_retval are expanded to include
    both x8 and the full sized NEON V registers, as defined by the
    ABI.

  * dl_runtime_profile needed to extract registers saved by
    _dl_runtime_resolve and put them into the new correctly sized
    La_aarch64_regs structure.

  * The LAV_CURRENT check is change to only accept new audit modules
    to avoid the undefined behavior of not save/restore x8.

  * Different than other architectures, audit modules older than
    LAV_CURRENT are rejected (both La_aarch64_regs and La_aarch64_retval
    changed their layout and there are no requirements to support multiple
    audit interface with the inherent aarch64 issues).

  * A new field is also reserved on both La_aarch64_regs and
    La_aarch64_retval to support variant pcs symbols.

Similar to x86, a new La_aarch64_vector type to represent the NEON
register is added on the La_aarch64_regs (so each type can be accessed
directly).

Since LAV_CURRENT was already bumped to support bind-now, there is
no need to increase it again.

Checked on aarch64-linux-gnu.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit ce9a68c57c)

Resolved conflicts:
	NEWS
	elf/rtld.c
2022-04-12 13:33:10 -04:00
Adhemerval Zanella
056fc1c0e3 elf: Issue la_symbind for bind-now (BZ #23734)
The audit symbind callback is not called for binaries built with
-Wl,-z,now or when LD_BIND_NOW=1 is used, nor the PLT tracking callbacks
(plt_enter and plt_exit) since this would change the expected
program semantics (where no PLT is expected) and would have performance
implications (such as for BZ#15533).

LAV_CURRENT is also bumped to indicate the audit ABI change (where
la_symbind flags are set by the loader to indicate no possible PLT
trace).

To handle powerpc64 ELFv1 function descriptor, _dl_audit_symbind
requires to know whether bind-now is used so the symbol value is
updated to function text segment instead of the OPD (for lazy binding
this is done by PPC64_LOAD_FUNCPTR on _dl_runtime_resolve).

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
powerpc64-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 32612615c5)

Resolved conflicts:
	NEWS - Manual merge.
2022-04-12 13:32:59 -04:00
Adhemerval Zanella
efb21b5fb2 elf: Fix initial-exec TLS access on audit modules (BZ #28096)
For audit modules and dependencies with initial-exec TLS, we can not
set the initial TLS image on default loader initialization because it
would already be set by the audit setup.  However, subsequent thread
creation would need to follow the default behaviour.

This patch fixes it by setting l_auditing link_map field not only
for the audit modules, but also for all its dependencies.  This is
used on _dl_allocate_tls_init to avoid the static TLS initialization
at load time.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 254d3d5aef)
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
98047ba95c elf: Add la_activity during application exit
la_activity is not called during application exit, even though
la_objclose is.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
(cherry picked from commit 5fa11a2bc9)
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
2255621f0e elf: Do not fail for failed dlmopen on audit modules (BZ #28061)
The dl_main sets the LM_ID_BASE to RT_ADD just before starting to
add load new shared objects.  The state is set to RT_CONSISTENT just
after all objects are loaded.

However if a audit modules tries to dlmopen an inexistent module,
the _dl_open will assert that the namespace is in an inconsistent
state.

This is different than dlopen, since first it will not use
LM_ID_BASE and second _dl_map_object_from_fd is the sole responsible
to set and reset the r_state value.

So the assert on _dl_open can not really be seen if the state is
consistent, since _dt_main resets it.  This patch removes the assert.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 484e672dda)

Resolved conflicts:
	elf/Makefile
	elf/dl-open.c
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
d1b9bee29a elf: Issue audit la_objopen for vDSO
The vDSO is is listed in the link_map chain, but is never the subject of
an la_objopen call.  A new internal flag __RTLD_VDSO is added that
acts as __RTLD_OPENEXEC to allocate the required 'struct auditstate'
extra space for the 'struct link_map'.

The return value from the callback is currently ignored, since there
is no PLT call involved by glibc when using the vDSO, neither the vDSO
are exported directly.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit f0e23d34a7)

Resolved conflicts:
	elf/Makefile
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
02c6a3d353 elf: Add audit tests for modules with TLSDESC
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit d1b38173c9)
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
29496b3103 elf: Avoid unnecessary slowdown from profiling with audit (BZ#15533)
The rtld-audit interfaces introduces a slowdown due to enabling
profiling instrumentation (as if LD_AUDIT implied LD_PROFILE).
However, instrumenting is only necessary if one of audit libraries
provides PLT callbacks (la_pltenter or la_pltexit symbols).  Otherwise,
the slowdown can be avoided.

The following patch adjusts the logic that enables profiling to iterate
over all audit modules and check if any of those provides a PLT hook.
To keep la_symbind to work even without PLT callbacks, _dl_fixup now
calls the audit callback if the modules implements it.

Co-authored-by: Alexander Monakov <amonakov@ispras.ru>

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 063f9ba220)

Resolved conflicts:
	NEWS
	elf/Makefile
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
a8e211daea elf: Add _dl_audit_pltexit
It consolidates the code required to call la_pltexit audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 8c0664e2b8)

Resolved conflicts:
	sysdeps/hppa/dl-runtime.c
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
fd9c4e8a1b elf: Add _dl_audit_pltenter
It consolidates the code required to call la_pltenter audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit eff687e846)
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
31473c273b elf: Add _dl_audit_preinit
It consolidates the code required to call la_preinit audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 0b98a87487)
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
b2d99731b6 elf: Add _dl_audit_symbind_alt and _dl_audit_symbind
It consolidates the code required to call la_symbind{32,64} audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit cda4f265c6)
2022-04-08 14:18:12 -04:00
Adhemerval Zanella
198660741b elf: Add _dl_audit_objclose
It consolidates the code required to call la_objclose audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 311c9ee54e)
2022-04-08 14:18:11 -04:00
Adhemerval Zanella
ec0fc2a153 elf: Add _dl_audit_objsearch
It consolidates the code required to call la_objsearch audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit c91008d349)
2022-04-08 14:18:11 -04:00
Adhemerval Zanella
66e9d27a09 elf: Add _dl_audit_activity_map and _dl_audit_activity_nsid
It consolidates the code required to call la_activity audit
callback.

Also for a new Lmid_t the namespace link_map list are empty, so it
requires to check if before using it.  This can happen for when audit
module is used along with dlmopen.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit 3dac3959a5)
2022-04-08 14:18:11 -04:00
Adhemerval Zanella
ce0cb6d1d2 elf: Add _dl_audit_objopen
It consolidates the code required to call la_objopen audit callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit aee6e90f93)

Resolved conflicts:
	elf/Makefile
2022-04-08 14:18:11 -04:00
Adhemerval Zanella
e25fe99213 elf: Move la_activity (LA_ACT_ADD) after _dl_add_to_namespace_list() (BZ #28062)
It ensures that the the namespace is guaranteed to not be empty.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
(cherry picked from commit ed3ce71f5c)

Resolved conflicts:
	elf/Makefile
2022-04-08 14:18:11 -04:00
Adhemerval Zanella
a31bbe3242 elf: Move LAV_CURRENT to link_lavcurrent.h
No functional change.

(cherry picked from commit 54816ae98d)

Resolved conflicts:
	elf/Makefile
2022-04-08 14:18:11 -04:00
Adhemerval Zanella
f6a54a3042 elf: Fix elf_get_dynamic_info() for bootstrap
THe d6d89608ac broke powerpc for --enable-bind-now because it turned
out that different than patch assumption rtld elf_get_dynamic_info()
does require to handle RTLD_BOOTSTRAP to avoid DT_FLAGS and
DT_RUNPATH (more specially the GLRO usage which is not reallocate
yet).

This patch fixes by passing two arguments to elf_get_dynamic_info()
to inform that by rtld (bootstrap) or static pie initialization
(static_pie_bootstrap).  I think using explicit argument is way more
clear and burried C preprocessor, and compiler should remove the
dead code.

I checked on x86_64 and i686 with default options, --enable-bind-now,
and --enable-bind-now and --enable--static-pie.  I also check on
aarch64, armhf, powerpc64, and powerpc with default and
--enable-bind-now.

(cherry picked from commit 5118dcac68)

Resolved conflicts:
	elf/rtld.c - Manual merge.
2022-04-08 14:18:11 -04:00
Adhemerval Zanella
b868b45f67 elf: Fix dynamic-link.h usage on rtld.c
The 4af6982e4c fix does not fully handle RTLD_BOOTSTRAP usage on
rtld.c due two issues:

  1. RTLD_BOOTSTRAP is also used on dl-machine.h on various
     architectures and it changes the semantics of various machine
     relocation functions.

  2. The elf_get_dynamic_info() change was done sideways, previously
     to 490e6c62aa get-dynamic-info.h was included by the first
     dynamic-link.h include *without* RTLD_BOOTSTRAP being defined.
     It means that the code within elf_get_dynamic_info() that uses
     RTLD_BOOTSTRAP is in fact unused.

To fix 1. this patch now includes dynamic-link.h only once with
RTLD_BOOTSTRAP defined.  The ELF_DYNAMIC_RELOCATE call will now have
the relocation fnctions with the expected semantics for the loader.

And to fix 2. part of 4af6982e4c is reverted (the check argument
elf_get_dynamic_info() is not required) and the RTLD_BOOTSTRAP
pieces are removed.

To reorganize the includes the static TLS definition is moved to
its own header to avoid a circular dependency (it is defined on
dynamic-link.h and dl-machine.h requires it at same time other
dynamic-link.h definition requires dl-machine.h defitions).

Also ELF_MACHINE_NO_REL, ELF_MACHINE_NO_RELA, and ELF_MACHINE_PLT_REL
are moved to its own header.  Only ancient ABIs need special values
(arm, i386, and mips), so a generic one is used as default.

The powerpc Elf64_FuncDesc is also moved to its own header, since
csu code required its definition (which would require either include
elf/ folder or add a full path with elf/).

Checked on x86_64, i686, aarch64, armhf, powerpc64, powerpc32,
and powerpc64le.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
(cherry picked from commit d6d89608ac)

Resolved conflicts:
	elf/rtld.c
2022-04-08 14:18:11 -04:00
Adhemerval Zanella
c6df39a0bd elf: Fix elf_get_dynamic_info definition
Before to 490e6c62aa ('elf: Avoid nested functions in the loader
[BZ #27220]'), elf_get_dynamic_info() was defined twice on rtld.c: on
the first dynamic-link.h include and later within _dl_start().  The
former definition did not define DONT_USE_BOOTSTRAP_MAP and it is used
on setup_vdso() (since it is a global definition), while the former does
define DONT_USE_BOOTSTRAP_MAP and it is used on loader self-relocation.

With the commit change, the function is now included and defined once
instead of defined as a nested function.  So rtld.c defines without
defining RTLD_BOOTSTRAP and it brokes at least powerpc32.

This patch fixes by moving the get-dynamic-info.h include out of
dynamic-link.h, which then the caller can corirectly set the expected
semantic by defining STATIC_PIE_BOOTSTRAP, RTLD_BOOTSTRAP, and/or
RESOLVE_MAP.

It also required to enable some asserts only for the loader bootstrap
to avoid issues when called from setup_vdso().

As a side note, this is another issues with nested functions: it is
not clear from pre-processed output (-E -dD) how the function will
be build and its semantic (since nested function will be local and
extra C defines may change it).

I checked on x86_64-linux-gnu (w/o --enable-static-pie),
i686-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu-power4,
aarch64-linux-gnu, arm-linux-gnu, sparc64-linux-gnu, and
s390x-linux-gnu.

Reviewed-by: Fangrui Song <maskray@google.com>
(cherry picked from commit 4af6982e4c)

Resolved conflicts:
	elf/rtld.c
2022-04-08 14:18:11 -04:00