Before commit ee1ada1bdb
("elf: Rework exception handling in the dynamic loader
[BZ #25486]"), the previous order called the main calloc
to allocate a shadow GOT/PLT array for auditing support.
This happened before libc.so.6 ELF constructors were run, so
a user malloc could run without libc.so.6 having been
initialized fully. One observable effect was that
environ was NULL at this point.
It does not seem to be possible at present to trigger such
an allocation, but it seems more robust to delay switching
to main malloc after ld.so self-relocation is complete.
The elf/tst-rtld-no-malloc-audit test case fails with a
2.34-era glibc that does not have this fix.
Reviewed-by: DJ Delorie <dj@redhat.com>
Previously, a la_activity audit event was generated before
relocation processing completed. This does did not match what
happened during initial startup in elf/rtld.c (towards the end
of dl_main). It also caused various problems if an auditor
tried to open the same shared object again using dlmopen:
If it was the directly loaded object, it had a search scope
associated with it, so the early exit in dl_open_worker_begin
was taken even though the object was unrelocated. This caused
the r_state == RT_CONSISTENT assert to fail. Avoidance of the
assert also depends on reversing the order of r_state update
and auditor event (already implemented in a previous commit).
At the later point, args->map can be NULL due to failure,
so use the assigned namespace ID instead if that is available.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This is conceptually similar to the reported bug, but does not
depend on auditing. The fix is simple: just complete execution
of the constructors. This exposed the fact that the link map
for statically linked executables does not have l_init_called
set, even though constructors have run.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add new testcase elf/tst-startup-errno.c which tests that errno is set
to 0 at first ELF constructor execution and at the start of the
program's main function.
Tested for x86_64
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The current racy approach is to enable asynchronous cancellation
before making the syscall and restore the previous cancellation
type once the syscall returns, and check if cancellation has happen
during the cancellation entrypoint.
As described in BZ#12683, this approach shows 2 problems:
1. Cancellation can act after the syscall has returned from the
kernel, but before userspace saves the return value. It might
result in a resource leak if the syscall allocated a resource or a
side effect (partial read/write), and there is no way to program
handle it with cancellation handlers.
2. If a signal is handled while the thread is blocked at a cancellable
syscall, the entire signal handler runs with asynchronous
cancellation enabled. This can lead to issues if the signal
handler call functions which are async-signal-safe but not
async-cancel-safe.
For the cancellation to work correctly, there are 5 points at which the
cancellation signal could arrive:
[ ... )[ ... )[ syscall ]( ...
1 2 3 4 5
1. Before initial testcancel, e.g. [*... testcancel)
2. Between testcancel and syscall start, e.g. [testcancel...syscall start)
3. While syscall is blocked and no side effects have yet taken
place, e.g. [ syscall ]
4. Same as 3 but with side-effects having occurred (e.g. a partial
read or write).
5. After syscall end e.g. (syscall end...*]
And libc wants to act on cancellation in cases 1, 2, and 3 but not
in cases 4 or 5. For the 4 and 5 cases, the cancellation will eventually
happen in the next cancellable entrypoint without any further external
event.
The proposed solution for each case is:
1. Do a conditional branch based on whether the thread has received
a cancellation request;
2. It can be caught by the signal handler determining that the saved
program counter (from the ucontext_t) is in some address range
beginning just before the "testcancel" and ending with the
syscall instruction.
3. SIGCANCEL can be caught by the signal handler and determine that
the saved program counter (from the ucontext_t) is in the address
range beginning just before "testcancel" and ending with the first
uninterruptable (via a signal) syscall instruction that enters the
kernel.
4. In this case, except for certain syscalls that ALWAYS fail with
EINTR even for non-interrupting signals, the kernel will reset
the program counter to point at the syscall instruction during
signal handling, so that the syscall is restarted when the signal
handler returns. So, from the signal handler's standpoint, this
looks the same as case 2, and thus it's taken care of.
5. For syscalls with side-effects, the kernel cannot restart the
syscall; when it's interrupted by a signal, the kernel must cause
the syscall to return with whatever partial result is obtained
(e.g. partial read or write).
6. The saved program counter points just after the syscall
instruction, so the signal handler won't act on cancellation.
This is similar to 4. since the program counter is past the syscall
instruction.
So The proposed fixes are:
1. Remove the enable_asynccancel/disable_asynccancel function usage in
cancellable syscall definition and instead make them call a common
symbol that will check if cancellation is enabled (__syscall_cancel
at nptl/cancellation.c), call the arch-specific cancellable
entry-point (__syscall_cancel_arch), and cancel the thread when
required.
2. Provide an arch-specific generic system call wrapper function
that contains global markers. These markers will be used in
SIGCANCEL signal handler to check if the interruption has been
called in a valid syscall and if the syscalls has side-effects.
A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c
is provided. However, the markers may not be set on correct
expected places depending on how INTERNAL_SYSCALL_NCS is
implemented by the architecture. It is expected that all
architectures add an arch-specific implementation.
3. Rewrite SIGCANCEL asynchronous handler to check for both canceling
type and if current IP from signal handler falls between the global
markers and act accordingly.
4. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to
use the appropriate cancelable syscalls.
5. Adjust 'lowlevellock-futex.h' arch-specific implementations to
provide cancelable futex calls.
Some architectures require specific support on syscall handling:
* On i386 the syscall cancel bridge needs to use the old int80
instruction because the optimized vDSO symbol the resulting PC value
for an interrupted syscall points to an address outside the expected
markers in __syscall_cancel_arch. It has been discussed in LKML [1]
on how kernel could help userland to accomplish it, but afaik
discussion has stalled.
Also, sysenter should not be used directly by libc since its calling
convention is set by the kernel depending of the underlying x86 chip
(check kernel commit 30bfa7b3488bfb1bb75c9f50a5fcac1832970c60).
* mips o32 is the only kABI that requires 7 argument syscall, and to
avoid add a requirement on all architectures to support it, mips
support is added with extra internal defines.
Checked on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc-linux-gnu,
powerpc64-linux-gnu, powerpc64le-linux-gnu, i686-linux-gnu, and
x86_64-linux-gnu.
[1] https://lkml.org/lkml/2016/3/8/1105
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The old code used l_init_called as an indicator for whether TLS
initialization was complete. However, it is possible that
TLS for an object is initialized, written to, and then dlopen
for this object is called again, and l_init_called is not true at
this point. Previously, this resulted in TLS being initialized
twice, discarding any interim writes (technically introducing a
use-after-free bug even).
This commit introduces an explicit per-object flag, l_tls_in_slotinfo.
It indicates whether _dl_add_to_slotinfo has been called for this
object. This flag is used to avoid double-initialization of TLS.
In update_tls_slotinfo, the first_static_tls micro-optimization
is removed because preserving the initalization flag for subsequent
use by the second loop for static TLS is a bit complicated, and
another per-object flag does not seem to be worth it. Furthermore,
the l_init_called flag is dropped from the second loop (for static
TLS initialization) because l_need_tls_init on its own prevents
double-initialization.
The remaining l_init_called usage in resize_scopes and update_scopes
is just an optimization due to the use of scope_has_map, so it is
not changed in this commit.
The isupper check ensures that libc.so.6 is TLS is not reverted.
Such a revert happens if l_need_tls_init is not cleared in
_dl_allocate_tls_init for the main_thread case, now that
l_init_called is not checked anymore in update_tls_slotinfo
in elf/dl-open.c.
Reported-by: Jonathon Anderson <janderson@rice.edu>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
While working on a patch to add support for the extensible rseq ABI, we
came across an issue where a new 'const' variable would be merged with
the existing '__rseq_size' variable. We tracked this to the use of
'-fmerge-all-constants' which allows the compiler to merge identical
constant variables. This means that all 'const' variables in a compile
unit that are of the same size and are initialized to the same value can
be merged.
In this specific case, on 32 bit systems 'unsigned int' and 'ptrdiff_t'
are both 4 bytes and initialized to 0 which should trigger the merge.
However for reasons we haven't delved into when the attribute 'section
(".data.rel.ro")' is added to the mix, only variables of the same exact
types are merged. As far as we know this behavior is not specified
anywhere and could change with a new compiler version, hence this patch.
Move the definitions of these variables into an assembler file and add
hidden writable aliases for internal use. This has the added bonus of
removing the asm workaround to set the values on rseq registration.
Tested on Debian 12 with GCC 12.2.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Starting with commit
59974938fe
elf/rtld: Count skipped environment variables for enable_secure
The new testcase elf/tst-tunables-enable_secure-env segfaults on s390 (31bit).
There _start parses the auxiliary vector for some additional checks.
Therefore it skips over the zeros after the environment variables ...
0x7fffac20: 0x7fffbd17 0x7fffbd32 0x7fffbd69 0x00000000
------------------------------------------------^^^last environment variable
... and then it parses the auxiliary vector and stops at AT_NULL.
0x7fffac30: 0x00000000 0x00000021 0x00000000 0x00000000
--------------------------------^^^AT_SYSINFO_EHDR--------------^^^AT_NULL
----------------^^^newp-----------------------------------------^^^oldp
Afterwards it tries to access AT_PHDR which points to somewhere and segfaults.
Due to not incorporating the skip_env variable in the computation of oldp
when shuffling down the auxv in rtld.c, it just copies one entry with AT_NULL
and value 0x00000021 and stops the loop. In reality we have skipped
GLIBC_TUNABLES environment variable (=> skip_env=1). Thus we should copy from
here:
0x7fffac40: 0x00000021 0x7ffff000 0x00000010 0x007fffff
----------------^^^fixed-oldp
This patch fixes the computation of oldp when shuffling down auxiliary vector.
It also adds some checks in the testcase. Those checks also fail on
s390x (64bit) and x86_64 without the fix.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It turns out that quite a few applications use bundled mallocs that
have been built to use global-dynamic TLS (instead of the recommended
initial-exec TLS). The previous workaround from
commit afe42e935b ("elf: Avoid some
free (NULL) calls in _dl_update_slotinfo") does not fix all
encountered cases unfortunatelly.
This change avoids the TLS generation update for recursive use
of TLS from a malloc that was called during a TLS update. This
is possible because an interposed malloc has a fixed module ID and
TLS slot. (It cannot be unloaded.) If an initially-loaded module ID
is encountered in __tls_get_addr and the dynamic linker is already
in the middle of a TLS update, use the outdated DTV, thus avoiding
another call into malloc. It's still necessary to update the
DTV to the most recent generation, to get out of the slow path,
which is why the check for recursion is needed.
The bookkeeping is done using a global counter instead of per-thread
flag because TLS access in the dynamic linker is tricky.
All this will go away once the dynamic linker stops using malloc
for TLS, likely as part of a change that pre-allocates all TLS
during pthread_create/dlopen.
Fixes commit d2123d6827 ("elf: Fix slow
tls access after dlopen [BZ #19924]").
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
The conditionals for several mtrace-based tests in catgets, elf, libio,
malloc, misc, nptl, posix, and stdio-common were incorrect leading to
test failures when bootstrapping glibc without perl.
The correct conditional for mtrace-based tests requires three checks:
first checking for run-built-tests, then build-shared, and lastly that
PERL is not equal to "no" (missing perl).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Also compile dl-misc.os with $(rtld-early-cflags) to avoid
Program received signal SIGILL, Illegal instruction.
0x00007ffff7fd36ea in _dl_strtoul (nptr=nptr@entry=0x7fffffffe2c9 "2",
endptr=endptr@entry=0x7fffffffd728) at dl-misc.c:156
156 bool positive = true;
(gdb) bt
#0 0x00007ffff7fd36ea in _dl_strtoul (nptr=nptr@entry=0x7fffffffe2c9 "2",
endptr=endptr@entry=0x7fffffffd728) at dl-misc.c:156
#1 0x00007ffff7fdb1a9 in tunable_initialize (
cur=cur@entry=0x7ffff7ffbc00 <tunable_list+2176>,
strval=strval@entry=0x7fffffffe2c9 "2", len=len@entry=1)
at dl-tunables.c:131
#2 0x00007ffff7fdb3a2 in parse_tunables (valstring=<optimized out>)
at dl-tunables.c:258
#3 0x00007ffff7fdb5d9 in __GI___tunables_init (envp=0x7fffffffdd58)
at dl-tunables.c:288
#4 0x00007ffff7fe44c3 in _dl_sysdep_start (
start_argptr=start_argptr@entry=0x7fffffffdcb0,
dl_main=dl_main@entry=0x7ffff7fe5f80 <dl_main>)
at ../sysdeps/unix/sysv/linux/dl-sysdep.c:110
#5 0x00007ffff7fe5cae in _dl_start_final (arg=0x7fffffffdcb0) at rtld.c:494
#6 _dl_start (arg=0x7fffffffdcb0) at rtld.c:581
#7 0x00007ffff7fe4b38 in _start ()
(gdb)
when setting GLIBC_TUNABLES in glibc compiled with APX.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
When using the glibc.rtld.enable_secure tunable we need to keep track of
the count of environment variables we skip due to __libc_enable_secure
being set and adjust the auxv section of the stack. This fixes an
assertion when running ld.so directly with glibc.rtld.enable_secure set.
Add a testcase that ensures the assert is not hit.
elf/rtld.c:1324 assert (auxv == sp + 1);
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This reverts commit a1735e0aa8.
The test failure is a real valgrind bug that needs to be fixed before
valgrind is usable with a glibc that has been built with
CC="gcc -march=x86-64-v3". The proposed valgrind patch teaches
valgrind to replace ld.so strcmp with an unoptimized scalar
implementation, thus avoiding any AVX2-related problems.
Valgrind bug: <https://bugs.kde.org/show_bug.cgi?id=485487>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
None of the existing tests seem to cover the case where
_dl_signal_error is called without an active error handler.
The new elf/tst-rtld-does-not-exist test triggers such a
_dl_signal_error call from _dl_map_object.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The aarch64 uses 'trad' for traditional tls and 'desc' for tls
descriptors, but unlike other targets it defaults to 'desc'. The
gnutls2 configure check does not set aarch64 as an ABI that uses
TLS descriptors, which then disable somes stests.
Also rename the internal machinery fron gnu2 to tls descriptors.
Checked on aarch64-linux-gnu.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
ARM _dl_tlsdesc_dynamic slow path has two issues:
* The ip/r12 is defined by AAPCS as a scratch register, and gcc is
used to save the stack pointer before on some function calls. So it
should also be saved/restored as well. It fixes the tst-gnu2-tls2.
* None of the possible VFP registers are saved/restored. ARM has the
additional complexity to have different VFP bank sizes (depending of
VFP support by the chip).
The tst-gnu2-tls2 test is extended to check for VFP registers, although
only for hardfp builds. Different than setcontext, _dl_tlsdesc_dynamic
does not have HWCAP_ARM_IWMMXT (I don't have a way to properly test
it and it is almost a decade since newer hardware was released).
With this patch there is no need to mark tst-gnu2-tls2 as XFAIL.
Checked on arm-linux-gnueabihf.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add a tunable for setting __libc_enable_secure to 1. Do not set
__libc_enable_secure to 0 if the tunable is set to 0. Ignore all
tunables if glib.rtld.enable_secure is set. One use-case for this
addition is to enable testing code paths that depend on
__libc_enable_secure being set without the need to use setxid binaries.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
When strcmp-avx2.S is used as the default, elf/tst-valgrind-smoke fails
with
==1272761== Conditional jump or move depends on uninitialised value(s)
==1272761== at 0x4022C98: strcmp (strcmp-avx2.S:462)
==1272761== by 0x400B05B: _dl_name_match_p (dl-misc.c:75)
==1272761== by 0x40085F3: _dl_map_object (dl-load.c:1966)
==1272761== by 0x401AEA4: map_doit (rtld.c:644)
==1272761== by 0x4001488: _dl_catch_exception (dl-catch.c:237)
==1272761== by 0x40015AE: _dl_catch_error (dl-catch.c:256)
==1272761== by 0x401B38F: do_preload (rtld.c:816)
==1272761== by 0x401C116: handle_preload_list (rtld.c:892)
==1272761== by 0x401EDF5: dl_main (rtld.c:1842)
==1272761== by 0x401A79E: _dl_sysdep_start (dl-sysdep.c:140)
==1272761== by 0x401BEEE: _dl_start_final (rtld.c:494)
==1272761== by 0x401BEEE: _dl_start (rtld.c:581)
==1272761== by 0x401AD87: ??? (in */elf/ld.so)
The assembly codes are:
0x0000000004022c80 <+144>: vmovdqu 0x20(%rdi),%ymm0
0x0000000004022c85 <+149>: vpcmpeqb 0x20(%rsi),%ymm0,%ymm1
0x0000000004022c8a <+154>: vpcmpeqb %ymm0,%ymm15,%ymm2
0x0000000004022c8e <+158>: vpandn %ymm1,%ymm2,%ymm1
0x0000000004022c92 <+162>: vpmovmskb %ymm1,%ecx
0x0000000004022c96 <+166>: inc %ecx
=> 0x0000000004022c98 <+168>: jne 0x4022c32 <strcmp+66>
strcmp-avx2.S has 32-byte vector loads of strings which are shorter than
32 bytes:
(gdb) p (char *) ($rdi + 0x20)
$6 = 0x1ffeffea20 "memcheck-amd64-linux.so"
(gdb) p (char *) ($rsi + 0x20)
$7 = 0x4832640 "core-amd64-linux.so"
(gdb) call (int) strlen ((char *) ($rsi + 0x20))
$8 = 19
(gdb) call (int) strlen ((char *) ($rdi + 0x20))
$9 = 23
(gdb)
It triggers the valgrind error. The above code is safe since the loads
don't cross the page boundary. Update tst-valgrind-smoke.sh to accept
an optional suppression file and pass a suppression file to valgrind when
strcmp-avx2.S is the default implementation of strcmp.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
Compiler generates the following instruction sequence for GNU2 dynamic
TLS access:
leaq tls_var@TLSDESC(%rip), %rax
call *tls_var@TLSCALL(%rax)
or
leal tls_var@TLSDESC(%ebx), %eax
call *tls_var@TLSCALL(%eax)
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS and RAX/EAX, are unchanged after CALL. When
_dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
path. __tls_get_addr is a normal function which doesn't preserve any
caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
caller-saved registers, but didn't preserve any other caller-saved
registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
XSAVE and XSAVEC to save and restore all caller-saved registers. This
fixes BZ #31372.
Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
to optimize elf_machine_runtime_setup.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
After 78ca44da01
("elf: Relocate libc.so early during startup and dlmopen (bug 31083)")
we start seeing tst-nodeps2 failures when building the testsuite with
--enable-hard-coded-path-in-tests.
When building the testsuite with --enable-hard-coded-path-in-tests
the tst-nodeps2-mod.so is not built with the required DT_RUNPATH
values and the test escapes the test framework and loads the system
libraries and aborts. The fix is to use the existing
$(link-test-modules-rpath-link) variable to set DT_RUNPATH correctly.
No regressions on x86_64.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This is a minimal regression test for bug 29039 which only affects
targets with TLSDESC and a reproducer requires that
1) Have modid gaps (closed modules) with old generation.
2) Update a DTV to a newer generation (needs a newer dlopen).
3) But do not update the closed gap entry in that DTV.
4) Reuse the modid gap for a new module (another dlopen).
5) Use dynamic TLSDESC in that new module with old generation (bug).
6) Access TLS via this TLSDESC and the now outdated DTV.
However step (3) in practice rarely happens: during DTV update the
entries for closed modids are initialized to "unallocated" and then
dynamic TLSDESC calls __tls_get_addr independently of its generation.
The only exception to this is DTV setup at thread creation (gaps are
initialized to NULL instead of unallocated) or DTV resize where the
gap entries are outside the previous DTV array (again NULL instead
of unallocated, and this requires loading > DTV_SURPLUS modules).
So the bug can only cause NULL (+ offset) dereference, not use after
free. And the easiest way to get (3) is via thread creation.
Note that step (5) requires that the newly loaded module has larger
TLS than the remaining optional static TLS. And for (6) there cannot
be other TLS access or dlopen in the thread that updates the DTV.
Tested on aarch64-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
If /tmp is mounted nosuid and make xcheck is run,
then tst-env-setuid fails UNSUPPORTED with "SGID failed: GID and EGID match"
and /var/tmp/tst-sonamemove-runmod1.so.profile is created.
If you then try to rerun the test with a suid mounted test-dir
(the SGID binary is created in test-dir which defaults to /tmp)
with something like that:
make tst-env-setuid-ENV="TMPDIR=..." t=elf/tst-env-setuid test
the test fails as the LD_PROFILE output file is still available
from the previous run.
Thus this patch removes the LD_PROFILE output file in parent
before spawning the SGID binary.
Even if LD_PROFILE is not supported anymore in static binaries,
use a different library and thus output file for tst-env-setuid
and tst-env-setuid-static in order to not interfere if both
tests are run in parallel.
Furthermore the checks in test_child are now more verbose.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The _dl_non_dynamic_init does not parse LD_PROFILE, which does not
enable profile for dlopen objects. Since dlopen is deprecated for
static objects, it is better to remove the support.
It also allows to trim down libc.a of profile support.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Loader does not ignore LD_PROFILE in secure-execution mode (different
than man-page states [1]), rather it uses a different path
(/var/profile) and ignore LD_PROFILE_OUTPUT.
Allowing secure-execution profiling is already a non good security
boundary, since it enables different code paths and extra OS access by
the process. But by ignoring LD_PROFILE_OUTPUT, the resulting profile
file might also be acceded in a racy manner since the file name does not
use any process-specific information (such as pid, timing, etc.).
Another side-effect is it forces lazy binding even on libraries that
might be with DF_BIND_NOW.
[1] https://man7.org/linux/man-pages/man8/ld.so.8.html
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Some environment variables allow alteration of allocator behavior
across setuid boundaries, where a setuid program may ignore the
tunable, but its non-setuid child can read it and adjust the memory
allocator behavior accordingly.
Most library behavior tunings is limited to the current process and does
not bleed in scope; so it is unclear how pratical this misfeature is.
If behavior change across privilege boundaries is desirable, it would be
better done with a wrapper program around the non-setuid child that sets
these envvars, instead of using the setuid process as the messenger.
The patch as fixes tst-env-setuid, where it fail if any unsecvars is
set. It also adds a dynamic test, although it requires
--enable-hardcoded-path-in-tests so kernel correctly sets the setuid
bit (using the loader command directly would require to set the
setuid bit on the loader itself, which is not a usual deployment).
Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The tunable privilege levels were a retrofit to try and keep the malloc
tunable environment variables' behavior unchanged across security
boundaries. However, CVE-2023-4911 shows how tricky can be
tunable parsing in a security-sensitive environment.
Not only parsing, but the malloc tunable essentially changes some
semantics on setuid/setgid processes. Although it is not a direct
security issue, allowing users to change setuid/setgid semantics is not
a good security practice, and requires extra code and analysis to check
if each tunable is safe to use on all security boundaries.
It also means that security opt-in features, like aarch64 MTE, would
need to be explicit enabled by an administrator with a wrapper script
or with a possible future system-wide tunable setting.
Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
The open_path stops if a relative path in search path contains a
component that is a non directory (for instance, if the component
is an existing file).
For instance:
$ cat > lib.c <<EOF
> void foo (void) {}
> EOF
$ gcc -shared -fPIC -o lib.so lib.c
$ cat > main.c <<EOF
extern void foo ();
int main () { foo (); return 0; }
EOF
$ gcc -o main main.c lib.so
$ LD_LIBRARY_PATH=. ./main
$ LD_LIBRARY_PATH=non-existing/path:. ./main
$ LD_LIBRARY_PATH=$(pwd)/main:. ./main
$ LD_LIBRARY_PATH=./main:. ./main
./main: error while loading shared libraries: lib.so: cannot open shared object file: No such file or directory
The invalid './main' should be ignored as a non-existent one,
instead as a valid but non accessible file.
Absolute paths do not trigger this issue because their status are
initialized as 'unknown' and open_path check if this is a directory.
Checked on x86_64-linux-gnu.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
The PR_SET_VMA_ANON_NAME support is only enabled through a configurable
kernel switch, mainly because assigning a name to a
anonymous virtual memory area might prevent that area from being
merged with adjacent virtual memory areas.
For instance, with the following code:
void *p1 = mmap (NULL,
1024 * 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1,
0);
void *p2 = mmap (p1 + (1024 * 4096),
1024 * 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1,
0);
The kernel will potentially merge both mappings resulting in only one
segment of size 0x800000. If the segment is names with
PR_SET_VMA_ANON_NAME with different names, it results in two mappings.
Although this will unlikely be an issue for pthread stacks and malloc
arenas (since for pthread stacks the guard page will result in
a PROT_NONE segment, similar to the alignment requirement for the arena
block), it still might prevent the mmap memory allocated for detail
malloc.
There is also another potential scalability issue, where the prctl
requires
to take the mmap global lock which is still not fully fixed in Linux
[1] (for pthread stacks and arenas, it is mitigated by the stack
cached and the arena reuse).
So this patch disables anonymous mapping annotations as default and
add a new tunable, glibc.mem.decorate_maps, can be used to enable
it.
[1] https://lwn.net/Articles/906852/
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Add anonymous mmap annotations on loader malloc, malloc when it
allocates memory with mmap, and on malloc arena. The /proc/self/maps
will now print:
[anon: glibc: malloc arena]
[anon: glibc: malloc]
[anon: glibc: loader malloc]
On arena allocation, glibc annotates only the read/write mapping.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Linux 4.5 removed thread stack annotations due to the complexity of
computing them [1], and Linux added PR_SET_VMA_ANON_NAME on 5.17
as a way to name anonymous virtual memory areas.
This patch adds decoration on the stack created and used by
pthread_create, for glibc crated thread stack the /proc/self/maps will
now show:
[anon: glibc: pthread stack: <tid>]
And for user-provided stacks:
[anon: glibc: pthread user stack: <tid>]
The guard page is not decorated, and the mapping name is cleared when
the thread finishes its execution (so the cached stack does not have any
name associated).
Checked on x86_64-linux-gnu aarch64 aarch64-linux-gnu.
[1] 65376df582
Co-authored-by: Ian Rogers <irogers@google.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
All the crypt related functions, cryptographic algorithms, and
make requirements are removed, with only the exception of md5
implementation which is moved to locale folder since it is
required by localedef for integrity protection (libc's
locale-reading code does not check these, but localedef does
generate them).
Besides thec code itself, both internal documentation and the
manual is also adjusted. This allows to remove both --enable-crypt
and --enable-nss-crypt configure options.
Checked with a build for all affected ABIs.
Co-authored-by: Zack Weinberg <zack@owlfolio.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Parts of elf/tst-rtld-list-diagnostics.py have been copied from
scripts/tst-ld-trace.py.
The abnf module is entirely optional and used to verify the
ABNF grammar as included in the manual.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Sort Makefile variables using scrips/sort-makefile-lines.py.
No code generation changes observed in non-test binary artifacts.
No regressions on x86_64 and i686.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Previously, after destructors for a DSO have been invoked, ld.so refused
to bind against that DSO in all cases. Relax this restriction somewhat
if the referencing object is itself a DSO that is being unloaded. This
assumes that the symbol reference is not going to be stored anywhere.
The situation in the test case can arise fairly easily with C++ and
objects that are built with different optimization levels and therefore
define different functions with vague linkage.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Applying this commit results in bit-identical libc.so.6.
The elf/ld-linux-x86-64.so.2 does change, but only in .note.gnu.build-id
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch checks _dl_debug_vdprintf, by passing various inputs to
_dl_dprintf and comparing the output with invocations of snprintf.
Signed-off-by: Roy Eldar <royeldar0@gmail.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
When dlopen is being called, efforts have been made to improve
future lookup performance. This includes marking a search path
as non-existent using `stat`. However, if the root directory
is given as a search path, there exists a bug which erroneously
marks it as non-existing.
The bug is reproduced under the following sequence:
1. dlopen is called to open a shared library, with at least:
1) a dependency 'A.so' not directly under the '/' directory
(e.g. /lib/A.so), and
2) another dependency 'B.so' resides in '/'.
2. for this bug to reproduce, 'A.so' should be searched *before* 'B.so'.
3. it first tries to find 'A.so' in /, (e.g. /A.so):
- this will (obviously) fail,
- since it's the first time we have seen the '/' directory,
its 'status' is 'unknown'.
4. `buf[buflen - namelen - 1] = '\0'` is executed:
- it intends to remove the leaf and its final slash,
- because of the speciality of '/', its buflen == namelen + 1,
- it erroneously clears the entire buffer.
6. it then calls 'stat' with the empty buffer:
- which will result in an error.
7. so it marks '/' as 'nonexisting', future lookups will not consider
this path.
8. while /B.so *does* exist, failure to look it up in the '/'
directory leads to a 'cannot open shared object file' error.
This patch fixes the bug by preventing 'buflen', an index to put '\0',
from being set to 0, so that the root '/' is always kept.
Relative search paths are always considered as 'existing' so this
wont be affected.
Writeup by Moody Liu <mooodyhunter@outlook.com>
Suggested-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Qixing ksyx Xue <qixingxue@outlook.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Sort tests against updated scripts/sort-makefile-lines.py.
No changes in generated code.
No regressions on x86_64 and i686.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Fix list terminator whitspace.
Sort using scripts/sort-makefile-lines.py.
No code generation changes observed in binary artifacts.
No regressions on x86_64 and i686.