After 78ca44da01
("elf: Relocate libc.so early during startup and dlmopen (bug 31083)")
we start seeing tst-nodeps2 failures when building the testsuite with
--enable-hard-coded-path-in-tests.
When building the testsuite with --enable-hard-coded-path-in-tests
the tst-nodeps2-mod.so is not built with the required DT_RUNPATH
values and the test escapes the test framework and loads the system
libraries and aborts. The fix is to use the existing
$(link-test-modules-rpath-link) variable to set DT_RUNPATH correctly.
No regressions on x86_64.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Add ELF_DYNAMIC_AFTER_RELOC to allow target specific processing after
relocation.
For x86-64, add
#define DT_X86_64_PLT (DT_LOPROC + 0)
#define DT_X86_64_PLTSZ (DT_LOPROC + 1)
#define DT_X86_64_PLTENT (DT_LOPROC + 3)
1. DT_X86_64_PLT: The address of the procedure linkage table.
2. DT_X86_64_PLTSZ: The total size, in bytes, of the procedure linkage
table.
3. DT_X86_64_PLTENT: The size, in bytes, of a procedure linkage table
entry.
With the r_addend field of the R_X86_64_JUMP_SLOT relocation set to the
memory offset of the indirect branch instruction.
Define ELF_DYNAMIC_AFTER_RELOC for x86-64 to rewrite the PLT section
with direct branch after relocation when the lazy binding is disabled.
PLT rewrite is disabled by default since SELinux may disallow modifying
code pages and ld.so can't detect it in all cases. Use
$ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=1
to enable PLT rewrite with 32-bit direct jump at run-time or
$ export GLIBC_TUNABLES=glibc.cpu.plt_rewrite=2
to enable PLT rewrite with 32-bit direct jump and on APX processors with
64-bit absolute jump at run-time.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
I've updated copyright dates in glibc for 2024. This is the patch for
the changes not generated by scripts/update-copyrights and subsequent
build / regeneration of generated files.
This is a minimal regression test for bug 29039 which only affects
targets with TLSDESC and a reproducer requires that
1) Have modid gaps (closed modules) with old generation.
2) Update a DTV to a newer generation (needs a newer dlopen).
3) But do not update the closed gap entry in that DTV.
4) Reuse the modid gap for a new module (another dlopen).
5) Use dynamic TLSDESC in that new module with old generation (bug).
6) Access TLS via this TLSDESC and the now outdated DTV.
However step (3) in practice rarely happens: during DTV update the
entries for closed modids are initialized to "unallocated" and then
dynamic TLSDESC calls __tls_get_addr independently of its generation.
The only exception to this is DTV setup at thread creation (gaps are
initialized to NULL instead of unallocated) or DTV resize where the
gap entries are outside the previous DTV array (again NULL instead
of unallocated, and this requires loading > DTV_SURPLUS modules).
So the bug can only cause NULL (+ offset) dereference, not use after
free. And the easiest way to get (3) is via thread creation.
Note that step (5) requires that the newly loaded module has larger
TLS than the remaining optional static TLS. And for (6) there cannot
be other TLS access or dlopen in the thread that updates the DTV.
Tested on aarch64-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
If /tmp is mounted nosuid and make xcheck is run,
then tst-env-setuid fails UNSUPPORTED with "SGID failed: GID and EGID match"
and /var/tmp/tst-sonamemove-runmod1.so.profile is created.
If you then try to rerun the test with a suid mounted test-dir
(the SGID binary is created in test-dir which defaults to /tmp)
with something like that:
make tst-env-setuid-ENV="TMPDIR=..." t=elf/tst-env-setuid test
the test fails as the LD_PROFILE output file is still available
from the previous run.
Thus this patch removes the LD_PROFILE output file in parent
before spawning the SGID binary.
Even if LD_PROFILE is not supported anymore in static binaries,
use a different library and thus output file for tst-env-setuid
and tst-env-setuid-static in order to not interfere if both
tests are run in parallel.
Furthermore the checks in test_child are now more verbose.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The loader now warns for invalid and out-of-range tunable values. The
patch also fixes the parsing of size_t maximum values, where
_dl_strtoul was failing for large values close to SIZE_MAX.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The tunable parsing duplicates the tunable environment variable so it
null-terminates each one since it simplifies the later parsing. It has
the drawback of adding another point of failure (__minimal_malloc
failing), and the memory copy requires tuning the compiler to avoid mem
operations calls.
The parsing now tracks the tunable start and its size. The
dl-tunable-parse.h adds helper functions to help parsing, like a strcmp
that also checks for size and an iterator for suboptions that are
comma-separated (used on hwcap parsing by x86, powerpc, and s390x).
Since the environment variable is allocated on the stack by the kernel,
it is safe to keep the references to the suboptions for later parsing
of string tunables (as done by set_hwcaps by multiple architectures).
Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, and
aarch64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
GLRO(dl_lazy) is used to set the parameters for the early
_dl_relocate_object call, so the consider_profiling setting has to
be applied before the call.
Fixes commit 78ca44da01 ("elf: Relocate
libc.so early during startup and dlmopen (bug 31083)").
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
It splits between process_envvars_secure and process_envvars_default,
with the former used to process arguments for __libc_enable_secure.
It does not have any semantic change, just simplify the code so there
is no need to handle __libc_enable_secure on each len switch.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
To avoid any environment variable to change setuid binaries
semantics.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Loader already ignores LD_DEBUG, LD_DEBUG_OUTPUT, and
LD_TRACE_LOADED_OBJECTS. Both LD_WARN and LD_VERBOSE are similar to
LD_DEBUG, in the sense they enable additional checks and debug
information, so it makes sense to disable them.
Also add both LD_VERBOSE and LD_WARN on filtered environment variables
for setuid binaries.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The patch adds two new macros, TUNABLE_GET_DEFAULT and TUNABLE_IS_INITIALIZED,
here the former get the default value with a signature similar to
TUNABLE_GET, while the later returns whether the tunable was set by
the environment variable.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Tested-by: Zhangfei Gao <zhangfei.gao@linaro.org>
_dl_assign_tls_modid() assigns a slotinfo entry for a new module, but
does *not* do anything to the generation counter. The first time this
happens, the generation is zero and map_generation() returns the current
generation to be used during relocation processing. However, if
a slotinfo entry is later reused, it will already have a generation
assigned. If this generation has fallen behind the current global max
generation, then this causes an obsolete generation to be assigned
during relocation processing, as map_generation() returns this
generation if nonzero. _dl_add_to_slotinfo() eventually resets the
generation, but by then it is too late. This causes DTV updates to be
skipped, leading to NULL or broken TLS slot pointers and segfaults.
Fix this by resetting the generation to zero in _dl_assign_tls_modid(),
so it behaves the same as the first time a slot is assigned.
_dl_add_to_slotinfo() will still assign the correct static generation
later during module load, but relocation processing will no longer use
an obsolete generation.
Note that slotinfo entry (aka modid) reuse typically happens after a
dlclose and only TLS access via dynamic tlsdesc is affected. Because
tlsdesc is optimized to use the optional part of static TLS, dynamic
tlsdesc can be avoided by increasing the glibc.rtld.optional_static_tls
tunable to a large enough value, or by LD_PRELOAD-ing the affected
modules.
Fixes bug 29039.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
This is just a minor optimization. It also makes it more obvious that
_dl_relocate_object can be called multiple times.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The _dl_non_dynamic_init does not parse LD_PROFILE, which does not
enable profile for dlopen objects. Since dlopen is deprecated for
static objects, it is better to remove the support.
It also allows to trim down libc.a of profile support.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Loader does not ignore LD_PROFILE in secure-execution mode (different
than man-page states [1]), rather it uses a different path
(/var/profile) and ignore LD_PROFILE_OUTPUT.
Allowing secure-execution profiling is already a non good security
boundary, since it enables different code paths and extra OS access by
the process. But by ignoring LD_PROFILE_OUTPUT, the resulting profile
file might also be acceded in a racy manner since the file name does not
use any process-specific information (such as pid, timing, etc.).
Another side-effect is it forces lazy binding even on libraries that
might be with DF_BIND_NOW.
[1] https://man7.org/linux/man-pages/man8/ld.so.8.html
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The strlen might trigger and invalid GOT entry if it used before
the process is self-relocated (for instance on dl-tunables if any
error occurs).
For i386, _dl_writev with PIE requires to use the old 'int $0x80'
syscall mode because the calling the TLS register (gs) is not yet
initialized.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Instead of ignoring ill-formatted tunable strings, first, check all the
tunable definitions are correct and then set each tunable value. It
means that partially invalid strings, like "key1=value1:key2=key2=value'
or 'key1=value':key2=value2=value2' do not enable 'key1=value1'. It
avoids possible user-defined errors in tunable definitions.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Tunable definitions with more than one '=' on are parsed and enabled,
and any subsequent '=' are ignored. It means that tunables in the form
'tunable=tunable=value' or 'tunable=value=value' are handled as
'tunable=value'. These inputs are likely user input errors, which
should not be accepted.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Some environment variables allow alteration of allocator behavior
across setuid boundaries, where a setuid program may ignore the
tunable, but its non-setuid child can read it and adjust the memory
allocator behavior accordingly.
Most library behavior tunings is limited to the current process and does
not bleed in scope; so it is unclear how pratical this misfeature is.
If behavior change across privilege boundaries is desirable, it would be
better done with a wrapper program around the non-setuid child that sets
these envvars, instead of using the setuid process as the messenger.
The patch as fixes tst-env-setuid, where it fail if any unsecvars is
set. It also adds a dynamic test, although it requires
--enable-hardcoded-path-in-tests so kernel correctly sets the setuid
bit (using the loader command directly would require to set the
setuid bit on the loader itself, which is not a usual deployment).
Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The tunable privilege levels were a retrofit to try and keep the malloc
tunable environment variables' behavior unchanged across security
boundaries. However, CVE-2023-4911 shows how tricky can be
tunable parsing in a security-sensitive environment.
Not only parsing, but the malloc tunable essentially changes some
semantics on setuid/setgid processes. Although it is not a direct
security issue, allowing users to change setuid/setgid semantics is not
a good security practice, and requires extra code and analysis to check
if each tunable is safe to use on all security boundaries.
It also means that security opt-in features, like aarch64 MTE, would
need to be explicit enabled by an administrator with a wrapper script
or with a possible future system-wide tunable setting.
Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
setuid/setgid process now ignores any glibc tunables, and filters out
all environment variables that might changes its behavior. This patch
also adds GLIBC_TUNABLES, so any spawned process by setuid/setgid
processes should set tunable explicitly.
Checked on x86_64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Since malloc debug support moved to a different library
(libc_malloc_debug.so), the glibc.malloc.check requires preloading the
debug library to enable it. It means that suid-debug support has not
been working since 2.34.
To restore its support, it would require to add additional information
and parsing to where to find libc_malloc_debug.so.
It is one thing less that might change AT_SECURE binaries' behavior
due to environment configurations.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Arguments to a memchr call were swapped, causing incorrect skipping
of files.
Files related to dpkg have different names: they actually end in
.dpkg-new and .dpkg-tmp, not .tmp as I mistakenly assumed.
Fixes commit 2aa0974d25 ("elf: ldconfig should skip
temporary files created by package managers").
So that the test is harder to confuse with elf/tst-execstack
(although the tests are supposed to be the same).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The force_first parameter was ineffective because the dlclose'd
object was not necessarily the first in the maps array. Also
enable force_first handling unconditionally, regardless of namespace.
The initial object in a namespace should be destructed first, too.
The _dl_sort_maps_dfs function had early returns for relocation
dependency processing which broke force_first handling, too, and
this is fixed in this change as well.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The open_path stops if a relative path in search path contains a
component that is a non directory (for instance, if the component
is an existing file).
For instance:
$ cat > lib.c <<EOF
> void foo (void) {}
> EOF
$ gcc -shared -fPIC -o lib.so lib.c
$ cat > main.c <<EOF
extern void foo ();
int main () { foo (); return 0; }
EOF
$ gcc -o main main.c lib.so
$ LD_LIBRARY_PATH=. ./main
$ LD_LIBRARY_PATH=non-existing/path:. ./main
$ LD_LIBRARY_PATH=$(pwd)/main:. ./main
$ LD_LIBRARY_PATH=./main:. ./main
./main: error while loading shared libraries: lib.so: cannot open shared object file: No such file or directory
The invalid './main' should be ignored as a non-existent one,
instead as a valid but non accessible file.
Absolute paths do not trigger this issue because their status are
initialized as 'unknown' and open_path check if this is a directory.
Checked on x86_64-linux-gnu.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
The PR_SET_VMA_ANON_NAME support is only enabled through a configurable
kernel switch, mainly because assigning a name to a
anonymous virtual memory area might prevent that area from being
merged with adjacent virtual memory areas.
For instance, with the following code:
void *p1 = mmap (NULL,
1024 * 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1,
0);
void *p2 = mmap (p1 + (1024 * 4096),
1024 * 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1,
0);
The kernel will potentially merge both mappings resulting in only one
segment of size 0x800000. If the segment is names with
PR_SET_VMA_ANON_NAME with different names, it results in two mappings.
Although this will unlikely be an issue for pthread stacks and malloc
arenas (since for pthread stacks the guard page will result in
a PROT_NONE segment, similar to the alignment requirement for the arena
block), it still might prevent the mmap memory allocated for detail
malloc.
There is also another potential scalability issue, where the prctl
requires
to take the mmap global lock which is still not fully fixed in Linux
[1] (for pthread stacks and arenas, it is mitigated by the stack
cached and the arena reuse).
So this patch disables anonymous mapping annotations as default and
add a new tunable, glibc.mem.decorate_maps, can be used to enable
it.
[1] https://lwn.net/Articles/906852/
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Add anonymous mmap annotations on loader malloc, malloc when it
allocates memory with mmap, and on malloc arena. The /proc/self/maps
will now print:
[anon: glibc: malloc arena]
[anon: glibc: malloc]
[anon: glibc: loader malloc]
On arena allocation, glibc annotates only the read/write mapping.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Linux 4.5 removed thread stack annotations due to the complexity of
computing them [1], and Linux added PR_SET_VMA_ANON_NAME on 5.17
as a way to name anonymous virtual memory areas.
This patch adds decoration on the stack created and used by
pthread_create, for glibc crated thread stack the /proc/self/maps will
now show:
[anon: glibc: pthread stack: <tid>]
And for user-provided stacks:
[anon: glibc: pthread user stack: <tid>]
The guard page is not decorated, and the mapping name is cleared when
the thread finishes its execution (so the cached stack does not have any
name associated).
Checked on x86_64-linux-gnu aarch64 aarch64-linux-gnu.
[1] 65376df582
Co-authored-by: Ian Rogers <irogers@google.com>
Reviewed-by: DJ Delorie <dj@redhat.com>
It adds NT_X86_SHSTK (2fab02b25ae7cf5), NT_RISCV_CSR/NT_RISCV_VECTOR
(9300f00439743c4), and NT_LOONGARCH_HW_BREAK/NT_LOONGARCH_HW_WATCH
(1a69f7a161a78ae).
All the crypt related functions, cryptographic algorithms, and
make requirements are removed, with only the exception of md5
implementation which is moved to locale folder since it is
required by localedef for integrity protection (libc's
locale-reading code does not check these, but localedef does
generate them).
Besides thec code itself, both internal documentation and the
manual is also adjusted. This allows to remove both --enable-crypt
and --enable-nss-crypt configure options.
Checked with a build for all affected ABIs.
Co-authored-by: Zack Weinberg <zack@owlfolio.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This avoids crashes due to partially written files, after a package
update is interrupted.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This reverts commit 6985865bc3.
Reason for revert:
The commit changes the order of ELF destructor calls too much relative
to what applications expect or can handle. In particular, during
process exit and _dl_fini, after the revert commit, we no longer call
the destructors of the main program first; that only happens after
some dlopen'ed objects have been destructed. This robs applications
of an opportunity to influence destructor order by calling dlclose
explicitly from the main program's ELF destructors. A couple of
different approaches involving reverse constructor order were tried,
and none of them worked really well. It seems we need to keep the
dependency sorting in _dl_fini.
There is also an ambiguity regarding nested dlopen calls from ELF
constructors: Should those destructors run before or after the object
that called dlopen? Commit 6985865bc3 used reverse order
of the start of ELF constructor calls for destructors, but arguably
using completion of constructors is more correct. However, that alone
is not sufficient to address application compatibility issues (it
does not change _dl_fini ordering at all).
The string parsing routine may end up writing beyond bounds of tunestr
if the input tunable string is malformed, of the form name=name=val.
This gets processed twice, first as name=name=val and next as name=val,
resulting in tunestr being name=name=val:name=val, thus overflowing
tunestr.
Terminate the parsing loop at the first instance itself so that tunestr
does not overflow.
This also fixes up tst-env-setuid-tunables to actually handle failures
correct and add new tests to validate the fix for this CVE.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>