This restore the 2.33 semantic for arena_get2. It was changed by
11a02b035b to avoid arena_get2 call malloc (back when __get_nproc
was refactored to use an scratch_buffer - 903bc7dcc2). The
__get_nproc was refactored over then and now it also avoid to call
malloc.
The 11a02b035b did not take in consideration any performance
implication, which should have been discussed properly. The
__get_nprocs_sched is still used as a fallback mechanism if procfs
and sysfs is not acessible.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
These were broken by the new atan2 functions, as they were only
set up for univariate functions. Arity is now detected from the
input file - this revealed a mistake that the double-precision
inputs were being used for both single- and double-precision
routines, which is now remedied.
The _dl_non_dynamic_init does not parse LD_PROFILE, which does not
enable profile for dlopen objects. Since dlopen is deprecated for
static objects, it is better to remove the support.
It also allows to trim down libc.a of profile support.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Using the memcmp symbol directly allows the compile to inline the
memcmp calls (especially because _dl_tunable_set_hwcaps uses constants
values), generating better code.
Checked with tst-tunables on s390x-linux-gnu (qemu system).
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The dl-symbol-redir-ifunc.h redirects compiler-generated libcalls to
arch-specific memory implementations to avoid ifunc calls where it is not
yet possible. The memcmp-isa-default-impl.h aims to fix the same issue
by calling the specific memset implementation directly.
Using the memcmp symbol directly allows the compiler to inline the memset
calls (especially because _dl_tunable_set_hwcaps uses constants values),
generating better code.
Checked on x86_64-linux-gnu.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The strlen might trigger and invalid GOT entry if it used before
the process is self-relocated (for instance on dl-tunables if any
error occurs).
For i386, _dl_writev with PIE requires to use the old 'int $0x80'
syscall mode because the calling the TLS register (gs) is not yet
initialized.
Checked on x86_64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Some environment variables allow alteration of allocator behavior
across setuid boundaries, where a setuid program may ignore the
tunable, but its non-setuid child can read it and adjust the memory
allocator behavior accordingly.
Most library behavior tunings is limited to the current process and does
not bleed in scope; so it is unclear how pratical this misfeature is.
If behavior change across privilege boundaries is desirable, it would be
better done with a wrapper program around the non-setuid child that sets
these envvars, instead of using the setuid process as the messenger.
The patch as fixes tst-env-setuid, where it fail if any unsecvars is
set. It also adds a dynamic test, although it requires
--enable-hardcoded-path-in-tests so kernel correctly sets the setuid
bit (using the loader command directly would require to set the
setuid bit on the loader itself, which is not a usual deployment).
Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The tunable privilege levels were a retrofit to try and keep the malloc
tunable environment variables' behavior unchanged across security
boundaries. However, CVE-2023-4911 shows how tricky can be
tunable parsing in a security-sensitive environment.
Not only parsing, but the malloc tunable essentially changes some
semantics on setuid/setgid processes. Although it is not a direct
security issue, allowing users to change setuid/setgid semantics is not
a good security practice, and requires extra code and analysis to check
if each tunable is safe to use on all security boundaries.
It also means that security opt-in features, like aarch64 MTE, would
need to be explicit enabled by an administrator with a wrapper script
or with a possible future system-wide tunable setting.
Co-authored-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: DJ Delorie <dj@redhat.com>
setuid/setgid process now ignores any glibc tunables, and filters out
all environment variables that might changes its behavior. This patch
also adds GLIBC_TUNABLES, so any spawned process by setuid/setgid
processes should set tunable explicitly.
Checked on x86_64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Linux 6.6 (09da082b07bbae1c) added support for fchmodat2, which has
similar semantics as fchmodat with an extra flag argument. This
allows fchmodat to implement AT_SYMLINK_NOFOLLOW and AT_EMPTY_PATH
without the need for procfs.
The syscall is registered on all architectures (with value of 452
except on alpha which is 562, commit 78252deb023cf087).
The tst-lchmod.c requires a small fix where fchmodat checks two
contradictory assertions ('(st.st_mode & 0777) == 2' and
'(st.st_mode & 0777) == 3').
Checked on x86_64-linux-gnu on a 6.6 kernel.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
strrchr-evex-base used `vpcompress{b|d}` in the page cross logic but
was missing the CPU_FEATURE checks for VBMI2 in the
ifunc/ifunc-impl-list.
The fix is either to add those checks or change the logic to not use
`vpcompress{b|d}`. Choosing the latter here so that the strrchr-evex
implementation is usable on SKX.
New implementation is a bit slower, but this is in a cold path so its
probably okay.
Fixes commit a61933fe27 ("sparc: Remove bzero optimization") that
after moving code jumped to the wrong label 4.
Verfied by successfully running string/test-memset on sparc32.
Signed-off-by: Andreas Larsson <andreas@gaisler.com>
Signed-off-by: Ludwig Rydberg <ludwig.rydberg@gaisler.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The latest implementations of memcpy are actually faster than the Falkor
implementations [1], so remove the falkor/phecda ifuncs for memcpy and
the now unused IS_FALKOR/IS_PHECDA defines.
[1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add a specialized memset for the common ZVA size of 64 to avoid the
overhead of reading the ZVA size. Since the code is identical to
__memset_falkor, remove the latter.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Cleanup emag memset - merge the memset_base64.S file, remove
the unused ZVA code (since it is disabled on emag).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The PR_SET_VMA_ANON_NAME support is only enabled through a configurable
kernel switch, mainly because assigning a name to a
anonymous virtual memory area might prevent that area from being
merged with adjacent virtual memory areas.
For instance, with the following code:
void *p1 = mmap (NULL,
1024 * 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1,
0);
void *p2 = mmap (p1 + (1024 * 4096),
1024 * 4096,
PROT_READ | PROT_WRITE,
MAP_PRIVATE | MAP_ANONYMOUS,
-1,
0);
The kernel will potentially merge both mappings resulting in only one
segment of size 0x800000. If the segment is names with
PR_SET_VMA_ANON_NAME with different names, it results in two mappings.
Although this will unlikely be an issue for pthread stacks and malloc
arenas (since for pthread stacks the guard page will result in
a PROT_NONE segment, similar to the alignment requirement for the arena
block), it still might prevent the mmap memory allocated for detail
malloc.
There is also another potential scalability issue, where the prctl
requires
to take the mmap global lock which is still not fully fixed in Linux
[1] (for pthread stacks and arenas, it is mitigated by the stack
cached and the arena reuse).
So this patch disables anonymous mapping annotations as default and
add a new tunable, glibc.mem.decorate_maps, can be used to enable
it.
[1] https://lwn.net/Articles/906852/
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Linux 5.17 added support to naming anonymous virtual memory areas
through the prctl syscall. The __set_vma_name is a wrapper to avoid
optimizing the prctl call if the kernel does not support it.
If the kernel does not support PR_SET_VMA_ANON_NAME, prctl returns
EINVAL. And it also returns the same error for an invalid argument.
Since it is an internal-only API, it assumes well-formatted input:
aligned START, with (START, START+LEN) being a valid memory range,
and NAME with a limit of 80 characters without an invalid one
("\\`$[]").
Reviewed-by: DJ Delorie <dj@redhat.com>
When invoking sem_open with O_CREAT as one of its flags, we'll end up
in the second part of sem_open's "if ((oflag & O_CREAT) == 0 || (oflag
& O_EXCL) == 0)", which means that we don't expect the semaphore file
to exist.
In that part, open_flags is initialized as "O_RDWR | O_CREAT | O_EXCL
| O_CLOEXEC" and there's an attempt to open(2) the file, which will
likely fail because it won't exist. After that first (expected)
failure, some cleanup is done and we go back to the label "try_again",
which lives in the first part of the aforementioned "if".
The problem is that, in that part of the code, we expect the semaphore
file to exist, and as such O_CREAT (this time the flag we pass to
open(2)) needs to be cleaned from open_flags, otherwise we'll see
another failure (this time unexpected) when trying to open the file,
which will lead the call to sem_open to fail as well.
This can cause very strange bugs, especially with OpenMPI, which makes
extensive use of semaphores.
Fix the bug by simplifying the logic when choosing open(2) flags and
making sure O_CREAT is not set when the semaphore file is expected to
exist.
A regression test for this issue would require a complex and cpu time
consuming logic, since to trigger the wrong code path is not
straightforward due the racy condition. There is a somewhat reliable
reproducer in the bug, but it requires using OpenMPI.
This resolves BZ #30789.
See also: https://bugs.launchpad.net/ubuntu/+source/h5py/+bug/2031912
Signed-off-by: Sergio Durigan Junior <sergiodj@sergiodj.net>
Co-Authored-By: Simon Chopin <simon.chopin@canonical.com>
Co-Authored-By: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Fixes: 533deafbdf ("Use O_CLOEXEC in more places (BZ #15722)")
Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than
ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename
strlen_mte to strlen_generic. Remove rtld-memset.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Commit 7f602256ab moved the tst-rfc3484*
tests from posix/ to nss/, but didn't correct references to point to
their new subdir when building for mach and arm. This commit fixes
that.
Tested with build-many-glibcs.sh for i686-gnu.
The prototype is:
void __memswap (void *restrict p1, void *restrict p2, size_t n)
The function swaps the content of two memory blocks P1 and P2 of
len N. Memory overlap is NOT handled.
It will be used on qsort optimization.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
All the crypt related functions, cryptographic algorithms, and
make requirements are removed, with only the exception of md5
implementation which is moved to locale folder since it is
required by localedef for integrity protection (libc's
locale-reading code does not check these, but localedef does
generate them).
Besides thec code itself, both internal documentation and the
manual is also adjusted. This allows to remove both --enable-crypt
and --enable-nss-crypt configure options.
Checked with a build for all affected ABIs.
Co-authored-by: Zack Weinberg <zack@owlfolio.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The libcrypt was maked to be phase out on 2.38, and a better project
already exist that provide both compatibility and better API
(libxcrypt). The sparc optimizations add the burden to extra
build-many-glibcs.py configurations.
Checked on sparc64 and sparcv9.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for
memcpy, memmove and memset (use .inst for now so it works with all binutils
versions without needing complex configure and conditional compilation).
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
getaddrinfo is an entry point for nss functionality. This commit moves
it from 'sysdeps/posix' to 'nss', gets rid of the stub in 'posix', and
moves all associated tests as well.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The arguments for "expected" and "got" are mismatched. Furthermore
this patch is dumping both values as hex.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
If feenableexcept or fedisableexcept gets excepts=FE_INVALID=0x80
as input, we have a signed left shift: 0x80 << 24 which is not
representable as int and thus is undefined behaviour according to
C standard.
This patch casts excepts as unsigned int before shifting, which is
defined.
For me, the observed undefined behaviour is that the shift is done
with "unsigned"-instructions, which is exactly what we want.
Furthermore, I don't get any exception-flags.
After the fix, the code is using the same instruction sequence as
before.
This reverts commit 6985865bc3.
Reason for revert:
The commit changes the order of ELF destructor calls too much relative
to what applications expect or can handle. In particular, during
process exit and _dl_fini, after the revert commit, we no longer call
the destructors of the main program first; that only happens after
some dlopen'ed objects have been destructed. This robs applications
of an opportunity to influence destructor order by calling dlclose
explicitly from the main program's ELF destructors. A couple of
different approaches involving reverse constructor order were tried,
and none of them worked really well. It seems we need to keep the
dependency sorting in _dl_fini.
There is also an ambiguity regarding nested dlopen calls from ELF
constructors: Should those destructors run before or after the object
that called dlopen? Commit 6985865bc3 used reverse order
of the start of ELF constructor calls for destructors, but arguably
using completion of constructors is more correct. However, that alone
is not sufficient to address application compatibility issues (it
does not change _dl_fini ordering at all).
Linux 6.5 adds a constant SCM_PIDFD (recall that the non-uapi
linux/socket.h, where this constant is added, is in fact a header
providing many constants that are part of the kernel/userspace
interface). This shows up that SCM_SECURITY, from the same set of
definitions and added in Linux 2.6.17, is also missing from glibc,
although glibc has the first two constants from this set, SCM_RIGHTS
and SCM_CREDENTIALS; add both missing constants to glibc.
Tested for x86_64.
Linux 6.5 adds a constant AT_HANDLE_FID; add it to glibc. Because
this is a flag for the function name_to_handle_at declared in
bits/fcntl-linux.h, put the flag there rather than alongside other
AT_* flags in (OS-independent) fcntl.h.
Tested for x86_64.
With GCC 14 on 32-bit x86 the compiler emits a maybe-uninitialized
warning:
../sysdeps/ieee754/dbl-64/k_rem_pio2.c: In function '__kernel_rem_pio2':
../sysdeps/ieee754/dbl-64/k_rem_pio2.c:364:20: error: 'fq' may be used uninitialized [-Werror=maybe-uninitialized]
364 | y[0] = fq[0]; y[1] = fq[1]; y[2] = fw;
| ~~^~~
This is similar to the warning that is suppressed in the other branch of
the switch. Help the compiler knowing that the variable is always
initialized, which also makes the suppression obsolete.
This commit refactors `strrchr-evex` and `strrchr-evex512` to use a
common implementation: `strrchr-evex-base.S`.
The motivation is `strrchr-evex` needed to be refactored to not use
64-bit masked registers in preperation for AVX10.
Once vec-width masked register combining was removed, the EVEX and
EVEX512 implementations can easily be implemented in the same file
without any major overhead.
The net result is performance improvements (measured on TGL) for both
`strrchr-evex` and `strrchr-evex512`. Although, note there are some
regressions in the test suite and it may be many of the cases that
make the total-geomean of improvement/regression across bench-strrchr
are cold. The point of the performance measurement is to show there
are no major regressions, but the primary motivation is preperation
for AVX10.
Benchmarks where taken on TGL:
https://www.intel.com/content/www/us/en/products/sku/213799/intel-core-i711850h-processor-24m-cache-up-to-4-80-ghz/specifications.html
EVEX geometric_mean(N=5) of all benchmarks New / Original : 0.74
EVEX512 geometric_mean(N=5) of all benchmarks New / Original: 0.87
Full check passes on x86.
* Transpose table layout for improved memory access
* Use half-vector special comparisons for AdvSIMD
* Improve register use near special-case branches
- Due to the presence of a function call, return value would get
mov-d out of x0 in order to facilitate PCS. By moving the final
computation after the branch this can be avoided
Also change SVE routines to use overloaded intrinsics for readability.
Use overloaded intrinsics for readability. Codegen does not
change, however while we're bringing the routines up-to-date with
recent improvements to other routines in AOR it is worth copying
this change over as well.
* Update ULP comment reflecting a new observed max in [-pi/2, pi/2]
* Use the same polynomial in AdvSIMD and SVE, rather than FTRIG instructions
* Improve register use near special-case branch
Also use overloaded intrinsics for SVE.
When -D_FORTIFY_SOURCE=2 was given during compilation,
sprintf and similar functions will check if their
first argument is in read-only memory and exit with
*** %n in writable segment detected ***
otherwise. To check if the memory is read-only, glibc
reads frpm the file "/proc/self/maps". If opening this
file fails due to too many open files (EMFILE), glibc
will now ignore this error.
Fixes [BZ #30932]
Signed-off-by: Volker Weißmann <volker.weissmann@gmx.de>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
GLIBC_TUNABLES scrubbing happens earlier than envvar scrubbing and some
tunables are required to propagate past setxid boundary, like their
env_alias. Rely on tunable scrubbing to clean out GLIBC_TUNABLES like
before, restoring behaviour in glibc 2.37 and earlier.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Linux v5.10 added a mount option MS_NOSYMFOLLOW, which was added to
glibc in commit 0ca21427d9.
Add the corresponding statfs/statvfs flag bit, ST_NOSYMFOLLOW.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Read directly into the mips_abiflags struct rather than reading the
entire segment and using alloca when the passed buffer is not big enough.
Checked with build-many-glibcs.py on mips-linux-gnu
Tested-by: Ying Huang <ying.huang@oss.cipunited.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This commit add support for the new AVX10 cpu features:
https://cdrdv2-public.intel.com/784267/355989-intel-avx10-spec.pdf
We add checks for:
- `AVX10`: Check if AVX10 is present.
- `AVX10_{X,Y,Z}MM`: Check if a given vec class has AVX10 support.
`make check` passes and cpuid output was checked against GNR/DMR on an
emulator.
On powerpc, SET_RESTORE_ROUND uses inline assembly to optimize the
prologue get/save/set rounding mode operations for POWER9 and
later by using 'mffscrn' where possible, this was introduced by
commit f1c56cdff0.
GCC version 14 onwards supports builtins as __builtin_set_fpscr_rn
which now returns the FPSCR fields in a double. This feature is
available on Power9 when the __SET_FPSCR_RN_RETURNS_FPSCR__ macro
is defined.
GCC commit ef3bbc69d15707e4db6e2f198c621effb636cc26 adds
this feature.
Changes are done to use __builtin_set_fpscr_rn instead of mffscrn
or mffscrni in __fe_mffscrn(rn).
Suggested-by: Carl Love <cel@us.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
AT_EMPTY_PATH is a requirement to implement fstat over fstatat,
however it does not prevent the kernel to read the path argument.
It is not an issue, but on x86-64 with SMAP-capable CPUs the kernel is
forced to perform expensive user memory access. After that regular
lookup is performed which adds even more overhead.
Instead, issue the fstat syscall directly on LFS fstat implementation
(32 bit architectures will still continue to use statx, which is
required to have 64 bit time_t support). it should be even a
small performance gain on non x86_64, since there is no need
to handle the path argument.
Checked on x86_64-linux-gnu.
Remove the unnecessary extra checks for sin (-0.0) from vector sin/sinf,
improving performance. Passes regress.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
This patch updates the kernel version in the tests tst-mman-consts.py
and tst-pidfd-consts.py to 6.5. (There are no new constants covered
by these tests in 6.5 that need any other header changes;
tst-mount-consts.py was updated separately along with a header
constant addition.)
Tested with build-many-glibcs.py.
When an NSS plugin only implements the _gethostbyname2_r and
_getcanonname_r callbacks, getaddrinfo could use memory that was freed
during tmpbuf resizing, through h_name in a previous query response.
The backing store for res->at->name when doing a query with
gethostbyname3_r or gethostbyname2_r is tmpbuf, which is reallocated in
gethosts during the query. For AF_INET6 lookup with AI_ALL |
AI_V4MAPPED, gethosts gets called twice, once for a v6 lookup and second
for a v4 lookup. In this case, if the first call reallocates tmpbuf
enough number of times, resulting in a malloc, th->h_name (that
res->at->name refers to) ends up on a heap allocated storage in tmpbuf.
Now if the second call to gethosts also causes the plugin callback to
return NSS_STATUS_TRYAGAIN, tmpbuf will get freed, resulting in a UAF
reference in res->at->name. This then gets dereferenced in the
getcanonname_r plugin call, resulting in the use after free.
Fix this by copying h_name over and freeing it at the end. This
resolves BZ #30843, which is assigned CVE-2023-4806.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
According to glibc strrchr microbenchmark test results, this implementation
could reduce the runtime time as following:
Name Percent of rutime reduced
strrchr-lasx 10%-50%
strrchr-lsx 0%-50%
strrchr-aligned 5%-50%
Generic strrchr is implemented by function strlen + memrchr, the lasx version
will compare with generic strrchr implemented by strlen-lasx + memrchr-lasx,
the lsx version will compare with generic strrchr implemented by strlen-lsx +
memrchr-lsx, the aligned version will compare with generic strrchr implemented
by strlen-aligned + memrchr-generic.
According to glibc strcpy and stpcpy microbenchmark test results(changed
to use generic_strcpy and generic_stpcpy instead of strlen + memcpy),
comparing with the generic version, this implementation could reduce the
runtime as following:
Name Percent of rutime reduced
strcpy-aligned 8%-45%
strcpy-unaligned 8%-48%, comparing with the aligned version, unaligned
version takes less instructions to copy the tail of data
which length is less than 8. it also has better performance
in case src and dest cannot be both aligned with 8bytes
strcpy-lsx 20%-80%
strcpy-lasx 15%-86%
stpcpy-aligned 6%-43%
stpcpy-unaligned 8%-48%
stpcpy-lsx 10%-80%
stpcpy-lasx 10%-87%
This patch adds the MOVE_MOUNT_BENEATH constant from Linux 6.5 to
glibc's sys/mount.h and updates tst-mount-consts.py to reflect these
constants being up to date with that Linux kernel version.
Tested with build-many-glibcs.py.
Linux 6.5 has one new syscall, cachestat, and also enables the
cacheflush syscall for hppa. Update syscall-names.list and regenerate
the arch-syscall.h headers with build-many-glibcs.py update-syscalls.
Tested with build-many-glibcs.py.
Needed since gcc-10 enabled -fno-common by default.
[In use in Gentoo since gcc-10, no problems observed.
Also discussed with and reviewed by Jessica Clarke from
Debian. Andreas]
Bug: https://bugs.gentoo.org/723268
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
Use a fixed size array instead. The maximum number of arguments
is set by macro tricks.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The current implementation of dlclose (and process exit) re-sorts the
link maps before calling ELF destructors. Destructor order is not the
reverse of the constructor order as a result: The second sort takes
relocation dependencies into account, and other differences can result
from ambiguous inputs, such as cycles. (The force_first handling in
_dl_sort_maps is not effective for dlclose.) After the changes in
this commit, there is still a required difference due to
dlopen/dlclose ordering by the application, but the previous
discrepancies went beyond that.
A new global (namespace-spanning) list of link maps,
_dl_init_called_list, is updated right before ELF constructors are
called from _dl_init.
In dl_close_worker, the maps variable, an on-stack variable length
array, is eliminated. (VLAs are problematic, and dlclose should not
call malloc because it cannot readily deal with malloc failure.)
Marking still-used objects uses the namespace list directly, with
next and next_idx replacing the done_index variable.
After marking, _dl_init_called_list is used to call the destructors
of now-unused maps in reverse destructor order. These destructors
can call dlopen. Previously, new objects do not have l_map_used set.
This had to change: There is no copy of the link map list anymore,
so processing would cover newly opened (and unmarked) mappings,
unloading them. Now, _dl_init (indirectly) sets l_map_used, too.
(dlclose is handled by the existing reentrancy guard.)
After _dl_init_called_list traversal, two more loops follow. The
processing order changes to the original link map order in the
namespace. Previously, dependency order was used. The difference
should not matter because relocation dependencies could already
reorder link maps in the old code.
The changes to _dl_fini remove the sorting step and replace it with
a traversal of _dl_init_called_list. The l_direct_opencount
decrement outside the loader lock is removed because it appears
incorrect: the counter manipulation could race with other dynamic
loader operations.
tst-audit23 needs adjustments to the changes in LA_ACT_DELETE
notifications. The new approach for checking la_activity should
make it clearer that la_activty calls come in pairs around namespace
updates.
The dependency sorting test cases need updates because the destructor
order is always the opposite order of constructor order, even with
relocation dependencies or cycles present.
There is a future cleanup opportunity to remove the now-constant
force_first and for_fini arguments from the _dl_sort_maps function.
Fixes commit 1df71d32fe ("elf: Implement
force_first handling in _dl_sort_maps_dfs (bug 28937)").
Reviewed-by: DJ Delorie <dj@redhat.com>
Commit 5f828ff824 ("io: Fix F_GETLK, F_SETLK, and F_SETLKW for
powerpc64") fixed an issue with the value of the lock constants on
powerpc64 when not using __USE_FILE_OFFSET64, but it ended-up also
changing the value when using __USE_FILE_OFFSET64 causing an API change.
Fix that by also checking that define, restoring the pre
4d0fe291ae commit values:
Default values:
- F_GETLK: 5
- F_SETLK: 6
- F_SETLKW: 7
With -D_FILE_OFFSET_BITS=64:
- F_GETLK: 12
- F_SETLK: 13
- F_SETLKW: 14
At the same time, it has been noticed that there was no test for io lock
with __USE_FILE_OFFSET64, so just add one.
Tested on x86_64-linux-gnu, i686-linux-gnu and
powerpc64le-unknown-linux-gnu.
Resolves: BZ #30804.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
XTheadBb has similar instructions like Zbb, which allow optimized
string processing:
* th.ff0: find-first zero is a CLZ instruction.
* th.tstnbz: Similar like orc.b, but with a bit-inverted result.
The instructions are documented here:
https://github.com/T-head-Semi/thead-extension-spec/tree/master/xtheadbb
These instructions can be found in the T-Head C906 and the C910.
Tested with the string tests.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This code is generally unused in practice since there don't seem to be
any NSS modules that only implement _nss_MOD_gethostbyname2_r and not
_nss_MOD_gethostbyname3_r.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
This interface allows to obtain the associated process ID from the
process file descriptor. It is done by parsing the procps fdinfo
information. Its prototype is:
pid_t pidfd_getpid (int fd)
It returns the associated pid or -1 in case of an error and sets the
errno accordingly. The possible errno values are those from open, read,
and close (used on procps parsing), along with:
- EBADF if the FD is negative, does not have a PID associated, or if
the fdinfo fields contain a value larger than pid_t.
- EREMOTE if the PID is in a separate namespace.
- ESRCH if the process is already terminated.
Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Returning a pidfd allows a process to keep a race-free handle for a
child process, otherwise, the caller will need to either use pidfd_open
(which still might be subject to TOCTOU) or keep the old racy interface
base on pid_t.
To correct use pifd_spawn, the kernel must support not only returning
the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux
5.4). If kernel does not support the waitid, pidfd return ENOSYS.
It avoids the need to racy workarounds, such as reading the procfs
fdinfo to get the pid to use along with other wait interfaces.
These interfaces are similar to the posix_spawn and posix_spawnp, with
the only difference being it returns a process file descriptor (int)
instead of a process ID (pid_t). Their prototypes are:
int pidfd_spawn (int *restrict pidfd,
const char *restrict file,
const posix_spawn_file_actions_t *restrict facts,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict],
char *const envp[restrict])
int pidfd_spawnp (int *restrict pidfd,
const char *restrict path,
const posix_spawn_file_actions_t *restrict facts,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict_arr],
char *const envp[restrict_arr]);
A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the return
argument lifetime. Although on Linux pid_t and int are interchangeable,
POSIX only states that pid_t should be a signed integer.
Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It
also means that both interfaces support the same attribute and file
actions, and a new flag or file action on posix_spawn is also added
automatically for pidfd_spawn.
Also, using posix_spawn plumbing allows the reusing of most of the
current testing with some changes:
- waitid is used instead of waitpid since it is a more generic
interface.
- tst-posix_spawn-setsid.c is adapted to take into consideration that
the caller can check for session id directly. The test now spawns
itself and writes the session id as a file instead.
- tst-spawn3.c need to know where pidfd_spawn is used so it keeps an
extra file description unused.
Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid
support), Linux 5.4 (full support), and Linux 6.2.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
These functions allow to posix_spawn and posix_spawnp to use
CLONE_INTO_CGROUP with clone3, allowing the child process to
be created in a different cgroup version 2. These are GNU
extensions that are available only for Linux, and also only
for the architectures that implement clone3 wrapper
(HAVE_CLONE3_WRAPPER).
To create a process on a different cgroupv2, one can use the:
posix_spawnattr_t attr;
posix_spawnattr_init (&attr);
posix_spawnattr_setflags (&attr, POSIX_SPAWN_SETCGROUP);
posix_spawnattr_setcgroup_np (&attr, cgroup);
posix_spawn (...)
Similar to other posix_spawn flags, POSIX_SPAWN_SETCGROUP control
whether the cgroup file descriptor will be used or not with
clone3.
There is no fallback if either clone3 does not support the flag
or if the architecture does not provide the clone3 wrapper, in
this case posix_spawn returns EOPNOTSUPP.
Checked on x86_64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
It follows the internal signature:
extern int clone3 (struct clone_args *__cl_args, size_t __size,
int (*__func) (void *__arg), void *__arg);
Checked on mips64el-linux-gnueabihf, mips64el-n32-linux-gnu, and
mipsel-linux-gnu.
It follows the internal signature:
extern int clone3 (struct clone_args *__cl_args, size_t __size,
int (*__func) (void *__arg), void *__arg);
Checked on arm-linux-gnueabihf.