The implementation is based on scalar Chacha20 with per-thread cache.
It uses getrandom or /dev/urandom as fallback to get the initial entropy,
and reseeds the internal state on every 16MB of consumed buffer.
To improve performance and lower memory consumption the per-thread cache
is allocated lazily on first arc4random functions call, and if the
memory allocation fails getentropy or /dev/urandom is used as fallback.
The cache is also cleared on thread exit iff it was initialized (so if
arc4random is not called it is not touched).
Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).
The ChaCha20 implementation is based on RFC8439 [1], omitting the final
XOR of the keystream with the plaintext because the plaintext is a
stream of zeros. This strategy is similar to what OpenBSD arc4random
does.
The arc4random_uniform is based on previous work by Florian Weimer,
where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete
Uniform Generation from Coin Flips, and Applications (2013) [2], who
credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform
random number generation (1976), for solving the general case.
The main advantage of this method is the that the unit of randomness is not
the uniform random variable (uint32_t), but a random bit. It optimizes the
internal buffer sampling by initially consuming a 32-bit random variable
and then sampling byte per byte. Depending of the upper bound requested,
it might lead to better CPU utilization.
Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.
Co-authored-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
[1] https://datatracker.ietf.org/doc/html/rfc8439
[2] https://arxiv.org/pdf/1304.1916.pdf
And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL,
since header inclusion single-thread.h is in the wrong order, the define
needs to come before including sysdeps/unix/sysdep.h. The macro
is now moved to a per-arch single-threade.h header.
The SINGLE_THREAD_P is used on some more places.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
The main drive is to optimize the internal usage and required size
when sigset_t is embedded in other data structures. On Linux, the
current supported signal set requires up to 8 bytes (16 on mips),
was lower than the user defined sigset_t (128 bytes).
A new internal type internal_sigset_t is added, along with the
functions to operate on it similar to the ones for sigset_t. The
internal-signals.h is also refactored to remove unused functions
Besides small stack usage on some functions (posix_spawn, abort)
it lower the struct pthread by about 120 bytes (112 on mips).
Checked on x86_64-linux-gnu.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
By adding an internal alias to avoid the GOT indirection.
On some architecture, __libc_single_thread may be accessed through
copy relocations and thus it requires to update also the copies
default copy.
This is done by adding a new internal macro,
libc_hidden_data_{proto,def}, which has an addition argument that
specifies the alias name (instead of default __GI_ one).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Fangrui Song <maskray@google.com>
The 404656009b reversion did not setup the atomic loop to set the
cancel bits correctly. The fix is essentially what pthread_cancel
did prior 26cfbb7162.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Some Linux interfaces never restart after being interrupted by a signal
handler, regardless of the use of SA_RESTART [1]. It means that for
pthread cancellation, if the target thread disables cancellation with
pthread_setcancelstate and calls such interfaces (like poll or select),
it should not see spurious EINTR failures due the internal SIGCANCEL.
However recent changes made pthread_cancel to always sent the internal
signal, regardless of the target thread cancellation status or type.
To fix it, the previous semantic is restored, where the cancel signal
is only sent if the target thread has cancelation enabled in
asynchronous mode.
The cancel state and cancel type is moved back to cancelhandling
and atomic operation are used to synchronize between threads. The
patch essentially revert the following commits:
8c1c0aae20 nptl: Move cancel type out of cancelhandling
2b51742531 nptl: Move cancel state out of cancelhandling
26cfbb7162 nptl: Remove CANCELING_BITMASK
However I changed the atomic operation to follow the internal C11
semantic and removed the MACRO usage, it simplifies a bit the
resulting code (and removes another usage of the old atomic macros).
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
and powerpc64-linux-gnu.
[1] https://man7.org/linux/man-pages/man7/signal.7.html
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
If the build itself is run in a container, we may not be able to
fully set up a nested container for test-container testing.
Notably is the mounting of /proc, since it's critical that it
be mounted from within the same PID namespace as its users, and
thus cannot be bind mounted from outside the container like other
mounts.
This patch defaults to using the parent's PID namespace instead of
creating a new one, as this is more likely to be allowed.
If the test needs an isolated PID namespace, it should add the "pidns"
command to its init script.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
_STACK_GROWS_DOWN is defined to 0 when the stack grows up. The
code in unwind.c used `#ifdef _STACK_GROWS_DOWN' to selct the
stack grows down define for FRAME_LEFT. As a result, the
_STACK_GROWS_DOWN define was always selected and cleanups were
incorrectly sequenced when the stack grows up.
For audit modules and dependencies with initial-exec TLS, we can not
set the initial TLS image on default loader initialization because it
would already be set by the audit setup. However, subsequent thread
creation would need to follow the default behaviour.
This patch fixes it by setting l_auditing link_map field not only
for the audit modules, but also for all its dependencies. This is
used on _dl_allocate_tls_init to avoid the static TLS initialization
at load time.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
The commit:
"Add LLL_MUTEX_READ_LOCK [BZ #28537]"
SHA1: d672a98a1a
introduced LLL_MUTEX_READ_LOCK, to skip CAS in spinlock loop
if atomic load fails. But, "continue" inside of do-while loop
does not skip the evaluation of escape expression, thus CAS
is not skipped.
Replace do-while with while and skip LLL_MUTEX_TRYLOCK if
LLL_MUTEX_READ_LOCK fails.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
This tunable allows applications to register the rseq area instead
of glibc.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The rseq area is placed directly into struct pthread. rseq
registration failure is not treated as an error, so it is possible
that threads run with inconsistent registration status.
<sys/rseq.h> is not yet installed as a public header.
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The function was renamed to __atomic_wide_counter_load_relaxed
in commit 8bd336a00a ("nptl: Extract
<bits/atomic_wide_counter.h> from pthread_cond_common.c").
rseq support will use a 32-byte aligned field in struct pthread,
so the whole struct needs to have at least that alignment.
nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h>
to obtain the fallback definition.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
__libc_signal_restore_set was in the wrong place: It also ran
when setjmp returned the second time (after pthread_exit or
pthread_cancel). This is observable with blocked pending
signals during thread exit.
Fixes commit b3cae39dcb
("nptl: Start new threads with all signals blocked [BZ #25098]").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
And make it an installed header. This addresses a few aliasing
violations (which do not seem to result in miscompilation due to
the use of atomics), and also enables use of wide counters in other
parts of the library.
The debug output in nptl/tst-cond22 has been adjusted to print
the 32-bit values instead because it avoids a big-endian/little-endian
difference.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Update
commit 49302b8fdf
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Nov 11 06:54:01 2021 -0800
Avoid extra load with CAS in __pthread_mutex_clocklock_common [BZ #28537]
Replace boolean CAS with value CAS to avoid the extra load.
and
commit 0b82747dc4
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Nov 11 06:31:51 2021 -0800
Avoid extra load with CAS in __pthread_mutex_lock_full [BZ #28537]
Replace boolean CAS with value CAS to avoid the extra load.
by moving assignment out of the CAS condition.
CAS instruction is expensive. From the x86 CPU's point of view, getting
a cache line for writing is more expensive than reading. See Appendix
A.2 Spinlock in:
https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/xeon-lock-scaling-analysis-paper.pdf
The full compare and swap will grab the cache line exclusive and cause
excessive cache line bouncing.
Add LLL_MUTEX_READ_LOCK to do an atomic load and skip CAS in spinlock
loop if compare may fail to reduce cache line bouncing on contended locks.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
The check for waiting for the pidfile to be created looks wrong. At the
point when ACCESS is run the pid file will always be created and
accessible as it is created during DO_PREPARE. This means that thread
cancellation may be performed before the pid is written to the pidfile.
This was found to be flaky when testing on my OpenRISC platform.
Fix this by using the semaphore to wait for pidfile pid write
completion.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The choice between the kill vs tgkill system calls is not just about
the TID reuse race, but also about whether the signal is sent to the
whole process (and any thread in it) or to a specific thread.
This was caught by the openposix test suite:
LTP: openposix test suite - FAIL: SIGUSR1 is member of new thread pendingset.
<https://gitlab.com/cki-project/kernel-tests/-/issues/764>
Fixes commit 526c3cf11e ("nptl: Fix race
between pthread_kill and thread exit (bug 12889)").
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Linux added FUTEX_LOCK_PI2 to support clock selection
(commit bf22a6976897977b0a3f1aeba6823c959fc4fdae). With the new
flag we can now proper support CLOCK_MONOTONIC for
pthread_mutex_clocklock with Priority Inheritance. If kernel
does not support, EINVAL is returned instead.
The difference is the futex operation will be issued and the kernel
will advertise the missing support (instead of hard-code error
return).
Checked on x86_64-linux-gnu and i686-linux-gnu on Linux 5.14, 5.11,
and 4.15.
This patch uses the new futex PI operation provided by Linux v5.14
when it is required.
The futex_lock_pi64() is moved to futex-internal.c (since it used on
two different places and its code size might be large depending of the
kernel configuration) and clockid is added as an argument.
Co-authored-by: Kurt Kanzenbach <kurt@linutronix.de>
As part of the fix for bug 12889, signals are blocked during
thread exit, so that application code cannot run on the thread that
is about to exit. This would cause problems if the application
expected signals to be delivered after the signal handler revealed
the thread to still exist, despite pthread_kill can no longer be used
to send signals to it. However, glibc internally uses the SIGSETXID
signal in a way that is incompatible with signal blocking, due to the
way the setxid handshake delays thread exit until the setxid operation
has completed. With a blocked SIGSETXID, the handshake can never
complete, causing a deadlock.
As a band-aid, restore the previous handshake protocol by not blocking
SIGSETXID during thread exit.
The new test sysdeps/pthread/tst-pthread-setuid-loop.c is based on
a downstream test by Martin Osvald.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
The fix for bug 19193 breaks some old applications which appear
to use pthread_kill to probe if a thread is still running, something
that is not supported by POSIX.
A new thread exit lock and flag are introduced. They are used to
detect that the thread is about to exit or has exited in
__pthread_kill_internal, and the signal is not sent in this case.
The test sysdeps/pthread/tst-pthread_cancel-select-loop.c is derived
from a downstream test originally written by Marek Polacek.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This closes one remaining race condition related to bug 12889: if
the thread already exited on the kernel side, returning ESRCH
is not correct because that error is reserved for the thread IDs
(pthread_t values) whose lifetime has ended. In case of a
kernel-side exit and a valid thread ID, no signal needs to be sent
and cancellation does not have an effect, so just return 0.
sysdeps/pthread/tst-kill4.c triggers undefined behavior and is
removed with this commit.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
A mapped temporary file and a semaphore is used to synchronize the
pid information on the created file, the semaphore is updated once
the file contents is flushed.
Checked on x86_64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
The test nptl/tst-thread_local1.cc fails to build with GCC mainline
because of changes to what libstdc++ headers implicitly include what
other headers:
tst-thread_local1.cc: In function 'int do_test()':
tst-thread_local1.cc:177:5: error: variable 'std::array<std::pair<const char*, std::function<void(void* (*)(void*))> >, 2> do_thread_X' has initializer but incomplete type
177 | do_thread_X
| ^~~~~~~~~~~
Fix this by adding an explicit include of <array>.
Tested with build-many-glibcs.py for aarch64-linux-gnu.
Remove all malloc hook uses from core malloc functions and move it
into a new library libc_malloc_debug.so. With this, the hooks now no
longer have any effect on the core library.
libc_malloc_debug.so is a malloc interposer that needs to be preloaded
to get hooks functionality back so that the debugging features that
depend on the hooks, i.e. malloc-check, mcheck and mtrace work again.
Without the preloaded DSO these debugging features will be nops.
These features will be ported away from hooks in subsequent patches.
Similarly, legacy applications that need hooks functionality need to
preload libc_malloc_debug.so.
The symbols exported by libc_malloc_debug.so are maintained at exactly
the same version as libc.so.
Finally, static binaries will no longer be able to use malloc
debugging features since they cannot preload the debugging DSO.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
The clone3 system call (since Linux 5.3) provides a superset of the
functionality of clone and clone2. It also provides a number of API
improvements, including the ability to specify the size of the child's
stack area which can be used by kernel to compute the shadow stack size
when allocating the shadow stack. Add:
extern int __clone_internal (struct clone_args *__cl_args,
int (*__func) (void *__arg), void *__arg);
to provide an abstract interface for clone, clone2 and clone3.
1. Simplify stack management for thread creation by passing both stack
base and size to create_thread.
2. Consolidate clone vs clone2 differences into a single file.
3. Call __clone3 if HAVE_CLONE3_WAPPER is defined. If __clone3 returns
-1 with ENOSYS, fall back to clone or clone2.
4. Use only __clone_internal to clone a thread. Since the stack size
argument for create_thread is now unconditional, always pass stack size
to create_thread.
5. Enable the public clone3 wrapper in the future after it has been
added to all targets.
NB: Sandbox will return ENOSYS on clone3 in both Chromium:
The following revision refers to this bug:
218438259d
commit 218438259dd795456f0a48f67cbe5b4e520db88b
Author: Matthew Denton <mpdenton@chromium.org>
Date: Thu Jun 03 20:06:13 2021
Linux sandbox: return ENOSYS for clone3
Because clone3 uses a pointer argument rather than a flags argument, we
cannot examine the contents with seccomp, which is essential to
preventing sandboxed processes from starting other processes. So, we
won't be able to support clone3 in Chromium. This CL modifies the
BPF policy to return ENOSYS for clone3 so glibc always uses the fallback
to clone.
Bug: 1213452
Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Commit-Queue: Matthew Denton <mpdenton@chromium.org>
Cr-Commit-Position: refs/heads/master@{#888980}
[modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc
and Firefox:
https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
<limits.h> used to be a header file with no declarations.
GCC's libgomp includes it in a #pragma GCC visibility hidden block.
Including <unistd.h> from <limits.h> (indirectly) declares everything
in <unistd.h> with hidden visibility, resulting in linker failures.
This commit avoids C declarations in assembler mode and only declares
__sysconf in <limits.h> (and not the entire contents of <unistd.h>).
The __sysconf symbol is already part of the ABI. PTHREAD_STACK_MIN
is no longer defined for __USE_DYNAMIC_STACK_SIZE && __ASSEMBLER__
because there is no possible definition.
Additionally, PTHREAD_STACK_MIN is now defined by <pthread.h> for
__USE_MISC because this is what developers expect based on the macro
name. It also helps to avoid libgomp linker failures in GCC because
libgomp includes <pthread.h> before its visibility hacks.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The constant PTHREAD_STACK_MIN may be too small for some processors.
Rename _SC_SIGSTKSZ_SOURCE to _DYNAMIC_STACK_SIZE_SOURCE. When
_DYNAMIC_STACK_SIZE_SOURCE or _GNU_SOURCE are defined, define
PTHREAD_STACK_MIN to sysconf(_SC_THREAD_STACK_MIN) which is changed
to MIN (PTHREAD_STACK_MIN, sysconf(_SC_MINSIGSTKSZ)).
Consolidate <bits/local_lim.h> with <bits/pthread_stack_min.h> to
provide a constant target specific PTHREAD_STACK_MIN value.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
As a result, is not necessary to specify __attribute__ ((nocommon))
on individual definitions.
GCC 10 defaults to -fno-common on all architectures except ARC,
but this change is compatible with older GCC versions and ARC, too.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This slightly reduces code size, as can be seen below.
__libc_lock_unlock is usually used along with __libc_lock_lock in
the same function. __libc_lock_lock already has an out-of-line
slow path, so this change should not introduce many additional
non-leaf functions.
This change also fixes a link failure in 32-bit Arm thumb mode
because commit 1f9c804fbd
("nptl: Use internal low-level lock type for !IS_IN (libc)")
introduced __libc_do_syscall calls outside of libc.
Before x86-64:
text data bss dec hex filename
1937748 20456 54896 2013100 1eb7ac libc.so.6
25601 856 12768 39225 9939 nss/libnss_db.so.2
40310 952 25144 66406 10366 nss/libnss_files.so.2
After x86-64:
text data bss dec hex filename
1935312 20456 54896 2010664 1eae28 libc.so.6
25559 864 12768 39191 9917 nss/libnss_db.so.2
39764 960 25144 65868 1014c nss/libnss_files.so.2
Before i686:
2110961 11272 39144 2161377 20fae1 libc.so.6
27243 428 12652 40323 9d83 nss/libnss_db.so.2
43062 476 25028 68566 10bd6 nss/libnss_files.so.2
After i686:
2107347 11272 39144 2157763 20ecc3 libc.so.6
26929 432 12652 40013 9c4d nss/libnss_db.so.2
43132 480 25028 68640 10c20 nss/libnss_files.so.2
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The remaining symbols are mostly used by libthread_db.
__pthread_get_minstack has to remain exported even though unused.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Now that there are no internal users anymore, these new symbol
versions can be removed from the public ABI. The compatibility
symbols remain.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The valgrind/helgrind test suite needs a way to make stack dealloction
more prompt, and this feature seems to be generally useful.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
This allows distributions to strip debugging information from
libc.so.6 without impacting the debugging experience.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>