Created tunable glibc.pthread.stack_hugetlb to control when hugepages
can be used for stack allocation.
In case THP are enabled and glibc.pthread.stack_hugetlb is set to
0, glibc will madvise the kernel not to use allow hugepages for stack
allocations.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
And make always supported. The configure option was added on glibc 2.25
and some features require it (such as hwcap mask, huge pages support, and
lock elisition tuning). It also simplifies the build permutations.
Changes from v1:
* Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs
more discussion.
* Cleanup more code.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Rename atomic_exchange_rel/acq to use atomic_exchange_release/acquire
since these map to the standard C11 atomic builtins.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Replace atomic_decrement_and_test with atomic_fetch_add_relaxed.
These are simple counters which do not protect any shared data from
concurrent accesses. Also remove the unused file cond-perf.c.
Passes regress on AArch64.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
A new internal definition, __LIBC_LOCK_ALIGNMENT, is used to force
the 4-byte alignment only for m68k, other architecture keep the
natural alignment of the type used internally (and hppa does not
require 16-byte alignment for kernel-assisted CAS).
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Replace the 3 uses of atomic_bit_set and atomic_bit_test_set with
atomic_fetch_or_relaxed. Using relaxed MO is correct since the
atomics are used to ensure memory is released only once.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Improve performance of recursive IO locks by adding a fast path for
the single-threaded case. To reduce the number of memory accesses for
locking/unlocking, only increment the recursion counter if the lock
is already taken.
On Neoverse V1, a microbenchmark with many small freads improved by
2.9x. Multithreaded performance improved by 2%.
Reviewed-by: Cristian Rodríguez <crrodriguez@opensuse.org>
__pthread_sigmask cannot actually fail with valid pointer arguments
(it would need a really broken seccomp filter), and we do not check
for errors elsewhere.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.
This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.
This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.
This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.
getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.
The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The implementation is based on scalar Chacha20 with per-thread cache.
It uses getrandom or /dev/urandom as fallback to get the initial entropy,
and reseeds the internal state on every 16MB of consumed buffer.
To improve performance and lower memory consumption the per-thread cache
is allocated lazily on first arc4random functions call, and if the
memory allocation fails getentropy or /dev/urandom is used as fallback.
The cache is also cleared on thread exit iff it was initialized (so if
arc4random is not called it is not touched).
Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).
The ChaCha20 implementation is based on RFC8439 [1], omitting the final
XOR of the keystream with the plaintext because the plaintext is a
stream of zeros. This strategy is similar to what OpenBSD arc4random
does.
The arc4random_uniform is based on previous work by Florian Weimer,
where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete
Uniform Generation from Coin Flips, and Applications (2013) [2], who
credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform
random number generation (1976), for solving the general case.
The main advantage of this method is the that the unit of randomness is not
the uniform random variable (uint32_t), but a random bit. It optimizes the
internal buffer sampling by initially consuming a 32-bit random variable
and then sampling byte per byte. Depending of the upper bound requested,
it might lead to better CPU utilization.
Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.
Co-authored-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
[1] https://datatracker.ietf.org/doc/html/rfc8439
[2] https://arxiv.org/pdf/1304.1916.pdf
And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL,
since header inclusion single-thread.h is in the wrong order, the define
needs to come before including sysdeps/unix/sysdep.h. The macro
is now moved to a per-arch single-threade.h header.
The SINGLE_THREAD_P is used on some more places.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
By adding an internal alias to avoid the GOT indirection.
On some architecture, __libc_single_thread may be accessed through
copy relocations and thus it requires to update also the copies
default copy.
This is done by adding a new internal macro,
libc_hidden_data_{proto,def}, which has an addition argument that
specifies the alias name (instead of default __GI_ one).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Fangrui Song <maskray@google.com>
Some Linux interfaces never restart after being interrupted by a signal
handler, regardless of the use of SA_RESTART [1]. It means that for
pthread cancellation, if the target thread disables cancellation with
pthread_setcancelstate and calls such interfaces (like poll or select),
it should not see spurious EINTR failures due the internal SIGCANCEL.
However recent changes made pthread_cancel to always sent the internal
signal, regardless of the target thread cancellation status or type.
To fix it, the previous semantic is restored, where the cancel signal
is only sent if the target thread has cancelation enabled in
asynchronous mode.
The cancel state and cancel type is moved back to cancelhandling
and atomic operation are used to synchronize between threads. The
patch essentially revert the following commits:
8c1c0aae20 nptl: Move cancel type out of cancelhandling
2b51742531 nptl: Move cancel state out of cancelhandling
26cfbb7162 nptl: Remove CANCELING_BITMASK
However I changed the atomic operation to follow the internal C11
semantic and removed the MACRO usage, it simplifies a bit the
resulting code (and removes another usage of the old atomic macros).
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
and powerpc64-linux-gnu.
[1] https://man7.org/linux/man-pages/man7/signal.7.html
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
This matches the data size initial-exec relocations use on most
targets.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
The relationship between the thread pointer and the rseq area
is made explicit. The constant offset can be used by JIT compilers
to optimize rseq access (e.g., for really fast sched_getcpu).
Extensibility is provided through __rseq_size and __rseq_flags.
(In the future, the kernel could request a different rseq size
via the auxiliary vector.)
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
This tunable allows applications to register the rseq area instead
of glibc.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The rseq area is placed directly into struct pthread. rseq
registration failure is not treated as an error, so it is possible
that threads run with inconsistent registration status.
<sys/rseq.h> is not yet installed as a public header.
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
<tls.h> already contains a definition that is quite similar,
but it is not consistent across architectures.
Only architectures for which rseq support is added are covered.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
And make it an installed header. This addresses a few aliasing
violations (which do not seem to result in miscompilation due to
the use of atomics), and also enables use of wide counters in other
parts of the library.
The debug output in nptl/tst-cond22 has been adjusted to print
the 32-bit values instead because it avoids a big-endian/little-endian
difference.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch uses the new futex PI operation provided by Linux v5.14
when it is required.
The futex_lock_pi64() is moved to futex-internal.c (since it used on
two different places and its code size might be large depending of the
kernel configuration) and clockid is added as an argument.
Co-authored-by: Kurt Kanzenbach <kurt@linutronix.de>
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
<limits.h> used to be a header file with no declarations.
GCC's libgomp includes it in a #pragma GCC visibility hidden block.
Including <unistd.h> from <limits.h> (indirectly) declares everything
in <unistd.h> with hidden visibility, resulting in linker failures.
This commit avoids C declarations in assembler mode and only declares
__sysconf in <limits.h> (and not the entire contents of <unistd.h>).
The __sysconf symbol is already part of the ABI. PTHREAD_STACK_MIN
is no longer defined for __USE_DYNAMIC_STACK_SIZE && __ASSEMBLER__
because there is no possible definition.
Additionally, PTHREAD_STACK_MIN is now defined by <pthread.h> for
__USE_MISC because this is what developers expect based on the macro
name. It also helps to avoid libgomp linker failures in GCC because
libgomp includes <pthread.h> before its visibility hacks.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
As a result, is not necessary to specify __attribute__ ((nocommon))
on individual definitions.
GCC 10 defaults to -fno-common on all architectures except ARC,
but this change is compatible with older GCC versions and ARC, too.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This slightly reduces code size, as can be seen below.
__libc_lock_unlock is usually used along with __libc_lock_lock in
the same function. __libc_lock_lock already has an out-of-line
slow path, so this change should not introduce many additional
non-leaf functions.
This change also fixes a link failure in 32-bit Arm thumb mode
because commit 1f9c804fbd
("nptl: Use internal low-level lock type for !IS_IN (libc)")
introduced __libc_do_syscall calls outside of libc.
Before x86-64:
text data bss dec hex filename
1937748 20456 54896 2013100 1eb7ac libc.so.6
25601 856 12768 39225 9939 nss/libnss_db.so.2
40310 952 25144 66406 10366 nss/libnss_files.so.2
After x86-64:
text data bss dec hex filename
1935312 20456 54896 2010664 1eae28 libc.so.6
25559 864 12768 39191 9917 nss/libnss_db.so.2
39764 960 25144 65868 1014c nss/libnss_files.so.2
Before i686:
2110961 11272 39144 2161377 20fae1 libc.so.6
27243 428 12652 40323 9d83 nss/libnss_db.so.2
43062 476 25028 68566 10bd6 nss/libnss_files.so.2
After i686:
2107347 11272 39144 2157763 20ecc3 libc.so.6
26929 432 12652 40013 9c4d nss/libnss_db.so.2
43132 480 25028 68640 10c20 nss/libnss_files.so.2
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This avoids an ABI hazard (types changing between different modules
of glibc) without introducing linknamespace issues. In particular,
NSS modules now call __lll_lock_wait_private@@GLIBC_PRIVATE to wait
on internal locks (the unlock path is inlined and performs a direct
system call).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbols gai_cancel, gai_error, gai_suspend, getaddrinfo_a,
__gai_suspend_time64 were moved using scripts/move-symbol-to-libc.py.
For Hurd (which remains !PTHREAD_IN_LIBC), a few #define redirects
had to be added because several pthread functions are not available
under __. (Linux uses __ prefixes for most hidden aliases, and has
to in some cases to avoid linknamespace issues.)
The valgrind/helgrind test suite needs a way to make stack dealloction
more prompt, and this feature seems to be generally useful.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
librt.so is no longer installed for PTHREAD_IN_LIBC, and tests
are not linked against it. $(librt) is introduced globally for
shared tests that need to be linked for both PTHREAD_IN_LIBC
and !PTHREAD_IN_LIBC.
GLIBC_PRIVATE symbols that were needed during the transition are
removed again.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This adds several temporary GLIBC_PRIVATE exports. The symbol names
are changed so that they all start with __timer_.
It is now possible to invoke the fork handler directly, so
pthread_atfork is no longer necessary. The associated error cannot
happen anymore, and cancellation handling can be removed from
the helper thread routine.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbol was moved using scripts/move-symbol-to-libc.py.
An explicit call from fork into the mq_notify implementation replaces
the previous use of pthread_atfork.
Reviewed-by: Adhemerva Zanella <adhemerval.zanella@linaro.org>
This commit also moves the aio_misc and aio_sigquue helper,
so GLIBC_PRIVATE exports need to be added.
The symbol was moved using scripts/move-symbol-to-libc.py.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The pthread_atfork is similar between Linux and Hurd, only the compat
version bits differs. The generic version is place at sysdeps/pthread
with a common name.
It also fixes an issue with Hurd license, where the static-only object
did not use LGPL + exception.
Checked on x86_64-linux-gnu, i686-linux-gnu, and with a build for
i686-gnu.
The Linux nptl implementation is used as base for generic fork
implementation to handle the internal locks and mutexes. The
system specific bits are moved a new internal _Fork symbol.
(This new implementation will be used to provide a async-signal-safe
_Fork now that POSIX has clarified that fork might not be
async-signal-safe [1]).
For Hurd it means that the __nss_database_fork_prepare_parent and
__nss_database_fork_subprocess will be run in a slight different
order.
[1] https://austingroupbugs.net/view.php?id=62
For !__ASSUME_TIME64_SYSCALLS there is no need to issue a 64-bit syscall
if the provided timeout fits in a 32-bit one. The 64-bit usage should
be rare since the timeout is a relative one.
Checked on i686-linux-gnu on a 4.15 kernel and on a 5.11 kernel
(with and without --enable-kernel=5.1) and on x86_64-linux-gnu.
Reviewed-by: Lukasz Majewski <lukma@denx.de>
This mirrors the situation on Hurd. These directories are on
the include search part, so #include <pthreadP.h> works after this
change on both Hurd and nptl.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
A new build flag, _TIME_BITS, enables the usage of the newer 64-bit
time symbols for legacy ABI (where 32-bit time_t is default). The 64
bit time support is only enabled if LFS (_FILE_OFFSET_BITS=64) is
also used.
Different than LFS support, the y2038 symbols are added only for the
required ABIs (armhf, csky, hppa, i386, m68k, microblaze, mips32,
mips64-n32, nios2, powerpc32, sparc32, s390-32, and sh). The ABIs with
64-bit time support are unchanged, both for symbol and types
redirection.
On Linux the full 64-bit time support requires a minimum of kernel
version v5.1. Otherwise, the 32-bit fallbacks are used and might
results in error with overflow return code (EOVERFLOW).
The i686-gnu does not yet support 64-bit time.
This patch exports following rediretions to support 64-bit time:
* libc:
adjtime
adjtimex
clock_adjtime
clock_getres
clock_gettime
clock_nanosleep
clock_settime
cnd_timedwait
ctime
ctime_r
difftime
fstat
fstatat
futimens
futimes
futimesat
getitimer
getrusage
gettimeofday
gmtime
gmtime_r
localtime
localtime_r
lstat_time
lutimes
mktime
msgctl
mtx_timedlock
nanosleep
nanosleep
ntp_gettime
ntp_gettimex
ppoll
pselec
pselect
pthread_clockjoin_np
pthread_cond_clockwait
pthread_cond_timedwait
pthread_mutex_clocklock
pthread_mutex_timedlock
pthread_rwlock_clockrdlock
pthread_rwlock_clockwrlock
pthread_rwlock_timedrdlock
pthread_rwlock_timedwrlock
pthread_timedjoin_np
recvmmsg
sched_rr_get_interval
select
sem_clockwait
semctl
semtimedop
sem_timedwait
setitimer
settimeofday
shmctl
sigtimedwait
stat
thrd_sleep
time
timegm
timerfd_gettime
timerfd_settime
timespec_get
utime
utimensat
utimes
utimes
wait3
wait4
* librt:
aio_suspend
mq_timedreceive
mq_timedsend
timer_gettime
timer_settime
* libanl:
gai_suspend
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Now that the thread cancellation type is not accessed concurrently
anymore, it is possible to move it out the cancelhandling.
By removing the cancel state out of the internal thread cancel handling
state there is no need to check if cancelled bit was set in CAS
operation.
It allows simplifing the cancellation wrappers and the
CANCEL_CANCELED_AND_ASYNCHRONOUS is removed.
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Now that thread cancellation state is not accessed concurrently anymore,
it is possible to move it out the 'cancelhandling'.
The code is also simplified: CANCELLATION_P is replaced with a
internal pthread_testcancel call and the CANCELSTATE_BIT{MASK} is
removed.
With this behavior pthread_setcancelstate does not require to act on
cancellation if cancel type is asynchronous (is already handled either
by pthread_setcanceltype or by the signal handler).
Checked on x86_64-linux-gnu and aarch64-linux-gnu.