glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-09 23:00:07 +00:00

Author	SHA1	Message	Date
Adhemerval Zanella	89b53077d2	nptl: Fix Race conditions in pthread cancellation [BZ#12683] The current racy approach is to enable asynchronous cancellation before making the syscall and restore the previous cancellation type once the syscall returns, and check if cancellation has happen during the cancellation entrypoint. As described in BZ#12683, this approach shows 2 problems: 1. Cancellation can act after the syscall has returned from the kernel, but before userspace saves the return value. It might result in a resource leak if the syscall allocated a resource or a side effect (partial read/write), and there is no way to program handle it with cancellation handlers. 2. If a signal is handled while the thread is blocked at a cancellable syscall, the entire signal handler runs with asynchronous cancellation enabled. This can lead to issues if the signal handler call functions which are async-signal-safe but not async-cancel-safe. For the cancellation to work correctly, there are 5 points at which the cancellation signal could arrive: [ ... )[ ... )[ syscall ]( ... 1 2 3 4 5 1. Before initial testcancel, e.g. [... testcancel) 2. Between testcancel and syscall start, e.g. [testcancel...syscall start) 3. While syscall is blocked and no side effects have yet taken place, e.g. [ syscall ] 4. Same as 3 but with side-effects having occurred (e.g. a partial read or write). 5. After syscall end e.g. (syscall end...] And libc wants to act on cancellation in cases 1, 2, and 3 but not in cases 4 or 5. For the 4 and 5 cases, the cancellation will eventually happen in the next cancellable entrypoint without any further external event. The proposed solution for each case is: 1. Do a conditional branch based on whether the thread has received a cancellation request; 2. It can be caught by the signal handler determining that the saved program counter (from the ucontext_t) is in some address range beginning just before the "testcancel" and ending with the syscall instruction. 3. SIGCANCEL can be caught by the signal handler and determine that the saved program counter (from the ucontext_t) is in the address range beginning just before "testcancel" and ending with the first uninterruptable (via a signal) syscall instruction that enters the kernel. 4. In this case, except for certain syscalls that ALWAYS fail with EINTR even for non-interrupting signals, the kernel will reset the program counter to point at the syscall instruction during signal handling, so that the syscall is restarted when the signal handler returns. So, from the signal handler's standpoint, this looks the same as case 2, and thus it's taken care of. 5. For syscalls with side-effects, the kernel cannot restart the syscall; when it's interrupted by a signal, the kernel must cause the syscall to return with whatever partial result is obtained (e.g. partial read or write). 6. The saved program counter points just after the syscall instruction, so the signal handler won't act on cancellation. This is similar to 4. since the program counter is past the syscall instruction. So The proposed fixes are: 1. Remove the enable_asynccancel/disable_asynccancel function usage in cancellable syscall definition and instead make them call a common symbol that will check if cancellation is enabled (__syscall_cancel at nptl/cancellation.c), call the arch-specific cancellable entry-point (__syscall_cancel_arch), and cancel the thread when required. 2. Provide an arch-specific generic system call wrapper function that contains global markers. These markers will be used in SIGCANCEL signal handler to check if the interruption has been called in a valid syscall and if the syscalls has side-effects. A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c is provided. However, the markers may not be set on correct expected places depending on how INTERNAL_SYSCALL_NCS is implemented by the architecture. It is expected that all architectures add an arch-specific implementation. 3. Rewrite SIGCANCEL asynchronous handler to check for both canceling type and if current IP from signal handler falls between the global markers and act accordingly. 4. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to use the appropriate cancelable syscalls. 5. Adjust 'lowlevellock-futex.h' arch-specific implementations to provide cancelable futex calls. Some architectures require specific support on syscall handling: * On i386 the syscall cancel bridge needs to use the old int80 instruction because the optimized vDSO symbol the resulting PC value for an interrupted syscall points to an address outside the expected markers in __syscall_cancel_arch. It has been discussed in LKML [1] on how kernel could help userland to accomplish it, but afaik discussion has stalled. Also, sysenter should not be used directly by libc since its calling convention is set by the kernel depending of the underlying x86 chip (check kernel commit 30bfa7b3488bfb1bb75c9f50a5fcac1832970c60). * mips o32 is the only kABI that requires 7 argument syscall, and to avoid add a requirement on all architectures to support it, mips support is added with extra internal defines. Checked on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc-linux-gnu, powerpc64-linux-gnu, powerpc64le-linux-gnu, i686-linux-gnu, and x86_64-linux-gnu. [1] https://lkml.org/lkml/2016/3/8/1105 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-08-23 14:27:43 -03:00
Maciej W. Rozycki	bea2ad022d	nptl: Fix stray process left by tst-cancel7 blocking testing Fix an issue with commit `b74121ae4b` ("Update.") and prevent a stray process from being left behind by tst-cancel7 (and also tst-cancelx7, which is the same test built with '-fexceptions' additionally supplied to the compiler), which then blocks remote testing until the process has been killed by hand. This test case creates a thread that runs an extra copy of the test via system(3) and using the '--direct' option so that the test wrapper does not interfere with this instance. This extra copy executes its business and calls sigsuspend(2) and then never terminates by itself. Instead it relies on being killed by the main test process directly via a thread cancellation request or, should that fail, by issuing SIGKILL either at the conclusion of 'do_test' or by the test driver via 'do_cleanup' where the test timeout has been hit or the test driver interrupted. However if the main test process has been instead killed by a signal, such as due to incorrect execution, before it had a chance to kill the extra copy of the test case, then the test wrapper will terminate without running 'do_cleanup' and consequently the extra copy of the test case will remain forever in its suspended state, and in the remote case in particular it means that the remote test wrapper will wait forever for the SSH command to complete. This has been observed with the 'alpha-linux-gnu' target, where the main test process triggers SIGSEGV and the test wrapper correctly records: Didn't expect signal from child: got `Segmentation fault' in nptl/tst-cancel7.out and terminates, but then the calling SSH command continues waiting for the remaining process started in the same session on the remote target to complete. Address this problem by also registering 'do_cleanup' via atexit(3), observing that 'support_delete_temp_files' is registered by the test wrapper before the test initializing function 'do_prepare' is called and that we call all the functions registered in the reverse of the order in which they were registered, so it is safe to refer to 'pidfilename' in 'do_cleanup' invoked by exit(3) because by that time temporary files have not yet been deleted. A minor inconvenience is that if 'signal_handler' is invoked in the test wrapper as a result of SIGALRM rather than SIGINT, then 'do_cleanup' will be called twice, once as a cleanup handler and again by exit(3). In reality it is harmless though, because issuing SIGKILL is guarded by a record lock, so if the first call has succeeded in killing the extra copy of the test case, then the subsequent call will do nothing. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-08-07 19:46:21 +01:00
Maciej W. Rozycki	934ba77add	nptl: Reorder semaphore release in tst-cancel7 Move the release of the semaphore used to synchronize between an extra copy of the test run as a separate process and the main test process until after the PID file has been locked. It is so that if the cleanup function gets called by the test driver due to premature termination of the main test process, then the function does not get at the PID file before it has been locked and conclude that the extra copy of the test has already terminated. This won't usually happen due to a relatively high amount of time required to elapse before timeout triggers in the test driver, but it will change with the next change. There is still a small time window remaining with this change in place where the main test process gets killed for some reason between the extra copy of the test has been already started by pthread_create(3) and a successful return from the call to sem_wait(3), in which case the cleanup function can be reached before PID has been written to the PID file and the file locked. It seems that with the test case structured as it is now and PID-based process management we have no means to avoid it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-08-07 19:46:21 +01:00
Florian Weimer	fe06fb313b	elf: Clarify and invert second argument of _dl_allocate_tls_init Also remove an outdated comment: _dl_allocate_tls_init is called as part of pthread_create. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-08-05 18:26:42 +02:00
Maciej W. Rozycki	4b2a1b602f	nptl: Convert tst-sem11 and tst-sem12 tests to use the test driver Fix an issue with commit `2af4e3e566` ("Test of semaphores.") by making the tst-sem11 and tst-sem12 tests use the test driver, preventing them from ever causing testing to hang forever and never complete, such as currently happening with the 'mips-linux-gnu' (o32 ABI) target. Adjust the name of the PREPARE macro, which clashes with the interpretation of its presence by the test driver, by using a TF_ prefix in reference to the name of the 'tf' function. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-12 20:41:08 +02:00
Maciej W. Rozycki	9d8995833e	nptl: Add copyright notice tst-sem11 and tst-sem12 tests Add a copyright notice to the tst-sem11 and tst-sem12 tests, observing that they have been originally contributed back in 2007, with commit `2af4e3e566` ("Test of semaphores."). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-12 20:40:36 +02:00
H.J. Lu	ba144c179e	Add --disable-static-c++-tests option [BZ #31797 ] By default, if the C++ toolchain lacks support for static linking, configure fails to find the C++ header files and the glibc build fails. The --disable-static-c++-link-check option allows the glibc build to finish, but static C++ tests will fail if the C++ toolchain doesn't have the necessary static C++ libraries which may not be easily installed. Add --disable-static-c++-tests option to skip the static C++ link check and tests. This fixes BZ #31797. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-07-02 00:51:34 -07:00
Carlos O'Donell	a7fe3e805d	Fix conditionals on mtrace-based tests (bug 31892) The conditionals for several mtrace-based tests in catgets, elf, libio, malloc, misc, nptl, posix, and stdio-common were incorrect leading to test failures when bootstrapping glibc without perl. The correct conditional for mtrace-based tests requires three checks: first checking for run-built-tests, then build-shared, and lastly that PERL is not equal to "no" (missing perl). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-07-01 17:20:30 +02:00
H.J. Lu	42e48e720c	nptl: Add tst-pthread-key1-static for BZ #21777 Add a static pthread static tests to verify that BZ #21777 is fixed. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2024-04-09 05:27:03 -07:00
Konstantin Kharlamov	fe00366b63	treewide: python-scripts: use `is None` for none-equality Testing for `None`-ness with `==` operator is frowned upon and causes warnings in at least "LGTM" python linter. Fix that. Signed-off-by: Konstantin Kharlamov <Hi-Angel@yandex.ru> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-02-23 08:50:00 -03:00
Adhemerval Zanella	460860f457	Remove ia64-linux-gnu Linux 6.7 removed ia64 from the official tree [1], following the general principle that a glibc port needs upstream support for the architecture in all the components it depends on (binutils, GCC, and the Linux kernel). Apart from the removal of sysdeps/ia64 and sysdeps/unix/sysv/linux/ia64, there are updates to various comments referencing ia64 for which removal of those references seemed appropriate. The configuration is removed from README and build-many-glibcs.py. The CONTRIBUTED-BY, elf/elf.h, manual/contrib.texi (the porting mention), *.po files, config.guess, and longlong.h are not changed. For Linux it allows cleanup some clone2 support on multiple files. The following bug can be closed as WONTFIX: BZ 22634 [2], BZ 14250 [3], BZ 21634 [4], BZ 10163 [5], BZ 16401 [6], and BZ 11585 [7]. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=43ff221426d33db909f7159fdf620c3b052e2d1c [2] https://sourceware.org/bugzilla/show_bug.cgi?id=22634 [3] https://sourceware.org/bugzilla/show_bug.cgi?id=14250 [4] https://sourceware.org/bugzilla/show_bug.cgi?id=21634 [5] https://sourceware.org/bugzilla/show_bug.cgi?id=10163 [6] https://sourceware.org/bugzilla/show_bug.cgi?id=16401 [7] https://sourceware.org/bugzilla/show_bug.cgi?id=11585 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2024-01-08 17:09:36 -03:00
Paul Eggert	dff8da6b3e	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
Florian Weimer	e21aa9b9cc	nptl: Link tst-execstack-threads-mod.so with -z execstack This ensures that the test still links with a linker that refuses to create an executable stack marker automatically. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-11-20 09:22:25 +01:00
Florian Weimer	8c8eff33e4	nptl: Rename tst-execstack to tst-execstack-threads So that the test is harder to confuse with elf/tst-execstack (although the tests are supposed to be the same). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-11-20 09:22:21 +01:00
Adhemerval Zanella	fee9e40a8d	malloc: Decorate malloc maps Add anonymous mmap annotations on loader malloc, malloc when it allocates memory with mmap, and on malloc arena. The /proc/self/maps will now print: [anon: glibc: malloc arena] [anon: glibc: malloc] [anon: glibc: loader malloc] On arena allocation, glibc annotates only the read/write mapping. Checked on x86_64-linux-gnu and aarch64-linux-gnu. Reviewed-by: DJ Delorie <dj@redhat.com>	2023-11-07 10:27:20 -03:00
Adhemerval Zanella	6afce56c19	nptl: Decorate thread stack on pthread_create Linux 4.5 removed thread stack annotations due to the complexity of computing them [1], and Linux added PR_SET_VMA_ANON_NAME on 5.17 as a way to name anonymous virtual memory areas. This patch adds decoration on the stack created and used by pthread_create, for glibc crated thread stack the /proc/self/maps will now show: [anon: glibc: pthread stack: <tid>] And for user-provided stacks: [anon: glibc: pthread user stack: <tid>] The guard page is not decorated, and the mapping name is cleared when the thread finishes its execution (so the cached stack does not have any name associated). Checked on x86_64-linux-gnu aarch64 aarch64-linux-gnu. [1] `65376df582` Co-authored-by: Ian Rogers <irogers@google.com> Reviewed-by: DJ Delorie <dj@redhat.com>	2023-11-07 10:27:20 -03:00
Samuel Thibault	6333a6014f	__call_tls_dtors: Use call_function_static_weak	2023-09-04 20:03:37 +02:00
Florian Weimer	2c6b4b272e	nptl: Unconditionally use a 32-byte rseq area If the kernel headers provide a larger struct rseq, we used that size as the argument to the rseq system call. As a result, rseq registration would fail on older kernels which only accept size 32.	2023-07-21 16:18:18 +02:00
Arsen Arsenović	3edca7f545	nptl: Make tst-tls3mod.so explicitly lazy Fixes the following test-time errors, that lead to FAILs, on toolchains that set -z now out o the box, such as the one used on Gentoo Hardened: .../build-x86-x86_64-pc-linux-gnu-nptl $ grep '' nptl/tst-tls3*.out nptl/tst-tls3.out:dlopen failed nptl/tst-tls3-malloc.out:dlopen failed Reviewed-by: Florian Weimer <fweimer@redhat.com> Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>	2023-07-20 12:24:28 +02:00
Paul Eggert	3edc4ff2ce	make ‘struct pthread’ a complete type * nptl/descr.h (struct pthread): Remove end_padding member, which made this type incomplete. (PTHREAD_STRUCT_END_PADDING): Stop using end_padding. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-07-19 14:16:04 -07:00
Frédéric Bérat	8022fc7d51	tests: replace system by xsystem With fortification enabled, system calls return result needs to be checked, has it gets the __wur macro enabled. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-06-19 09:15:05 -04:00
Frédéric Bérat	20b6b8e8a5	tests: replace read by xread With fortification enabled, read calls return result needs to be checked, has it gets the __wur macro enabled. Note on read call removal from sysdeps/pthread/tst-cancel20.c and sysdeps/pthread/tst-cancel21.c: It is assumed that this second read call was there to overcome the race condition between pipe closure and thread cancellation that could happen in the original code. Since this race condition got fixed by `d0e3ffb7a5` the second call seems superfluous. Hence, instead of checking for the return value of read, it looks reasonable to simply remove it. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-06-19 09:14:56 -04:00
Paul Pluzhnikov	7f0d9e61f4	Fix all the remaining misspellings -- BZ 25337	2023-06-02 01:39:48 +00:00
Frédéric Bérat	026a84a54d	tests: replace write by xwrite Using write without cheks leads to warn unused result when __wur is enabled. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-06-01 12:40:05 -04:00
Carlos O'Donell	b600f47758	nptl: Reformat Makefile. Reflow all long lines adding comment terminators. Rename files that cause inconsistent ordering. Sort all reflowed text using scripts/sort-makefile-lines.py. No code generation changes observed in binary artifacts. No regressions on x86_64 and i686.	2023-05-18 12:39:47 -04:00
Cupertino Miranda	b630be0922	Created tunable to force small pages on stack allocation. Created tunable glibc.pthread.stack_hugetlb to control when hugepages can be used for stack allocation. In case THP are enabled and glibc.pthread.stack_hugetlb is set to 0, glibc will madvise the kernel not to use allow hugepages for stack allocations. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-04-20 13:54:24 -03:00
Adhemerval Zanella Netto	33237fe83d	Remove --enable-tunables configure option And make always supported. The configure option was added on glibc 2.25 and some features require it (such as hwcap mask, huge pages support, and lock elisition tuning). It also simplifies the build permutations. Changes from v1: * Remove glibc.rtld.dynamic_sort changes, it is orthogonal and needs more discussion. * Cleanup more code. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-03-29 14:33:06 -03:00
Adhemerval Zanella Netto	88677348b4	Move libc_freeres_ptrs and libc_subfreeres to hidden/weak functions They are both used by __libc_freeres to free all library malloc allocated resources to help tooling like mtrace or valgrind with memory leak tracking. The current scheme uses assembly markers and linker script entries to consolidate the free routine function pointers in the RELRO segment and to be freed buffers in BSS. This patch changes it to use specific free functions for libc_freeres_ptrs buffers and call the function pointer array directly with call_function_static_weak. It allows the removal of both the internal macros and the linker script sections. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-03-27 13:57:55 -03:00
Andreas Schwab	359a0b9dbc	Remove pthread-pi-defines.sym It became unused with the removal of the assembler implementation of the pthread functions.	2023-02-03 17:59:55 +01:00
Joseph Myers	6d7e8eda9b	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
YunQiang Su	a9acb7b39e	Define in_int32_t_range to check if the 64 bit time_t syscall should be used Currently glibc uses in_time_t_range to detects time_t overflow, and if it occurs fallbacks to 64 bit syscall version. The function name is confusing because internally time_t might be either 32 bits or 64 bits (depending on __TIMESIZE). This patch refactors the in_time_t_range by replacing it with in_int32_t_range for the case to check if the 64 bit time_t syscall should be used. The in_time_t range is used to detect overflow of the syscall return value. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-11-17 14:35:13 -03:00
Florian Weimer	ee1ada1bdb	elf: Rework exception handling in the dynamic loader [BZ #25486 ] The old exception handling implementation used function interposition to replace the dynamic loader implementation (no TLS support) with the libc implementation (TLS support). This results in problems if the link order between the dynamic loader and libc is reversed (bug 25486). The new implementation moves the entire implementation of the exception handling functions back into the dynamic loader, using THREAD_GETMEM and THREAD_SETMEM for thread-local data support. These depends on Hurd support for these macros, added in commit `b65a82e4e7` ("hurd: Add THREAD_GET/SETMEM/_NC"). One small obstacle is that the exception handling facilities are used before the TCB has been set up, so a check is needed if the TCB is available. If not, a regular global variable is used to store the exception handling information. Also rename dl-error.c to dl-catch.c, to avoid confusion with the dlerror function. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2022-11-03 09:39:31 +01:00
Adhemerval Zanella	3d8b5dde87	nptl: Fix pthread_create.c build with clang clang complains that libc_hidden_data_def (__nptl_threads_events) creates an invalid alias: pthread_create.c:50:1: error: alias must point to a defined variable or function libc_hidden_data_def (__nptl_threads_events) ^ ../include/libc-symbols.h:621:37: note: expanded from macro 'libc_hidden_data_def' It seems that clang requires that a proper prototype is defined prior the hidden alias creation. Reviewed-by: Fangrui Song <maskray@google.com>	2022-11-01 09:51:10 -03:00
Adhemerval Zanella	9b5e138f2b	linux: Avoid shifting a negative signed on POSIX timer interface The current macros uses pid as signed value, which triggers a compiler warning for process and thread timers. Replace MAKE_PROCESS_CPUCLOCK with static inline function that expects the pid as unsigned. These are similar to what Linux does internally. Checked on x86_64-linux-gnu. Reviewed-by: Arjun Shankar <arjun@redhat.com>	2022-10-20 10:19:08 -03:00
Yu Chien Peter Lin	365b3af67e	nptl: Convert tst-setuid2 to test-driver Use <support/test-driver.c> and replace pthread calls to its xpthread equivalents. Signed-off-by: Yu Chien Peter Lin <peterlin@andestech.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-10-03 11:19:36 -03:00
Wilco Dijkstra	22f4ab2d20	Use atomic_exchange_release/acquire Rename atomic_exchange_rel/acq to use atomic_exchange_release/acquire since these map to the standard C11 atomic builtins. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-09-26 16:58:08 +01:00
Wilco Dijkstra	4a07fbb689	Use C11 atomics instead of atomic_decrement_and_test Replace atomic_decrement_and_test with atomic_fetch_add_relaxed. These are simple counters which do not protect any shared data from concurrent accesses. Also remove the unused file cond-perf.c. Passes regress on AArch64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-09-23 15:59:56 +01:00
Wilco Dijkstra	d1babeb32d	Use C11 atomics instead of atomic_increment(_val) Replace atomic_increment and atomic_increment_val with atomic_fetch_add_relaxed. One case in sem_post.c uses release semantics (see comment above it). The others are simple counters and do not protect any shared data from concurrent accesses. Passes regress on AArch64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-09-23 15:59:56 +01:00
Wilco Dijkstra	8114b95cef	Use C11 atomics instead of atomic_and/or Remove the 4 uses of atomic_and and atomic_or with atomic_fetch_and_acquire and atomic_fetch_or_acquire. This is preserves existing implied semantics, however relaxed MO on FUTEX_OWNER_DIED accesses may be correct. Passes regress on AArch64. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-09-23 15:59:56 +01:00
Adhemerval Zanella Netto	de477abcaa	Use '%z' instead of '%Z' on printf functions The Z modifier is a nonstandard synonymn for z (that predates z itself) and compiler might issue an warning for in invalid conversion specifier. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2022-09-22 08:48:04 -03:00
Wilco Dijkstra	a30e960328	Use relaxed atomics since there is no MO dependence Replace the 3 uses of atomic_bit_set and atomic_bit_test_set with atomic_fetch_or_relaxed. Using relaxed MO is correct since the atomics are used to ensure memory is released only once. Reviewed-by: Florian Weimer <fweimer@redhat.com>	2022-09-13 11:58:07 +01:00
Wilco Dijkstra	a364a3a709	Use C11 atomics instead of atomic_decrement(_val) Replace atomic_decrement and atomic_decrement_val with atomic_fetch_add_relaxed. Reviewed-by: DJ Delorie <dj@redhat.com>	2022-09-09 14:22:26 +01:00
Adhemerval Zanella Netto	6f4e0fcfa2	stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417 ) The implementation is based on scalar Chacha20 with per-thread cache. It uses getrandom or /dev/urandom as fallback to get the initial entropy, and reseeds the internal state on every 16MB of consumed buffer. To improve performance and lower memory consumption the per-thread cache is allocated lazily on first arc4random functions call, and if the memory allocation fails getentropy or /dev/urandom is used as fallback. The cache is also cleared on thread exit iff it was initialized (so if arc4random is not called it is not touched). Although it is lock-free, arc4random is still not async-signal-safe (the per thread state is not updated atomically). The ChaCha20 implementation is based on RFC8439 [1], omitting the final XOR of the keystream with the plaintext because the plaintext is a stream of zeros. This strategy is similar to what OpenBSD arc4random does. The arc4random_uniform is based on previous work by Florian Weimer, where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete Uniform Generation from Coin Flips, and Applications (2013) [2], who credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform random number generation (1976), for solving the general case. The main advantage of this method is the that the unit of randomness is not the uniform random variable (uint32_t), but a random bit. It optimizes the internal buffer sampling by initially consuming a 32-bit random variable and then sampling byte per byte. Depending of the upper bound requested, it might lead to better CPU utilization. Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu. Co-authored-by: Florian Weimer <fweimer@redhat.com> Reviewed-by: Yann Droneaud <ydroneaud@opteya.com> [1] https://datatracker.ietf.org/doc/html/rfc8439 [2] https://arxiv.org/pdf/1304.1916.pdf	2022-07-22 11:58:27 -03:00
Adhemerval Zanella	f27e5e2178	nptl: Fix ___pthread_unregister_cancel_restore asynchronous restore This was due a wrong revert done on `404656009b`. Checked on x86_64-linux-gnu and i686-linux-gnu.	2022-07-13 10:44:13 -03:00
Adhemerval Zanella	e070501d12	Replace __libc_multiple_threads with __libc_single_threaded And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL, since header inclusion single-thread.h is in the wrong order, the define needs to come before including sysdeps/unix/sysdep.h. The macro is now moved to a per-arch single-threade.h header. The SINGLE_THREAD_P is used on some more places. Checked on aarch64-linux-gnu and x86_64-linux-gnu.	2022-07-05 10:14:47 -03:00
Adhemerval Zanella	a1bdd81664	Refactor internal-signals.h The main drive is to optimize the internal usage and required size when sigset_t is embedded in other data structures. On Linux, the current supported signal set requires up to 8 bytes (16 on mips), was lower than the user defined sigset_t (128 bytes). A new internal type internal_sigset_t is added, along with the functions to operate on it similar to the ones for sigset_t. The internal-signals.h is also refactored to remove unused functions Besides small stack usage on some functions (posix_spawn, abort) it lower the struct pthread by about 120 bytes (112 on mips). Checked on x86_64-linux-gnu. Reviewed-by: Arjun Shankar <arjun@redhat.com>	2022-06-30 14:56:21 -03:00
Adhemerval Zanella	d55df811e9	nptl: Remove unused members from struct pthread It removes both pid_ununsed and cpuclock_offset_ununsed, saving about 12 bytes from struct pthread. Reviewed-by: Arjun Shankar <arjun@redhat.com>	2022-06-29 16:58:26 -03:00
Adhemerval Zanella	baf2a265c7	misc: Optimize internal usage of __libc_single_threaded By adding an internal alias to avoid the GOT indirection. On some architecture, __libc_single_thread may be accessed through copy relocations and thus it requires to update also the copies default copy. This is done by adding a new internal macro, libc_hidden_data_{proto,def}, which has an addition argument that specifies the alias name (instead of default __GI_ one). Checked on x86_64-linux-gnu and i686-linux-gnu. Reviewed-by: Fangrui Song <maskray@google.com>	2022-06-24 17:45:58 -03:00
Adhemerval Zanella	c7d36dcecc	nptl: Fix __libc_cleanup_pop_restore asynchronous restore (BZ#29214) This was due a wrong revert done on `404656009b`. Checked on x86_64-linux-gnu.	2022-06-08 09:23:02 -03:00
Wangyang Guo	8162147872	nptl: Add backoff mechanism to spinlock loop When mutiple threads waiting for lock at the same time, once lock owner releases the lock, waiters will see lock available and all try to lock, which may cause an expensive CAS storm. Binary exponential backoff with random jitter is introduced. As try-lock attempt increases, there is more likely that a larger number threads compete for adaptive mutex lock, so increase wait time in exponential. A random jitter is also added to avoid synchronous try-lock from other threads. v2: Remove read-check before try-lock for performance. v3: 1. Restore read-check since it works well in some platform. 2. Make backoff arch dependent, and enable it for x86_64. 3. Limit max backoff to reduce latency in large critical section. v4: Fix strict-prototypes error in sysdeps/nptl/pthread_mutex_backoff.h v5: Commit log updated for regression in large critical section. Result of pthread-mutex-locks bench Test Platform: Xeon 8280L (2 socket, 112 CPUs in total) First Row: thread number First Col: critical section length Values: backoff vs upstream, time based, low is better non-critical-length: 1 1 2 4 8 16 32 64 112 140 0 0.99 0.58 0.52 0.49 0.43 0.44 0.46 0.52 0.54 1 0.98 0.43 0.56 0.50 0.44 0.45 0.50 0.56 0.57 2 0.99 0.41 0.57 0.51 0.45 0.47 0.48 0.60 0.61 4 0.99 0.45 0.59 0.53 0.48 0.49 0.52 0.64 0.65 8 1.00 0.66 0.71 0.63 0.56 0.59 0.66 0.72 0.71 16 0.97 0.78 0.91 0.73 0.67 0.70 0.79 0.80 0.80 32 0.95 1.17 0.98 0.87 0.82 0.86 0.89 0.90 0.90 64 0.96 0.95 1.01 1.01 0.98 1.00 1.03 0.99 0.99 128 0.99 1.01 1.01 1.17 1.08 1.12 1.02 0.97 1.02 non-critical-length: 32 1 2 4 8 16 32 64 112 140 0 1.03 0.97 0.75 0.65 0.58 0.58 0.56 0.70 0.70 1 0.94 0.95 0.76 0.65 0.58 0.58 0.61 0.71 0.72 2 0.97 0.96 0.77 0.66 0.58 0.59 0.62 0.74 0.74 4 0.99 0.96 0.78 0.66 0.60 0.61 0.66 0.76 0.77 8 0.99 0.99 0.84 0.70 0.64 0.66 0.71 0.80 0.80 16 0.98 0.97 0.95 0.76 0.70 0.73 0.81 0.85 0.84 32 1.04 1.12 1.04 0.89 0.82 0.86 0.93 0.91 0.91 64 0.99 1.15 1.07 1.00 0.99 1.01 1.05 0.99 0.99 128 1.00 1.21 1.20 1.22 1.25 1.31 1.12 1.10 0.99 non-critical-length: 128 1 2 4 8 16 32 64 112 140 0 1.02 1.00 0.99 0.67 0.61 0.61 0.61 0.74 0.73 1 0.95 0.99 1.00 0.68 0.61 0.60 0.60 0.74 0.74 2 1.00 1.04 1.00 0.68 0.59 0.61 0.65 0.76 0.76 4 1.00 0.96 0.98 0.70 0.63 0.63 0.67 0.78 0.77 8 1.01 1.02 0.89 0.73 0.65 0.67 0.71 0.81 0.80 16 0.99 0.96 0.96 0.79 0.71 0.73 0.80 0.84 0.84 32 0.99 0.95 1.05 0.89 0.84 0.85 0.94 0.92 0.91 64 1.00 0.99 1.16 1.04 1.00 1.02 1.06 0.99 0.99 128 1.00 1.06 0.98 1.14 1.39 1.26 1.08 1.02 0.98 There is regression in large critical section. But adaptive mutex is aimed for "quick" locks. Small critical section is more common when users choose to use adaptive pthread_mutex. Signed-off-by: Wangyang Guo <wangyang.guo@intel.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2022-05-09 14:38:40 -07:00

1 2 3 4 5 ...

2821 Commits