Commit 27761a1042 ("Refactor atfork
handlers") introduced a lock, atfork_lock, around fork handler list
accesses. It turns out that this lock occasionally results in
self-deadlocks in malloc/tst-mallocfork2:
(gdb) bt
#0 __lll_lock_wait_private ()
at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:63
#1 0x00007f160c6f927a in __run_fork_handlers (who=(unknown: 209394016),
who@entry=atfork_run_prepare) at register-atfork.c:116
#2 0x00007f160c6b7897 in __libc_fork () at ../sysdeps/nptl/fork.c:58
#3 0x00000000004027d6 in sigusr1_handler (signo=<optimized out>)
at tst-mallocfork2.c:80
#4 sigusr1_handler (signo=<optimized out>) at tst-mallocfork2.c:64
#5 <signal handler called>
#6 0x00007f160c6f92e4 in __run_fork_handlers (who=who@entry=atfork_run_parent)
at register-atfork.c:136
#7 0x00007f160c6b79a2 in __libc_fork () at ../sysdeps/nptl/fork.c:152
#8 0x0000000000402567 in do_test () at tst-mallocfork2.c:156
#9 0x0000000000402dd2 in support_test_main (argc=1, argv=0x7ffc81ef1ab0,
config=config@entry=0x7ffc81ef1970) at support_test_main.c:350
#10 0x0000000000402362 in main (argc=<optimized out>, argv=<optimized out>)
at ../support/test-driver.c:168
If no locking happens in the single-threaded case (where fork is
expected to be async-signal-safe), this deadlock is avoided.
(pthread_atfork is not required to be async-signal-safe, so a fork
call from a signal handler interrupting pthread_atfork is not
a problem.)
Current implementation (sysdeps/nptl/fork.c) replicates the atfork
handlers list backward to invoke the child handlers after fork/clone
syscall.
The internal atfork handlers is implemented as a single-linked list
so a lock-free algorithm can be used, trading fork mulithread call
performance for some code complexity and dynamic stack allocation
(since the backwards list should not fail).
This patch refactor it to use a dynarary instead of a linked list.
It simplifies the external variables need to be exported and also
the internal atfork handler member definition.
The downside is a serialization of fork call in multithread, since to
operate on the dynarray the internal lock should be used. However
as noted by Florian, it already acquires external locks for malloc
and libio so it is already hitting some lock contention. Besides,
posix_spawn should be faster and more scalable to run external programs
in multithread environments.
Checked on x86_64-linux-gnu.
* nptl/Makefile (routines): Remove unregister-atfork.
* nptl/register-atfork.c (fork_handler_pool): Remove variable.
(fork_handler_alloc): Remove function.
(fork_handlers, fork_handler_init): New variables.
(__fork_lock): Rename to atfork_lock.
(__register_atfork, __unregister_atfork, libc_freeres_fn): Rewrite
to use a dynamic array to add/remove atfork handlers.
* sysdeps/nptl/fork.c (__libc_fork): Likewise.
* sysdeps/nptl/fork.h (__fork_lock, __fork_handlers, __linkin_atfork):
Remove declaration.
(fork_handler): Remove next, refcntr, and need_signal member.
(__run_fork_handler_type): New enum.
(__run_fork_handlers): New prototype.
* sysdeps/nptl/libc-lockP.h (__libc_atfork): Remove declaration.