On s390x syscalls are triggered by svc instruction. One can
pass the syscall number encoded in the instruction "svc 123"
or by storing it in r1:
lghi r1,123
svc 0
If the syscall number is encoded in the instruction, this can
cause broken syscall restarts. Therefore this patch is now just
passing the syscall number in r1.
See also kernel-commit:
"s390/signal: switch to using vdso for sigreturn and syscall restart"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/s390/[%e2%80%a6]call.c?h=v6.0-rc1&id=df29a7440c4b5c65765c8f60396b3b13063e24e9
As information, the "svc 0" feature was introduced in kernel 2.5.62:
commit b5aad611393ef2e132e3648fa4c6e56a9cfa8708
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Starting with recent commit 92a7d13439
"x86-64: Align child stack to 16 bytes [BZ #27902]"
the new test misc/tst-misalign-clone has failed on s390x/s390.
This patch is now aligning the stack to a double
word boundary as also done in start.S files.
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
GDB failed to detect the outermost frame while showing the backtrace
within a thread:
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Before this patch, the start routines like thread_start had no cfi information.
GDB is then using the prologue unwinder if no cfi information is available.
This unwinder tries to unwind r15 and stops e.g. if r15 was updated or
on some jump-instructions.
On older glibc-versions (before commit "Remove cached PID/TID in clone"
c579f48edb), the thread_start function used
such a jump-instruction and GDB did not fail with an error.
This patch adds cfi information for _start, thread_start and __makecontext_ret
and marks r14 as undefined which marks the frame as outermost frame and GDB
stops the backtrace. Also tested different gcc versions in order to test
_Unwind_Backtrace() in libgcc as this is used by backtrace() in glibc.
ChangeLog:
* sysdeps/s390/s390-64/start.S (_start): Add cfi information for r14.
* sysdeps/s390/s390-32/start.S: (_start): Likewise
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S
(thread_start): Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S
(thread_start): Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/__makecontext_ret.S
(__makecontext_ret): Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/__makecontext_ret.S
(__makecontext_ret): Likewise.
This patch remove the PID cache and usage in current GLIBC code. Current
usage is mainly used a performance optimization to avoid the syscall,
however it adds some issues:
- The exposed clone syscall will try to set pid/tid to make the new
thread somewhat compatible with current GLIBC assumptions. This cause
a set of issue with new workloads and usecases (such as BZ#17214 and
[1]) as well for new internal usage of clone to optimize other algorithms
(such as clone plus CLONE_VM for posix_spawn, BZ#19957).
- The caching complexity also added some bugs in the past [2] [3] and
requires more effort of each port to handle such requirements (for
both clone and vfork implementation).
- Caching performance gain in mainly on getpid and some specific
code paths. The getpid performance leverage is questionable [4],
either by the idea of getpid being a hotspot as for the getpid
implementation itself (if it is indeed a justifiable hotspot a
vDSO symbol could let to a much more simpler solution).
Other usage is mainly for non usual code paths, such as pthread
cancellation signal and handling.
For thread creation (on stack allocation) the code simplification in fact
adds some performance gain due the no need of transverse the stack cache
and invalidate each element pid.
Other thread usages will require a direct getpid syscall, such as
cancellation/setxid signal, thread cancellation, thread fail path (at
create_thread), and thread signal (pthread_kill and pthread_sigqueue).
However these are hardly usual hotspots and I think adding a syscall is
justifiable.
It also simplifies both the clone and vfork arch-specific implementation.
And by review each fork implementation there are some discrepancies that
this patch also solves:
- microblaze clone/vfork does not set/reset the pid/tid field
- hppa uses the default vfork implementation that fallback to fork.
Since vfork is deprecated I do not think we should bother with it.
The patch also removes the TID caching in clone. My understanding for
such semantic is try provide some pthread usage after a user program
issue clone directly (as done by thread creation with CLONE_PARENT_SETTID
and pthread tid member). However, as stated before in multiple discussions
threads, GLIBC provides clone syscalls without further supporting all this
semantics.
I ran a full make check on x86_64, x32, i686, armhf, aarch64, and powerpc64le.
For sparc32, sparc64, and mips I ran the basic fork and vfork tests from
posix/ folder (on a qemu system). So it would require further testing
on alpha, hppa, ia64, m68k, nios2, s390, sh, and tile (I excluded microblaze
because it is already implementing the patch semantic regarding clone/vfork).
[1] https://codereview.chromium.org/800183004/
[2] https://sourceware.org/ml/libc-alpha/2006-07/msg00123.html
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=15368
[4] http://yarchive.net/comp/linux/getpid_caching.html
* sysdeps/nptl/fork.c (__libc_fork): Remove pid cache setting.
* nptl/allocatestack.c (allocate_stack): Likewise.
(__reclaim_stacks): Likewise.
(setxid_signal_thread): Obtain pid through syscall.
* nptl/nptl-init.c (sigcancel_handler): Likewise.
(sighandle_setxid): Likewise.
* nptl/pthread_cancel.c (pthread_cancel): Likewise.
* sysdeps/unix/sysv/linux/pthread_kill.c (__pthread_kill): Likewise.
* sysdeps/unix/sysv/linux/pthread_sigqueue.c (pthread_sigqueue):
Likewise.
* sysdeps/unix/sysv/linux/createthread.c (create_thread): Likewise.
* sysdeps/unix/sysv/linux/getpid.c: Remove file.
* nptl/descr.h (struct pthread): Change comment about pid value.
* nptl/pthread_getattr_np.c (pthread_getattr_np): Remove thread
pid assert.
* sysdeps/unix/sysv/linux/pthread-pids.h (__pthread_initialize_pids):
Do not set pid value.
* nptl_db/td_ta_thr_iter.c (iterate_thread_list): Remove thread
pid cache check.
* nptl_db/td_thr_validate.c (td_thr_validate): Likewise.
* sysdeps/aarch64/nptl/tcb-offsets.sym: Remove pid offset.
* sysdeps/alpha/nptl/tcb-offsets.sym: Likewise.
* sysdeps/arm/nptl/tcb-offsets.sym: Likewise.
* sysdeps/hppa/nptl/tcb-offsets.sym: Likewise.
* sysdeps/i386/nptl/tcb-offsets.sym: Likewise.
* sysdeps/ia64/nptl/tcb-offsets.sym: Likewise.
* sysdeps/m68k/nptl/tcb-offsets.sym: Likewise.
* sysdeps/microblaze/nptl/tcb-offsets.sym: Likewise.
* sysdeps/mips/nptl/tcb-offsets.sym: Likewise.
* sysdeps/nios2/nptl/tcb-offsets.sym: Likewise.
* sysdeps/powerpc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/s390/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sh/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sparc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/tile/nptl/tcb-offsets.sym: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym: Likewise.
* sysdeps/unix/sysv/linux/aarch64/clone.S: Remove pid and tid caching.
* sysdeps/unix/sysv/linux/alpha/clone.S: Likewise.
* sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
* sysdeps/unix/sysv/linux/hppa/clone.S: Likewise.
* sysdeps/unix/sysv/linux/i386/clone.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/clone2.S: Likewise.
* sysdeps/unix/sysv/linux/mips/clone.S: Likewise.
* sysdeps/unix/sysv/linux/nios2/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sh/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/tile/clone.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/aarch64/vfork.S: Remove pid set and reset.
* sysdeps/unix/sysv/linux/alpha/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/arm/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/i386/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/m68k/clone.S: Likewise.
* sysdeps/unix/sysv/linux/m68k/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/mips/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/nios2/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sh/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/tile/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/tst-clone2.c (f): Remove direct pthread
struct access.
(clone_test): Remove function.
(do_test): Rewrite to take in consideration pid is not cached anymore.
As discussed in libc-alpha [1] current clone with CLONE_VM (without
CLONE_THREAD set) will reset the pthread pid/tid fields to -1. The
issue is since memory is shared between the parent and child it will
clobber parent's cached pid/tid leading to internal inconsistencies
if the value is not restored.
And even it is restored it may lead to racy conditions when between
set/restore a thread might invoke pthread function that validate the
pthread with INVALID_TD_P/INVALID_NOT_TERMINATED_TD_P and thus get
wrong results.
As stated in BZ19957, previously reports of this behaviour was close
with EWONTFIX due the fact usage of clone outside glibc is tricky
since glibc requires consistent internal pthread, while using clone
directly may not provide it. However since now posix_spawn uses
clone (CLONE_VM) to fixes various issues related to previous vfork
usage this issue requires fixing.
The vfork implementation also does something similar, but instead
it negates and restores only the *pid* field and functions that
might access its value know to handle such case (getpid, raise
and pthread ones that uses INVALID_TD_P/INVALID_NOT_TERMINATED_TD_P
macros that check only *tid* field). Also vfork does not call
__clone directly, instead calling either __NR_vfork or __NR_clone
directly.
So this patch removes this clone behavior by avoiding setting
the pthread pid/tid field for CLONE_VM. There is no need to
check for CLONE_THREAD, since the minimum supported kernel in all
architecture implies that CLONE_VM must be used with CLONE_THREAD,
otherwise clone returns EINVAL.
Instead of current approach of:
int clone(int (*fn)(void *), void *child_stack, int flags, ...)
[...]
if (flags & CLONE_THREAD)
goto do_syscall;
pid_t new_value;
if (flags & CLONE_VM)
new_value = -1;
else
new_value = getpid ();
THREAD_SETMEM (THREAD_SELF, pid, new_value);
THREAD_SETMEM (THREAD_SELF, tid, new_value);
do_syscall:
[...]
The new approach uses:
int clone(int (*fn)(void *), void *child_stack, int flags, ...)
[...]
if (flags & CLONE_VM)
goto do_syscall;
pid_t new_value = getpid ();
THREAD_SETMEM (THREAD_SELF, pid, new_value);
THREAD_SETMEM (THREAD_SELF, tid, new_value);
do_syscall:
[...]
It also removes the linux tst-getpid2.c test which expects the previous
behavior and instead add another clone test.
Tested on x86_64, i686, x32, powerpc64le, aarch64, armhf, s390, and
s390x. I also did limited check on mips32 and sparc64 (using the new
added test).
I also got reviews from both m68k, hppa, and tile. So I presume for
these architecture the patch works.
The fixes for alpha, microblaze, sh, ia64, and nio2 have not been
tested.
[1] https://sourceware.org/ml/libc-alpha/2016-04/msg00307.html
* sysdeps/unix/sysv/linux/Makefile [$(subdir) == nptl] (test): Remove
tst-getpid2.
(test): Add tst-clone2.
* sysdeps/unix/sysv/linux/tst-clone2.c: New file.
* sysdeps/unix/sysv/linux/aarch64/clone.S (__clone): Do not change
pid/tid fields for CLONE_VM.
* sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
* sysdeps/unix/sysv/linux/i386/clone.S: Likewise.
* sysdeps/unix/sysv/linux/mips/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/tst-getpid2.c: Remove file.
This patch implements a new posix_spawn{p} implementation for Linux. The main
difference is it uses the clone syscall directly with CLONE_VM and CLONE_VFORK
flags and a direct allocated stack. The new stack and start function solves
most the vfork limitation (possible parent clobber due stack spilling). The
remaning issue are related to signal handling:
1. That no signal handlers must run in child context, to avoid corrupt
parent's state.
2. Child must synchronize with parent to enforce stack deallocation and
to possible return execv issues.
The first one is solved by blocking all signals in child, even NPTL-internal
ones (SIGCANCEL and SIGSETXID). The second issue is done by a stack allocation
in parent and a synchronization with using a pipe or waitpid (in case or error).
The pipe has the advantage of allowing the child signal an exec error (checked
with new tst-spawn2 test).
There is an inherent race condition in pipe2 usage for architectures that do not
support the syscall directly. In such cases the a pipe plus fctnl is used
instead and it may lead to file descriptor leak in parent (as decribed by fcntl
documentation).
The child process stack is allocate with a mmap with MAP_STACK flag using
default architecture stack size. Although it is slower than use a stack buffer
from parent, it allows some slack for the compatibility code to run scripts
with no shebang (which may use a buffer with size depending of argument list
count).
Performance should be similar to the vfork default posix implementation and
way faster than fork path (vfork on mostly linux ports are basically
clone with CLONE_VM plus CLONE_VFORK). The only difference is the syscalls
required for the stack allocation/deallocation.
It fixes BZ#10354, BZ#14750, and BZ#18433.
Tested on i386, x86_64, powerpc64le, and aarch64.
[BZ #14750]
[BZ #10354]
[BZ #18433]
* include/sched.h (__clone): Add hidden prototype.
(__clone2): Likewise.
* include/unistd.h (__dup): Likewise.
* posix/Makefile (tests): Add tst-spawn2.
* posix/tst-spawn2.c: New file.
* sysdeps/posix/dup.c (__dup): Add hidden definition.
* sysdeps/unix/sysv/linux/aarch64/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/alpha/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/hppa/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/i386/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/ia64/clone2.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/m68k/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/microblaze/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/mips/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/nios2/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S (__clone):
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S (__clone):
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/sh/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/tile/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/x86_64/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/nptl-signals.h
(____nptl_is_internal_signal): New function.
* sysdeps/unix/sysv/linux/spawni.c: New file.
2004-12-15 Jakub Jelinek <jakub@redhat.com>
* nis/nis_domain_of_r.c (nis_domain_of_r): Use libnsl_hidden_def,
not libnsl_hidden_proto.
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S (__clone): Add support
for NPTL where the PID is stored at userlevel and needs to be reset
when CLONE_THREAD is not used.
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S (__clone): Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S (__clone): Save
and restore r2 around call to fn.
2003-01-28 Martin Schwidefsky <schwidefsky@de.ibm.com>
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S: Reorder additional
clone parameters to match the order used on ia32.
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S: Likewise.
2002-02-07 Andreas Schwab <schwab@suse.de>
* configure.in: Fix check for -zcombreloc.
2002-02-06 H.J. Lu <hjl@gnu.org>
* config.h.in (HAVE_BUILTIN_MEMSET): New.
* configure.in: Check if __builtin_memset really works.
* elf/rtld.c (_dl_start): Check HAVE_BUILTIN_MEMSET instead of
__GNUC_PREREQ (2, 96) before using __builtin_memset.
2002-02-06 Jakub Jelinek <jakub@redhat.com>
* io/bug-ftw3.c (main): Don't try the test if root.
2002-02-06 Martin Schwidefsky <schwidefsky@de.ibm.com>
* sysdeps/unix/sysv/linux/s390/brk.c (__brk): Correct inline assembly
constraints.
* sysdeps/unix/sysv/linux/s390/s390-32/bits/resource.h (RLIMIT_LOCKS):
Add RLIMIT_LOCKS and adjust RLIMIT_NLIMITS.
* sysdeps/unix/sysv/linux/s390/s390-64/bits/resource.h (RLIMIT_LOCKS):
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S (clone): Make clone
a weak alias for __clone.
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S (clone): Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/profil-counter.h: Fix typo.
* sysdeps/unix/sysv/linux/s390/s390-64/Makefile: Add framestate.
* sysdeps/unix/sysv/linux/s390/s390-64/Versions: New file.
* sysdeps/unix/sysv/linux/s390/s390-64/mmap.S (__mmap64): Make __mmap
a weak alias for __mmap64.
* sysdeps/mips/atomicity.h (exchange_and_add): Not use branch likely.
* sysdeps/unix/sysv/linux/mips/sys/tas.h (_test_and_set): Likewise.
* sysdeps/generic/dl-tls.c: Don't read TLS header if TLS is not needed.
2001-07-06 Paul Eggert <eggert@twinsun.com>
* manual/argp.texi: Remove ignored LGPL copyright notice; it's
not appropriate for documentation anyway.
* manual/libc-texinfo.sh: "Library General Public License" ->
"Lesser General Public License".
2001-07-06 Andreas Jaeger <aj@suse.de>
* All files under GPL/LGPL version 2: Place under LGPL version
2.1.