The clone3 system call (since Linux 5.3) provides a superset of the
functionality of clone and clone2. It also provides a number of API
improvements, including the ability to specify the size of the child's
stack area which can be used by kernel to compute the shadow stack size
when allocating the shadow stack. Add:
extern int __clone_internal (struct clone_args *__cl_args,
int (*__func) (void *__arg), void *__arg);
to provide an abstract interface for clone, clone2 and clone3.
1. Simplify stack management for thread creation by passing both stack
base and size to create_thread.
2. Consolidate clone vs clone2 differences into a single file.
3. Call __clone3 if HAVE_CLONE3_WAPPER is defined. If __clone3 returns
-1 with ENOSYS, fall back to clone or clone2.
4. Use only __clone_internal to clone a thread. Since the stack size
argument for create_thread is now unconditional, always pass stack size
to create_thread.
5. Enable the public clone3 wrapper in the future after it has been
added to all targets.
NB: Sandbox will return ENOSYS on clone3 in both Chromium:
The following revision refers to this bug:
218438259d
commit 218438259dd795456f0a48f67cbe5b4e520db88b
Author: Matthew Denton <mpdenton@chromium.org>
Date: Thu Jun 03 20:06:13 2021
Linux sandbox: return ENOSYS for clone3
Because clone3 uses a pointer argument rather than a flags argument, we
cannot examine the contents with seccomp, which is essential to
preventing sandboxed processes from starting other processes. So, we
won't be able to support clone3 in Chromium. This CL modifies the
BPF policy to return ENOSYS for clone3 so glibc always uses the fallback
to clone.
Bug: 1213452
Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Commit-Queue: Matthew Denton <mpdenton@chromium.org>
Cr-Commit-Position: refs/heads/master@{#888980}
[modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc
and Firefox:
https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The error is as follows:
nss_module.c: In function 'module_load_nss_files':
nss_module.c:117:7: error: 'is_nscd' undeclared (first use in this function)
117 | if (is_nscd)
| ^~~~~~~
nss_module.c:117:7: note: each undeclared identifier is reported only once for each function it appears in
nss_module.c:119:51: error: 'nscd_init_cb' undeclared (first use in this function); did you mean 'nscd_init'?
119 | void (*cb) (size_t, struct traced_file *) = nscd_init_cb;
| ^~~~~~~~~~~~
| nscd_init
The make program might open a pipe for its job server, which triggers
an invalid check on the spawned process. This patch now passes the
lowest file descriptor as ithe first argument, so only the range
that was actually opened is checked.
Checked on x86_64-linux-gnu and i686-linux-gnu and centos7 (which
triggers the issue).
1. Align struct hdr to MALLOC_ALIGNMENT bytes so that malloc hooks in
libmcheck align memory to MALLOC_ALIGNMENT bytes.
2. Remove tst-mallocalign1 from tests-exclude-mcheck for i386 and x32.
3. Add tst-pvalloc-fortify and tst-reallocarray to tests-exclude-mcheck
since they use malloc_usable_size (see BZ #22057).
This fixed BZ #28068.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The previous approach defeats the vDSO optimization on older kernels
because a failing clock_gettime64 system call is performed on every
function call. It also results in a clobbered errno value, exposing
an OpenJDK bug (JDK-8270244).
This patch fixes by open-code INLINE_VSYSCALL macro and replace all
INLINE_SYSCALL_CALL with INTERNAL_SYSCALL_CALLS. Now for
__clock_gettime64x, the 64-bit vDSO is used and the 32-bit vDSO is
tried before falling back to 64-bit syscalls.
The previous code preferred 64-bit syscall for the case where the kernel
provides 64-bit time_t syscalls *and* also a 32-bit vDSO (in this case
the *64-bit* syscall should be preferable over the vDSO). All
architectures that provides 32-bit vDSO (i386, mips, powerpc, s390)
modulo sparc; but I am not sure if some kernels versions do provide
only 32-bit vDSO while still providing 64-bit time_t syscall.
Regardless, for such cases the 64-bit time_t syscall is used if the
vDSO returns overflowed 32-bit time_t.
Tested on i686-linux-gnu (with a time64 and non-time64 kernel),
x86_64-linux-gnu. Built with build-many-glibcs.py.
Co-authored-by: Florian Weimer <fweimer@redhat.com>
<limits.h> used to be a header file with no declarations.
GCC's libgomp includes it in a #pragma GCC visibility hidden block.
Including <unistd.h> from <limits.h> (indirectly) declares everything
in <unistd.h> with hidden visibility, resulting in linker failures.
This commit avoids C declarations in assembler mode and only declares
__sysconf in <limits.h> (and not the entire contents of <unistd.h>).
The __sysconf symbol is already part of the ABI. PTHREAD_STACK_MIN
is no longer defined for __USE_DYNAMIC_STACK_SIZE && __ASSEMBLER__
because there is no possible definition.
Additionally, PTHREAD_STACK_MIN is now defined by <pthread.h> for
__USE_MISC because this is what developers expect based on the macro
name. It also helps to avoid libgomp linker failures in GCC because
libgomp includes <pthread.h> before its visibility hacks.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Sometimes the test nss/tst-nss-files-hosts-long is failing as getent
fails with exit-code 2.
This happens if tst-reload1 was run just before this test:
make t=nss/tst-reload1 test
make t=nss/tst-nss-files-hosts-long test
Then the test fails as /etc/nsswitch.conf contains "hosts: test2"
and the hosts are not searched in /etc/hosts at all.
Thus this patch just requests a post cleanup after nss/tst-reload1
has run.
This was put in __libc_fork by c32c868ab8 ("posix: Add _Fork [BZ #4737]")
so we need to avoid locking them again in _Fork called by __libc_lock, otherwise
we deadlock.
Replace _SC_MINSIGSTKSZ with _SC_SIGSTKSZ since sysconf (_SC_MINSIGSTKSZ)
returns the minimum number of bytes of free stack space required in order
to guarantee successful, non-nested handling of a single signal whose
handler is an empty function while sysconf (_SC_SIGSTKSZ) returns the
suggested minimum number of bytes of stack space required for a signal
stack.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Replace MINSIGSTKSZ with sysconf (_SC_MINSIGSTKSZ) since the constant
MINSIGSTKSZ used in glibc build may be too small.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The constant PTHREAD_STACK_MIN may be too small for some processors.
Rename _SC_SIGSTKSZ_SOURCE to _DYNAMIC_STACK_SIZE_SOURCE. When
_DYNAMIC_STACK_SIZE_SOURCE or _GNU_SOURCE are defined, define
PTHREAD_STACK_MIN to sysconf(_SC_THREAD_STACK_MIN) which is changed
to MIN (PTHREAD_STACK_MIN, sysconf(_SC_MINSIGSTKSZ)).
Consolidate <bits/local_lim.h> with <bits/pthread_stack_min.h> to
provide a constant target specific PTHREAD_STACK_MIN value.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
As a result, is not necessary to specify __attribute__ ((nocommon))
on individual definitions.
GCC 10 defaults to -fno-common on all architectures except ARC,
but this change is compatible with older GCC versions and ARC, too.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
1. Add sysdeps/generic/malloc-size.h to define size related macros for
malloc.
2. Move x86_64/tst-mallocalign1.c to malloc and replace ALIGN_MASK with
MALLOC_ALIGN_MASK.
3. Add tst-mallocalign1 to tests-exclude-mcheck for i386 and x32 since
mcheck doesn't honor MALLOC_ALIGNMENT.
Change tst-spawn5.c to handle tst-spawn5 without optional path to ld.so,
--library-path nor the library path when glibc is configured with
--enable-hardcoded-path-in-tests. This fixes BZ #28067.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This slightly reduces code size, as can be seen below.
__libc_lock_unlock is usually used along with __libc_lock_lock in
the same function. __libc_lock_lock already has an out-of-line
slow path, so this change should not introduce many additional
non-leaf functions.
This change also fixes a link failure in 32-bit Arm thumb mode
because commit 1f9c804fbd
("nptl: Use internal low-level lock type for !IS_IN (libc)")
introduced __libc_do_syscall calls outside of libc.
Before x86-64:
text data bss dec hex filename
1937748 20456 54896 2013100 1eb7ac libc.so.6
25601 856 12768 39225 9939 nss/libnss_db.so.2
40310 952 25144 66406 10366 nss/libnss_files.so.2
After x86-64:
text data bss dec hex filename
1935312 20456 54896 2010664 1eae28 libc.so.6
25559 864 12768 39191 9917 nss/libnss_db.so.2
39764 960 25144 65868 1014c nss/libnss_files.so.2
Before i686:
2110961 11272 39144 2161377 20fae1 libc.so.6
27243 428 12652 40323 9d83 nss/libnss_db.so.2
43062 476 25028 68566 10bd6 nss/libnss_files.so.2
After i686:
2107347 11272 39144 2157763 20ecc3 libc.so.6
26929 432 12652 40013 9c4d nss/libnss_db.so.2
43132 480 25028 68640 10c20 nss/libnss_files.so.2
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The configure script checks for -mlong-double-128 but mentions -mlongdouble
when it fails.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
5 years ago, commit 8f1b841e45
unintentionally added an ifunc to the loader.
That modification has not caused any harm so far, but it doesn't add any
value either, because the hwcap information is available later during
libc initialization.
Suggested-by: Anton Blanchard <anton@ozlabs.org>
The following commit
commit 6f573a27b6
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Wed Jun 23 01:19:34 2021 -0400
x86-64: Add wcslen optimize for sse4.1
Added wcsnlen-sse4.1 to the wcslen ifunc implementation list and did
not add wcslen-sse4.1 to wcslen ifunc implementation list. This commit
fixes that by removing wcsnlen-sse4.1 from the wcslen ifunc
implementation list and adding wcslen-sse4.1 to the ifunc
implementation list.
Testing:
test-wcslen.c, test-rsi-wcslen.c, and test-rsi-strlen.c are passing as
well as all other tests in wcsmbs and string.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
commit 6f573a27b6
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Wed Jun 23 01:19:34 2021 -0400
x86-64: Add wcslen optimize for sse4.1
added wcsnlen-sse4.1 to the wcslen ifunc implementation list. Since the
random value in the the RSI register is larger than the wide-character
string length in the existing wcslen test, it didn't trigger the wcslen
test failure. Add a test to force 0 into the RSI register before calling
wcslen.
https://sourceware.org/bugzilla/show_bug.cgi?id=21782 dropped an ld
diagnostic for R_X86_64_PC32 referencing an undefined weak symbol in
-pie links. Arguably keeping the diagnostic like other ports is more
correct, since statically resolving movl foo(%rip), %eax to the
link-time zero address produces a corrupted output.
It turns out that --enable-static-pie builds do not depend on the ld
behavior. GCC generates GOT indirection for weak declarations for
-fPIE/-fPIC, so what ld does with the PC-relative relocation doesn't
really matter.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This patch adds a way to close a range of file descriptors on
posix_spawn as a new file action. The API is similar to the one
provided by Solaris 11 [1], where the file action causes the all open
file descriptors greater than or equal to input on to be closed when
the new process is spawned.
The function posix_spawn_file_actions_addclosefrom_np is safe to be
implemented by iterating over /proc/self/fd, since the Linux spawni.c
helper process does not use CLONE_FILES, so its has own file descriptor
table and any failure (in /proc operation) aborts the process creation
and returns an error to the caller.
I am aware that this file action might be redundant to the current
approach of POSIX in promoting O_CLOEXEC in more interfaces. However
O_CLOEXEC is still not the default and for some specific usages, the
caller needs to close all possible file descriptors to avoid them
leaking. Some examples are CPython (discussed in BZ#10353) and OpenJDK
jspawnhelper [2] (where OpenJDK spawns a helper process to exactly
closes all file descriptors). Most likely any environment which calls
functions that might open file descriptor under the hood and aim to use
posix_spawn might face the same requirement.
Checked on x86_64-linux-gnu and i686-linux-gnu on kernel 5.11 and 4.15.
[1] https://docs.oracle.com/cd/E36784_01/html/E36874/posix-spawn-file-actions-addclosefrom-np-3c.html
[2] https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libjava/childproc.c#L82
The function closes all open file descriptors greater than or equal to
input argument. Negative values are clamped to 0, i.e, it will close
all file descriptors.
As indicated by the bug report, this is a common symbol provided by
different systems (Solaris, OpenBSD, NetBSD, FreeBSD) and, although
its has inherent issues with not taking in consideration internal libc
file descriptors (such as syslog), this is also a common feature used
in multiple projects [1][2][3][4][5].
The Linux fallback implementation iterates over /proc and close all
file descriptors sequentially. Although it was raised the questioning
whether getdents on /proc/self/fd might return disjointed entries
when file descriptor are closed; it does not seems the case on my
testing on multiple kernel (v4.18, v5.4, v5.9) and the same strategy
is used on different projects [1][2][3][5].
Also, the interface is set a fail-safe meaning that a failure in the
fallback results in a process abort.
Checked on x86_64-linux-gnu and i686-linux-gnu on kernel 5.11 and 4.15.
[1] 5238e95759/src/basic/fd-util.c (L217)
[2] ddf4b77e11/src/lxc/start.c (L236)
[3] 9e4f2f3a6b/Modules/_posixsubprocess.c (L220)
[4] 5f47c0613e/src/libstd/sys/unix/process2.rs (L303-L308)
[5] https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libjava/childproc.c#L82
It was added on Linux 5.9 (278a5fbaed89) with CLOSE_RANGE_CLOEXEC
added on 5.11 (582f1fb6b721f). Although FreeBSD has added the same
syscall, this only adds the symbol on Linux ports. This syscall is
required to provided a fail-safe way to implement the closefrom
symbol (BZ #10353).
Checked on x86_64-linux-gnu and i686-linux-gnu on kernel 5.11 and 4.15.
The code to allocate a stack from xsigstack is refactored so it can
be more generic. The new support_stack_alloc() also set PROT_EXEC
if DEFAULT_STACK_PERMS has PF_X. This is required on some
architectures (hppa for instance) and trying to access the rtld
global from testsuite will require more intrusive refactoring
in the ldsodefs.h header.
Checked on x86_64-linux-gnu and i686-linux-gnu. I also ran
tst-xsigstack on both hppa and ia64.
_int_realloc is correctly declared at the top to be static, but
incorrectly defined without the static keyword. Fix that. The
generated binaries have identical code.
Both tests try to dlopen libm.so at runtime, so make them depend on it
so that they're executed if libm.so has been updated.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
The tcache allocator layer uses the tcache pointer as a key to
identify a block that may be freed twice. Since this is in the
application data area, an attacker exploiting a use-after-free could
potentially get access to the entire tcache structure through this
key. A detailed write-up was provided by Awarau here:
https://awaraucom.wordpress.com/2020/07/19/house-of-io-remastered/
Replace this static pointer use for key checking with one that is
generated at malloc initialization. The first attempt is through
getrandom with a fallback to random_bits(), which is a simple
pseudo-random number generator based on the clock. The fallback ought
to be sufficient since the goal of the randomness is only to make the
key arbitrary enough that it is very unlikely to collide with user
data.
Co-authored-by: Eyal Itkin <eyalit@checkpoint.com>
This partially fixes static-only NSS support (bug 27959): The files
module no longer needs dlopen. Support for the dns module remains
to be added, and also support for disabling dlopen altogether.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This is only needed if nss_files is loaded by nscd.
Before:
text data bss dec hex filename
767 0 24952 25719 6477 nss/files-init.os
After:
text data bss dec hex filename
666 0 0 666 29a nss/files-init.os
Using PATH_MAX bytes unconditionally for the directory name
is wasteful, but fixing that would constitute another break
of this semi-public ABI. (The other issue is that with
symbolic links, an arbitrary set of parent directories may need
watching, not just a single one.)
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch updates the kernel version in the test tst-mman-consts.py
to 5.13. (There are no new MAP_* constants covered by this test in
5.13 that need any other header changes.)
Tested with build-many-glibcs.py.
It's tst-realloc, not tst-posix-realloc. Verified this time to ensure
that the total number of tests reduced by 1.
Reported-by: Stefan Liebler <stli@linux.ibm.com>
They are no longer needed after everything has been moved into
libc. The _dl_vsym test has to be removed because the symbol
cannot be used outside libc anymore.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>