Commit Graph

6968 Commits

Author SHA1 Message Date
Yury Khrustalev
f4d00dd60d AArch64: Add support for memory protection keys
This patch adds support for memory protection keys on AArch64 systems with
enabled Stage 1 permission overlays feature introduced in Armv8.9 / 9.4
(FEAT_S1POE) [1].

 1. Internal functions "pkey_read" and "pkey_write" to access data
    associated with memory protection keys.
 2. Implementation of API functions "pkey_get" and "pkey_set" for
    the AArch64 target.
 3. AArch64-specific PKEY flags for READ and EXECUTE (see below).
 4. New target-specific test that checks behaviour of pkeys on
    AArch64 targets.
 5. This patch also extends existing generic test for pkeys.
 6. HWCAP constant for Permission Overlay Extension feature.

To support more accurate mapping of underlying permissions to the
PKEY flags, we introduce additional AArch64-specific flags. The full
list of flags is:

 - PKEY_UNRESTRICTED: 0x0 (for completeness)
 - PKEY_DISABLE_ACCESS: 0x1 (existing flag)
 - PKEY_DISABLE_WRITE: 0x2 (existing flag)
 - PKEY_DISABLE_EXECUTE: 0x4 (new flag, AArch64 specific)
 - PKEY_DISABLE_READ: 0x8 (new flag, AArch64 specific)

The problem here is that PKEY_DISABLE_ACCESS has unusual semantics as
it overlaps with existing PKEY_DISABLE_WRITE and new PKEY_DISABLE_READ.
For this reason mapping between permission bits RWX and "restrictions"
bits awxr (a for disable access, etc) becomes complicated:

 - PKEY_DISABLE_ACCESS disables both R and W
 - PKEY_DISABLE_{WRITE,READ} disables W and R respectively
 - PKEY_DISABLE_EXECUTE disables X

Combinations like the one below are accepted although they are redundant:

 - PKEY_DISABLE_ACCESS | PKEY_DISABLE_READ | PKEY_DISABLE_WRITE

Reverse mapping tries to retain backward compatibility and ORs
PKEY_DISABLE_ACCESS whenever both flags PKEY_DISABLE_READ and
PKEY_DISABLE_WRITE would be present.

This will break code that compares pkey_get output with == instead
of using bitwise operations. The latter is more correct since PKEY_*
constants are essentially bit flags.

It should be noted that PKEY_DISABLE_ACCESS does not prevent execution.

[1] https://developer.arm.com/documentation/ddi0487/ka/ section D8.4.1.4

Co-authored-by: Szabolcs Nagy <szabolcs.nagy@arm.com>

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-20 11:30:58 +00:00
Peter Bergner
229265cc2c powerpc: Improve the inline asm for syscall wrappers
Update the inline asm syscall wrappers to match the newer register constraint
usage in INTERNAL_VSYSCALL_CALL_TYPE.  Use the faster mfocrf instruction when
available, rather than the slower mfcr microcoded instruction.
2024-11-19 12:43:57 -05:00
Adhemerval Zanella
461cab1de7 linux: Add support for getrandom vDSO
Linux 6.11 has getrandom() in vDSO. It operates on a thread-local opaque
state allocated with mmap using flags specified by the vDSO.

Multiple states are allocated at once, as many as fit into a page, and
these are held in an array of available states to be doled out to each
thread upon first use, and recycled when a thread terminates. As these
states run low, more are allocated.

To make this procedure async-signal-safe, a simple guard is used in the
LSB of the opaque state address, falling back to the syscall if there's
reentrancy contention.

Also, _Fork() is handled by blocking signals on opaque state allocation
(so _Fork() always sees a consistent state even if it interrupts a
getrandom() call) and by iterating over the thread stack cache on
reclaim_stack. Each opaque state will be in the free states list
(grnd_alloc.states) or allocated to a running thread.

The cancellation is handled by always using GRND_NONBLOCK flags while
calling the vDSO, and falling back to the cancellable syscall if the
kernel returns EAGAIN (would block). Since getrandom is not defined by
POSIX and cancellation is supported as an extension, the cancellation is
handled as 'may occur' instead of 'shall occur' [1], meaning that if
vDSO does not block (the expected behavior) getrandom will not act as a
cancellation entrypoint. It avoids a pthread_testcancel call on the fast
path (different than 'shall occur' functions, like sem_wait()).

It is currently enabled for x86_64, which is available in Linux 6.11,
and aarch64, powerpc32, powerpc64, loongarch64, and s390x, which are
available in Linux 6.12.

Link: https://pubs.opengroup.org/onlinepubs/9799919799/nframe.html [1]
Co-developed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Tested-by: Jason A. Donenfeld <Jason@zx2c4.com> # x86_64
Tested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> # x86_64, aarch64
Tested-by: Xi Ruoyao <xry111@xry111.site> # x86_64, aarch64, loongarch64
Tested-by: Stefan Liebler <stli@linux.ibm.com> # s390x
2024-11-12 14:42:12 -03:00
Michael Jeanson
97f60abd25 nptl: initialize rseq area prior to registration
Per the rseq syscall documentation, 3 fields are required to be
initialized by userspace prior to registration, they are 'cpu_id',
'rseq_cs' and 'flags'. Since we have no guarantee that 'struct pthread'
is cleared on all architectures, explicitly set those 3 fields prior to
registration.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-11-07 22:23:49 +01:00
Yury Khrustalev
ff254cabd6 misc: Align argument name for pkey_*() functions with the manual
Change name of the access_rights argument to access_restrictions
of the following functions:

 - pkey_alloc()
 - pkey_set()

as this argument refers to access restrictions rather than access
rights and previous name might have been misleading.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-11-06 13:11:33 +00:00
Aurelien Jarno
273694cd78 Add Arm HWCAP2_* constants from Linux 3.15 and 6.2 to <bits/hwcap.h>
Linux 3.15 and 6.2 added HWCAP2_* values for Arm. These bits have
already been added to dl-procinfo.{c,h} in commits 9aea0cb842 and
8ebe9c0b38. Also add them to <bits/hwcap.h> so that they can be used
in user code. For example, for checking bits in the value returned by
getauxval(AT_HWCAP2).

Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Yury Khrustalev <yury.khrustalev@arm.com>
2024-11-05 21:03:37 +01:00
caiyinyu
93ced0e1b8 LoongArch: Add RSEQ_SIG in rseq.h.
Signed-off-by: caiyinyu <caiyinyu@loongson.cn>
2024-11-01 10:41:20 +08:00
Sachin Monga
383e4f53cb powerpc64: Obviate the need for ROP protection in clone/clone3
Save lr in a non-volatile register before scv in clone/clone3.
For clone, the non-volatile register was unused and already
saved/restored.  Remove the dead code from clone.

Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Reviewed-by: Peter Bergner <bergner@linux.ibm.com>
2024-10-30 16:50:04 -04:00
Florian Weimer
4f5f8343c3 Linux: Match kernel text for SCHED_ macros
This avoids -Werror build issues in strace, which bundles UAPI
headers, but does not include them as system headers.

Fixes commit c444cc1d83
("Linux: Add missing scheduler constants to <sched.h>").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-10-25 16:46:30 +02:00
DJ Delorie
81439a116c configure: default to --prefix=/usr on GNU/Linux
I'm getting tired of always typing --prefix=/usr
so making it the default.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-10-22 18:11:49 -04:00
Adhemerval Zanella
ab564362d0 linux: Fix tst-syscall-restart.c on old gcc (BZ 32283)
To avoid a parameter name omitted error.
2024-10-18 08:48:22 -03:00
Adhemerval Zanella
2c1903cbba sparc: Fix restartable syscalls (BZ 32173)
The commit 'sparc: Use Linux kABI for syscall return'
(86c5d2cf0c) did not take into account
a subtle sparc syscall kABI constraint.  For syscalls that might block
indefinitely, on an interrupt (like SIGCONT) the kernel will set the
instruction pointer to just before the syscall:

arch/sparc/kernel/signal_64.c
476 static void do_signal(struct pt_regs *regs, unsigned long orig_i0)
477 {
[...]
525                 if (restart_syscall) {
526                         switch (regs->u_regs[UREG_I0]) {
527                         case ERESTARTNOHAND:
528                         case ERESTARTSYS:
529                         case ERESTARTNOINTR:
530                                 /* replay the system call when we are done */
531                                 regs->u_regs[UREG_I0] = orig_i0;
532                                 regs->tpc -= 4;
533                                 regs->tnpc -= 4;
534                                 pt_regs_clear_syscall(regs);
535                                 fallthrough;
536                         case ERESTART_RESTARTBLOCK:
537                                 regs->u_regs[UREG_G1] = __NR_restart_syscall;
538                                 regs->tpc -= 4;
539                                 regs->tnpc -= 4;
540                                 pt_regs_clear_syscall(regs);
541                         }

However, on a SIGCONT it seems that 'g1' register is being clobbered after the
syscall returns.  Before 86c5d2cf0c, the 'g1' was always placed jus
before the 'ta' instruction which then reloads the syscall number and restarts
the syscall.

On master, where 'g1' might be placed before 'ta':

  $ cat test.c
  #include <unistd.h>

  int main ()
  {
    pause ();
  }
  $ gcc test.c -o test
  $ strace -f ./t
  [...]
  ppoll(NULL, 0, NULL, NULL, 0

On another terminal

  $ kill -STOP 2262828

  $ strace -f ./t
  [...]
  --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=2521813, si_uid=8289} ---
  --- stopped by SIGSTOP ---

And then

  $ kill -CONT 2262828

Results in:

  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=2521813, si_uid=8289} ---
  restart_syscall(<... resuming interrupted ppoll ...>) = -1 EINTR (Interrupted system call)

Where the expected behaviour would be:

  $ strace -f ./t
  [...]
  ppoll(NULL, 0, NULL, NULL, 0)           = ? ERESTARTNOHAND (To be restarted if no handler)
  --- SIGSTOP {si_signo=SIGSTOP, si_code=SI_USER, si_pid=2521813, si_uid=8289} ---
  --- stopped by SIGSTOP ---
  --- SIGCONT {si_signo=SIGCONT, si_code=SI_USER, si_pid=2521813, si_uid=8289} ---
  ppoll(NULL, 0, NULL, NULL, 0

Just moving the 'g1' setting near the syscall asm is not suffice,
the compiler might optimize it away (as I saw on cancellation.c by
trying this fix).  Instead, I have change the inline asm to put the
'g1' setup in ithe asm block.  This would require to change the asm
constraint for INTERNAL_SYSCALL_NCS, since the syscall number is not
constant.

Checked on sparc64-linux-gnu.

Reported-by: René Rebe <rene@exactcode.de>
Tested-by: Sam James <sam@gentoo.org>
Reviewed-by: Sam James <sam@gentoo.org>
2024-10-16 14:54:24 -03:00
caiyinyu
2fffaffde8 LoongArch: Regenerate loongarch/arch-syscall.h by build-many-glibcs.py update-syscalls. 2024-10-12 15:50:11 +08:00
Adhemerval Zanella
5ffc903216 misc: Add support for Linux uio.h RWF_ATOMIC flag
Linux 6.11 adds the new flag for pwritev2 (commit
c34fc6f26ab86d03a2d47446f42b6cd492dfdc56).

Checked on x86_64-linux-gnu on 6.11 kernel.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-10-10 10:28:01 -03:00
Adhemerval Zanella
934d0bf426 Update kernel version to 6.11 in header constant tests
This patch updates the kernel version in the tests tst-mount-consts.py,
and tst-sched-consts.py to 6.11.

There are no new constants covered by these tests in 6.11.

Tested with build-many-glibcs.py.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-10-10 10:27:55 -03:00
Adhemerval Zanella
f6e849fd7c linux: Add MAP_DROPPABLE from Linux 6.11
This request the page to be never written out to swap, it will be zeroed
under memory pressure (so kernel can just drop the page), it is inherited
by fork, it is not counted against @code{mlock} budget, and if there is
no enough memory to service a page faults there is no fatal error (so not
signal is sent).

Tested with build-many-glibcs.py.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-10-10 10:27:53 -03:00
Adhemerval Zanella
86f06282cc Update PIDFD_* constants for Linux 6.11
Linux 6.11 adds some more PIDFD_* constants for 'pidfs: allow retrieval
of namespace file descriptors'
(5b08bd408534bfb3a7cf5778da5b27d4e4fffe12).

Tested with build-many-glibcs.py.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-10-10 10:27:51 -03:00
Adhemerval Zanella
02de16df48 Update syscall lists for Linux 6.11
Linux 6.11 changes for syscall are:

  * fstat/newfstatat for loongarch (it should be safe to add since
    255dc1e4ed that undefine them).
  * clone3 for nios2, which only adds the entry point but defined
    __ARCH_BROKEN_SYS_CLONE3 (the syscall will always return ENOSYS).
  * uretprobe for x86_64 and x32.

Update syscall-names.list and regenerate the arch-syscall.h headers
with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-10-10 10:27:49 -03:00
Adhemerval Zanella
d40ac01cbb stdlib: Make abort/_Exit AS-safe (BZ 26275)
The recursive lock used on abort does not synchronize with a new process
creation (either by fork-like interfaces or posix_spawn ones), nor it
is reinitialized after fork().

Also, the SIGABRT unblock before raise() shows another race condition,
where a fork or posix_spawn() call by another thread, just after the
recursive lock release and before the SIGABRT signal, might create
programs with a non-expected signal mask.  With the default option
(without POSIX_SPAWN_SETSIGDEF), the process can see SIG_DFL for
SIGABRT, where it should be SIG_IGN.

To fix the AS-safe, raise() does not change the process signal mask,
and an AS-safe lock is used if a SIGABRT is installed or the process
is blocked or ignored.  With the signal mask change removal,
there is no need to use a recursive loc.  The lock is also taken on
both _Fork() and posix_spawn(), to avoid the spawn process to see the
abort handler as SIG_DFL.

A read-write lock is used to avoid serialize _Fork and posix_spawn
execution.  Both sigaction (SIGABRT) and abort() requires to lock
as writer (since both change the disposition).

The fallback is also simplified: there is no need to use a loop of
ABORT_INSTRUCTION after _exit() (if the syscall does not terminate the
process, the system is broken).

The proposed fix changes how setjmp works on a SIGABRT handler, where
glibc does not save the signal mask.  So usage like the below will now
always abort.

  static volatile int chk_fail_ok;
  static jmp_buf chk_fail_buf;

  static void
  handler (int sig)
  {
    if (chk_fail_ok)
      {
        chk_fail_ok = 0;
        longjmp (chk_fail_buf, 1);
      }
    else
      _exit (127);
  }
  [...]
  signal (SIGABRT, handler);
  [....]
  chk_fail_ok = 1;
  if (! setjmp (chk_fail_buf))
    {
      // Something that can calls abort, like a failed fortify function.
      chk_fail_ok = 0;
      printf ("FAIL\n");
    }

Such cases will need to use sigsetjmp instead.

The _dl_start_profile calls sigaction through _profil, and to avoid
pulling abort() on loader the call is replaced with __libc_sigaction.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-10-08 14:40:12 -03:00
Adhemerval Zanella
55d33108c7 linux: Use GLRO(dl_vdso_time) on time
The BZ#24967 fix (1bdda52fe9) missed the time for
architectures that define USE_IFUNC_TIME.  Although it is not
an issue, since there is no pointer mangling, there is also no need
to call dl_vdso_vsym since the vDSO setup was already done by the
loader.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2024-10-08 13:28:21 -03:00
Adhemerval Zanella
02b195d30f linux: Use GLRO(dl_vdso_gettimeofday) on gettimeofday
The BZ#24967 fix (1bdda52fe9) missed the gettimeofday for
architectures that define USE_IFUNC_GETTIMEOFDAY.  Although it is not
an issue, since there is no pointer mangling, there is also no need
to call dl_vdso_vsym since the vDSO setup was already done by the
loader.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2024-10-08 13:28:21 -03:00
Adhemerval Zanella
5e8cfc5d62 linux: sparc: Fix clone for LEON/sparcv8 (BZ 31394)
The sparc clone mitigation (faeaa3bc9f) added the use of
flushw, which is not support by LEON/sparcv8.  As discussed on
the libc-alpha, 'ta 3' is a working alternative [1].

[1] https://sourceware.org/pipermail/libc-alpha/2024-August/158905.html

Checked with a build for sparcv8-linux-gnu targetting leon.

Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-10-01 10:37:21 -03:00
Adhemerval Zanella
49c3682ce1 linux: sparc: Fix syscall_cancel for LEON
LEON2/LEON3 are both sparcv8, which does not support branch hints
(bne,pn) nor the return instruction.

Checked with a build for sparcv8-linux-gnu targetting leon. I also
checked some cancellation tests with qemu-system (targeting LEON3).

Acked-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
2024-10-01 10:37:21 -03:00
Pavel Kozlov
cc84cd389c arc: Cleanup arcbe
Remove the mention of arcbe ABI to avoid any mislead.
ARC big endian ABI is no longer supported.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-09-25 15:54:07 +01:00
Florian Weimer
4ff55d08df arc: Remove HAVE_ARC_BE macro and disable big-endian port
It is no longer needed, now that ARC is always little endian.
2024-09-25 11:25:22 +02:00
caiyinyu
255dc1e4ed LoongArch: Undef __NR_fstat and __NR_newfstatat.
In Linux 6.11, fstat and newfstatat are added back. To avoid the messy
usage of the fstat, newfstatat, and statx system calls, we will continue
using statx only in glibc, maintaining consistency with previous versions of
the LoongArch-specific glibc implementation.

Signed-off-by: caiyinyu <caiyinyu@loongson.cn>
Reviewed-by: Xi Ruoyao <xry111@xry111.site>
Suggested-by: Florian Weimer <fweimer@redhat.com>
2024-09-25 10:00:42 +08:00
Florian Weimer
7e21a65c58 misc: Enable internal use of memory protection keys
This adds the necessary hidden prototypes.
2024-09-24 13:23:10 +02:00
Florian Weimer
6f3f6c506c Linux: readdir64_r should not skip d_ino == 0 entries (bug 32126)
This is the same bug as bug 12165, but for readdir_r.  The
regression test covers both bug 12165 and bug 32126.

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-09-21 19:32:34 +02:00
Florian Weimer
e92718552e Linux: Use readdir64_r for compat __old_readdir64_r (bug 32128)
It is not necessary to do the conversion at the getdents64
layer for readdir64_r.  Doing it piecewise for readdir64
is slightly simpler and allows deleting __old_getdents64.

This fixes bug 32128 because readdir64_r handles the length
check correctly.

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-09-21 19:32:34 +02:00
Joe Ramsay
751a5502be AArch64: Add vector logp1 alias for log1p
This enables vectorisation of C23 logp1, which is an alias for log1p.
There are no new tests or ulp entries because the new symbols are simply
aliases.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2024-09-19 17:53:34 +01:00
Florian Weimer
c444cc1d83 Linux: Add missing scheduler constants to <sched.h>
And add a test, misc/tst-sched-consts, that checks
consistency with <sched.h>.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-09-11 10:05:08 +02:00
Florian Weimer
21571ca0d7 Linux: Add the sched_setattr and sched_getattr functions
And struct sched_attr.

In sysdeps/unix/sysv/linux/bits/sched.h, the hack that defines
sched_param around the inclusion of <linux/sched/types.h> is quite
ugly, but the definition of struct sched_param has already been
dropped by the kernel, so there is nothing else we can do and maintain
compatibility of <sched.h> with a wide range of kernel header
versions.  (An alternative would involve introducing a separate header
for this functionality, but this seems unnecessary.)

The existing sched_* functions that change scheduler parameters
are already incompatible with PTHREAD_PRIO_PROTECT mutexes, so
there is no harm in adding more functionality in this area.

The documentation mostly defers to the Linux manual pages.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-09-11 10:05:08 +02:00
Florian Weimer
61f2c2e1d1 Linux: readdir_r needs to report getdents failures (bug 32124)
Upon error, return the errno value set by the __getdents call
in __readdir_unlocked.  Previously, kernel-reported errors
were ignored.

Reviewed-by: DJ Delorie <dj@redhat.com>
2024-09-05 12:05:32 +02:00
Adhemerval Zanella
1927f718fc linux: mips: Fix syscall_cancell build for __mips_isa_rev >= 6
Use beqzc instead of bnel.

Checked with a mipsisa64r6el-n64-linux-gnu build and some nptl
cancellation tests on qemu.
2024-09-02 12:30:45 -03:00
Adhemerval Zanella
89b53077d2 nptl: Fix Race conditions in pthread cancellation [BZ#12683]
The current racy approach is to enable asynchronous cancellation
before making the syscall and restore the previous cancellation
type once the syscall returns, and check if cancellation has happen
during the cancellation entrypoint.

As described in BZ#12683, this approach shows 2 problems:

  1. Cancellation can act after the syscall has returned from the
     kernel, but before userspace saves the return value.  It might
     result in a resource leak if the syscall allocated a resource or a
     side effect (partial read/write), and there is no way to program
     handle it with cancellation handlers.

  2. If a signal is handled while the thread is blocked at a cancellable
     syscall, the entire signal handler runs with asynchronous
     cancellation enabled.  This can lead to issues if the signal
     handler call functions which are async-signal-safe but not
     async-cancel-safe.

For the cancellation to work correctly, there are 5 points at which the
cancellation signal could arrive:

	[ ... )[ ... )[ syscall ]( ...
	   1      2        3    4   5

  1. Before initial testcancel, e.g. [*... testcancel)
  2. Between testcancel and syscall start, e.g. [testcancel...syscall start)
  3. While syscall is blocked and no side effects have yet taken
     place, e.g. [ syscall ]
  4. Same as 3 but with side-effects having occurred (e.g. a partial
     read or write).
  5. After syscall end e.g. (syscall end...*]

And libc wants to act on cancellation in cases 1, 2, and 3 but not
in cases 4 or 5.  For the 4 and 5 cases, the cancellation will eventually
happen in the next cancellable entrypoint without any further external
event.

The proposed solution for each case is:

  1. Do a conditional branch based on whether the thread has received
     a cancellation request;

  2. It can be caught by the signal handler determining that the saved
     program counter (from the ucontext_t) is in some address range
     beginning just before the "testcancel" and ending with the
     syscall instruction.

  3. SIGCANCEL can be caught by the signal handler and determine that
     the saved program counter (from the ucontext_t) is in the address
     range beginning just before "testcancel" and ending with the first
     uninterruptable (via a signal) syscall instruction that enters the
      kernel.

  4. In this case, except for certain syscalls that ALWAYS fail with
     EINTR even for non-interrupting signals, the kernel will reset
     the program counter to point at the syscall instruction during
     signal handling, so that the syscall is restarted when the signal
     handler returns.  So, from the signal handler's standpoint, this
     looks the same as case 2, and thus it's taken care of.

  5. For syscalls with side-effects, the kernel cannot restart the
     syscall; when it's interrupted by a signal, the kernel must cause
     the syscall to return with whatever partial result is obtained
     (e.g. partial read or write).

  6. The saved program counter points just after the syscall
     instruction, so the signal handler won't act on cancellation.
     This is similar to 4. since the program counter is past the syscall
     instruction.

So The proposed fixes are:

  1. Remove the enable_asynccancel/disable_asynccancel function usage in
     cancellable syscall definition and instead make them call a common
     symbol that will check if cancellation is enabled (__syscall_cancel
     at nptl/cancellation.c), call the arch-specific cancellable
     entry-point (__syscall_cancel_arch), and cancel the thread when
     required.

  2. Provide an arch-specific generic system call wrapper function
     that contains global markers.  These markers will be used in
     SIGCANCEL signal handler to check if the interruption has been
     called in a valid syscall and if the syscalls has side-effects.

     A reference implementation sysdeps/unix/sysv/linux/syscall_cancel.c
     is provided.  However, the markers may not be set on correct
     expected places depending on how INTERNAL_SYSCALL_NCS is
     implemented by the architecture.  It is expected that all
     architectures add an arch-specific implementation.

  3. Rewrite SIGCANCEL asynchronous handler to check for both canceling
     type and if current IP from signal handler falls between the global
     markers and act accordingly.

  4. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to
     use the appropriate cancelable syscalls.

  5. Adjust 'lowlevellock-futex.h' arch-specific implementations to
     provide cancelable futex calls.

Some architectures require specific support on syscall handling:

  * On i386 the syscall cancel bridge needs to use the old int80
    instruction because the optimized vDSO symbol the resulting PC value
    for an interrupted syscall points to an address outside the expected
    markers in __syscall_cancel_arch.  It has been discussed in LKML [1]
    on how kernel could help userland to accomplish it, but afaik
    discussion has stalled.

    Also, sysenter should not be used directly by libc since its calling
    convention is set by the kernel depending of the underlying x86 chip
    (check kernel commit 30bfa7b3488bfb1bb75c9f50a5fcac1832970c60).

  * mips o32 is the only kABI that requires 7 argument syscall, and to
    avoid add a requirement on all architectures to support it, mips
    support is added with extra internal defines.

Checked on aarch64-linux-gnu, arm-linux-gnueabihf, powerpc-linux-gnu,
powerpc64-linux-gnu, powerpc64le-linux-gnu, i686-linux-gnu, and
x86_64-linux-gnu.

[1] https://lkml.org/lkml/2016/3/8/1105
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2024-08-23 14:27:43 -03:00
Maciej W. Rozycki
9fb237a1c8 nptl: Fix extraneous testing run by tst-rseq-nptl in the test driver
Fix an issue with commit 8f4632deb3 ("Linux: rseq registration tests")
and prevent testing from being run in the process of the test driver
itself rather than just the test child where one has been forked.  The
problem here is the unguarded use of a destructor to call a part of the
testing.  The destructor function, 'do_rseq_destructor_test' is called
implicitly at program completion, however because it is associated with
the executable itself rather than an individual process, it is called
both in the test child *and* in the test driver itself.

Prevent this from happening by providing a guard variable that only
enables test invocation from 'do_rseq_destructor_test' in the process
that has first run 'do_test'.  Consequently extra testing is invoked
from 'do_rseq_destructor_test' only once and in the correct process,
regardless of the use or the lack of of the '--direct' option.  Where
called in the controlling test driver process that has neved called
'do_test' the destructor function silently returns right away without
taking any further actions, letting the test driver fail gracefully
where applicable.

This arrangement prevents 'tst-rseq-nptl' from ever causing testing to
hang forever and never complete, such as currently happening with the
'mips-linux-gnu' (o32 ABI) target.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2024-08-16 14:38:33 +01:00
Carlos O'Donell
b22923abb0 Report error if setaffinity wrapper fails (Bug 32040)
Previously if the setaffinity wrapper failed the rest of the subtest
would not execute and the current subtest would be reported as passing.
Now if the setaffinity wrapper fails the subtest is correctly reported
as faling. Tested manually by changing the conditions of the affinity
call including setting size to zero, or checking the wrong condition.

No regressions on x86_64.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-08-15 15:28:48 -04:00
H.J. Lu
ff0320bec2 Add mremap tests
Add tests for MREMAP_MAYMOVE and MREMAP_FIXED.  On Linux, also test
MREMAP_DONTUNMAP.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-08-01 05:06:12 -07:00
H.J. Lu
6c40cb0e9f linux: Update the mremap C implementation [BZ #31968]
Update the mremap C implementation to support the optional argument for
MREMAP_DONTUNMAP added in Linux 5.7 since it may not always be correct
to implement a variadic function as a non-variadic function on all Linux
targets.  Return MAP_FAILED and set errno to EINVAL for unknown flag bits.
This fixes BZ #31968.

Note: A test must be added when a new flag bit is introduced.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-08-01 05:06:12 -07:00
Adhemerval Zanella
28f8cee64a Add F_DUPFD_QUERY from Linux 6.10 to bits/fcntl-linux.h
It was added by commit c62b758bae6af16 as a way for userspace to
check if two file descriptors refer to the same struct file.

Checked on aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-07-30 08:52:52 -03:00
Adhemerval Zanella
e433cdec9b Update kernel version to 6.10 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py, and tst-pidfd-consts.py to 6.9.

There are no new constants covered by these tests in 6.10.

Tested with build-many-glibcs.py.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-07-30 08:48:51 -03:00
Adhemerval Zanella
eb0776d4e1 Update syscall lists for Linux 6.10
Linux 6.10 changes for syscall are:

  * mseal for all architectures.
  * map_shadow_stack for x32.
  * Replace sync_file_range with sync_file_range2 for csky (which
    fixes a broken sync_file_range usage).

Update syscall-names.list and regenerate the arch-syscall.h headers
with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2024-07-30 08:48:51 -03:00
Michael Karcher
faeaa3bc9f
Mitigation for "clone on sparc might fail with -EFAULT for no valid reason" (bz 31394)
It seems the kernel can not deal with uncommitted stack space in the area intended
for the register window when executing the clone() system call. So create a nested
frame (proxy for the kernel frame) and flush it from the processor to memory to
force committing pages to the stack before invoking the system call.

Bug: https://www.mail-archive.com/debian-glibc@lists.debian.org/msg62592.html
Bug: https://sourceware.org/bugzilla/show_bug.cgi?id=31394
See-also: https://lore.kernel.org/sparclinux/62f9be9d-a086-4134-9a9f-5df8822708af@mkarcher.dialup.fu-berlin.de/
Signed-off-by: Michael Karcher <sourceware-bugzilla@mkarcher.dialup.fu-berlin.de>
Reviewed-by: DJ Delorie <dj@redhat.com>
2024-07-29 23:00:39 +02:00
H.J. Lu
8344c1f551 x32/cet: Support shadow stack during startup for Linux 6.10
Use RXX_LP in RTLD_START_ENABLE_X86_FEATURES.  Support shadow stack during
startup for Linux 6.10:

commit 2883f01ec37dd8668e7222dfdb5980c86fdfe277
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 15 07:04:33 2024 -0700

    x86/shstk: Enable shadow stacks for x32

    1. Add shadow stack support to x32 signal.
    2. Use the 64-bit map_shadow_stack syscall for x32.
    3. Set up shadow stack for x32.

Add the map_shadow_stack system call to <fixup-asm-unistd.h> and regenerate
arch-syscall.h.  Tested on Intel Tiger Lake with CET enabled x32.  There
are no regressions with CET enabled x86-64.  There are no changes in CET
enabled x86-64 _dl_start_user.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-07-25 00:17:21 -07:00
Andreas K. Hüttel
ab5748118f
linux: Trivial test output fix in tst-pkey
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-07-19 22:57:23 +02:00
Adhemerval Zanella
6b7e2e1d61
linux: Also check pkey_get for ENOSYS on tst-pkey (BZ 31996)
The powerpc pkey_get/pkey_set support was only added for 64-bit [1],
and tst-pkey only checks if the support was present with pkey_alloc
(which does not fail on powerpc32, at least running a 64-bit kernel).

Checked on powerpc-linux-gnu.

[1] https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a803367bab167f5ec4fde1f0d0ec447707c29520
Reviewed-By: Andreas K. Huettel <dilfridge@gentoo.org>
2024-07-19 22:39:44 +02:00
John David Anglin
8cfa4ecff2 Fix usage of _STACK_GROWS_DOWN and _STACK_GROWS_UP defines [BZ 31989]
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Reviewed-By: Andreas K. Hüttel <dilfridge@gentoo.org>
2024-07-19 10:10:17 -04:00
Florian Weimer
2e456ccf0c Linux: Make __rseq_size useful for feature detection (bug 31965)
The __rseq_size value is now the active area of struct rseq
(so 20 initially), not the full struct size including padding
at the end (32 initially).

Update misc/tst-rseq to print some additional diagnostics.

Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
2024-07-09 19:33:37 +02:00
Adhemerval Zanella
9fc639f654 elf: Make dl-rseq-symbols Linux only
And avoid a Hurd build failures.

Checked on x86_64-linux-gnu.
2024-07-04 10:09:07 -03:00
John David Anglin
4737e6a7a3 hppa/vdso: Provide 64-bit clock_gettime() vDSO only
Adhemerval noticed that the gettimeofday() and 32-bit clock_gettime()
vDSO calls won't be used by glibc on hppa, so there is no need to
declare them.  Both syscalls will be emulated by utilizing return values
of the 64-bit clock_gettime() vDSO instead.

Signed-off-by: Helge Deller <deller@gmx.de>
Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
2024-07-02 16:26:32 -04:00