glibc/sysdeps/unix/sysv/linux/clone-pidfd-support.c

61 lines
2.2 KiB
C
Raw Normal View History

posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349) Returning a pidfd allows a process to keep a race-free handle for a child process, otherwise, the caller will need to either use pidfd_open (which still might be subject to TOCTOU) or keep the old racy interface base on pid_t. To correct use pifd_spawn, the kernel must support not only returning the pidfd with clone/clone3 but also waitid (P_PIDFD) (added on Linux 5.4). If kernel does not support the waitid, pidfd return ENOSYS. It avoids the need to racy workarounds, such as reading the procfs fdinfo to get the pid to use along with other wait interfaces. These interfaces are similar to the posix_spawn and posix_spawnp, with the only difference being it returns a process file descriptor (int) instead of a process ID (pid_t). Their prototypes are: int pidfd_spawn (int *restrict pidfd, const char *restrict file, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict], char *const envp[restrict]) int pidfd_spawnp (int *restrict pidfd, const char *restrict path, const posix_spawn_file_actions_t *restrict facts, const posix_spawnattr_t *restrict attrp, char *const argv[restrict_arr], char *const envp[restrict_arr]); A new symbol is used instead of a posix_spawn extension to avoid possible issues with language bindings that might track the return argument lifetime. Although on Linux pid_t and int are interchangeable, POSIX only states that pid_t should be a signed integer. Both symbols reuse the posix_spawn posix_spawn_file_actions_t and posix_spawnattr_t, to void rehash posix_spawn API or add a new one. It also means that both interfaces support the same attribute and file actions, and a new flag or file action on posix_spawn is also added automatically for pidfd_spawn. Also, using posix_spawn plumbing allows the reusing of most of the current testing with some changes: - waitid is used instead of waitpid since it is a more generic interface. - tst-posix_spawn-setsid.c is adapted to take into consideration that the caller can check for session id directly. The test now spawns itself and writes the session id as a file instead. - tst-spawn3.c need to know where pidfd_spawn is used so it keeps an extra file description unused. Checked on x86_64-linux-gnu on Linux 4.15 (no CLONE_PIDFD or waitid support), Linux 5.4 (full support), and Linux 6.2. Reviewed-by: Florian Weimer <fweimer@redhat.com>
2023-08-24 16:42:18 +00:00
/* Check if kernel supports PID file descriptors.
Copyright (C) 2023 Free Software Foundation, Inc.
This file is part of the GNU C Library.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2.1 of the License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with the GNU C Library; if not, see
<https://www.gnu.org/licenses/>. */
#include <atomic.h>
#include <sys/wait.h>
#include <sysdep.h>
/* The PID file descriptors was added during multiple releases:
- Linux 5.2 added CLONE_PIDFD support for clone and __clone_pidfd_supported
syscall.
- Linux 5.3 added support for poll and CLONE_PIDFD for clone3.
- Linux 5.4 added P_PIDFD support on waitid.
For internal usage on spawn and fork, it only make sense to return a file
descriptor if caller can actually waitid on it. */
static int __waitid_pidfd_supported = 0;
bool
__clone_pidfd_supported (void)
{
int state = atomic_load_relaxed (&__waitid_pidfd_supported);
if (state == 0)
{
/* Linux define the maximum allocated file descriptor value as
0x7fffffc0 (from fs/file.c):
#define __const_min(x, y) ((x) < (y) ? (x) : (y))
unsigned int sysctl_nr_open_max =
__const_min(INT_MAX, ~(size_t)0/sizeof(void *)) & -BITS_PER_LONG;
So we can detect whether kernel supports all pidfd interfaces by
using a valid but never allocated file descriptor: if is not
supported waitid will return EINVAL, otherwise EBADF.
Also the waitid is a cancellation entrypoint, so issue the syscall
directly. */
int r = INTERNAL_SYSCALL_CALL (waitid, P_PIDFD, INT_MAX, NULL,
WEXITED | WNOHANG);
state = r == -EBADF ? 1 : -1;
atomic_store_relaxed (&__waitid_pidfd_supported, state);
}
return state > 0;
}