Now that kernel exports linux/mount.h and includes it on linux/fs.h,
its definitions might clash with glibc exports sys/mount.h. To avoid
the need to rearrange the Linux header to be always after glibc one,
the glibc sys/mount.h is changed to:
1. Undefine the macros also used as enum constants. This covers prior
inclusion of <linux/mount.h> (for instance MS_RDONLY).
2. Include <linux/mount.h> based on the usual __has_include check
(needs to use __has_include ("linux/mount.h") to paper over GCC
bugs.
3. Define enum fsconfig_command only if FSOPEN_CLOEXEC is not defined.
(FSOPEN_CLOEXEC should be a very close proxy.)
4. Define struct mount_attr if MOUNT_ATTR_SIZE_VER0 is not defined.
(Added in the same commit on the Linux side.)
This patch also adds some tests to check if including linux/fs.h and
linux/mount.h after and before sys/mount.h does work.
Checked on x86_64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Improve performance of recursive IO locks by adding a fast path for
the single-threaded case. To reduce the number of memory accesses for
locking/unlocking, only increment the recursion counter if the lock
is already taken.
On Neoverse V1, a microbenchmark with many small freads improved by
2.9x. Multithreaded performance improved by 2%.
Reviewed-by: Cristian Rodríguez <crrodriguez@opensuse.org>
So far this test checks if pidfd_open-syscall is supported,
which was introduced with linux 5.3.
The process_madvise-syscall was introduced with linux 5.10.
Thus you'll get FAILs if you are running a kernel in between.
This patch adds a check if the first process_madvise-syscall
returns ENOSYS and in this case will fail with UNSUPPORTED.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
`#ifndef STPCPY` is incorrect for checking if `STRCPY` is already
defined. It doesn't end up mattering as the whole check is
guarded by `#if IS_IN (libc)` but is incorrect none the less.
Similar to 6720d36b66 for x86-64.
Clang cannot assemble movzx in the AT&T dialect mode. Change movzx to
movzbl, which follows the AT&T dialect and is used elsewhere in the
file.
The older libc versions are obsolete for over twenty years now.
This patch removes the special flags for libc5 and libc4 and assumes
that all libraries cached are libc6 compatible and use FLAG_ELF_LIBC6.
Checked with a build for all affected architectures.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
exit only terminates the current thread, not the whole process, so it
is the wrong fallback system call in this context. All supported
Linux versions implement the exit_group system call anyway.
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 5.18. (There are no
new constants covered by these tests in 5.19, or in 5.17 or 5.18 in
the case of tst-mount-consts.py that previously used version 5.16,
that need any other header changes.)
Tested with build-many-glibcs.py.
__pthread_sigmask cannot actually fail with valid pointer arguments
(it would need a really broken seccomp filter), and we do not check
for errors elsewhere.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Since commit ec2c1fcefb ("malloc:
Abort on heap corruption, without a backtrace [BZ #21754]"),
__libc_message always terminates the process. Since commit
a289ea09ea ("Do not print backtraces
on fatal glibc errors"), the backtrace facility has been removed.
Therefore, remove enum __libc_message_action and the action
argument of __libc_message, and mark __libc_message as _No_return.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Linux 5.19 has no new syscalls, but enables memfd_secret in the uapi
headers for RISC-V. Update the version number in syscall-names.list
to reflect that it is still current for 5.19 and regenerate the
arch-syscall.h headers with build-many-glibcs.py update-syscalls.
Tested with build-many-glibcs.py.
The inline and library functions that the CMSG_NXTHDR macro may expand
to increment the pointer to the header before checking the stride of
the increment against available space. Since C only allows incrementing
pointers to one past the end of an array, the increment must be done
after a length check. This commit fixes that and includes a regression
test for CMSG_FIRSTHDR and CMSG_NXTHDR.
The Linux, Hurd, and generic headers are all changed.
Tested on Linux on armv7hl, i686, x86_64, aarch64, ppc64le, and s390x.
[BZ #28846]
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
pidfd_getfd can fail for a valid pidfd with errno EPERM for various
reasons in a restricted environment. Use FAIL_UNSUPPORTED in that case.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.
This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.
This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.
This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.
getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.
The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.
Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Commit a06b40cdf5 updated stat.h to use
__USE_XOPEN2K8 instead of __USE_MISC to add the st_atim, st_mtim and
st_ctim members to struct stat. However, for microblaze, there are two
definitions of struct stat, depending on the __USE_FILE_OFFSET64 macro.
The second one was not updated.
Change __USE_MISC to __USE_XOPEN2K8 in the __USE_FILE_OFFSET64 version
of struct stat for microblaze.
The hppa port starts libc at GLIBC_2.2, but has earlier symbol
versions in other shared objects. This means that the compat
symbol for readdir64 is not actually present in libc even though
have-GLIBC_2.1.3 is defined as yes at the make level.
Fixes commit 15e50e6c96 ("Linux:
dirent/tst-readdir64-compat can be a regular test") by mostly
reverting it.
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-avx2.S. It is used only if AVX2 is supported
and enabled by the architecture.
As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.
On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):
SSE MB/s
-----------------------------------------------
arc4random [single-thread] 704.25
arc4random_buf(16) [single-thread] 1018.17
arc4random_buf(32) [single-thread] 1315.27
arc4random_buf(48) [single-thread] 1449.36
arc4random_buf(64) [single-thread] 1511.16
arc4random_buf(80) [single-thread] 1539.48
arc4random_buf(96) [single-thread] 1571.06
arc4random_buf(112) [single-thread] 1596.16
arc4random_buf(128) [single-thread] 1613.48
-----------------------------------------------
AVX2 MB/s
-----------------------------------------------
arc4random [single-thread] 922.61
arc4random_buf(16) [single-thread] 1478.70
arc4random_buf(32) [single-thread] 2241.80
arc4random_buf(48) [single-thread] 2681.28
arc4random_buf(64) [single-thread] 2913.43
arc4random_buf(80) [single-thread] 3009.73
arc4random_buf(96) [single-thread] 3141.16
arc4random_buf(112) [single-thread] 3254.46
arc4random_buf(128) [single-thread] 3305.02
-----------------------------------------------
Checked on x86_64-linux-gnu.
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-ssse3.S. It replaces the ROTATE_SHUF_2 (which
uses pshufb) by ROTATE2 and thus making the original implementation
SSE2.
As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.
On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):
GENERIC MB/s
-----------------------------------------------
arc4random [single-thread] 443.11
arc4random_buf(16) [single-thread] 552.27
arc4random_buf(32) [single-thread] 626.86
arc4random_buf(48) [single-thread] 649.81
arc4random_buf(64) [single-thread] 663.95
arc4random_buf(80) [single-thread] 674.78
arc4random_buf(96) [single-thread] 675.17
arc4random_buf(112) [single-thread] 680.69
arc4random_buf(128) [single-thread] 683.20
-----------------------------------------------
SSE MB/s
-----------------------------------------------
arc4random [single-thread] 704.25
arc4random_buf(16) [single-thread] 1018.17
arc4random_buf(32) [single-thread] 1315.27
arc4random_buf(48) [single-thread] 1449.36
arc4random_buf(64) [single-thread] 1511.16
arc4random_buf(80) [single-thread] 1539.48
arc4random_buf(96) [single-thread] 1571.06
arc4random_buf(112) [single-thread] 1596.16
arc4random_buf(128) [single-thread] 1613.48
-----------------------------------------------
Checked on x86_64-linux-gnu.
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-aarch64.S. It is used as default and only
little-endian is supported (BE uses generic code).
As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.
On a virtualized Linux on Apple M1 it shows the following
improvements (using formatted bench-arc4random data):
GENERIC MB/s
-----------------------------------------------
arc4random [single-thread] 380.89
arc4random_buf(16) [single-thread] 500.73
arc4random_buf(32) [single-thread] 552.61
arc4random_buf(48) [single-thread] 566.82
arc4random_buf(64) [single-thread] 574.01
arc4random_buf(80) [single-thread] 581.02
arc4random_buf(96) [single-thread] 591.19
arc4random_buf(112) [single-thread] 592.29
arc4random_buf(128) [single-thread] 596.43
-----------------------------------------------
OPTIMIZED MB/s
-----------------------------------------------
arc4random [single-thread] 569.60
arc4random_buf(16) [single-thread] 825.78
arc4random_buf(32) [single-thread] 987.03
arc4random_buf(48) [single-thread] 1042.39
arc4random_buf(64) [single-thread] 1075.50
arc4random_buf(80) [single-thread] 1094.68
arc4random_buf(96) [single-thread] 1130.16
arc4random_buf(112) [single-thread] 1129.58
arc4random_buf(128) [single-thread] 1137.91
-----------------------------------------------
Checked on aarch64-linux-gnu.
The implementation is based on scalar Chacha20 with per-thread cache.
It uses getrandom or /dev/urandom as fallback to get the initial entropy,
and reseeds the internal state on every 16MB of consumed buffer.
To improve performance and lower memory consumption the per-thread cache
is allocated lazily on first arc4random functions call, and if the
memory allocation fails getentropy or /dev/urandom is used as fallback.
The cache is also cleared on thread exit iff it was initialized (so if
arc4random is not called it is not touched).
Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).
The ChaCha20 implementation is based on RFC8439 [1], omitting the final
XOR of the keystream with the plaintext because the plaintext is a
stream of zeros. This strategy is similar to what OpenBSD arc4random
does.
The arc4random_uniform is based on previous work by Florian Weimer,
where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete
Uniform Generation from Coin Flips, and Applications (2013) [2], who
credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform
random number generation (1976), for solving the general case.
The main advantage of this method is the that the unit of randomness is not
the uniform random variable (uint32_t), but a random bit. It optimizes the
internal buffer sampling by initially consuming a 32-bit random variable
and then sampling byte per byte. Depending of the upper bound requested,
it might lead to better CPU utilization.
Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.
Co-authored-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>
[1] https://datatracker.ietf.org/doc/html/rfc8439
[2] https://arxiv.org/pdf/1304.1916.pdf
Before this the test fails if run in a chroot by a non-root user:
warning: could not become root outside namespace (Operation not permitted)
../sysdeps/unix/sysv/linux/tst-mount.c:36: numeric comparison failure
left: 1 (0x1); from: errno
right: 19 (0x13); from: ENODEV
error: ../sysdeps/unix/sysv/linux/tst-mount.c:39: not true: fd != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:46: not true: r != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:48: not true: r != -1
../sysdeps/unix/sysv/linux/tst-mount.c:52: numeric comparison failure
left: 1 (0x1); from: errno
right: 9 (0x9); from: EBADF
error: ../sysdeps/unix/sysv/linux/tst-mount.c:55: not true: mfd != -1
../sysdeps/unix/sysv/linux/tst-mount.c:58: numeric comparison failure
left: 1 (0x1); from: errno
right: 2 (0x2); from: ENOENT
error: ../sysdeps/unix/sysv/linux/tst-mount.c:61: not true: r != -1
../sysdeps/unix/sysv/linux/tst-mount.c:65: numeric comparison failure
left: 1 (0x1); from: errno
right: 2 (0x2); from: ENOENT
error: ../sysdeps/unix/sysv/linux/tst-mount.c:68: not true: pfd != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:75: not true: fd_tree != -1
../sysdeps/unix/sysv/linux/tst-mount.c:88: numeric comparison failure
left: 1 (0x1); from: errno
right: 38 (0x26); from: ENOSYS
error: 12 test failures
Checking that the test can enter a new mount namespace is more correct
than just checking the return value of support_become_root() as the test
code changes the mount namespace it runs in so running it as root on a
system that does not support mount namespaces should still skip.
Also change the test to remove the unnecessary fork.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
1. Add default ISA level selection in non-multiarch/rtld
implementations.
2. Add ISA level build guards to different implementations.
- I.e strcpy-avx2.S which is ISA level 3 will only build if
compiled ISA level <= 3. Otherwise there is no reason to
include it as we will always use one of the ISA level 4
implementations (strcpy-evex.S).
3. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
1. Add ISA level build guards to different implementations.
- wcscpy-ssse3.S is used as ISA level 2/3/4.
- wcscpy-generic.c is only used at ISA level 1 and will
only build if compiled with ISA level == 1. Otherwise
there is no reason to include it as we will always use
wcscpy-ssse3.S
2. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
1. Add default ISA level selection in non-multiarch/rtld
implementations.
2. Add ISA level build guards to different implementations.
- I.e strcmp-avx2.S which is ISA level 3 will only build if
compiled ISA level <= 3. Otherwise there is no reason to
include it as we will always use one of the ISA level 4
implementations (strcmp-evex.S).
3. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
Starting with commit e070501d12
"Replace __libc_multiple_threads with __libc_single_threaded"
the testcases nptl/tst-cancel-self and
nptl/tst-cancel-self-cancelstate are failing.
This is fixed by only defining SINGLE_THREAD_BY_GLOBAL on s390x,
but not on s390.
Starting with commit 09c76a7409
"Linux: Consolidate {RTLD_}SINGLE_THREAD_P definition",
SINGLE_THREAD_BY_GLOBAL was defined in
sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h.
Lateron the commit 9a973da617
"s390: Consolidate Linux syscall definition" consolidates the sysdep.h files
from s390-32/s390-64 subdirectories. Unfortunately the macro is now always
defined instead of only on s390-64.
As information:
TLS_MULTIPLE_THREADS_IN_TCB is also only defined for s390.
See: sysdeps/s390/nptl/tls.h
wmemcmp isn't used by the dynamic loader so their no need to add an
RTLD stub for it.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Tested build on x86_64 and x86_32 with/without multiarch.
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.
Because strcmp-sse2.S implements so many functions (more from
avx2/evex/sse42) add a new file 'strcmp-naming.h' to assist in
getting the correct symbol name for all the function across
multiarch/non-multiarch builds.
Tested build on x86_64 and x86_32 with/without multiarch.
The previous macro name can be confusing given that both
`__strcasecmp_l_nonascii` and `__strcasecmp_nonascii` are
functions and we use the `_l` version.
The intrinsics are not available before GCC7 and using standard
operators generates code of equivalent or better quality.
Removed:
_cvtmask64_u64
_kshiftri_mask64
_kand_mask64
Geometric Mean of 5 Runs of Full Benchmark Suite New / Old: 0.958
These functions all have optimized versions:
__strncat_sse2_unaligned, __strncpy_sse2_unaligned, and
stpncpy_sse2_unaligned which are faster than their respective generic
implementations. Since the sse2 versions can run on baseline x86_64,
we should use these as the baseline implementation and can remove the
generic implementations.
Geometric mean of N=20 runs of the entire benchmark suite on:
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz (Tigerlake)
__strncat_sse2_unaligned / __strncat_generic: .944
__strncpy_sse2_unaligned / __strncpy_generic: .726
__stpncpy_sse2_unaligned / __stpncpy_generic: .650
Tested build with and without multiarch and full check with multiarch.
gas -mtune= may change NOP generating patterns but -mtune=i686 has no
difference from the default by inspecting .o and .os files.
Note: Clang doesn't support -Wa,-mtune=i686.
Remove redundant strcspn-generic, strpbrk-generic and strspn-generic
from sysdep_routines in sysdeps/x86_64/multiarch/Makefile added by
commit c69f960b01
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Sun Jul 3 21:28:07 2022 -0700
x86: Add support for building str{c|p}{brk|spn} with explicit ISA level
since they have been added to sysdep_routines in sysdeps/x86_64/Makefile.
This change provides implementations for the mbrtoc8 and c8rtomb
functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14
N2653. It also provides the char8_t typedef from WG14 N2653.
The mbrtoc8 and c8rtomb functions are declared in uchar.h in C2X
mode or when the _GNU_SOURCE macro or C++20 __cpp_char8_t feature
test macro is defined.
The char8_t typedef is declared in uchar.h in C2X mode or when the
_GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature
test macro is not defined (if __cpp_char8_t is defined, then char8_t
is a builtin type).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
We found that string functions were using AND+ADDP
to find the nibble/syndrome mask but there is an easier
opportunity through `SHRN dst.8b, src.8h, 4` (shift
right every 2 bytes by 4 and narrow to 1 byte) and has
same latency on all SIMD ARMv8 targets as ADDP. There
are also possible gaps for memcmp but that's for
another patch.
We see 10-20% savings for small-mid size cases (<=128)
which are primary cases for general workloads.
1. Refactor files so that all implementations are in the multiarch
directory
- Moved the implementation portion of memcmp sse2 from memcmp.S to
multiarch/memcmp-sse2.S
- The non-multiarch file now only includes one of the
implementations in the multiarch directory based on the compiled
ISA level (only used for non-multiarch builds. Otherwise we go
through the ifunc selector).
2. Add ISA level build guards to different implementations.
- I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if
compiled ISA level <= 3. Otherwise there is no reason to include
it as we will always use one of the ISA level 4
implementations (memcmp-evex-movbe.S).
3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the
non-multiarch {w}memcmp{eq}.S which will in turn select the best
implementation based on the compiled ISA level.
4. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
1. Refactor files so that all implementations are in the multiarch
directory
- Moved the implementation portion of memset sse2 from memset.S to
multiarch/memset-sse2.S
- The non-multiarch file now only includes one of the
implementations in the multiarch directory based on the compiled
ISA level (only used for non-multiarch builds. Otherwise we go
through the ifunc selector).
2. Add ISA level build guards to different implementations.
- I.e memset-avx2-unaligned-erms.S which is ISA level 3 will only
build if compiled ISA level <= 3. Otherwise there is no reason
to include it as we will always use one of the ISA level 4
implementations (memset-evex-unaligned-erms.S).
3. Add new multiarch/rtld-memset.S that just include the
non-multiarch memset.S which will in turn select the best
implementation based on the compiled ISA level.
4. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
1. Refactor files so that all implementations are in the multiarch
directory
- Moved the implementation portion of memmove sse2 from memmove.S
to multiarch/memmove-sse2.S
- The non-multiarch file now only includes one of the
implementations in the multiarch directory based on the compiled
ISA level (only used for non-multiarch builds. Otherwise we go
through the ifunc selector).
2. Add ISA level build guards to different implementations.
- I.e memmove-avx2-unaligned-erms.S which is ISA level 3 will only
build if compiled ISA level <= 3. Otherwise there is no reason
to include it as we will always use one of the ISA level 4
implementations (memmove-evex-unaligned-erms.S).
3. Add new multiarch/rtld-memmove.S that just include the
non-multiarch memmove.S which will in turn select the best
implementation based on the compiled ISA level.
4. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
isa raising memmove
The changes for these functions are different than the others because
the best implementation (sse4_2) requires the generic
implementation as a fallback to be built as well.
Changes are:
1. Add non-multiarch functions for str{c|p}{brk|spn}.c to statically
select the best implementation based on the configured ISA build
level.
2. Add stubs for str{c|p}{brk|spn}-generic and varshift.c to in the
sysdeps/x86_64 directory so that the the sse4 implementation will
have all of its dependencies for the non-multiarch / rtld build
when ISA level >= 2.
3. Add new multiarch/rtld-strcspn.c that just include the
non-multiarch strcspn.c which will in turn select the best
implementation based on the compiled ISA level.
4. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL,
since header inclusion single-thread.h is in the wrong order, the define
needs to come before including sysdeps/unix/sysdep.h. The macro
is now moved to a per-arch single-threade.h header.
The SINGLE_THREAD_P is used on some more places.
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
It was added on Linux 5.12 (2a1867219c7b27f928e2545782b86daaf9ad50bd)
to allow change the properties of a mount or a mount tree using file
descriptors which the new mount api is based on.
Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The new mount API was added on Linux 5.2 with six new syscalls:
fsopen, fsconfig, fsmount, move_mount, fspick, and open_tree.
The new test verifies minimal functionality along with error paths
for specific arguments and their corner cases.
Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
It was added on Linux 5.2 (a07b20004793d8926f78d63eb5980559f7813404)
to return a O_PATH-opened file descriptor to an existing mountpoint.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
It was added on Linux 5.2 (cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb)
that can be used to pick an existing mountpoint into an filesystem
context which can thereafter be used to reconfigure a superblock
with fsconfig syscall.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
It was added on Linux 5.2 (ecdab150fddb42fe6a739335257949220033b782)
as a way to a configure filesystem creation context and trigger
actions upon it, to be used in conjunction with fsopen, fspick and
fsmount.
The fsconfig_command commands are currently only defined as an enum,
so they can't be checked on tst-mount-consts.py with current test
support.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The AFP feature (Alternate floating-point behavior) was added in armv8.7 and
introduced new FPCR bits.
Currently, HWCAP2_AFP bits (bit 0, 1, 2) in FPCR are preserved when fenv is
set to default environment. This is a deviation from standard behaviour.
Clear these bits when setting the fenv to default.
There is no libc API to modify the new FPCR bits. Restoring those bits matters
if the user changed them directly.
The main drive is to optimize the internal usage and required size
when sigset_t is embedded in other data structures. On Linux, the
current supported signal set requires up to 8 bytes (16 on mips),
was lower than the user defined sigset_t (128 bytes).
A new internal type internal_sigset_t is added, along with the
functions to operate on it similar to the ones for sigset_t. The
internal-signals.h is also refactored to remove unused functions
Besides small stack usage on some functions (posix_spawn, abort)
it lower the struct pthread by about 120 bytes (112 on mips).
Checked on x86_64-linux-gnu.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
The new asymmetric mode is available when HWCAP2_MTE3 is set (support is
available), bit2 is set in the tunable (user request per application),
and the system is configured such that the asymmetric mode is preferred over
sync or async (per-cpu system-wide setting).
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
On success, mq_receive() and mq_timedreceive() return the number of
bytes in the received message, so it requires to check if the value
is larger than 0.
Checked on i686-linux-gnu.
Was missing to for the multiarch build rtld-strncmp-sse4_2.os was
being built and exporting symbols:
build/glibc/string/rtld-strncmp-sse4_2.os:
0000000000000000 T __strncmp_sse42
Introduced in:
commit 11ffcacb64
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Jun 21 12:10:50 2017 -0700
x86-64: Implement strcmp family IFUNC selectors in C
Was missing to for the multiarch build rtld-strcspn-sse4.os was
being built and exporting symbols:
build/glibc/string/rtld-strcspn-sse4.os:
U ___m128i_shift_right
U __strcspn_generic
0000000000000000 T __strcspn_sse42
U strlen
build/glibc/string/rtld-varshift.os:
0000000000000000 R ___m128i_shift_right
Introduced in:
commit 06e51c8f3d
Author: H.J. Lu <hongjiu.lu@intel.com>
Date: Fri Jul 3 02:48:56 2009 -0700
Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.
Was missing to for the multiarch build rtld-memmove-ssse3.os was
being built and exporting symbols:
>$ nm string/rtld-memmove-ssse3.os
U __GI___chk_fail
0000000000000020 T __memcpy_chk_ssse3
0000000000000040 T __memcpy_ssse3
0000000000000020 T __memmove_chk_ssse3
0000000000000040 T __memmove_ssse3
0000000000000000 T __mempcpy_chk_ssse3
0000000000000010 T __mempcpy_ssse3
U __x86_shared_cache_size_half
Introduced after 2.35 in:
commit 26b2478322
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Thu Apr 14 11:47:40 2022 -0500
x86: Reduce code size of mem{move|pcpy|cpy}-ssse3
1. Remove sse2 instructions when using the avx512 or avx version.
2. Fixup some format nits in how the address offsets where aligned.
3. Use more space efficient instructions in the conditional AVX
restoral.
- vpcmpeqq -> vpcmpeqb
- cmp imm32, r; jz -> inc r; jz
4. Use `rep movsb` instead of `rep movsq`. The former is guranteed to
be fast with the ERMS flags, the latter is not. The latter also
wastes an instruction in size setup.
The primary memmove_{impl}_unaligned_erms implementations don't
interact with this function. Putting them in same file both
wastes space and unnecessarily bloats a hot code section.
Implementation wise:
1. Remove the VZEROUPPER as memset_{impl}_unaligned_erms does not
use the L(stosb) label that was previously defined.
2. Don't give the hotpath (fallthrough) to zero size.
Code positioning wise:
Move memset_{chk}_erms to its own file. Leaving it in between the
memset_{impl}_unaligned both adds unnecessary complexity to the
file and wastes space in a relatively hot cache section.
These files simply include the sysdeps/posix implementations which would
be used even in the absence of the files. They have been unnecessary
since 7b17aeda0c when nice and signal were removed from the
syscalls.list file.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This maintains compatibility between <sys/mman.h> and <linux/uio.h>.
Before that, the addition of process_madvise made those two header
files incompatible. This has been observed resulting in a build
failure in LLDB's Process/Linux/NativeRegisterContextLinux_s390x.cpp
source file.
Fixes commit d19ee3473d
("linux: Add process_madvise").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>