Commit Graph

15191 Commits

Author SHA1 Message Date
Wilco Dijkstra
12182ba18d AArch64: Fix typo in sve configure check (BZ# 29394)
Fix a typo in the SVE configure check. This fixes [BZ# 29394].
2022-08-11 17:52:00 +01:00
Wilco Dijkstra
c51c483d2b libio: Improve performance of IO locks
Improve performance of recursive IO locks by adding a fast path for
the single-threaded case. To reduce the number of memory accesses for
locking/unlocking, only increment the recursion counter if the lock
is already taken.

On Neoverse V1, a microbenchmark with many small freads improved by
2.9x. Multithreaded performance improved by 2%.

Reviewed-by: Cristian Rodríguez  <crrodriguez@opensuse.org>
2022-08-11 16:47:45 +01:00
Stefan Liebler
11f09947f3 tst-process_madvise: Check process_madvise-syscall support.
So far this test checks if pidfd_open-syscall is supported,
which was introduced with linux 5.3.

The process_madvise-syscall was introduced with linux 5.10.
Thus you'll get FAILs if you are running a kernel in between.

This patch adds a check if the first process_madvise-syscall
returns ENOSYS and in this case will fail with UNSUPPORTED.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-08-11 12:21:05 +02:00
Noah Goldstein
312ded0d63 x86: Fix #define STRCPY guard in strcpy-sse2.S
`#ifndef STPCPY` is incorrect for checking if `STRCPY` is already
defined.  It doesn't end up mattering as the whole check is
guarded by `#if IS_IN (libc)` but is incorrect none the less.
2022-08-09 17:00:03 +08:00
Adhemerval Zanella
26a3499cdb i386: Use cmpl instead of cmp
Clang cannot assemble cmp in the AT&T dialect mode.
2022-08-05 09:28:39 -03:00
Adhemerval Zanella
1ed5869c4c i386: Use fldt instead of fld on e_logl.S
Clang cannot assemble fldt in the AT&T dialect mode.
2022-08-05 09:28:33 -03:00
Fangrui Song
525ca33a61 i386: Replace movzx with movzbl
Similar to 6720d36b66 for x86-64.

Clang cannot assemble movzx in the AT&T dialect mode.  Change movzx to
movzbl, which follows the AT&T dialect and is used elsewhere in the
file.
2022-08-04 14:06:50 -07:00
Adhemerval Zanella
3698f5a9dd i386: Remove RELA support
Now that prelink is not support, there is no need to keep supporting
rela for non bootstrap.
2022-08-04 10:03:46 -03:00
Adhemerval Zanella
c3f5682215 arm: Remove RELA support
Now that prelink is not support, there is no need to keep supporting
rela for non bootstrap.
2022-08-04 10:03:46 -03:00
Adhemerval Zanella
36676f5e5d Remove ldd libc4 support
The older libc versions are obsolete for over twenty years now.
2022-08-04 10:03:45 -03:00
Lucas A. M. Magalhaes
8ee878592c Assume only FLAG_ELF_LIBC6 suport
The older libc versions are obsolete for over twenty years now.
This patch removes the special flags for libc5 and libc4 and assumes
that all libraries cached are libc6 compatible and use FLAG_ELF_LIBC6.

Checked with a build for all affected architectures.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-08-04 09:09:48 -03:00
Adhemerval Zanella
5a57ad23ba Remove left over LD_LIBRARY_VERSION usages
The environment variable was removed by
d2db60d8d8.
2022-08-04 09:09:48 -03:00
Florian Weimer
8fabe0e632 Linux: Remove exit system call from _exit
exit only terminates the current thread, not the whole process, so it
is the wrong fallback system call in this context.  All supported
Linux versions implement the exit_group system call anyway.
2022-08-04 06:17:50 +02:00
caiyinyu
3e83843637 LoongArch: Add vdso support for gettimeofday. 2022-08-04 09:19:36 +08:00
Joseph Myers
085030b957 Update kernel version to 5.19 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 5.18.  (There are no
new constants covered by these tests in 5.19, or in 5.17 or 5.18 in
the case of tst-mount-consts.py that previously used version 5.16,
that need any other header changes.)

Tested with build-many-glibcs.py.
2022-08-03 16:31:58 +00:00
Florian Weimer
68e036f27f nptl: Remove uses of assert_perror
__pthread_sigmask cannot actually fail with valid pointer arguments
(it would need a really broken seccomp filter), and we do not check
for errors elsewhere.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-08-03 11:42:49 +02:00
Florian Weimer
cca9684f2d stdio: Clean up __libc_message after unconditional abort
Since commit ec2c1fcefb ("malloc:
Abort on heap corruption, without a backtrace [BZ #21754]"),
__libc_message always terminates the process.  Since commit
a289ea09ea ("Do not print backtraces
on fatal glibc errors"), the backtrace facility has been removed.
Therefore, remove enum __libc_message_action and the action
argument of __libc_message, and mark __libc_message as _No_return.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-08-03 11:42:39 +02:00
Joseph Myers
fccadcdf5b Update syscall lists for Linux 5.19
Linux 5.19 has no new syscalls, but enables memfd_secret in the uapi
headers for RISC-V.  Update the version number in syscall-names.list
to reflect that it is still current for 5.19 and regenerate the
arch-syscall.h headers with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.
2022-08-02 21:05:07 +00:00
Arjun Shankar
9c443ac455 socket: Check lengths before advancing pointer in CMSG_NXTHDR
The inline and library functions that the CMSG_NXTHDR macro may expand
to increment the pointer to the header before checking the stride of
the increment against available space.  Since C only allows incrementing
pointers to one past the end of an array, the increment must be done
after a length check.  This commit fixes that and includes a regression
test for CMSG_FIRSTHDR and CMSG_NXTHDR.

The Linux, Hurd, and generic headers are all changed.

Tested on Linux on armv7hl, i686, x86_64, aarch64, ppc64le, and s390x.

[BZ #28846]

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-08-02 11:10:25 +02:00
Mark Wielaard
325ba824b0 tst-pidfd.c: UNSUPPORTED if we get EPERM on valid pidfd_getfd call
pidfd_getfd can fail for a valid pidfd with errno EPERM for various
reasons in a restricted environment. Use FAIL_UNSUPPORTED in that case.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-29 18:52:12 +02:00
caiyinyu
bce0218d9a LoongArch: Add greg_t and gregset_t. 2022-07-29 09:15:21 +08:00
caiyinyu
033e76ea9c LoongArch: Fix VDSO_HASH and VDSO_NAME. 2022-07-29 09:15:21 +08:00
Darius Rad
7c5db7931f riscv: Update rv64 libm test ulps
Generated on a Microsemi Polarfire Icicle Kit running Linux version
5.15.32.  Same ULPs were also produced on QEMU 5.2.0 running Linux
5.18.0.
2022-07-27 10:50:20 -03:00
Darius Rad
5b6d8a650d riscv: Update nofpu libm test ulps 2022-07-27 10:50:10 -03:00
Jason A. Donenfeld
eaad4f9e8f arc4random: simplify design for better safety
Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-07-27 08:58:27 -03:00
caiyinyu
68d61026d5 LoongArch: Hard Float Support 2022-07-26 12:35:12 -03:00
caiyinyu
3d87c89815 LoongArch: Build Infrastructure 2022-07-26 12:35:12 -03:00
caiyinyu
0d4a891a7c LoongArch: Add ABI Lists 2022-07-26 12:35:12 -03:00
caiyinyu
f2037efbb3 LoongArch: Linux ABI 2022-07-26 12:35:12 -03:00
caiyinyu
45955fe618 LoongArch: Linux Syscall Interface 2022-07-26 12:35:12 -03:00
caiyinyu
3275882261 LoongArch: Atomic and Locking Routines 2022-07-26 12:35:12 -03:00
caiyinyu
c742795dce LoongArch: Generic <math.h> and soft-fp Routines 2022-07-26 12:35:12 -03:00
caiyinyu
619bfc6770 LoongArch: Thread-Local Storage Support 2022-07-26 12:35:12 -03:00
caiyinyu
a133942025 LoongArch: ABI Implementation 2022-07-26 12:35:12 -03:00
Arnout Vandecappelle (Essensium/Mind)
794c27446f struct stat is not posix conformant on microblaze with __USE_FILE_OFFSET64
Commit a06b40cdf5 updated stat.h to use
__USE_XOPEN2K8 instead of __USE_MISC to add the st_atim, st_mtim and
st_ctim members to struct stat. However, for microblaze, there are two
definitions of struct stat, depending on the __USE_FILE_OFFSET64 macro.
The second one was not updated.

Change __USE_MISC to __USE_XOPEN2K8 in the __USE_FILE_OFFSET64 version
of struct stat for microblaze.
2022-07-25 11:06:49 -03:00
Florian Weimer
0c5605989f Linux: dirent/tst-readdir64-compat needs to use TEST_COMPAT (bug 27654)
The hppa port starts libc at GLIBC_2.2, but has earlier symbol
versions in other shared objects.  This means that the compat
symbol for readdir64 is not actually present in libc even though
have-GLIBC_2.1.3 is defined as yes at the make level.

Fixes commit 15e50e6c96 ("Linux:
dirent/tst-readdir64-compat can be a regular test") by mostly
reverting it.
2022-07-25 11:39:03 +02:00
Adhemerval Zanella Netto
3b56f944c5 s390x: Add optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-s390x.S.  The final state register clearing is
omitted.

On a z15 it shows the following improvements (using formatted
bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               198.92
arc4random_buf(16) [single-thread]       244.49
arc4random_buf(32) [single-thread]       282.73
arc4random_buf(48) [single-thread]       286.64
arc4random_buf(64) [single-thread]       320.06
arc4random_buf(80) [single-thread]       297.43
arc4random_buf(96) [single-thread]       310.96
arc4random_buf(112) [single-thread]      308.10
arc4random_buf(128) [single-thread]      309.90
-----------------------------------------------

VX.                                        MB/s
-----------------------------------------------
arc4random [single-thread]               430.26
arc4random_buf(16) [single-thread]       735.14
arc4random_buf(32) [single-thread]      1029.99
arc4random_buf(48) [single-thread]      1206.76
arc4random_buf(64) [single-thread]      1311.92
arc4random_buf(80) [single-thread]      1378.74
arc4random_buf(96) [single-thread]      1445.06
arc4random_buf(112) [single-thread]     1484.32
arc4random_buf(128) [single-thread]     1517.30
-----------------------------------------------

Checked on s390x-linux-gnu.
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
b7060acfe8 powerpc64: Add optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-ppc.c.  It targets POWER8 and it is used on default
for LE.

On a POWER8 it shows the following improvements (using formatted
bench-arc4random data):

POWER8

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               138.77
arc4random_buf(16) [single-thread]       174.36
arc4random_buf(32) [single-thread]       228.11
arc4random_buf(48) [single-thread]       252.31
arc4random_buf(64) [single-thread]       270.11
arc4random_buf(80) [single-thread]       278.97
arc4random_buf(96) [single-thread]       287.78
arc4random_buf(112) [single-thread]      291.92
arc4random_buf(128) [single-thread]      295.25

POWER8                                     MB/s
-----------------------------------------------
arc4random [single-thread]               198.06
arc4random_buf(16) [single-thread]       278.79
arc4random_buf(32) [single-thread]       448.89
arc4random_buf(48) [single-thread]       551.09
arc4random_buf(64) [single-thread]       646.12
arc4random_buf(80) [single-thread]       698.04
arc4random_buf(96) [single-thread]       756.06
arc4random_buf(112) [single-thread]      784.12
arc4random_buf(128) [single-thread]      808.04
-----------------------------------------------

Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
84cfc6479b x86: Add AVX2 optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-avx2.S.  It is used only if AVX2 is supported
and enabled by the architecture.

As for generic implementation, the last step that XOR with the
input is omited.  The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

AVX2                                       MB/s
-----------------------------------------------
arc4random [single-thread]               922.61
arc4random_buf(16) [single-thread]      1478.70
arc4random_buf(32) [single-thread]      2241.80
arc4random_buf(48) [single-thread]      2681.28
arc4random_buf(64) [single-thread]      2913.43
arc4random_buf(80) [single-thread]      3009.73
arc4random_buf(96) [single-thread]      3141.16
arc4random_buf(112) [single-thread]     3254.46
arc4random_buf(128) [single-thread]     3305.02
-----------------------------------------------

Checked on x86_64-linux-gnu.
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
e169aff0e9 x86: Add SSE2 optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-ssse3.S.  It replaces the ROTATE_SHUF_2 (which
uses pshufb) by ROTATE2 and thus making the original implementation
SSE2.

As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               443.11
arc4random_buf(16) [single-thread]       552.27
arc4random_buf(32) [single-thread]       626.86
arc4random_buf(48) [single-thread]       649.81
arc4random_buf(64) [single-thread]       663.95
arc4random_buf(80) [single-thread]       674.78
arc4random_buf(96) [single-thread]       675.17
arc4random_buf(112) [single-thread]      680.69
arc4random_buf(128) [single-thread]      683.20
-----------------------------------------------

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

Checked on x86_64-linux-gnu.
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
4c128c7823 aarch64: Add optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-aarch64.S.  It is used as default and only
little-endian is supported (BE uses generic code).

As for generic implementation, the last step that XOR with the
input is omited.  The final state register clearing is also
omitted.

On a virtualized Linux on Apple M1 it shows the following
improvements (using formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               380.89
arc4random_buf(16) [single-thread]       500.73
arc4random_buf(32) [single-thread]       552.61
arc4random_buf(48) [single-thread]       566.82
arc4random_buf(64) [single-thread]       574.01
arc4random_buf(80) [single-thread]       581.02
arc4random_buf(96) [single-thread]       591.19
arc4random_buf(112) [single-thread]      592.29
arc4random_buf(128) [single-thread]      596.43
-----------------------------------------------

OPTIMIZED                                  MB/s
-----------------------------------------------
arc4random [single-thread]               569.60
arc4random_buf(16) [single-thread]       825.78
arc4random_buf(32) [single-thread]       987.03
arc4random_buf(48) [single-thread]      1042.39
arc4random_buf(64) [single-thread]      1075.50
arc4random_buf(80) [single-thread]      1094.68
arc4random_buf(96) [single-thread]      1130.16
arc4random_buf(112) [single-thread]     1129.58
arc4random_buf(128) [single-thread]     1137.91
-----------------------------------------------

Checked on aarch64-linux-gnu.
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
6f4e0fcfa2 stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
The implementation is based on scalar Chacha20 with per-thread cache.
It uses getrandom or /dev/urandom as fallback to get the initial entropy,
and reseeds the internal state on every 16MB of consumed buffer.

To improve performance and lower memory consumption the per-thread cache
is allocated lazily on first arc4random functions call, and if the
memory allocation fails getentropy or /dev/urandom is used as fallback.
The cache is also cleared on thread exit iff it was initialized (so if
arc4random is not called it is not touched).

Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).

The ChaCha20 implementation is based on RFC8439 [1], omitting the final
XOR of the keystream with the plaintext because the plaintext is a
stream of zeros.  This strategy is similar to what OpenBSD arc4random
does.

The arc4random_uniform is based on previous work by Florian Weimer,
where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete
Uniform Generation from Coin Flips, and Applications (2013) [2], who
credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform
random number generation (1976), for solving the general case.

The main advantage of this method is the that the unit of randomness is not
the uniform random variable (uint32_t), but a random bit.  It optimizes the
internal buffer sampling by initially consuming a 32-bit random variable
and then sampling byte per byte.  Depending of the upper bound requested,
it might lead to better CPU utilization.

Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.

Co-authored-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>

[1] https://datatracker.ietf.org/doc/html/rfc8439
[2] https://arxiv.org/pdf/1304.1916.pdf
2022-07-22 11:58:27 -03:00
Michael Hudson-Doyle
1f4e90d468 linux: return UNSUPPORTED from tst-mount if entering mount namespace fails
Before this the test fails if run in a chroot by a non-root user:

warning: could not become root outside namespace (Operation not permitted)
../sysdeps/unix/sysv/linux/tst-mount.c:36: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 19 (0x13); from: ENODEV
error: ../sysdeps/unix/sysv/linux/tst-mount.c:39: not true: fd != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:46: not true: r != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:48: not true: r != -1
../sysdeps/unix/sysv/linux/tst-mount.c:52: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 9 (0x9); from: EBADF
error: ../sysdeps/unix/sysv/linux/tst-mount.c:55: not true: mfd != -1
../sysdeps/unix/sysv/linux/tst-mount.c:58: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 2 (0x2); from: ENOENT
error: ../sysdeps/unix/sysv/linux/tst-mount.c:61: not true: r != -1
../sysdeps/unix/sysv/linux/tst-mount.c:65: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 2 (0x2); from: ENOENT
error: ../sysdeps/unix/sysv/linux/tst-mount.c:68: not true: pfd != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:75: not true: fd_tree != -1
../sysdeps/unix/sysv/linux/tst-mount.c:88: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 38 (0x26); from: ENOSYS
error: 12 test failures

Checking that the test can enter a new mount namespace is more correct
than just checking the return value of support_become_root() as the test
code changes the mount namespace it runs in so running it as root on a
system that does not support mount namespaces should still skip.

Also change the test to remove the unnecessary fork.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-19 06:55:49 +12:00
Noah Goldstein
49889fb256 x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level
1. Add default ISA level selection in non-multiarch/rtld
   implementations.

2. Add ISA level build guards to different implementations.
    - I.e strcpy-avx2.S which is ISA level 3 will only build if
      compiled ISA level <= 3. Otherwise there is no reason to
      include it as we will always use one of the ISA level 4
      implementations (strcpy-evex.S).

3. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-16 03:07:59 -07:00
Noah Goldstein
192979ee35 x86: Add support to build wcscpy with explicit ISA level
1. Add ISA level build guards to different implementations.
    - wcscpy-ssse3.S is used as ISA level 2/3/4.
    - wcscpy-generic.c is only used at ISA level 1 and will
      only build if compiled with ISA level == 1. Otherwise
      there is no reason to include it as we will always use
      wcscpy-ssse3.S

2. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-16 03:07:59 -07:00
Noah Goldstein
ceabdcd130 x86: Add support to build strcmp/strlen/strchr with explicit ISA level
1. Add default ISA level selection in non-multiarch/rtld
   implementations.

2. Add ISA level build guards to different implementations.
    - I.e strcmp-avx2.S which is ISA level 3 will only build if
      compiled ISA level <= 3. Otherwise there is no reason to
      include it as we will always use one of the ISA level 4
      implementations (strcmp-evex.S).

3. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-16 03:07:59 -07:00
Stefan Liebler
779aa039fc S390: Define SINGLE_THREAD_BY_GLOBAL only on s390x
Starting with commit e070501d12
"Replace __libc_multiple_threads with __libc_single_threaded"
the testcases nptl/tst-cancel-self and
nptl/tst-cancel-self-cancelstate are failing.

This is fixed by only defining SINGLE_THREAD_BY_GLOBAL on s390x,
but not on s390.

Starting with commit 09c76a7409
"Linux: Consolidate {RTLD_}SINGLE_THREAD_P definition",
SINGLE_THREAD_BY_GLOBAL was defined in
sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h.

Lateron the commit 9a973da617
"s390: Consolidate Linux syscall definition" consolidates the sysdep.h files
from s390-32/s390-64 subdirectories.  Unfortunately the macro is now always
defined instead of only on s390-64.

As information:
TLS_MULTIPLE_THREADS_IN_TCB is also only defined for s390.
See: sysdeps/s390/nptl/tls.h
2022-07-14 13:39:09 +02:00
Noah Goldstein
7c8ca17893 x86: Add missing rtm tests for strcmp family
Add new tests for:
    strcasecmp
    strncasecmp
    strcmp
    wcscmp

These functions all have avx2_rtm implementations so should be tested.
2022-07-13 14:55:31 -07:00
Noah Goldstein
42b014dd1b x86: Remove unneeded rtld-wmemcmp
wmemcmp isn't used by the dynamic loader so their no need to add an
RTLD stub for it.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
e19bb87c97 x86: Move wcslen SSE2 implementation to multiarch/wcslen-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00