Commit Graph

15452 Commits

Author SHA1 Message Date
Noah Goldstein
7775574ce0 x86: Use testb for case-locale check in str{n}casecmp-sse2
`testb` saves a bit of code size is the imm-operand can be encoded
1-bytes.

Tested on x86-64.
2022-10-20 11:29:05 -07:00
Noah Goldstein
b6d02d6457 x86: Use testb for case-locale check in str{n}casecmp-avx2
`testb` saves a bit of code size is the imm-operand can be encoded
1-bytes.

Tested on x86-64.
2022-10-20 11:29:05 -07:00
Noah Goldstein
5ce9766417 x86: Add support for VEC_SIZE == 64 in strcmp-evex.S impl
Unused at the moment, but evex512 strcmp, strncmp, strcasecmp{l}, and
strncasecmp{l} functions can be added by including strcmp-evex.S with
"x86-evex512-vecs.h" defined.

In addition save code size a bit in a few places.

1. tzcnt ...         -> bsf ...
2. vpcmp{b|d} $0 ... -> vpcmpeq{b|d}

This saves a touch of code size but has minimal net affect.

Full check passes on x86-64.
2022-10-20 11:29:05 -07:00
Noah Goldstein
c25eb94aed x86: Remove AVX512-BVMI2 instruction from strrchr-evex.S
commit b412213eee
    Author: Noah Goldstein <goldstein.w.n@gmail.com>
    Date:   Tue Oct 18 17:44:07 2022 -0700

        x86: Optimize strrchr-evex.S and implement with VMM headers

Added `vpcompress{b|d}` to the page-cross logic with is an
AVX512-VBMI2 instruction. This is not supported on SKX. Since the
page-cross logic is relatively cold and the benefit is minimal
revert the page-cross case back to the old logic which is supported
on SKX.

Tested on x86-64.
2022-10-20 11:29:05 -07:00
Felix Riemann
a885fc2d68 sysdeps: arm: Fix preconfigure script for ARMv8/v9 targets [BZ #29698]
The ARM preconfigure script tries to detect the capabilities of the
target platform by checking the compiler's predefined architecture
macros. However, if the compiler is tuning for AArch32 on ARMv8/v9 this
step fails:

checking for sysdeps preconfigure fragments... aarch64 alpha arc arm
WARNING: arm/preconfigure: Did not find ARM architecture type; using default

This is because preconfigure.ac doesn't escape the square brackets in
the glob for matching compilers targeting ARMv8. Adding another pair of
brackets to escape the first pair fixes this:

checking for sysdeps preconfigure fragments... aarch64 alpha arc arm
 Found compiler is configured for something newer than v7 - using v7

Signed-off-by: Felix Riemann <felix.riemann@sma.de>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-10-20 11:23:05 -03:00
Adhemerval Zanella
9b5e138f2b linux: Avoid shifting a negative signed on POSIX timer interface
The current macros uses pid as signed value, which triggers a compiler
warning for process and thread timers.  Replace MAKE_PROCESS_CPUCLOCK
with static inline function that expects the pid as unsigned.  These
are similar to what Linux does internally.

Checked on x86_64-linux-gnu.
Reviewed-by: Arjun Shankar <arjun@redhat.com>
2022-10-20 10:19:08 -03:00
Noah Goldstein
b412213eee x86: Optimize strrchr-evex.S and implement with VMM headers
Optimization is:
1. Cache latest result in "fast path" loop with `vmovdqu` instead of
  `kunpckdq`.  This helps if there are more than one matches.

Code Size Changes:
strrchr-evex.S       :  +30 bytes (Same number of cache lines)

Net perf changes:

Reported as geometric mean of all improvements / regressions from N=10
runs of the benchtests. Value as New Time / Old Time so < 1.0 is
improvement and 1.0 is regression.

strrchr-evex.S       : 0.932 (From cases with higher match frequency)

Full results attached in email.

Full check passes on x86-64.
2022-10-19 17:31:03 -07:00
Noah Goldstein
4af6844aa5 x86: Optimize memrchr-evex.S
Optimizations are:
1. Use the fact that lzcnt(0) -> VEC_SIZE for memchr to save a branch
   in short string case.
2. Save several instructions in len = [VEC_SIZE, 4 * VEC_SIZE] case.
3. Use more code-size efficient instructions.
	- tzcnt ...     -> bsf ...
	- vpcmpb $0 ... -> vpcmpeq ...

Code Size Changes:
memrchr-evex.S      :  -29 bytes

Net perf changes:

Reported as geometric mean of all improvements / regressions from N=10
runs of the benchtests. Value as New Time / Old Time so < 1.0 is
improvement and 1.0 is regression.

memrchr-evex.S      : 0.949 (Mostly from improvements in small strings)

Full results attached in email.

Full check passes on x86-64.
2022-10-19 17:31:03 -07:00
Noah Goldstein
b79f8ff26a x86: Optimize strnlen-evex.S and implement with VMM headers
Optimizations are:
1. Use the fact that bsf(0) leaves the destination unchanged to save a
   branch in short string case.
2. Restructure code so that small strings are given the hot path.
        - This is a net-zero on the benchmark suite but in general makes
      sense as smaller sizes are far more common.
3. Use more code-size efficient instructions.
	- tzcnt ...     -> bsf ...
	- vpcmpb $0 ... -> vpcmpeq ...
4. Align labels less aggressively, especially if it doesn't save fetch
   blocks / causes the basic-block to span extra cache-lines.

The optimizations (especially for point 2) make the strnlen and
strlen code essentially incompatible so split strnlen-evex
to a new file.

Code Size Changes:
strlen-evex.S       :  -23 bytes
strnlen-evex.S      : -167 bytes

Net perf changes:

Reported as geometric mean of all improvements / regressions from N=10
runs of the benchtests. Value as New Time / Old Time so < 1.0 is
improvement and 1.0 is regression.

strlen-evex.S       : 0.992 (No real change)
strnlen-evex.S      : 0.947

Full results attached in email.

Full check passes on x86-64.
2022-10-19 17:31:03 -07:00
Noah Goldstein
69717709ec x86: Shrink / minorly optimize strchr-evex and implement with VMM headers
Size Optimizations:
1. Condence hot path for better cache-locality.
    - This is most impact for strchrnul where the logic strings with
      len <= VEC_SIZE or with a match in the first VEC no fits entirely
      in the first cache line.
2. Reuse common targets in first 4x VEC and after the loop.
3. Don't align targets so aggressively if it doesn't change the number
   of fetch blocks it will require and put more care in avoiding the
   case where targets unnecessarily split cache lines.
4. Align the loop better for DSB/LSD
5. Use more code-size efficient instructions.
	- tzcnt ...     -> bsf ...
	- vpcmpb $0 ... -> vpcmpeq ...
6. Align labels less aggressively, especially if it doesn't save fetch
   blocks / causes the basic-block to span extra cache-lines.

Code Size Changes:
strchr-evex.S	: -63 bytes
strchrnul-evex.S: -48 bytes

Net perf changes:
Reported as geometric mean of all improvements / regressions from N=10
runs of the benchtests. Value as New Time / Old Time so < 1.0 is
improvement and 1.0 is regression.

strchr-evex.S (Fixed)   : 0.971
strchr-evex.S (Rand)    : 0.932
strchrnul-evex.S        : 0.965

Full results attached in email.

Full check passes on x86-64.
2022-10-19 17:31:03 -07:00
Noah Goldstein
330881763e x86: Optimize memchr-evex.S and implement with VMM headers
Optimizations are:

1. Use the fact that tzcnt(0) -> VEC_SIZE for memchr to save a branch
   in short string case.
2. Restructure code so that small strings are given the hot path.
	- This is a net-zero on the benchmark suite but in general makes
      sense as smaller sizes are far more common.
3. Use more code-size efficient instructions.
	- tzcnt ...     -> bsf ...
	- vpcmpb $0 ... -> vpcmpeq ...
4. Align labels less aggressively, especially if it doesn't save fetch
   blocks / causes the basic-block to span extra cache-lines.

The optimizations (especially for point 2) make the memchr and
rawmemchr code essentially incompatible so split rawmemchr-evex
to a new file.

Code Size Changes:
memchr-evex.S       : -107 bytes
rawmemchr-evex.S    :  -53 bytes

Net perf changes:

Reported as geometric mean of all improvements / regressions from N=10
runs of the benchtests. Value as New Time / Old Time so < 1.0 is
improvement and 1.0 is regression.

memchr-evex.S       : 0.928
rawmemchr-evex.S    : 0.986 (Less targets cross cache lines)

Full results attached in email.

Full check passes on x86-64.
2022-10-19 17:31:03 -07:00
Sunil K Pandey
451c6e5854 x86_64: Implement evex512 version of memchr, rawmemchr and wmemchr
This patch implements following evex512 version of string functions.
evex512 version takes up to 30% less cycle as compared to evex,
depending on length and alignment.

- memchr function using 512 bit vectors.
- rawmemchr function using 512 bit vectors.
- wmemchr function using 512 bit vectors.

Code size data:

memchr-evex.o		762 byte
memchr-evex512.o	576 byte (-24%)

rawmemchr-evex.o	461 byte
rawmemchr-evex512.o	412 byte (-11%)

wmemchr-evex.o		794 byte
wmemchr-evex512.o	552 byte (-30%)

Placeholder function, not used by any processor at the moment.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2022-10-18 13:26:33 -07:00
Florian Weimer
58548b9d68 Use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources
In the future, this will result in a compilation failure if the
macros are unexpectedly undefined (due to header inclusion ordering
or header inclusion missing altogether).

Assembler sources are more difficult to convert.  In many cases,
they are hand-optimized for the mangling and no-mangling variants,
which is why they are not converted.

sysdeps/s390/s390-32/__longjmp.c and sysdeps/s390/s390-64/__longjmp.c
are special: These are C sources, but most of the implementation is
in assembler, so the PTR_DEMANGLE macro has to be undefined in some
cases, to match the assembler style.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-10-18 17:04:10 +02:00
Florian Weimer
88f4b6929c Introduce <pointer_guard.h>, extracted from <sysdep.h>
This allows us to define a generic no-op version of PTR_MANGLE and
PTR_DEMANGLE.  In the future, we can use PTR_MANGLE and PTR_DEMANGLE
unconditionally in C sources, avoiding an unintended loss of hardening
due to missing include files or unlucky header inclusion ordering.

In i386 and x86_64, we can avoid a <tls.h> dependency in the C
code by using the computed constant from <tcb-offsets.h>.  <sysdep.h>
no longer includes these definitions, so there is no cyclic dependency
anymore when computing the <tcb-offsets.h> constants.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-10-18 17:03:55 +02:00
Florian Weimer
246f37d6b1 x86-64: Move LP_SIZE definition to its own header
This way, we can define the pointer guard macros without including
<sysdep.h> on x86-64.  Other architectures will not have such an
inclusion dependency, and the implied header file inclusion would
create a porting hazard.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-10-18 17:02:08 +02:00
Szabolcs Nagy
7363a9a9a0 math: Fix asin and acos invalid exception with old gcc
This works around a gcc issue where it const folded inf/inf into nan,
preventing the invalid exception to be signalled.

(x-x)/(x-x) is more robust against optimizations and works for all
out of bounds values including x==nan.

The gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115
should be fixed on release branches starting from gcc-10, but it is
better to change the code in case glibc is built with older gcc.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2022-10-17 08:18:52 +01:00
Noah Goldstein
be066536bd x86: Update strlen-evex-base to use new reg/vec macros.
To avoid duplicate the VMM / GPR / mask insn macros in all incoming
evex512 files use the macros defined in 'reg-macros.h' and
'{vec}-macros.h'

This commit does not change libc.so

Tested build on x86-64
2022-10-14 21:21:58 -07:00
Noah Goldstein
47f5d51461 x86: Remove now unused vec header macros.
This commit does not change libc.so

Tested build on x86-64
2022-10-14 21:21:58 -07:00
Noah Goldstein
a6784653f7 x86: Update memset to use new VEC macros
Replace %VEC(n) -> %VMM(n)

This commit does not change libc.so

Tested build on x86-64
2022-10-14 21:21:58 -07:00
Noah Goldstein
4fb7d8a938 x86: Update memmove to use new VEC macros
Replace %VEC(n) -> %VMM(n)

This commit does not change libc.so

Tested build on x86-64
2022-10-14 21:21:58 -07:00
Noah Goldstein
3088a66ff8 x86: Update memrchr to use new VEC macros
Replace %VEC(n) -> %VMM(n)

This commit does not change libc.so

Tested build on x86-64
2022-10-14 21:21:58 -07:00
Noah Goldstein
52ab7604db x86: Update VEC macros to complete API for evex/evex512 impls
1) Copy so that backport will be easier.
2) Make section only define if there is not a previous definition
3) Add `VEC_lo` definition for proper reg-width but in the
   ymm/zmm0-15 range.
4) Add macros for accessing GPRs based on VEC_SIZE
        This is to make it easier to do think like:
        ```
            vpcmpb %VEC(0), %VEC(1), %k0
            kmov{d|q} %k0, %{eax|rax}
            test %{eax|rax}
        ```
        It adds macro s.t any GPR can get the proper width with:
            `V{upcase_GPR_name}`

        and any mask insn can get the proper width with:
            `{upcase_mask_insn_without_postfix}`

This commit does not change libc.so

Tested build on x86-64
2022-10-14 21:21:58 -07:00
Joseph Myers
3bd18aa4d1 Add AArch64 HWCAP2_EBF16 from Linux 6.0 to bits/hwcap.h
Linux 6.0 adds a new AArch64 HWCAP2 bit, HWCAP2_EBF16.  Add this to
glibc's bits/hwcap.h.

Tested with build-many-glibcs.py for aarch64-linux-gnu.
2022-10-12 14:28:14 +00:00
Adhemerval Zanella
5355f9ca7b elf: Remove -fno-tree-loop-distribute-patterns usage on dl-support
Besides the option being gcc specific, this approach is still fragile
and not future proof since we do not know if this will be the only
optimization option gcc will add that transforms loops to memset
(or any libcall).

This patch adds a new header, dl-symbol-redir-ifunc.h, that can b
used to redirect the compiler generated libcalls to port the generic
memset implementation if required.

Checked on x86_64-linux-gnu and aarch64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-10-10 10:32:28 -03:00
Andreas Schwab
954b8f3895 Expose all MAP_ constants in <sys/mman.h> unconditionally (bug 29375)
POSIX reserves the MAP_ prefix for <sys/mman.h>, so there is no need to
conditionalize their definitions on feature test macros.
2022-10-10 09:30:24 +02:00
Xi Ruoyao
589eda82bb LoongArch: Fix the condition to use PC-relative addressing in start.S
A start.o compiled from start.S with -DPIC and no -DSHARED is used by
both crt1.o and rcrt1.o.  So the LoongArch static PIE patch
unintentionally introduced PC-relative addressing for main and
__libc_start_main into crt1.o.

While the latest Binutils (trunk, which will be released as 2.40)
supports the PC-relative relocs against an external function by creating
a PLT entry, the 2.39 release branch doesn't (and won't) support this.
An error is raised:

    "PLT stub does not represent and symbol not defined."

So, we need the following changes:

1. Check if ld supports the PC-relative relocs against an external
   function.  If it's not supported, we deem static PIE unsupported.
2. Change start.S.  If static PIE is supported, use PC-relative
   addressing for main and __libc_start_main and rely on the linker to
   create PLT entries.  Otherwise, restore the old behavior (using GOT
   to address these functions).

An alternative would be adding a new "static-pie-start.S", and some
custom logic into Makefile to build rcrt1.o with it.  And, restore
start.S to the state before static PIE change so crt1.o won't contain
PC-relative relocs against external symbols.  But I can't see any
benefit of this alternative, so I'd just keep it simple.

Tested by building glibc with the following configurations:

1. Binutils trunk + GCC trunk.  Static PIE enabled.  All tests
   passed.
2. Binutils 2.39 branch + GCC trunk.  Static PIE disabled.  Tests
   related to ifunc failed (it's a known issue).  All other tests
   passed.
3. Binutils 2.39 branch + GCC 12 branch, cross compilation with
   build-many-glibcs.py from x86_64-linux-gnu.  Static PIE disabled.
   Build succeeded.
2022-10-08 16:34:45 +08:00
Adhemerval Zanella
f9646d138f arm: Enable USE_ATOMIC_COMPILER_BUILTINS (BZ #24774)
As per other architectures.  I have checked on a armv8 hardware with
the following configurations:

  arm-linux-gnueabihf (gcc built with --with-float=hard --with-cpu=arm926ej-s)
  armv5-linux-gnueabihf (-march=armv5te -mfpu=vfpv3)
  armv7-linux-gnueabihf (-march=armv7-a -mfpu=vfpv3)
  armv7-thumb-linux-gnueabihf (-march=armv7-a -mfpu=vfpv3 -mthumb)
  armv7-neon-linux-gnueabihf (-march=armv7-a -mfpu=neon)
  armv7-neonhard-linux-gnueabihf (-march=armv7-a -mfpu=neon -mfloat-abi=hard)

Without any regression.

I haven't dig into the code, but since Linux atomic-machine.h handle
pre-ARMv6 and ARMv6 I expect the compiler might have some small room
to optimize.

The code size also improves is most of the configurations:

* master

   text    data     bss     dec     hex filename
1727801    9720   37928 1775449  1b1759	 arm-linux-gnueabihf/libc.so
1691729    9720   37928 1739377  1a8a71	 arm-linux-gnueabihf-armv7-disable-multi-arch/libc.so
1725509    9720   37928 1773157  1b0e65	 armv5-linux-gnueabihf/libc.so
1700757    9720   37928 1748405  1aadb5	 armv6-linux-gnueabihf/libc.so
1698973    9720   37928 1746621  1aa6bd	 armv6t2-linux-gnueabihf/libc.so
1695481    9752   37928 1743161  1a9939	 armv7-linux-gnueabihf/libc.so
1692917    9744   37928 1740589  1a8f2d	 armv7-neonhard-linux-gnueabihf/libc.so
1692917    9744   37928 1740589  1a8f2d	 armv7-neon-linux-gnueabihf/libc.so
1225353    9752   37928 1273033  136cc9	 armv7-thumb-linux-gnueabihf/libc.so

* patched

   text    data     bss     dec     hex filename
1726805    9720   37928 1774453  1b1375 arm-linux-gnueabihf/libc.so
1689321    9720   37928 1736969  1a8109	arm-linux-gnueabihf-armv7-disable-multi-arch/libc.so
1724433    9720   37928 1772081  1b0a31 armv5-linux-gnueabihf/libc.so
1698301    9720   37928 1745949  1aa41d armv6-linux-gnueabihf/libc.so
1696525    9720   37928 1744173  1a9d2d armv6t2-linux-gnueabihf/libc.so
1693009    9752   37928 1740689  1a8f91 armv7-linux-gnueabihf/libc.so
1690493    9744   37928 1738165  1a85b5 armv7-neonhard-linux-gnueabihf/libc.so
1690493    9744   37928 1738165  1a85b5 armv7-neon-linux-gnueabihf/libc.so
1223837    9752   37928 1271517  1366dd armv7-thumb-linux-gnueabihf/libc.so

The idea is eventually move all architectures to use compiler builtins.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
2022-10-07 16:19:20 -03:00
Javier Pello
ab40f20364 elf: Remove _dl_string_hwcap
Removal of legacy hwcaps support from the dynamic loader left
no users of _dl_string_hwcap.

Signed-off-by: Javier Pello <devel@otheo.eu>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-10-06 07:59:48 -03:00
Javier Pello
4a7094119c elf: Remove hwcap parameter from add_to_cache signature
Last commit made it so that the value passed for that parameter was
always 0 at its only call site.

Signed-off-by: Javier Pello <devel@otheo.eu>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-10-06 07:59:48 -03:00
Javier Pello
d178c67535 x86_64: Remove platform directory library loading test
This was to test loading of shared libraries from platform
subdirectories, but this functionality is going away in the
following commits.

Signed-off-by: Javier Pello <devel@otheo.eu>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-10-06 07:59:48 -03:00
Joseph Myers
27d67e974e Update kernel version to 6.0 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.0.  (There are no new
constants covered by these tests in 6.0 that need any other header
changes.)

Tested with build-many-glibcs.py.
2022-10-05 22:11:27 +00:00
Adhemerval Zanella Netto
9dc4e29f63 x86: Fix -Os build (BZ #29576)
The compiler might transform __stpcpy calls (which are routed to
__builtin_stpcpy as an optimization) to strcpy and x86_64 strcpy
multiarch implementation does not build any working symbol due
ISA_SHOULD_BUILD not being evaluated for IS_IN(rtld).

Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-10-05 18:04:13 -03:00
Joseph Myers
a878a1384c Regenerate sysdeps/mach/hurd/bits/errno.h
This addition to the list of source headers in
sysdeps/mach/hurd/bits/errno.h appears in the source tree after
build-many-glibcs.py runs, I'm guessing resulting from gnumach commit
c566ad85a2d6728ebc8ec0f461a3b35df300e96e.
2022-10-05 19:21:25 +00:00
Joseph Myers
919b9bfaa9 Update syscall lists for Linux 6.0
Linux 6.0 has no new syscalls.  Update the version number in
syscall-names.list to reflect that it is still current for 6.0.

Tested with build-many-glibcs.py.
2022-10-05 14:33:14 +00:00
Aurelien Jarno
7e8283170c x86-64: Require BMI1/BMI2 for AVX2 strrchr and wcsrchr implementations
The AVX2 strrchr and wcsrchr implementation uses the 'blsmsk'
instruction which belongs to the BMI1 CPU feature and the 'shrx'
instruction, which belongs to the BMI2 CPU feature.

Fixes: df7e295d18 ("x86: Optimize {str|wcs}rchr-avx2")
Partially resolves: BZ #29611

Reviewed-by: Noah Goldstein  <goldstein.w.n@gmail.com>
2022-10-03 23:46:11 +02:00
Aurelien Jarno
3c0c78afab x86-64: Require BMI2 and LZCNT for AVX2 memrchr implementation
The AVX2 memrchr implementation uses the 'shlxl' instruction, which
belongs to the BMI2 CPU feature and uses the 'lzcnt' instruction, which
belongs to the LZCNT CPU feature.

Fixes: af5306a735 ("x86: Optimize memrchr-avx2.S")
Partially resolves: BZ #29611

Reviewed-by: Noah Goldstein  <goldstein.w.n@gmail.com>
2022-10-03 23:46:11 +02:00
Aurelien Jarno
e3e7fab7fe x86-64: Require BMI2 for AVX2 (raw|w)memchr implementations
The AVX2 memchr, rawmemchr and wmemchr implementations use the 'bzhi'
and 'sarx' instructions, which belongs to the BMI2 CPU feature.

Fixes: acfd088a19 ("x86: Optimize memchr-avx2.S")
Partially resolves: BZ #29611

Reviewed-by: Noah Goldstein  <goldstein.w.n@gmail.com>
2022-10-03 23:46:11 +02:00
Aurelien Jarno
f31a5a884e x86-64: Require BMI2 for AVX2 wcs(n)cmp implementations
The AVX2 wcs(n)cmp implementations use the 'bzhi' instruction, which
belongs to the BMI2 CPU feature.

NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
as BSF if the CPU doesn't support TZCNT, and produces the same result
for non-zero input.

Partially fixes: b77b06e0e2 ("x86: Optimize strcmp-avx2.S")
Partially resolves: BZ #29611

Reviewed-by: Noah Goldstein  <goldstein.w.n@gmail.com>
2022-10-03 23:46:11 +02:00
Aurelien Jarno
fc7de1d9b9 x86-64: Require BMI2 for AVX2 strncmp implementation
The AVX2 strncmp implementations uses the 'bzhi' instruction, which
belongs to the BMI2 CPU feature.

NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
as BSF if the CPU doesn't support TZCNT, and produces the same result
for non-zero input.

Partially fixes: b77b06e0e2 ("x86: Optimize strcmp-avx2.S")
Partially resolves: BZ #29611

Reviewed-by: Noah Goldstein  <goldstein.w.n@gmail.com>
2022-10-03 23:46:11 +02:00
Aurelien Jarno
4d64c64457 x86-64: Require BMI2 for AVX2 strcmp implementation
The AVX2 strcmp implementation uses the 'bzhi' instruction, which
belongs to the BMI2 CPU feature.

NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
as BSF if the CPU doesn't support TZCNT, and produces the same result
for non-zero input.

Partially fixes: b77b06e0e2 ("x86: Optimize strcmp-avx2.S")
Partially resolves: BZ #29611

Reviewed-by: Noah Goldstein  <goldstein.w.n@gmail.com>
2022-10-03 23:46:11 +02:00
Aurelien Jarno
10f79d3670 x86-64: Require BMI2 for AVX2 str(n)casecmp implementations
The AVX2 str(n)casecmp implementations use the 'bzhi' instruction, which
belongs to the BMI2 CPU feature.

NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
as BSF if the CPU doesn't support TZCNT, and produces the same result
for non-zero input.

Partially fixes: b77b06e0e2 ("x86: Optimize strcmp-avx2.S")
Partially resolves: BZ #29611

Reviewed-by: Noah Goldstein  <goldstein.w.n@gmail.com>
2022-10-03 23:46:11 +02:00
Aurelien Jarno
b80f16adbd x86: include BMI1 and BMI2 in x86-64-v3 level
The "System V Application Binary Interface AMD64 Architecture Processor
Supplement" mandates the BMI1 and BMI2 CPU features for the x86-64-v3
level.

Reviewed-by: Noah Goldstein  <goldstein.w.n@gmail.com>
2022-10-03 23:46:11 +02:00
Noah Goldstein
653c12c7d8 x86: Cleanup pthread_spin_{try}lock.S
Save a jmp on the lock path coming from an initial failure in
pthread_spin_lock.S.  This costs 4-bytes of code but since the
function still fits in the same number of 16-byte blocks (default
function alignment) it does not have affect on the total binary size
of libc.so (unchanged after this commit).

pthread_spin_trylock was using a CAS when a simple xchg works which
is often more expensive.

Full check passes on x86-64.
2022-10-03 14:13:49 -07:00
Adhemerval Zanella
114e299ca6 x86: Remove .tfloat usage
Some compiler does not support it (such as clang integrated assembler)
neither gcc emits it.
2022-10-03 14:03:21 -03:00
John David Anglin
b7bd94068e hppa: Fix initialization of dp register [BZ 29635]
After upgrading glibc to Debian 2.35-1, gdb faulted on
startup and dropped core in a function call in the main
application.  This was caused by not initializing the
global dp register for the main application early enough.

Restore the code to initialize dp in _dl_start_user.
It was removed when code was added to initialize dp in
elf_machine_runtime_setup.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2022-10-01 19:49:25 +00:00
Adhemerval Zanella
609c9d0951 malloc: Do not clobber errno on __getrandom_nocancel (BZ #29624)
Use INTERNAL_SYSCALL_CALL instead of INLINE_SYSCALL_CALL.  This
requires emulate the semantic for hurd call (so __arc4random_buf
uses the fallback).

Checked on x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2022-09-30 15:25:15 -03:00
Adhemerval Zanella
13db9ee2cb stdlib: Fix __getrandom_nocancel type and arc4random usage (BZ #29638)
Using an unsigned type prevents the fallback to be used if kernel
does not support getrandom syscall.

Checked on x86_64-linux-gnu.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2022-09-30 15:24:49 -03:00
Xi Ruoyao
8b10727a9a LoongArch: Add static PIE support
If the compiler is new enough, enable static PIE support.  In the static
PIE version of _start (in rcrt1.o), use la.pcrel instead of la.got
because in a static PIE we cannot use GOT entries until the dynamic
relocations for GOT are resolved.
2022-09-30 11:51:58 +08:00
Noah Goldstein
b0969fa53a x86: Fix wcsnlen-avx2 page cross length comparison [BZ #29591]
Previous implementation was adjusting length (rsi) to match
bytes (eax), but since there is no bound to length this can cause
overflow.

Fix is to just convert the byte-count (eax) to length by dividing by
sizeof (wchar_t) before the comparison.

Full check passes on x86-64 and build succeeds w/ and w/o multiarch.
2022-09-28 20:15:16 -07:00
Joseph Myers
3e5760fcb4 Update _FloatN header support for C++ in GCC 13
GCC 13 adds support for _FloatN and _FloatNx types in C++, so breaking
the installed glibc headers that assume such support is not present.
GCC mostly works around this with fixincludes, but that doesn't help
for building glibc and its tests (glibc doesn't itself contain C++
code, but there's C++ code built for tests).  Update glibc's
bits/floatn-common.h and bits/floatn.h headers to handle the GCC 13
support directly.

In general the changes match those made by fixincludes, though I think
the ones in sysdeps/powerpc/bits/floatn.h, where the header tests
__LDBL_MANT_DIG__ == 113 or uses #elif, wouldn't match the existing
fixincludes patterns.

Some places involving special C++ handling in relation to _FloatN
support are not changed.  There's no need to change the
__HAVE_FLOATN_NOT_TYPEDEF definition (also in a form that wouldn't be
matched by the fixincludes fixes) because it's only used in relation
to macro definitions using features not supported for C++
(__builtin_types_compatible_p and _Generic).  And there's no need to
change the inline function overloads for issignaling, iszero and
iscanonical in C++ because cases where types have the same format but
are no longer compatible types are handled automatically by the C++
overload resolution rules.

This patch also does not change the overload handling for iseqsig, and
there I think changes *are* needed, beyond those in this patch or made
by fixincludes.  The way that overload is defined, via a template
parameter to a structure type, requires overloads whenever the types
are incompatible, even if they have the same format.  So I think we
need to add overloads with GCC 13 for every supported _FloatN and
_FloatNx type, rather than just having one for _Float128 when it has a
different ABI to long double as at present (but for older GCC, such
overloads must not be defined for types that end up defined as
typedefs for another type).

Tested with build-many-glibcs.py: compilers build for
aarch64-linux-gnu ia64-linux-gnu mips64-linux-gnu powerpc-linux-gnu
powerpc64le-linux-gnu x86_64-linux-gnu; glibcs build for
aarch64-linux-gnu ia64-linux-gnu i686-linux-gnu mips-linux-gnu
mips64-linux-gnu-n32 powerpc-linux-gnu powerpc64le-linux-gnu
x86_64-linux-gnu.
2022-09-28 20:10:08 +00:00
Samuel Thibault
d7f32c9958 hurd: Fix typo 2022-09-28 19:21:44 +02:00
Jörg Sonnenberger
c9226c03da get_nscd_addresses: Fix subscript typos [BZ #29605]
Fix the subscript on air->family, which was accidentally set to COUNT
when it should have remained as I.

Resolves: BZ #29605

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-09-28 12:47:10 -04:00
Samuel Thibault
7de3f0a96c hurd: Increase SOMAXCONN to 4096
Notably fakeroot-tcp may introduce a lot of parallel connections.
2022-09-27 23:37:42 +02:00
Wilco Dijkstra
22f4ab2d20 Use atomic_exchange_release/acquire
Rename atomic_exchange_rel/acq to use atomic_exchange_release/acquire
since these map to the standard C11 atomic builtins.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-09-26 16:58:08 +01:00
Wilco Dijkstra
4a07fbb689 Use C11 atomics instead of atomic_decrement_and_test
Replace atomic_decrement_and_test with atomic_fetch_add_relaxed.
These are simple counters which do not protect any shared data from
concurrent accesses. Also remove the unused file cond-perf.c.

Passes regress on AArch64.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-09-23 15:59:56 +01:00
Wilco Dijkstra
d1babeb32d Use C11 atomics instead of atomic_increment(_val)
Replace atomic_increment and atomic_increment_val with atomic_fetch_add_relaxed.
One case in sem_post.c uses release semantics (see comment above it).
The others are simple counters and do not protect any shared data from
concurrent accesses.

Passes regress on AArch64.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-09-23 15:59:56 +01:00
Alistair Francis
2e81493fa6 riscv: Remove RV32 floating point functions
We don't need RV32 specific floating point functions, instead make them
generic for RISC-V.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-09-21 14:37:43 -04:00
Alistair Francis
73e9fe43ac riscv: Consolidate the libm-test-ulps
Both RV32 and RV64 should have the same libm-test-ulps, so consolidate
them into a single file.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-09-21 14:37:13 -04:00
Samuel Thibault
385f2ecda9 hurd: Fix SIOCADD/DELRT ioctls
The hurd network stack uses struct ifrtreq rather than ortentry.
2022-09-21 19:58:44 +02:00
Samuel Thibault
b84199eb18 hurd: Drop struct rtentry and in6_rtmsg
These were cargo-culted, they are not used at all in Hurd interfaces.
2022-09-21 19:58:44 +02:00
Damien Zammit
9ba0f010a6 hurd: Add _IOT_ifrtreq to <net/route.h>
So that we can use struct ifrtreq in ioctls.
2022-09-21 19:58:44 +02:00
Samuel Thibault
c0c9092f75 hurd: Use IF_NAMESIZE rather than IFNAMSIZ
The latter is not available without __USE_MISC.
2022-09-21 08:51:50 +02:00
Damien Zammit
ffd0b295d9 hurd: Add ifrtreq structure to net/route.h
As used by the hurdish route ioctls.
2022-09-21 00:42:13 +02:00
John David Anglin
fa47e8e6df hppa: undef __ASSUME_SET_ROBUST_LIST
QEMU does not support support set_robust_list. Thus, we need
to enable detection of set_robust_list system call.

Signed-off-by: John David Anglin <dave.anglin@bell.net>
2022-09-20 20:14:14 +00:00
Adhemerval Zanella
85a3228744 linux: Use same type for MMAP2_PAGE_UNIT
It avoid a possible compiler warning where right size of operator
is converted from a negative value to unsigned.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-09-20 10:57:40 -03:00
Adhemerval Zanella
aeb4d2e981 m68k: Enforce 4-byte alignment on internal locks (BZ #29537)
A new internal definition, __LIBC_LOCK_ALIGNMENT, is used to force
the 4-byte alignment only for m68k, other architecture keep the
natural alignment of the type used internally (and hppa does not
require 16-byte alignment for kernel-assisted CAS).

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-09-20 10:56:54 -03:00
Florian Weimer
766b73768b Linux: Do not skip d_ino == 0 entries in readdir, readdir64 (bug 12165)
POSIX does not say this value is special.  For example, old XFS file
systems may still use inode number zero.

Also update the comment regarding ENOENT.  Linux may return ENOENT
for some file systems.
2022-09-19 12:04:57 +02:00
Samuel Thibault
7ae60af75b hurd: Factorize at/non-at functions
Non-at functions can be implemented by just calling the corresponding at
function with AT_FDCWD and zero at_flags.

In the linkat case, the at behavior is different (O_NOLINK), so this introduces
__linkat_common to pass O_NOLINK as appropriate.

lstat functions can also be implemented with fstatat by adding
__fstatat64_common which takes a flags parameter in addition to the at_flags
parameter,

In the end this factorizes chmod, chown, link, lstat64, mkdir, readlink,
rename, stat64, symlink, unlink, utimes.

This also makes __lstat, __lxstat64, __stat and __xstat64 directly use
__fstatat64_common instead of __lstat64 or __stat64.
2022-09-17 19:58:30 +00:00
Łukasz Stelmach
22c96052ac RISC-V: Allow long jumps to __syscall_error
__syscall_error may end up farther than 1MiB away from a caller,
especially when linking statically large binaries. tail allows for
4GiB jumps and is reduced to j when a linked symbol is within range.

Fixes: 36960f0c76 ("RISC-V: Linux Syscall Interface")
Fixes: 7f33b09c65 ("RISC-V: Linux ABI")
Signed-off-by: Łukasz Stelmach <l.stelmach@samsung.com>
2022-09-16 23:25:45 -04:00
Samuel Thibault
5652e12cce hurd: Make readlink* just reopen the file used for stat
9e5c991106 ("hurd: Fix readlink() hanging on fifo") separated opening
the file for the stat call from opening the file for the read call. That
however opened a small window for the file to change. Better make this
atomic by reopening the file with O_READ.
2022-09-15 21:53:57 +02:00
Samuel Thibault
9e5c991106 hurd: Fix readlink() hanging on fifo
readlink() opens the target with O_READ to be able to read the symlink
content. When the target is actually a fifo, that would hang waiting for a
writer (caught in the coreutils testsuite). We thus have to first lookup the
target without O_READ to perform io_stat and lookout for fifos, and only
after checking the symlink type, we can re-lookup with O_READ.
2022-09-14 18:57:44 +02:00
Wilco Dijkstra
a30e960328 Use relaxed atomics since there is no MO dependence
Replace the 3 uses of atomic_bit_set and atomic_bit_test_set with
atomic_fetch_or_relaxed.  Using relaxed MO is correct since the
atomics are used to ensure memory is released only once.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-09-13 11:58:07 +01:00
Wilco Dijkstra
53b251c9ff Use C11 atomics instead atomic_add(_zero)
Replace atomic_add and atomic_add_zero with atomic_fetch_add_relaxed.

Reviewed-by: DJ Delorie <dj@redhat.com>
2022-09-09 14:11:23 +01:00
Andreas Schwab
3d7d5c10c8 errlist: add missing entry for EDEADLOCK (bug 29545)
Some architectures (mips, powerpc and sparc) define separate values for
EDEADLOCK and EDEADLK.  Readd the errlist entry for EDEADLOCK for those
configurations.  Also use the dependency files from generating the
auxiliary errlist and siglist files.
2022-09-08 11:40:24 +02:00
Joseph Myers
b8cc607f3c Do not define static_assert or thread_local in headers for C2x
C2x makes static_assert and thread_local into keywords, removing the
definitions as macros in assert.h and threads.h.  Thus, disable those
macros in those glibc headers for C2x.

The disabling is done based on a combination of language version and
__GNUC_PREREQ, *not* based on __GLIBC_USE (ISOC2X), on the principle
that users of the header (when requesting C11 or later APIs - not
assert.h for C99 and older API versions) should always have the names
static_assert or thread_local available after inclusion of the header,
whether as a keyword or as a macro.  Thus, when using a compiler
without the keywords (whether an older compiler, possibly in C2x mode,
or _GNU_SOURCE with any compiler but in an older language mode, for
example) the macros should be defined, even when C2x APIs have been
requested.  The __GNUC_PREREQ conditionals here may well need updating
with the versions of other compilers that gained support for these
keywords in C2x mode.

Tested for x86_64.
2022-09-07 18:39:28 +00:00
Florian Weimer
dbb75513f5 elf: Rename _dl_sort_maps parameter from skip to force_first
The new implementation will not be able to skip an arbitrary number
of objects.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-09-06 07:38:33 +02:00
Adhemerval Zanella
2fc7320668 math: x86: Use prefix for FP_INIT_ROUNDMODE
Not all compilers support the inline asm prefix '%v' to emit the avx
instruction if AVX is enable.  Use a prefix instead.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2022-09-05 10:54:41 -03:00
caiyinyu
930993921f LoongArch: Add soft float support. 2022-09-01 09:10:08 +08:00
Adhemerval Zanella
8cd559cf5a nptl: x86_64: Use same code for CURRENT_STACK_FRAME and stackinfo_get_sp
It avoids the possible warning of uninitialized 'frame' variable when
building with clang:

  ../sysdeps/nptl/jmp-unwind.c:27:42: error: variable 'frame' is
  uninitialized when used here [-Werror,-Wuninitialized]
    __pthread_cleanup_upto (env->__jmpbuf, CURRENT_STACK_FRAME);

The resulting code is similar to CURRENT_STACK_FRAME.

Checked on x86_64-linux-gnu.
2022-08-31 09:04:27 -03:00
Adhemerval Zanella
ddcf5a9170 posix: Fix macro expansion producing 'defined' has undefined behavior
The NEED_CHECK_SPEC is defined as:

  #define NEED_CHECK_SPEC \
    (!defined _XBS5_ILP32_OFF32 || !defined _XBS5_ILP32_OFFBIG \
     || !defined _XBS5_LP64_OFF64 || !defined _XBS5_LPBIG_OFFBIG \
     || !defined _POSIX_V6_ILP32_OFF32 || !defined _POSIX_V6_ILP32_OFFBIG \
     || !defined _POSIX_V6_LP64_OFF64 || !defined _POSIX_V6_LPBIG_OFFBIG \
     || !defined _POSIX_V7_ILP32_OFF32 || !defined _POSIX_V7_ILP32_OFFBIG \
     || !defined _POSIX_V7_LP64_OFF64 || !defined _POSIX_V7_LPBIG_OFFBIG)

Which is undefined behavior accordingly to C Standard (Preprocessing
directives, p4).

Checked on x86_64-linux-gnu.
2022-08-30 08:40:47 -03:00
Stefan Liebler
e57d8fc97b S390: Always use svc 0
On s390x syscalls are triggered by svc instruction. One can
pass the syscall number encoded in the instruction "svc 123"
or by storing it in r1:
lghi r1,123
svc 0

If the syscall number is encoded in the instruction, this can
cause broken syscall restarts.  Therefore this patch is now just
passing the syscall number in r1.

See also kernel-commit:
"s390/signal: switch to using vdso for sigreturn and syscall restart"
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/s390/[%e2%80%a6]call.c?h=v6.0-rc1&id=df29a7440c4b5c65765c8f60396b3b13063e24e9

As information, the "svc 0" feature was introduced in kernel 2.5.62:
commit b5aad611393ef2e132e3648fa4c6e56a9cfa8708
2022-08-30 10:54:46 +02:00
Xi Ruoyao
241603123c LoongArch: Use __builtin_{fmax,fmaxf,fmin,fminf} with GCC >= 13
GCC 13 compiles these built-ins to {fmax,fmin}.{s/d} instruction, use
them instead of the generic implementation.

Link: https://gcc.gnu.org/r13-2085
Signed-off-by: Xi Ruoyao <xry111@xry111.site>
2022-08-30 11:59:15 +08:00
caiyinyu
fa9e095bbe LoongArch: Fix ptr mangling/demangling features. 2022-08-30 11:45:22 +08:00
Richard Henderson
51231c469b Makeconfig: Set pie-ccflag to -fPIE by default [BZ# 29514]
We should default to the larger code model, in order to support
larger applications built with -static -pie.  This should be
consistent with pic-ccflag, which defaults to -fPIC.

Remove the now redundant override from sysdeps/sparc/Makefile.
Note that -fno-pie and -fno-PIE have the same effect.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2022-08-29 09:03:00 -04:00
Samuel Thibault
063f7462da hurd: Fix vm_size_t incoherencies
In gnumach, 3e1702a65fb3 ("add rpc_versions for vm types") changed the type
of vm_size_t, making it always a unsigned long. This made it incompatible on
x86 with size_t. Even if we may want to revert it to unsigned int, it's
better to fix the types of parameters according to the .defs files.
2022-08-29 01:42:47 +02:00
Samuel Thibault
cb033e6b0c mach: Make xpg_strerror_r set a message on error
posix advises to have strerror_r fill a message even when we are returning
an error.

This makes mach's xpg_strerror_r do this, like the generic version does.

Spotted by the libunistring testsuite test-strerror_r
2022-08-27 14:56:35 +02:00
Samuel Thibault
03ad444e8e mach: Fix incoherency between perror and strerror
08d2024b41 ("string: Simplify strerror_r") inadvertently made
__strerror_r print unknown error system in decimal while the original
code was printing it in hexadecimal. perror was kept printing in
hexadecimal in 725eeb4af1 ("string: Use tls-internal on strerror_l"),
let us keep both coherent.

This also fixes a duplicate ':'

Spotted by the libunistring testsuite test-perror2
2022-08-27 14:36:18 +02:00
Szabolcs Nagy
06d4381dd8 csu: Change start code license to have link exception
The start code can get linked into dynamic linked executables where
LGPL would require shipping the source or linkable binaries when the
executable is distributed.

On some targets the license exception was missing in start.S (which
is compiled into crt1.o and Scrt1.o which may end up linked into PDE
and PIE binaries).

I did not review what other code may end up in executables, just
fixed the start.S license inconsistency across targets.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-08-26 09:14:53 +01:00
Florian Weimer
5ecc982412 s390: Move hwcaps/platform names out of _rtld_global_ro
Changes to these arrays are often backported to stable releases,
but additions to these arrays shift the offsets of the following
_rltd_global_ro members, thus breaking the GLIBC_PRIVATE ABI.

Obviously, this change is itself an internal ABI break, but at least
it will avoid further ABI breaks going forward.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-08-25 21:33:12 +02:00
Florian Weimer
89baed0b93 Revert "Detect ld.so and libc.so version inconsistency during startup"
This reverts commit 6f85dbf102.

Once this change hits the release branches, it will require relinking
of all statically linked applications before static dlopen works
again, for the majority of updates on release branches: The NEWS file
is regularly updated with bug references, so the __libc_early_init
suffix changes, and static dlopen cannot find the function anymore.

While this ABI check is still technically correct (we do require
rebuilding & relinking after glibc updates to keep static dlopen
working), it is too drastic for stable release branches.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-08-25 18:46:43 +02:00
Florian Weimer
6f85dbf102 Detect ld.so and libc.so version inconsistency during startup
The files NEWS, include/link.h, and sysdeps/generic/ldsodefs.h
contribute to the version fingerprint used for detection.  The
fingerprint can be further refined using the --with-extra-version-id
configure argument.

_dl_call_libc_early_init is replaced with _dl_lookup_libc_early_init.
The new function is used store a pointer to libc.so's
__libc_early_init function in the libc_map_early_init member of the
ld.so namespace structure.  This function pointer can then be called
directly, so the separate invocation function is no longer needed.

The versioned symbol lookup needs the symbol versioning data
structures, so the initialization of libc_map and libc_map_early_init
is now done from _dl_check_map_versions, after this information
becomes available.  (_dl_map_object_from_fd does not set this up
in time, so the initialization code had to be moved from there.)
This means that the separate initialization code can be removed from
dl_main because _dl_check_map_versions covers all maps, including
the initial executable loaded by the kernel.  The lookup still happens
before relocation and the invocation of IFUNC resolvers, so IFUNC
resolvers are protected from ABI mismatch.

The __libc_early_init function pointer is not protected because
so little code runs between the pointer write and the invocation
(only dynamic linker code and IFUNC resolvers).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-08-24 17:35:57 +02:00
Paul Eggert
464138e904 Merge _GL_UNUSED C23 patch from Gnulib
* posix/getopt.c (_getopt_initialize):
* sysdeps/posix/tempname.c (try_dir, try_nocreate):
Put _GL_UNUSED before args instead of after.
This makes no difference for glibc.
It is needed for Gnulib when being compiled on
non-GCC C23 compilers.
2022-08-23 21:58:39 -07:00
Xi Ruoyao
8995b84c45 LoongArch: Fix dl-machine.h code formatting.
No functional change.
2022-08-24 10:06:41 +08:00
Samuel Thibault
af6b1cce98 hurd: Fix starting static binaries with stack protection enabled
gcc introduces gs:0x14 accesses in most functions, so we need some tcbhead
to be ready very early during initialization.  This configures a static area
which can be referenced by various protected functions, until proper TLS is
set up.
2022-08-22 22:34:31 +02:00
Samuel Thibault
4565083abc htl: Make pthread*_cond_timedwait register wref before releasing mutex
Otherwise another thread could be rightly trying to destroy the condition,
see e.g. tst-cond20.
2022-08-22 22:27:24 +02:00
Samuel Thibault
8bf0bc8350 htl: make __pthread_hurd_cond_timedwait_internal check mutex is held
Like __pthread_cond_timedwait_internal already does.
2022-08-22 22:25:27 +02:00
Joseph Myers
4c199499d6 Add AArch64 HWCAP2_* constants from Linux 5.19
Linux 5.19 adds more HWCAP2_* values for AArch64; add these to its
bits/hwcap.h header in glibc.

Tested with build-many-glibcs.py for aarch64-linux-gnu.
2022-08-22 14:59:39 +00:00
Joseph Myers
a727220b37 Add AGROUP from Linux 5.19 to sys/acct.h, remove Alpha version (bug 29502)
Linux 5.19 adds a new accounting flag AGROUP; add it to the
enumeration in sys/acct.h.

This shows up that the Alpha-specific variant of this header has a
different set of constants and struct acct, which appear to be the
constants and structure layout from Linux 2.0.  These were changed
some time between Linux 2.0 and Linux 2.2; I see no evidence of an
Alpha-specific layout or set of constants, but haven't checked the
detailed Linux kernel history between those versions.  Rather, it
looks like tha Alpha-specific header was originally needed because of
the use of types in the kernel structure (such as uid_t and gid_t)
that had different sizes on Alpha, and when glibc was updated for
changes to the structure and constants in the kernel

1998-10-02  Andreas Jaeger  <aj@arthur.rhein-neckar.de>

        * sysdeps/unix/sysv/linux/sys/acct.h: Bring in sync with current
        linux 2.1 version.

that simply omitted to do anything about the Alpha version.

Thus, remove the Alpha version in order to get the updated definitions
into use on Alpha, as I don't think the interfaces are actually
different for Alpha with any kernel version supported by glibc.

Tested for x86_64, and with build-many-glibcs.py for alpha-linux-gnu.
2022-08-22 14:16:57 +00:00
Florian Weimer
e7ad26ee3c alpha: Fix generic brk system call emulation in __brk_call (bug 29490)
The kernel special-cases the zero argument for alpha brk, and we can
use that to restore the generic Linux error handling behavior.

Fixes commit b57ab258c1 ("Linux:
Introduce __brk_call for invoking the brk system call").
2022-08-22 11:05:42 +02:00
Samuel Thibault
f7b0fc5cc6 hurd: Assume non-suid during bootstrap
We do not have a hurd data block only when bootstrapping the system, in
which case we don't have a notion of suid yet anyway.

This is needed, otherwise init_standard_fds would check that standard
file descriptors are allocated, which is meaningless during bootstrap.
2022-08-19 02:26:21 +02:00
Stefan Liebler
f465b21b06 S390: Fix werror=unused-variable in ifunc-impl-list.c.
If the architecture level set is high enough, no IFUNCs are used at
all and the variable i would be unused.  Then the build fails with:
../sysdeps/s390/multiarch/ifunc-impl-list.c: In function ‘__libc_ifunc_impl_list’:
../sysdeps/s390/multiarch/ifunc-impl-list.c:76:10: error: unused variable ‘i’ [-Werror=unused-variable]
   76 |   size_t i = max;
      |          ^
cc1: all warnings being treated as errors
2022-08-18 09:10:48 +02:00
Michael Hudson-Doyle
2b274fd8c9 Ensure calculations happen with desired rounding mode in y1lf128
math/test-float128-y1 fails on x86_64 and ppc64el with gcc 12 and -O3,
because code inside a block guarded by SET_RESTORE_ROUNDL is being moved
after the rounding mode has been restored. Use math_force_eval to
prevent this (and insert some math_opt_barrier calls to prevent code
from being moved before the rounding mode is set).

Fixes #29463

Reviewed-By: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2022-08-18 12:32:18 +12:00
Florian Weimer
2955ef4b7c Linux: Fix enum fsconfig_command detection in <sys/mount.h>
The #ifdef FSOPEN_CLOEXEC check did not work because the macro
was always defined in this header prior to the check, so that
the <linux/mount.h> contents did not matter.

Fixes commit 774058d729
("linux: Fix sys/mount.h usage with kernel headers").
2022-08-16 12:03:28 +02:00
Samuel Thibault
a2ee8c6500 Move ip_mreqn structure from Linux to generic
I.e. from sysdeps/unix/sysv/linux/bits/in.h to netinet/in.h

It is following both the BSD and Linux definitions.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-08-15 22:43:15 +02:00
Florian Weimer
f82e05ebb2 Linux: Terminate subprocess on late failure in tst-pidfd (bug 29485)
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-08-15 16:43:59 +02:00
Adhemerval Zanella
453b88efe6 arm: Remove nested functionf rom relocate_pc24
Checked on arm-linux-gnueabihf.
2022-08-12 09:46:22 -03:00
Adhemerval Zanella
774058d729 linux: Fix sys/mount.h usage with kernel headers
Now that kernel exports linux/mount.h and includes it on linux/fs.h,
its definitions might clash with glibc exports sys/mount.h.  To avoid
the need to rearrange the Linux header to be always after glibc one,
the glibc sys/mount.h is changed to:

  1. Undefine the macros also used as enum constants.  This covers prior
     inclusion of <linux/mount.h> (for instance MS_RDONLY).

  2. Include <linux/mount.h> based on the usual __has_include check
     (needs to use __has_include ("linux/mount.h") to paper over GCC
     bugs.

  3. Define enum fsconfig_command only if FSOPEN_CLOEXEC is not defined.
     (FSOPEN_CLOEXEC should be a very close proxy.)

  4. Define struct mount_attr if MOUNT_ATTR_SIZE_VER0 is not defined.
     (Added in the same commit on the Linux side.)

This patch also adds some tests to check if including linux/fs.h and
linux/mount.h after and before sys/mount.h does work.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-08-12 09:15:28 -03:00
Adhemerval Zanella
e1226cdc6b linux: Use compile_c_snippet to check linux/mount.h availability
Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-08-12 09:15:23 -03:00
Adhemerval Zanella
c68b6044bc linux: Mimic kernel defition for BLOCK_SIZE
To avoid possible warnings if the kernel header is included before
sys/mount.h.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-08-12 09:15:21 -03:00
Adhemerval Zanella
1542019b69 linux: Use compile_c_snippet to check linux/pidfd.h availability
Instead of tying to a specific kernel version.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-08-12 09:15:11 -03:00
caiyinyu
1c9bc1b6e5 LoongArch: Add pointer mangling support. 2022-08-12 09:30:56 +08:00
Wilco Dijkstra
12182ba18d AArch64: Fix typo in sve configure check (BZ# 29394)
Fix a typo in the SVE configure check. This fixes [BZ# 29394].
2022-08-11 17:52:00 +01:00
Wilco Dijkstra
c51c483d2b libio: Improve performance of IO locks
Improve performance of recursive IO locks by adding a fast path for
the single-threaded case. To reduce the number of memory accesses for
locking/unlocking, only increment the recursion counter if the lock
is already taken.

On Neoverse V1, a microbenchmark with many small freads improved by
2.9x. Multithreaded performance improved by 2%.

Reviewed-by: Cristian Rodríguez  <crrodriguez@opensuse.org>
2022-08-11 16:47:45 +01:00
Stefan Liebler
11f09947f3 tst-process_madvise: Check process_madvise-syscall support.
So far this test checks if pidfd_open-syscall is supported,
which was introduced with linux 5.3.

The process_madvise-syscall was introduced with linux 5.10.
Thus you'll get FAILs if you are running a kernel in between.

This patch adds a check if the first process_madvise-syscall
returns ENOSYS and in this case will fail with UNSUPPORTED.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2022-08-11 12:21:05 +02:00
Noah Goldstein
312ded0d63 x86: Fix #define STRCPY guard in strcpy-sse2.S
`#ifndef STPCPY` is incorrect for checking if `STRCPY` is already
defined.  It doesn't end up mattering as the whole check is
guarded by `#if IS_IN (libc)` but is incorrect none the less.
2022-08-09 17:00:03 +08:00
Adhemerval Zanella
26a3499cdb i386: Use cmpl instead of cmp
Clang cannot assemble cmp in the AT&T dialect mode.
2022-08-05 09:28:39 -03:00
Adhemerval Zanella
1ed5869c4c i386: Use fldt instead of fld on e_logl.S
Clang cannot assemble fldt in the AT&T dialect mode.
2022-08-05 09:28:33 -03:00
Fangrui Song
525ca33a61 i386: Replace movzx with movzbl
Similar to 6720d36b66 for x86-64.

Clang cannot assemble movzx in the AT&T dialect mode.  Change movzx to
movzbl, which follows the AT&T dialect and is used elsewhere in the
file.
2022-08-04 14:06:50 -07:00
Adhemerval Zanella
3698f5a9dd i386: Remove RELA support
Now that prelink is not support, there is no need to keep supporting
rela for non bootstrap.
2022-08-04 10:03:46 -03:00
Adhemerval Zanella
c3f5682215 arm: Remove RELA support
Now that prelink is not support, there is no need to keep supporting
rela for non bootstrap.
2022-08-04 10:03:46 -03:00
Adhemerval Zanella
36676f5e5d Remove ldd libc4 support
The older libc versions are obsolete for over twenty years now.
2022-08-04 10:03:45 -03:00
Lucas A. M. Magalhaes
8ee878592c Assume only FLAG_ELF_LIBC6 suport
The older libc versions are obsolete for over twenty years now.
This patch removes the special flags for libc5 and libc4 and assumes
that all libraries cached are libc6 compatible and use FLAG_ELF_LIBC6.

Checked with a build for all affected architectures.

Co-authored-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-08-04 09:09:48 -03:00
Adhemerval Zanella
5a57ad23ba Remove left over LD_LIBRARY_VERSION usages
The environment variable was removed by
d2db60d8d8.
2022-08-04 09:09:48 -03:00
Florian Weimer
8fabe0e632 Linux: Remove exit system call from _exit
exit only terminates the current thread, not the whole process, so it
is the wrong fallback system call in this context.  All supported
Linux versions implement the exit_group system call anyway.
2022-08-04 06:17:50 +02:00
caiyinyu
3e83843637 LoongArch: Add vdso support for gettimeofday. 2022-08-04 09:19:36 +08:00
Joseph Myers
085030b957 Update kernel version to 5.19 in header constant tests
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 5.18.  (There are no
new constants covered by these tests in 5.19, or in 5.17 or 5.18 in
the case of tst-mount-consts.py that previously used version 5.16,
that need any other header changes.)

Tested with build-many-glibcs.py.
2022-08-03 16:31:58 +00:00
Florian Weimer
68e036f27f nptl: Remove uses of assert_perror
__pthread_sigmask cannot actually fail with valid pointer arguments
(it would need a really broken seccomp filter), and we do not check
for errors elsewhere.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-08-03 11:42:49 +02:00
Florian Weimer
cca9684f2d stdio: Clean up __libc_message after unconditional abort
Since commit ec2c1fcefb ("malloc:
Abort on heap corruption, without a backtrace [BZ #21754]"),
__libc_message always terminates the process.  Since commit
a289ea09ea ("Do not print backtraces
on fatal glibc errors"), the backtrace facility has been removed.
Therefore, remove enum __libc_message_action and the action
argument of __libc_message, and mark __libc_message as _No_return.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-08-03 11:42:39 +02:00
Joseph Myers
fccadcdf5b Update syscall lists for Linux 5.19
Linux 5.19 has no new syscalls, but enables memfd_secret in the uapi
headers for RISC-V.  Update the version number in syscall-names.list
to reflect that it is still current for 5.19 and regenerate the
arch-syscall.h headers with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.
2022-08-02 21:05:07 +00:00
Arjun Shankar
9c443ac455 socket: Check lengths before advancing pointer in CMSG_NXTHDR
The inline and library functions that the CMSG_NXTHDR macro may expand
to increment the pointer to the header before checking the stride of
the increment against available space.  Since C only allows incrementing
pointers to one past the end of an array, the increment must be done
after a length check.  This commit fixes that and includes a regression
test for CMSG_FIRSTHDR and CMSG_NXTHDR.

The Linux, Hurd, and generic headers are all changed.

Tested on Linux on armv7hl, i686, x86_64, aarch64, ppc64le, and s390x.

[BZ #28846]

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2022-08-02 11:10:25 +02:00
Mark Wielaard
325ba824b0 tst-pidfd.c: UNSUPPORTED if we get EPERM on valid pidfd_getfd call
pidfd_getfd can fail for a valid pidfd with errno EPERM for various
reasons in a restricted environment. Use FAIL_UNSUPPORTED in that case.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-29 18:52:12 +02:00
caiyinyu
bce0218d9a LoongArch: Add greg_t and gregset_t. 2022-07-29 09:15:21 +08:00
caiyinyu
033e76ea9c LoongArch: Fix VDSO_HASH and VDSO_NAME. 2022-07-29 09:15:21 +08:00
Darius Rad
7c5db7931f riscv: Update rv64 libm test ulps
Generated on a Microsemi Polarfire Icicle Kit running Linux version
5.15.32.  Same ULPs were also produced on QEMU 5.2.0 running Linux
5.18.0.
2022-07-27 10:50:20 -03:00
Darius Rad
5b6d8a650d riscv: Update nofpu libm test ulps 2022-07-27 10:50:10 -03:00
Jason A. Donenfeld
eaad4f9e8f arc4random: simplify design for better safety
Rather than buffering 16 MiB of entropy in userspace (by way of
chacha20), simply call getrandom() every time.

This approach is doubtlessly slower, for now, but trying to prematurely
optimize arc4random appears to be leading toward all sorts of nasty
properties and gotchas. Instead, this patch takes a much more
conservative approach. The interface is added as a basic loop wrapper
around getrandom(), and then later, the kernel and libc together can
work together on optimizing that.

This prevents numerous issues in which userspace is unaware of when it
really must throw away its buffer, since we avoid buffering all
together. Future improvements may include userspace learning more from
the kernel about when to do that, which might make these sorts of
chacha20-based optimizations more possible. The current heuristic of 16
MiB is meaningless garbage that doesn't correspond to anything the
kernel might know about. So for now, let's just do something
conservative that we know is correct and won't lead to cryptographic
issues for users of this function.

This patch might be considered along the lines of, "optimization is the
root of all evil," in that the much more complex implementation it
replaces moves too fast without considering security implications,
whereas the incremental approach done here is a much safer way of going
about things. Once this lands, we can take our time in optimizing this
properly using new interplay between the kernel and userspace.

getrandom(0) is used, since that's the one that ensures the bytes
returned are cryptographically secure. But on systems without it, we
fallback to using /dev/urandom. This is unfortunate because it means
opening a file descriptor, but there's not much of a choice. Secondly,
as part of the fallback, in order to get more or less the same
properties of getrandom(0), we poll on /dev/random, and if the poll
succeeds at least once, then we assume the RNG is initialized. This is a
rough approximation, as the ancient "non-blocking pool" initialized
after the "blocking pool", not before, and it may not port back to all
ancient kernels, though it does to all kernels supported by glibc
(≥3.2), so generally it's the best approximation we can do.

The motivation for including arc4random, in the first place, is to have
source-level compatibility with existing code. That means this patch
doesn't attempt to litigate the interface itself. It does, however,
choose a conservative approach for implementing it.

Cc: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Cristian Rodríguez <crrodriguez@opensuse.org>
Cc: Paul Eggert <eggert@cs.ucla.edu>
Cc: Mark Harris <mark.hsj@gmail.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-07-27 08:58:27 -03:00
caiyinyu
68d61026d5 LoongArch: Hard Float Support 2022-07-26 12:35:12 -03:00
caiyinyu
3d87c89815 LoongArch: Build Infrastructure 2022-07-26 12:35:12 -03:00
caiyinyu
0d4a891a7c LoongArch: Add ABI Lists 2022-07-26 12:35:12 -03:00
caiyinyu
f2037efbb3 LoongArch: Linux ABI 2022-07-26 12:35:12 -03:00
caiyinyu
45955fe618 LoongArch: Linux Syscall Interface 2022-07-26 12:35:12 -03:00
caiyinyu
3275882261 LoongArch: Atomic and Locking Routines 2022-07-26 12:35:12 -03:00
caiyinyu
c742795dce LoongArch: Generic <math.h> and soft-fp Routines 2022-07-26 12:35:12 -03:00
caiyinyu
619bfc6770 LoongArch: Thread-Local Storage Support 2022-07-26 12:35:12 -03:00
caiyinyu
a133942025 LoongArch: ABI Implementation 2022-07-26 12:35:12 -03:00
Arnout Vandecappelle (Essensium/Mind)
794c27446f struct stat is not posix conformant on microblaze with __USE_FILE_OFFSET64
Commit a06b40cdf5 updated stat.h to use
__USE_XOPEN2K8 instead of __USE_MISC to add the st_atim, st_mtim and
st_ctim members to struct stat. However, for microblaze, there are two
definitions of struct stat, depending on the __USE_FILE_OFFSET64 macro.
The second one was not updated.

Change __USE_MISC to __USE_XOPEN2K8 in the __USE_FILE_OFFSET64 version
of struct stat for microblaze.
2022-07-25 11:06:49 -03:00
Florian Weimer
0c5605989f Linux: dirent/tst-readdir64-compat needs to use TEST_COMPAT (bug 27654)
The hppa port starts libc at GLIBC_2.2, but has earlier symbol
versions in other shared objects.  This means that the compat
symbol for readdir64 is not actually present in libc even though
have-GLIBC_2.1.3 is defined as yes at the make level.

Fixes commit 15e50e6c96 ("Linux:
dirent/tst-readdir64-compat can be a regular test") by mostly
reverting it.
2022-07-25 11:39:03 +02:00
Adhemerval Zanella Netto
3b56f944c5 s390x: Add optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-s390x.S.  The final state register clearing is
omitted.

On a z15 it shows the following improvements (using formatted
bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               198.92
arc4random_buf(16) [single-thread]       244.49
arc4random_buf(32) [single-thread]       282.73
arc4random_buf(48) [single-thread]       286.64
arc4random_buf(64) [single-thread]       320.06
arc4random_buf(80) [single-thread]       297.43
arc4random_buf(96) [single-thread]       310.96
arc4random_buf(112) [single-thread]      308.10
arc4random_buf(128) [single-thread]      309.90
-----------------------------------------------

VX.                                        MB/s
-----------------------------------------------
arc4random [single-thread]               430.26
arc4random_buf(16) [single-thread]       735.14
arc4random_buf(32) [single-thread]      1029.99
arc4random_buf(48) [single-thread]      1206.76
arc4random_buf(64) [single-thread]      1311.92
arc4random_buf(80) [single-thread]      1378.74
arc4random_buf(96) [single-thread]      1445.06
arc4random_buf(112) [single-thread]     1484.32
arc4random_buf(128) [single-thread]     1517.30
-----------------------------------------------

Checked on s390x-linux-gnu.
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
b7060acfe8 powerpc64: Add optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-ppc.c.  It targets POWER8 and it is used on default
for LE.

On a POWER8 it shows the following improvements (using formatted
bench-arc4random data):

POWER8

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               138.77
arc4random_buf(16) [single-thread]       174.36
arc4random_buf(32) [single-thread]       228.11
arc4random_buf(48) [single-thread]       252.31
arc4random_buf(64) [single-thread]       270.11
arc4random_buf(80) [single-thread]       278.97
arc4random_buf(96) [single-thread]       287.78
arc4random_buf(112) [single-thread]      291.92
arc4random_buf(128) [single-thread]      295.25

POWER8                                     MB/s
-----------------------------------------------
arc4random [single-thread]               198.06
arc4random_buf(16) [single-thread]       278.79
arc4random_buf(32) [single-thread]       448.89
arc4random_buf(48) [single-thread]       551.09
arc4random_buf(64) [single-thread]       646.12
arc4random_buf(80) [single-thread]       698.04
arc4random_buf(96) [single-thread]       756.06
arc4random_buf(112) [single-thread]      784.12
arc4random_buf(128) [single-thread]      808.04
-----------------------------------------------

Checked on powerpc64-linux-gnu and powerpc64le-linux-gnu.
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
84cfc6479b x86: Add AVX2 optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-avx2.S.  It is used only if AVX2 is supported
and enabled by the architecture.

As for generic implementation, the last step that XOR with the
input is omited.  The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

AVX2                                       MB/s
-----------------------------------------------
arc4random [single-thread]               922.61
arc4random_buf(16) [single-thread]      1478.70
arc4random_buf(32) [single-thread]      2241.80
arc4random_buf(48) [single-thread]      2681.28
arc4random_buf(64) [single-thread]      2913.43
arc4random_buf(80) [single-thread]      3009.73
arc4random_buf(96) [single-thread]      3141.16
arc4random_buf(112) [single-thread]     3254.46
arc4random_buf(128) [single-thread]     3305.02
-----------------------------------------------

Checked on x86_64-linux-gnu.
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
e169aff0e9 x86: Add SSE2 optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-amd64-ssse3.S.  It replaces the ROTATE_SHUF_2 (which
uses pshufb) by ROTATE2 and thus making the original implementation
SSE2.

As for generic implementation, the last step that XOR with the
input is omited. The final state register clearing is also
omitted.

On a Ryzen 9 5900X it shows the following improvements (using
formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               443.11
arc4random_buf(16) [single-thread]       552.27
arc4random_buf(32) [single-thread]       626.86
arc4random_buf(48) [single-thread]       649.81
arc4random_buf(64) [single-thread]       663.95
arc4random_buf(80) [single-thread]       674.78
arc4random_buf(96) [single-thread]       675.17
arc4random_buf(112) [single-thread]      680.69
arc4random_buf(128) [single-thread]      683.20
-----------------------------------------------

SSE                                        MB/s
-----------------------------------------------
arc4random [single-thread]               704.25
arc4random_buf(16) [single-thread]      1018.17
arc4random_buf(32) [single-thread]      1315.27
arc4random_buf(48) [single-thread]      1449.36
arc4random_buf(64) [single-thread]      1511.16
arc4random_buf(80) [single-thread]      1539.48
arc4random_buf(96) [single-thread]      1571.06
arc4random_buf(112) [single-thread]     1596.16
arc4random_buf(128) [single-thread]     1613.48
-----------------------------------------------

Checked on x86_64-linux-gnu.
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
4c128c7823 aarch64: Add optimized chacha20
It adds vectorized ChaCha20 implementation based on libgcrypt
cipher/chacha20-aarch64.S.  It is used as default and only
little-endian is supported (BE uses generic code).

As for generic implementation, the last step that XOR with the
input is omited.  The final state register clearing is also
omitted.

On a virtualized Linux on Apple M1 it shows the following
improvements (using formatted bench-arc4random data):

GENERIC                                    MB/s
-----------------------------------------------
arc4random [single-thread]               380.89
arc4random_buf(16) [single-thread]       500.73
arc4random_buf(32) [single-thread]       552.61
arc4random_buf(48) [single-thread]       566.82
arc4random_buf(64) [single-thread]       574.01
arc4random_buf(80) [single-thread]       581.02
arc4random_buf(96) [single-thread]       591.19
arc4random_buf(112) [single-thread]      592.29
arc4random_buf(128) [single-thread]      596.43
-----------------------------------------------

OPTIMIZED                                  MB/s
-----------------------------------------------
arc4random [single-thread]               569.60
arc4random_buf(16) [single-thread]       825.78
arc4random_buf(32) [single-thread]       987.03
arc4random_buf(48) [single-thread]      1042.39
arc4random_buf(64) [single-thread]      1075.50
arc4random_buf(80) [single-thread]      1094.68
arc4random_buf(96) [single-thread]      1130.16
arc4random_buf(112) [single-thread]     1129.58
arc4random_buf(128) [single-thread]     1137.91
-----------------------------------------------

Checked on aarch64-linux-gnu.
2022-07-22 11:58:27 -03:00
Adhemerval Zanella Netto
6f4e0fcfa2 stdlib: Add arc4random, arc4random_buf, and arc4random_uniform (BZ #4417)
The implementation is based on scalar Chacha20 with per-thread cache.
It uses getrandom or /dev/urandom as fallback to get the initial entropy,
and reseeds the internal state on every 16MB of consumed buffer.

To improve performance and lower memory consumption the per-thread cache
is allocated lazily on first arc4random functions call, and if the
memory allocation fails getentropy or /dev/urandom is used as fallback.
The cache is also cleared on thread exit iff it was initialized (so if
arc4random is not called it is not touched).

Although it is lock-free, arc4random is still not async-signal-safe
(the per thread state is not updated atomically).

The ChaCha20 implementation is based on RFC8439 [1], omitting the final
XOR of the keystream with the plaintext because the plaintext is a
stream of zeros.  This strategy is similar to what OpenBSD arc4random
does.

The arc4random_uniform is based on previous work by Florian Weimer,
where the algorithm is based on Jérémie Lumbroso paper Optimal Discrete
Uniform Generation from Coin Flips, and Applications (2013) [2], who
credits Donald E. Knuth and Andrew C. Yao, The complexity of nonuniform
random number generation (1976), for solving the general case.

The main advantage of this method is the that the unit of randomness is not
the uniform random variable (uint32_t), but a random bit.  It optimizes the
internal buffer sampling by initially consuming a 32-bit random variable
and then sampling byte per byte.  Depending of the upper bound requested,
it might lead to better CPU utilization.

Checked on x86_64-linux-gnu, aarch64-linux, and powerpc64le-linux-gnu.

Co-authored-by: Florian Weimer <fweimer@redhat.com>
Reviewed-by: Yann Droneaud <ydroneaud@opteya.com>

[1] https://datatracker.ietf.org/doc/html/rfc8439
[2] https://arxiv.org/pdf/1304.1916.pdf
2022-07-22 11:58:27 -03:00
Michael Hudson-Doyle
1f4e90d468 linux: return UNSUPPORTED from tst-mount if entering mount namespace fails
Before this the test fails if run in a chroot by a non-root user:

warning: could not become root outside namespace (Operation not permitted)
../sysdeps/unix/sysv/linux/tst-mount.c:36: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 19 (0x13); from: ENODEV
error: ../sysdeps/unix/sysv/linux/tst-mount.c:39: not true: fd != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:46: not true: r != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:48: not true: r != -1
../sysdeps/unix/sysv/linux/tst-mount.c:52: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 9 (0x9); from: EBADF
error: ../sysdeps/unix/sysv/linux/tst-mount.c:55: not true: mfd != -1
../sysdeps/unix/sysv/linux/tst-mount.c:58: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 2 (0x2); from: ENOENT
error: ../sysdeps/unix/sysv/linux/tst-mount.c:61: not true: r != -1
../sysdeps/unix/sysv/linux/tst-mount.c:65: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 2 (0x2); from: ENOENT
error: ../sysdeps/unix/sysv/linux/tst-mount.c:68: not true: pfd != -1
error: ../sysdeps/unix/sysv/linux/tst-mount.c:75: not true: fd_tree != -1
../sysdeps/unix/sysv/linux/tst-mount.c:88: numeric comparison failure
   left: 1 (0x1); from: errno
  right: 38 (0x26); from: ENOSYS
error: 12 test failures

Checking that the test can enter a new mount namespace is more correct
than just checking the return value of support_become_root() as the test
code changes the mount namespace it runs in so running it as root on a
system that does not support mount namespaces should still skip.

Also change the test to remove the unnecessary fork.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-19 06:55:49 +12:00
Noah Goldstein
49889fb256 x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level
1. Add default ISA level selection in non-multiarch/rtld
   implementations.

2. Add ISA level build guards to different implementations.
    - I.e strcpy-avx2.S which is ISA level 3 will only build if
      compiled ISA level <= 3. Otherwise there is no reason to
      include it as we will always use one of the ISA level 4
      implementations (strcpy-evex.S).

3. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-16 03:07:59 -07:00
Noah Goldstein
192979ee35 x86: Add support to build wcscpy with explicit ISA level
1. Add ISA level build guards to different implementations.
    - wcscpy-ssse3.S is used as ISA level 2/3/4.
    - wcscpy-generic.c is only used at ISA level 1 and will
      only build if compiled with ISA level == 1. Otherwise
      there is no reason to include it as we will always use
      wcscpy-ssse3.S

2. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-16 03:07:59 -07:00
Noah Goldstein
ceabdcd130 x86: Add support to build strcmp/strlen/strchr with explicit ISA level
1. Add default ISA level selection in non-multiarch/rtld
   implementations.

2. Add ISA level build guards to different implementations.
    - I.e strcmp-avx2.S which is ISA level 3 will only build if
      compiled ISA level <= 3. Otherwise there is no reason to
      include it as we will always use one of the ISA level 4
      implementations (strcmp-evex.S).

3. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-16 03:07:59 -07:00
Stefan Liebler
779aa039fc S390: Define SINGLE_THREAD_BY_GLOBAL only on s390x
Starting with commit e070501d12
"Replace __libc_multiple_threads with __libc_single_threaded"
the testcases nptl/tst-cancel-self and
nptl/tst-cancel-self-cancelstate are failing.

This is fixed by only defining SINGLE_THREAD_BY_GLOBAL on s390x,
but not on s390.

Starting with commit 09c76a7409
"Linux: Consolidate {RTLD_}SINGLE_THREAD_P definition",
SINGLE_THREAD_BY_GLOBAL was defined in
sysdeps/unix/sysv/linux/s390/s390-64/sysdep.h.

Lateron the commit 9a973da617
"s390: Consolidate Linux syscall definition" consolidates the sysdep.h files
from s390-32/s390-64 subdirectories.  Unfortunately the macro is now always
defined instead of only on s390-64.

As information:
TLS_MULTIPLE_THREADS_IN_TCB is also only defined for s390.
See: sysdeps/s390/nptl/tls.h
2022-07-14 13:39:09 +02:00
Noah Goldstein
7c8ca17893 x86: Add missing rtm tests for strcmp family
Add new tests for:
    strcasecmp
    strncasecmp
    strcmp
    wcscmp

These functions all have avx2_rtm implementations so should be tested.
2022-07-13 14:55:31 -07:00
Noah Goldstein
42b014dd1b x86: Remove unneeded rtld-wmemcmp
wmemcmp isn't used by the dynamic loader so their no need to add an
RTLD stub for it.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
e19bb87c97 x86: Move wcslen SSE2 implementation to multiarch/wcslen-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
64479f11b7 x86: Move wcschr SSE2 implementation to multiarch/wcschr-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
72a48ec0f7 x86: Move strcat SSE2 implementation to multiarch/strcat-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
cd080d0741 x86: Move strchr SSE2 implementation to multiarch/strchr-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
425647458b x86: Move strrchr SSE2 implementation to multiarch/strrchr-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
08af081ffd x86: Move memrchr SSE2 implementation to multiarch/memrchr-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
6b9006bfb0 x86: Move strcpy SSE2 implementation to multiarch/strcpy-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
58e6cd4bcb x86: Move strlen SSE2 implementation to multiarch/strlen-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
60a583ec60 x86: Move strcmp SSE42 implementation to multiarch/strcmp-sse4_2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
427eaa2c85 x86: Move wcscmp SSE2 implementation to multiarch/wcscmp-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
d561fbb041 x86: Move strcmp SSE2 implementation to multiarch/strcmp-sse2.S
This commit doesn't affect libc.so.6, its just housekeeping to prepare
for adding explicit ISA level support.

Because strcmp-sse2.S implements so many functions (more from
avx2/evex/sse42) add a new file 'strcmp-naming.h' to assist in
getting the correct symbol name for all the function across
multiarch/non-multiarch builds.

Tested build on x86_64 and x86_32 with/without multiarch.
2022-07-13 14:55:31 -07:00
Noah Goldstein
30e57e0a21 x86: Rename STRCASECMP_NONASCII macro to STRCASECMP_L_NONASCII
The previous macro name can be confusing given that both
`__strcasecmp_l_nonascii` and `__strcasecmp_nonascii` are
functions and we use the `_l` version.
2022-07-13 14:55:31 -07:00
Noah Goldstein
f2698954ff x86: Remove __mmask intrinsics in strstr-avx512.c
The intrinsics are not available before GCC7 and using standard
operators generates code of equivalent or better quality.

Removed:
    _cvtmask64_u64
    _kshiftri_mask64
    _kand_mask64

Geometric Mean of 5 Runs of Full Benchmark Suite New / Old: 0.958
2022-07-12 15:41:14 -07:00
Noah Goldstein
9c38deec96 x86: Remove generic strncat, strncpy, and stpncpy implementations
These functions all have optimized versions:
__strncat_sse2_unaligned, __strncpy_sse2_unaligned, and
stpncpy_sse2_unaligned which are faster than their respective generic
implementations.  Since the sse2 versions can run on baseline x86_64,
we should use these as the baseline implementation and can remove the
generic implementations.

Geometric mean of N=20 runs of the entire benchmark suite on:
11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz (Tigerlake)

__strncat_sse2_unaligned / __strncat_generic: .944
__strncpy_sse2_unaligned / __strncpy_generic: .726
__stpncpy_sse2_unaligned / __stpncpy_generic: .650

Tested build with and without multiarch and full check with multiarch.
2022-07-12 11:44:12 -07:00
Fangrui Song
c5bec9d491 i386: Remove -Wa,-mtune=i686
gas -mtune= may change NOP generating patterns but -mtune=i686 has no
difference from the default by inspecting .o and .os files.

Note: Clang doesn't support -Wa,-mtune=i686.
2022-07-12 11:14:32 -07:00
H.J. Lu
ec9013727d x86-64: Remove redundant strcspn-generic/strpbrk-generic/strspn-generic
Remove redundant strcspn-generic, strpbrk-generic and strspn-generic
from sysdep_routines in sysdeps/x86_64/multiarch/Makefile added by

commit c69f960b01
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Sun Jul 3 21:28:07 2022 -0700

    x86: Add support for building str{c|p}{brk|spn} with explicit ISA level

since they have been added to sysdep_routines in sysdeps/x86_64/Makefile.
2022-07-08 16:06:04 -07:00
H.J. Lu
eedf7886ed x86-64: Don't mark symbols as hidden in strcmp-XXX.S
Don't mark symbols as hidden in strcmp-avx2.S, strcmp-evex.S and
strcmp-sse42.S since they are marked as hidden in the IFUNC selectors.
2022-07-07 16:38:11 -07:00
Tom Honermann
8bcca1db3d stdlib: Implement mbrtoc8, c8rtomb, and the char8_t typedef.
This change provides implementations for the mbrtoc8 and c8rtomb
functions adopted for C++20 via WG21 P0482R6 and for C2X via WG14
N2653.  It also provides the char8_t typedef from WG14 N2653.

The mbrtoc8 and c8rtomb functions are declared in uchar.h in C2X
mode or when the _GNU_SOURCE macro or C++20 __cpp_char8_t feature
test macro is defined.

The char8_t typedef is declared in uchar.h in C2X mode or when the
_GNU_SOURCE macro is defined and the C++20 __cpp_char8_t feature
test macro is not defined (if __cpp_char8_t is defined, then char8_t
is a builtin type).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-07-06 09:29:42 -03:00
Danila Kutenin
3c99806989 aarch64: Optimize string functions with shrn instruction
We found that string functions were using AND+ADDP
to find the nibble/syndrome mask but there is an easier
opportunity through `SHRN dst.8b, src.8h, 4` (shift
right every 2 bytes by 4 and narrow to 1 byte) and has
same latency on all SIMD ARMv8 targets as ADDP. There
are also possible gaps for memcmp but that's for
another patch.

We see 10-20% savings for small-mid size cases (<=128)
which are primary cases for general workloads.
2022-07-06 09:26:20 +01:00
Noah Goldstein
ae308947ff x86: Add support for building {w}memcmp{eq} with explicit ISA level
1. Refactor files so that all implementations are in the multiarch
   directory
    - Moved the implementation portion of memcmp sse2 from memcmp.S to
      multiarch/memcmp-sse2.S

    - The non-multiarch file now only includes one of the
      implementations in the multiarch directory based on the compiled
      ISA level (only used for non-multiarch builds.  Otherwise we go
      through the ifunc selector).

2. Add ISA level build guards to different implementations.
    - I.e memcmp-avx2-movsb.S which is ISA level 3 will only build if
      compiled ISA level <= 3. Otherwise there is no reason to include
      it as we will always use one of the ISA level 4
      implementations (memcmp-evex-movbe.S).

3. Add new multiarch/rtld-{w}memcmp{eq}.S that just include the
   non-multiarch {w}memcmp{eq}.S which will in turn select the best
   implementation based on the compiled ISA level.

4. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-05 16:42:42 -07:00
Noah Goldstein
37ecc657b2 x86: Add support for building {w}memset{_chk} with explicit ISA level
1. Refactor files so that all implementations are in the multiarch
   directory
    - Moved the implementation portion of memset sse2 from memset.S to
      multiarch/memset-sse2.S

    - The non-multiarch file now only includes one of the
      implementations in the multiarch directory based on the compiled
      ISA level (only used for non-multiarch builds.  Otherwise we go
      through the ifunc selector).

2. Add ISA level build guards to different implementations.
    - I.e memset-avx2-unaligned-erms.S which is ISA level 3 will only
      build if compiled ISA level <= 3. Otherwise there is no reason
      to include it as we will always use one of the ISA level 4
      implementations (memset-evex-unaligned-erms.S).

3. Add new multiarch/rtld-memset.S that just include the
   non-multiarch memset.S which will in turn select the best
   implementation based on the compiled ISA level.

4. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-05 16:42:42 -07:00
Noah Goldstein
b6a02c3606 x86: Add support for building {w}memmove{_chk} with explicit ISA level
1. Refactor files so that all implementations are in the multiarch
   directory
    - Moved the implementation portion of memmove sse2 from memmove.S
      to multiarch/memmove-sse2.S

    - The non-multiarch file now only includes one of the
      implementations in the multiarch directory based on the compiled
      ISA level (only used for non-multiarch builds.  Otherwise we go
      through the ifunc selector).

2. Add ISA level build guards to different implementations.
    - I.e memmove-avx2-unaligned-erms.S which is ISA level 3 will only
      build if compiled ISA level <= 3. Otherwise there is no reason
      to include it as we will always use one of the ISA level 4
      implementations (memmove-evex-unaligned-erms.S).

3. Add new multiarch/rtld-memmove.S that just include the
   non-multiarch memmove.S which will in turn select the best
   implementation based on the compiled ISA level.

4. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
isa raising memmove
2022-07-05 16:42:42 -07:00
Noah Goldstein
c69f960b01 x86: Add support for building str{c|p}{brk|spn} with explicit ISA level
The changes for these functions are different than the others because
the best implementation (sse4_2) requires the generic
implementation as a fallback to be built as well.

Changes are:

1. Add non-multiarch functions for str{c|p}{brk|spn}.c to statically
   select the best implementation based on the configured ISA build
   level.

2. Add stubs for str{c|p}{brk|spn}-generic and varshift.c to in the
   sysdeps/x86_64 directory so that the the sse4 implementation will
   have all of its dependencies for the non-multiarch / rtld build
   when ISA level >= 2.

3. Add new multiarch/rtld-strcspn.c that just include the
   non-multiarch strcspn.c which will in turn select the best
   implementation based on the compiled ISA level.

4. Refactor the ifunc selector and ifunc implementation list to use
   the ISA level aware wrapper macros that allow functions below the
   compiled ISA level (with a guranteed replacement) to be skipped.

Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}

And m32 with and without multiarch.
2022-07-05 16:42:42 -07:00
Noah Goldstein
baeae86fb8 x86: Add comment explaining no Slow_SSE4_2 check in ifunc-sse4_2
Just for clarities sake and so that if a future implementation is
added we remember to add the check.
2022-07-05 16:42:42 -07:00
Adhemerval Zanella
e070501d12 Replace __libc_multiple_threads with __libc_single_threaded
And also fixes the SINGLE_THREAD_P macro for SINGLE_THREAD_BY_GLOBAL,
since header inclusion single-thread.h is in the wrong order, the define
needs to come before including sysdeps/unix/sysdep.h.  The macro
is now moved to a per-arch single-threade.h header.

The SINGLE_THREAD_P is used on some more places.

Checked on aarch64-linux-gnu and x86_64-linux-gnu.
2022-07-05 10:14:47 -03:00
Adhemerval Zanella
af1aa36c61 linux: Add mount_setattr
It was added on Linux 5.12 (2a1867219c7b27f928e2545782b86daaf9ad50bd)
to allow change the properties of a mount or a mount tree using file
descriptors which the new mount api is based on.

Checked on x86_64-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05 10:08:48 -03:00
Adhemerval Zanella
c3b02b6567 linux: Add tst-mount to check for Linux new mount API
The new mount API was added on Linux 5.2 with six new syscalls:
fsopen, fsconfig, fsmount, move_mount, fspick, and open_tree.

The new test verifies minimal functionality along with error paths
for specific arguments and their corner cases.

Checked on x86_64-linux-gnu.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05 10:08:48 -03:00
Adhemerval Zanella
78a408ee7b linux: Add open_tree
It was added on Linux 5.2 (a07b20004793d8926f78d63eb5980559f7813404)
to return a O_PATH-opened file descriptor to an existing mountpoint.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05 10:08:48 -03:00
Adhemerval Zanella
60f574e140 linux: Add fspick
It was added on Linux 5.2 (cf3cba4a429be43e5527a3f78859b1bfd9ebc5fb)
that can be used to pick an existing mountpoint into an filesystem
context which can thereafter be used to reconfigure a superblock
with fsconfig syscall.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05 10:08:48 -03:00
Adhemerval Zanella
7eae6a91e9 linux: Add fsconfig
It was added on Linux 5.2 (ecdab150fddb42fe6a739335257949220033b782)
as a way to a configure filesystem creation context and trigger
actions upon it, to be used in conjunction with fsopen, fspick and
fsmount.

The fsconfig_command commands are currently only defined as an enum,
so they can't be checked on tst-mount-consts.py with current test
support.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2022-07-05 10:08:48 -03:00
Tejas Belagod
05844d18f7 AArch64: Reset HWCAP2_AFP bits in FPCR for default fenv
The AFP feature (Alternate floating-point behavior) was added in armv8.7 and
introduced new FPCR bits.

Currently, HWCAP2_AFP bits (bit 0, 1, 2) in FPCR are preserved when fenv is
set to default environment.  This is a deviation from standard behaviour.
Clear these bits when setting the fenv to default.

There is no libc API to modify the new FPCR bits.  Restoring those bits matters
if the user changed them directly.
2022-07-05 14:01:17 +01:00
Adhemerval Zanella
8ee2c043cf Fix hurd namespace issues for internal signal functions
It was introduced by "Refactor internal-signals.h
(a1bdd81664)".  Use the internal symbols instead.

Checked with a build for i686-gnu.
2022-07-04 11:10:06 -03:00
Adhemerval Zanella
a1bdd81664 Refactor internal-signals.h
The main drive is to optimize the internal usage and required size
when sigset_t is embedded in other data structures.  On Linux, the
current supported signal set requires up to 8 bytes (16 on mips),
was lower than the user defined sigset_t (128 bytes).

A new internal type internal_sigset_t is added, along with the
functions to operate on it similar to the ones for sigset_t.  The
internal-signals.h is also refactored to remove unused functions

Besides small stack usage on some functions (posix_spawn, abort)
it lower the struct pthread by about 120 bytes (112 on mips).

Checked on x86_64-linux-gnu.

Reviewed-by: Arjun Shankar <arjun@redhat.com>
2022-06-30 14:56:21 -03:00
Kito Cheng
c22d2021a9
riscv: Use memcpy to handle unaligned access when fixing R_RISCV_RELATIVE
Although RISC-V Linux will enable the unaligned memory access handler by
default, that is quite expensive in general, using memcpy will be much cheaper
- just break down that into several load/store byte instructions.

ARM and MIPS has similar issue:

ARM: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51456
MIPS: https://gcc.gnu.org/legacy-ml/gcc-help/2005-07/msg00325.html

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2022-06-30 08:04:52 -07:00
Tejas Belagod
e9dd368296 AArch64: Add asymmetric faulting mode for tag violations in mem.tagging tunable
The new asymmetric mode is available when HWCAP2_MTE3 is set (support is
available), bit2 is set in the tunable (user request per application),
and the system is configured such that the asymmetric mode is preferred over
sync or async (per-cpu system-wide setting).

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2022-06-30 14:01:08 +01:00
Adhemerval Zanella
71d87d85bf linux: Fix mq_timereceive check for 32 bit fallback code (BZ 29304)
On  success,  mq_receive() and mq_timedreceive() return the number of
bytes in the received message, so it requires to check if the value
is larger than 0.

Checked on i686-linux-gnu.
2022-06-30 09:12:59 -03:00
Noah Goldstein
96ac447d91 x86: Add missing IS_IN (libc) check to strncmp-sse4_2.S
Was missing to for the multiarch build rtld-strncmp-sse4_2.os was
being built and exporting symbols:

build/glibc/string/rtld-strncmp-sse4_2.os:
0000000000000000 T __strncmp_sse42

Introduced in:

commit 11ffcacb64
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Jun 21 12:10:50 2017 -0700

    x86-64: Implement strcmp family IFUNC selectors in C
2022-06-29 19:47:52 -07:00
Noah Goldstein
0aa294fb88 x86: Add missing IS_IN (libc) check to strcspn-sse4.c
Was missing to for the multiarch build rtld-strcspn-sse4.os was
being built and exporting symbols:

build/glibc/string/rtld-strcspn-sse4.os:
                 U ___m128i_shift_right
                 U __strcspn_generic
0000000000000000 T __strcspn_sse42
                 U strlen

build/glibc/string/rtld-varshift.os:
0000000000000000 R ___m128i_shift_right

Introduced in:

commit 06e51c8f3d
Author: H.J. Lu <hongjiu.lu@intel.com>
Date:   Fri Jul 3 02:48:56 2009 -0700

    Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.
2022-06-29 19:47:52 -07:00
Noah Goldstein
8cfbbbcdf9 x86: Add missing IS_IN (libc) check to memmove-ssse3.S
Was missing to for the multiarch build rtld-memmove-ssse3.os was
being built and exporting symbols:

>$ nm string/rtld-memmove-ssse3.os
                 U __GI___chk_fail
0000000000000020 T __memcpy_chk_ssse3
0000000000000040 T __memcpy_ssse3
0000000000000020 T __memmove_chk_ssse3
0000000000000040 T __memmove_ssse3
0000000000000000 T __mempcpy_chk_ssse3
0000000000000010 T __mempcpy_ssse3
                 U __x86_shared_cache_size_half

Introduced after 2.35 in:

commit 26b2478322
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Thu Apr 14 11:47:40 2022 -0500

    x86: Reduce code size of mem{move|pcpy|cpy}-ssse3
2022-06-29 19:47:52 -07:00
H.J. Lu
88070acdd0 x86-64: Properly indent X86_IFUNC_IMPL_ADD_VN arguments
Properly indent X86_IFUNC_IMPL_ADD_VN arguments for memchr, rawmemchr
and wmemchr.

Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
2022-06-29 19:47:52 -07:00