Commit Graph

14743 Commits

Author SHA1 Message Date
Adhemerval Zanella
5b3e31e312 linux: Implement mremap in C
Variadic function calls in syscalls.list does not work for all ABIs
(for instance where the argument are passed on the stack instead of
registers) and might have underlying issues depending of the variadic
type (for instance if a 64-bit argument is used).

Checked on x86_64-linux-gnu.
2021-11-30 13:13:03 -03:00
Adhemerval Zanella
83008fa495 linux: Add prlimit64 C implementation
The LFS prlimit64 requires a arch-specific implementation in
syscalls.list.  Instead add a generic one that handles the
required symbol alias for __RLIM_T_MATCHES_RLIM64_T.

HPPA is the only outlier which requires a different default
symbol.

Checked on x86_64-linux-gnu and with build for the affected ABIs.
2021-11-30 13:13:03 -03:00
Adhemerval Zanella
137ed5ac44 linux: Use /proc/stat fallback for __get_nprocs_conf (BZ #28624)
The /proc/statm fallback was removed by f13fb81ad3 if sysfs is
not available, reinstate it.

Checked on x86_64-linux-gnu.
2021-11-25 11:00:42 -03:00
Adhemerval Zanella
d150181d73 linux: Add fanotify_mark C implementation
Passing 64-bit arguments on syscalls.list is tricky: it requires
to reimplement the expected kernel abi in each architecture.  This
is way to better to represent in C code where we already have
macros for this (SYSCALL_LL64).

Checked on x86_64-linux-gnu.
2021-11-25 09:56:57 -03:00
Adhemerval Zanella
c3b023a782 linux: Only build fstatat fallback if required
For 32-bit architecture with __ASSUME_STATX there is no need to
build fstatat64_time64_stat.

Checked on i686-linux-gnu.
2021-11-25 09:28:27 -03:00
Sunil K Pandey
c58d3b7d00 x86-64: Add vector sin/sinf to libmvec microbenchmark
Add vector sin/sinf and input files to libmvec microbenchmark.

libmvec-sin-inputs:
  90% Normal random distribution
  range: (-DBL_MAX, DBL_MAX)
  mean: 0.0
  sigma: 5.0
  10% uniform random distribution in range (-1000.0, 1000.0)

libmvec-sinf-inputs:
  90% Normal random distribution
  range: (-FLT_MAX, FLT_MAX)
  mean: 0.0f
  sigma: 5.0f
  10% uniform random distribution in range (-1000.0f, 1000.0f)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:50:23 -08:00
Sunil K Pandey
6a556bac81 x86-64: Add vector pow/powf to libmvec microbenchmark
Add vector pow/powf and input files to libmvec microbenchmark.

libmvec-pow-inputs:
  arg1:
    90% Normal random distribution
    range: (0.0, 256.0)
    mean: 0.0
    sigma: 32.0
    10% uniform random distribution in range (0.0, 256.0)
  arg2:
    90% Normal random distribution
    range: (-127.0, 127.0)
    mean: 0.0
    sigma: 16.0
    10% uniform random distribution in range (-127.0, 127.0)

libmvec-powf-inputs:
  arg1:
    90% Normal random distribution
    range: (0.0f, 100.0f)
    mean: 0.0f
    sigma: 16.0f
    10% uniform random distribution in range (0.0f, 100.0f)
  arg2:
    90% Normal random distribution
    range: (-10.0f, 10.0f)
    mean: 0.0f
    sigma: 8.0f
    10% uniform random distribution in range (-10.0f, 10.0f)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:49:14 -08:00
Sunil K Pandey
8ab8afb336 x86-64: Add vector log/logf to libmvec microbenchmark
Add vector log/logf and input files to libmvec microbenchmark.

libmvec-log-inputs:
  70% Normal random distribution
  range: (0.0, DBL_MAX)
  mean: 1.0
  sigma: 50.0
  30% uniform random distribution in range (0.0, DBL_MAX)

libmvec-logf-inputs:
  70% Normal random distribution
  range: (0.0f, FLT_MAX)
  mean: 1.0f
  sigma: 50.0f
  30% uniform random distribution in range (0.0f, FLT_MAX)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:48:14 -08:00
Sunil K Pandey
37df38bd5f x86-64: Add vector exp/expf to libmvec microbenchmark
Add vector exp/expf and input files to libmvec microbenchmark.

libmvec-exp-inputs:
  90% Normal random distribution
  range: (-708.0, 709.0)
  mean: 0.0
  sigma: 16.0
  10% uniform random distribution in range (-500.0, 500.0)

libmvec-expf-inputs:
  90% Normal random distribution
  range: (-87.0f, 88.0f)
  mean: 0.0f
  sigma: 8.0f
  10% uniform random distribution in range (-50.0f, 50.0f)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:46:59 -08:00
Sunil K Pandey
4443695598 x86-64: Add vector cos/cosf to libmvec microbenchmark
Add vector cos/cosf and input files to libmvec microbenchmark.

libmvec-cos-inputs:
  90% Normal random distribution
  range: (-DBL_MAX, DBL_MAX)
  mean: 0.0
  sigma: 5.0
  10% uniform random distribution in range (-1000.0, 1000.0)

libmvec-cosf-inputs:
  90% Normal random distribution
  range: (-FLT_MAX, FLT_MAX)
  mean: 0.0f
  sigma: 5.0f
  10% uniform random distribution in range (-1000.0f, 1000.0f)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:45:20 -08:00
Adhemerval Zanella
456b3c08b6 io: Refactor close_range and closefrom
Now that Hurd implementis both close_range and closefrom (f2c996597d),
we can make close_range() a base ABI, and make the default closefrom()
implementation on top of close_range().

The generic closefrom() implementation based on __getdtablesize() is
moved to generic close_range().  On Linux it will be overriden by
the auto-generation syscall while on Hurd it will be a system specific
implementation.

The closefrom() now calls close_range() and __closefrom_fallback().
Since on Hurd close_range() does not fail, __closefrom_fallback() is an
empty static inline function set by__ASSUME_CLOSE_RANGE.

The __ASSUME_CLOSE_RANGE also allows optimize Linux
__closefrom_fallback() implementation when --enable-kernel=5.9 or
higher is used.

Finally the Linux specific tst-close_range.c is moved to io and
enabled as default.  The Linuxism and CLOSE_RANGE_UNSHARE are
guarded so it can be built for Hurd (I have not actually test it).

Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu
build.
2021-11-24 09:09:37 -03:00
Florian Weimer
e186fc5a31 nptl: Do not set signal mask on second setjmp return [BZ #28607]
__libc_signal_restore_set was in the wrong place: It also ran
when setjmp returned the second time (after pthread_exit or
pthread_cancel).  This is observable with blocked pending
signals during thread exit.

Fixes commit b3cae39dcb
("nptl: Start new threads with all signals blocked [BZ #25098]").

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-11-24 08:59:54 +01:00
Adhemerval Zanella
aac54dcd37 powerpc: Define USE_PPC64_NOTOC iff compiler supports it
The @notoc usage only yields an advantage on ISA 3.1+ machine (power10)
and for ld.bfd also when it sees pcrel relocations used on the code
(generated if compiler targets ISA 3.1+).  On bfd case ISA 3.1+
instruction on stubs are used iff linker also sees the new pc-relative
relocations (for instance R_PPC64_D34), otherwise it generates default
stubs (ppc64_elf_check_relocs:4700).

This patch also help on linkers that do not implement this optimization,
since building for older ISA (such as 3.0 / power9) will also trigger
power10 stubs generation in the assembly code uses the NOTOC imacro.

Checked on powerpc64le-linux-gnu.

Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-11-22 14:49:11 -03:00
Adhemerval Zanella
bc801b3a40 setjmp: Replace jmp_buf-macros.h with jmp_buf-macros.sym
It requires less boilerplate code for newer ports.  The _Static_assert
checks from internal setjmp are moved to its own internal test since
setjmp.h is included early by multiple headers (to generate
rtld-sizes.sym).

The riscv jmp_buf-macros.h check is also redundant, it is already
done by riscv configure.ac.

Checked with a build for the affected architectures.
2021-11-22 13:43:22 -03:00
Joseph Myers
5c3ece451d Update kernel version to 5.15 in tst-mman-consts.py
This patch updates the kernel version in the test tst-mman-consts.py
to 5.15.  (There are no new MAP_* constants covered by this test in
5.15 that need any other header changes.)

Tested with build-many-glibcs.py.
2021-11-22 15:30:12 +00:00
Joseph Myers
bdeb7a8fa9 Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h
Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add
these constants to bits/socket.h.

Tested for x86_64.
2021-11-17 14:25:16 +00:00
Florian Weimer
f1d333b5bf elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres
This will be used to deallocate memory allocated using the non-minimal
malloc.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-11-17 12:20:29 +01:00
Florian Weimer
8bd336a00a nptl: Extract <bits/atomic_wide_counter.h> from pthread_cond_common.c
And make it an installed header.  This addresses a few aliasing
violations (which do not seem to result in miscompilation due to
the use of atomics), and also enables use of wide counters in other
parts of the library.

The debug output in nptl/tst-cond22 has been adjusted to print
the 32-bit values instead because it avoids a big-endian/little-endian
difference.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-11-17 12:20:13 +01:00
Sunil K Pandey
a43c0b5483 x86-64: Create microbenchmark infrastructure for libmvec
Add python script to generate libmvec microbenchmark from the input
values for each libmvec function using skeleton benchmark template.

Creates double and float benchmarks with vector length 1, 2, 4, 8,
and 16 for each libmvec function.  Vector length 1 corresponds to
scalar version of function and is included for vector function perf
comparison.

Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-16 11:37:39 -08:00
Noah Goldstein
2f9062d717 x86: Shrink memcmp-sse4.S code size
No bug.

This implementation refactors memcmp-sse4.S primarily with minimizing
code size in mind. It does this by removing the lookup table logic and
removing the unrolled check from (256, 512] bytes.

memcmp-sse4 code size reduction : -3487 bytes
wmemcmp-sse4 code size reduction: -1472 bytes

The current memcmp-sse4.S implementation has a large code size
cost. This has serious adverse affects on the ICache / ITLB. While
in micro-benchmarks the implementations appears fast, traces of
real-world code have shown that the speed in micro benchmarks does not
translate when the ICache/ITLB are not primed, and that the cost
of the code size has measurable negative affects on overall
application performance.

See https://research.google/pubs/pub48320/ for more details.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-10 20:12:10 -06:00
Joseph Myers
3387c40a8b Update syscall lists for Linux 5.15
Linux 5.15 has one new syscall, process_mrelease (and also enables the
clone3 syscall for RV32).  It also has a macro __NR_SYSCALL_MASK for
Arm, which is not a syscall but matches the pattern used for syscall
macro names.

Add __NR_SYSCALL_MASK to the names filtered out in the code dealing
with syscall lists, update syscall-names.list for the new syscall and
regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.

Tested with build-many-glibcs.py.
2021-11-10 15:21:19 +00:00
Florian Weimer
98966749f2 s390: Use long branches across object boundaries (jgh instead of jh)
Depending on the layout chosen by the linker, the 16-bit displacement
of the jh instruction is insufficient to reach the target label.

Analysis of the linker failure was carried out by Nick Clifton.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
2021-11-10 15:21:37 +01:00
H.J. Lu
0bd356df1a Remove the unused +mkdep/+make-deps/s-proto.S/s-proto-cancel.S
Since

commit d73f5331ce
Author: Roland McGrath <roland@gnu.org>
Date:   Fri May 2 02:20:45 2003 +0000

    2003-05-01  Roland McGrath  <roland@redhat.com>

dependency is generated by passing -MD -MF to compiler.  Remove the unused
+mkdep, +make-deps, s-proto.S and s-proto-cancel.S.

This fixes BZ #28554.
2021-11-10 04:54:18 -08:00
Adhemerval Zanella
824dd3ec49 Fix build a chec failures after b05fae4d8e
The include cleanup on dl-minimal.c removed too much for some
targets.

Also for Hurd, __sbrk is removed from localplt.data now that
tunables allocated memory through mmap.

Checked with a build for all affected architectures.
2021-11-09 23:21:22 -03:00
Adhemerval Zanella
b05fae4d8e elf: Use the minimal malloc on tunables_strdup
The rtld_malloc functions are moved to its own file so it can be
used on csu code.  Also, the functiosn are renamed to __minimal_*
(since there are now used not only on loader code).

Using the __minimal_malloc on tunables_strdup() avoids potential
issues with sbrk() calls while processing the tunables (I see
sporadic elf/tst-dso-ordering9 on powerpc64le with different
tests failing due ASLR).

Also, using __minimal_malloc over plain mmap optimizes the memory
allocation on both static and dynamic case (since it will any unused
space in either the last page of data segments, avoiding mmap() call,
or from the previous mmap() call).

Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-11-09 14:11:25 -03:00
Samuel Thibault
d41985b71e hurd: Remove unused __libc_close_range
That was just cargo-culted.
2021-11-07 16:23:51 +01:00
Sergey Bugaev
f2c996597d hurd: Implement close_range and closefrom
The close_range () function implements the same API as the Linux and
FreeBSD syscalls. It operates atomically and reliably. The specified
upper bound is clamped to the actual size of the file descriptor table;
it is expected that the most common use case is with last = UINT_MAX.

Like in the Linux syscall, it is also possible to pass the
CLOSE_RANGE_CLOEXEC flag to mark the file descriptors in the range
cloexec instead of acually closing them.

Also, add a Hurd version of the closefrom () function. Since unlike on
Linux, close_range () cannot fail due to being unuspported by the
running kernel, a fallback implementation is never necessary.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20211106153524.82700-1-bugaevc@gmail.com>
2021-11-07 16:16:11 +01:00
Noah Goldstein
475b63702e x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h
No bug.

This patch doubles the rep_movsb_threshold when using ERMS. Based on
benchmarks the vector copy loop, especially now that it handles 4k
aliasing, is better for these medium ranged.

On Skylake with ERMS:

Size,   Align1, Align2, dst>src,(rep movsb) / (vec copy)
4096,   0,      0,      0,      0.975
4096,   0,      0,      1,      0.953
4096,   12,     0,      0,      0.969
4096,   12,     0,      1,      0.872
4096,   44,     0,      0,      0.979
4096,   44,     0,      1,      0.83
4096,   0,      12,     0,      1.006
4096,   0,      12,     1,      0.989
4096,   0,      44,     0,      0.739
4096,   0,      44,     1,      0.942
4096,   12,     12,     0,      1.009
4096,   12,     12,     1,      0.973
4096,   44,     44,     0,      0.791
4096,   44,     44,     1,      0.961
4096,   2048,   0,      0,      0.978
4096,   2048,   0,      1,      0.951
4096,   2060,   0,      0,      0.986
4096,   2060,   0,      1,      0.963
4096,   2048,   12,     0,      0.971
4096,   2048,   12,     1,      0.941
4096,   2060,   12,     0,      0.977
4096,   2060,   12,     1,      0.949
8192,   0,      0,      0,      0.85
8192,   0,      0,      1,      0.845
8192,   13,     0,      0,      0.937
8192,   13,     0,      1,      0.939
8192,   45,     0,      0,      0.932
8192,   45,     0,      1,      0.927
8192,   0,      13,     0,      0.621
8192,   0,      13,     1,      0.62
8192,   0,      45,     0,      0.53
8192,   0,      45,     1,      0.516
8192,   13,     13,     0,      0.664
8192,   13,     13,     1,      0.659
8192,   45,     45,     0,      0.593
8192,   45,     45,     1,      0.575
8192,   2048,   0,      0,      0.854
8192,   2048,   0,      1,      0.834
8192,   2061,   0,      0,      0.863
8192,   2061,   0,      1,      0.857
8192,   2048,   13,     0,      0.63
8192,   2048,   13,     1,      0.629
8192,   2061,   13,     0,      0.627
8192,   2061,   13,     1,      0.62

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-06 16:18:08 -05:00
Noah Goldstein
a6b7502ec0 x86: Optimize memmove-vec-unaligned-erms.S
No bug.

The optimizations are as follows:

1) Always align entry to 64 bytes. This makes behavior more
   predictable and makes other frontend optimizations easier.

2) Make the L(more_8x_vec) cases 4k aliasing aware. This can have
   significant benefits in the case that:
        0 < (dst - src) < [256, 512]

3) Align before `rep movsb`. For ERMS this is roughly a [0, 30%]
   improvement and for FSRM [-10%, 25%].

In addition to these primary changes there is general cleanup
throughout to optimize the aligning routines and control flow logic.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-06 16:18:03 -05:00
Paul A. Clarke
9fea0f1a2a [powerpc] Tighten contraints for asm constant parameters
There are a few places where only known numeric values are acceptable for
`asm` parameters, yet the constraint "i" is used.  "i" can include
"symbolic constants whose values will be known only at assembly time or
later."

Use "n" instead of "i" where known numeric values are required.

Suggested-by: Segher Boessenkool <segher@kernel.crashing.org>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-11-03 09:17:28 -05:00
Adhemerval Zanella
09f214528c riscv: Build with -mno-relax if linker does not support R_RISCV_ALIGN
It allows build both glibc and tests with lld (Since lld does not
support R_RISCV_ALIGN linker relaxation).

Checked with a build for riscv32-linux-gnu-rv32imafdc-ilp32d and
riscv64-linux-gnu-rv64imafdc-lp64d.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Fangrui Song <maskray@google.com>
2021-11-03 09:25:06 -03:00
Fangrui Song
6720d36b66 x86-64: Replace movzx with movzbl
Clang cannot assemble movzx in the AT&T dialect mode.

../sysdeps/x86_64/strcmp.S:2232:16: error: invalid operand for instruction
 movzx (%rsi), %ecx
               ^~~~

Change movzx to movzbl, which follows the AT&T dialect and is used
elsewhere in the file.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-02 20:59:52 -07:00
Florian Weimer
cca75bd8b5 i386: Explain why __HAVE_64B_ATOMICS has to be 0 2021-11-02 10:26:23 +01:00
Adhemerval Zanella
613cb5c7b1 arm: Use have-mtls-dialect-gnu2 to check for ARM TLS descriptors support
The lld linker does not support TLSDESC for arm.  The have-arm-tls-desc
is a leftover of 56583289b1 to support NaCL.

Reviewed-by: Fangrui Song <maskray@google.com>
2021-11-01 16:23:15 -03:00
Adhemerval Zanella
d6dea8c847 arm: Use internal symbol for _dl_argv on _dl_start_user
The lld does not support R_ARM_GOTOFF32 to preemptible symbol (_dl_argv
has default visibility).  Use the internal alias instead (one option
would to use HIDDEN_JUMPTARGET, bu the macro is not defined for
!__ASSEMBLER__ and I made this patch arm-specific to avoid require to
check extensivelly on other architecture it this might break something).

Checked on arm-linux-gnueabihf.

Reviewed-by: Fangrui Song <maskray@google.com>
2021-11-01 16:21:53 -03:00
H.J. Lu
14dbbf46a0 x86-64: Remove Prefer_AVX2_STRCMP
Remove Prefer_AVX2_STRCMP to enable EVEX strcmp.  When comparing 2 32-byte
strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1
VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD
and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1
VPMOVMSKB and 1 TESTL.  EVEX strcmp is now faster than AVX2 strcmp by up
to 40% on Tiger Lake and Ice Lake.
2021-11-01 07:53:04 -07:00
H.J. Lu
c46e9afb2d x86-64: Improve EVEX strcmp with masked load
In strcmp-evex.S, to compare 2 32-byte strings, replace

        VMOVU   (%rdi, %rdx), %YMM0
        VMOVU   (%rsi, %rdx), %YMM1
        /* Each bit in K0 represents a mismatch in YMM0 and YMM1.  */
        VPCMP   $4, %YMM0, %YMM1, %k0
        VPCMP   $0, %YMMZERO, %YMM0, %k1
        VPCMP   $0, %YMMZERO, %YMM1, %k2
        /* Each bit in K1 represents a NULL in YMM0 or YMM1.  */
        kord    %k1, %k2, %k1
        /* Each bit in K1 represents a NULL or a mismatch.  */
        kord    %k0, %k1, %k1
        kmovd   %k1, %ecx
        testl   %ecx, %ecx
        jne     L(last_vector)

with

        VMOVU   (%rdi, %rdx), %YMM0
        VPTESTM %YMM0, %YMM0, %k2
        /* Each bit cleared in K1 represents a mismatch or a null CHAR
           in YMM0 and 32 bytes at (%rsi, %rdx).  */
        VPCMP   $0, (%rsi, %rdx), %YMM0, %k1{%k2}
        kmovd   %k1, %ecx
        incl    %ecx
        jne     L(last_vector)

It makes EVEX strcmp faster than AVX2 strcmp by up to 40% on Tiger Lake
and Ice Lake.

Co-Authored-By: Noah Goldstein <goldstein.w.n@gmail.com>
2021-11-01 07:52:56 -07:00
Stafford Horne
6446c725d4 Fix compiler issue with mmap_internal
Compiling mmap_internal fails to compile when we use -1 for MMAP2_PAGE_UNIT
on 32 bit architectures.  The error is as follows:

    ../sysdeps/unix/sysv/linux/mmap_internal.h:30:8: error: unknown type
    name 'uint64_t'
    |
       30 | static uint64_t page_unit;
	  |
	  |        ^~~~~~~~

Fix by adding including stdint.h.
2021-10-29 09:21:37 -03:00
Noah Goldstein
1d56fd3bae x86_64: Add memcmpeq.S to fix disable-multi-arch build
The following commit:

    commit cf4fd28ea4
    Author: Noah Goldstein <goldstein.w.n@gmail.com>
    Date:   Tue Oct 26 19:43:18 2021 -0500

Broke --disable-multi-arch build for x86_64 because x86_64/memcmpeq.S
was not defined outside of multiarch and the alias for __memcmpeq in
x86_64/memcmp.S was removed.

This commit fixes that issue by adding x86_64/memcmpeq.S.

make xcheck passes on x86_64 with and without --disable-multi-arch
2021-10-28 16:35:50 -05:00
Fangrui Song
6838920383 riscv: Fix incorrect jal with HIDDEN_JUMPTARGET
A non-local STV_DEFAULT defined symbol is by default preemptible in a
shared object. j/jal cannot target a preemptible symbol. On other
architectures, such a jump instruction either causes PLT [BZ #18822], or
if short-ranged, sometimes rejected by the linker (but not by GNU ld's
riscv port [ld PR/28509]).

Use HIDDEN_JUMPTARGET to target a non-preemptible symbol instead.

With this patch, ld.so and libc.so can be linked with LLD if source
files are compiled/assembled with -mno-relax/-Wa,-mno-relax.

Acked-by: Palmer Dabbelt <palmer@dabbelt.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-28 11:39:49 -07:00
Noah Goldstein
9b7cfab180 x86_64: Add evex optimized __memcmpeq in memcmpeq-evex.S
No bug. This commit adds new optimized __memcmpeq implementation for
evex.

The primary optimizations are:

1) skipping the logic to find the difference of the first mismatched
byte.

2) not updating src/dst addresses as the non-equals logic does not
need to be reused by different areas.
2021-10-27 13:03:46 -05:00
Noah Goldstein
b4ed69ba16 x86_64: Add avx2 optimized __memcmpeq in memcmpeq-avx2.S
No bug. This commit adds new optimized __memcmpeq implementation for
avx2.

The primary optimizations are:

1) skipping the logic to find the difference of the first mismatched
byte.

2) not updating src/dst addresses as the non-equals logic does not
need to be reused by different areas.
2021-10-27 13:03:46 -05:00
Noah Goldstein
fa7f63d8d6 x86_64: Add sse2 optimized __memcmpeq in memcmp-sse2.S
No bug. This commit does not modify any of the memcmp
implementation. It just adds __memcmpeq ifdefs to skip obvious cases
where computing the proper 1/-1 required by memcmp is not needed.
2021-10-27 13:03:46 -05:00
Noah Goldstein
cf4fd28ea4 x86_64: Add support for __memcmpeq using sse2, avx2, and evex
No bug. This commit adds support for __memcmpeq to be implemented
seperately from memcmp. Support is added for versions optimized with
sse2, avx2, and evex.
2021-10-27 13:03:46 -05:00
Noah Goldstein
9894127d20 String: Add hidden defs for __memcmpeq() to enable internal usage
No bug.

This commit adds hidden defs for all declarations of __memcmpeq. This
enables usage of __memcmpeq without the PLT for usage internal to
GLIBC.
2021-10-26 16:51:29 -05:00
Noah Goldstein
44829b3ddb String: Add support for __memcmpeq() ABI on all targets
No bug.

This commit adds support for __memcmpeq() as a new ABI for all
targets. In this commit __memcmpeq() is implemented only as an alias
to the corresponding targets memcmp() implementation. __memcmpeq() is
added as a new symbol starting with GLIBC_2.35 and defined in string.h
with comments explaining its behavior. Basic tests that it is callable
and works where added in string/tester.c

As discussed in the proposal "Add new ABI '__memcmpeq()' to libc"
__memcmpeq() is essentially a reserved namespace for bcmp(). The means
is shares the same specifications as memcmp() except the return value
for non-equal byte sequences is any non-zero value. This is less
strict than memcmp()'s return value specification and can be better
optimized when a boolean return is all that is needed.

__memcmpeq() is meant to only be called by compilers if they can prove
that the return value of a memcmp() call is only used for its boolean
value.

All tests in string/tester.c passed. As well build succeeds on
x86_64-linux-gnu target.
2021-10-26 16:51:29 -05:00
Fangrui Song
8438135d34 configure: Don't check LD -v --help for LIBC_LINKER_FEATURE
When LIBC_LINKER_FEATURE is used to check a linker option with the equal
sign, it will likely fail because the LD -v --help output may look like
`-z lam-report=[none|warning|error]` while the needle is something like
`-z lam-report=warning`.

The LD -v --help filter doesn't save much time, so just remove it.
2021-10-25 13:17:44 -07:00
Noah Goldstein
bad852b61b x86: Replace sse2 instructions with avx in memcmp-evex-movbe.S
This commit replaces two usages of SSE2 'movups' with AVX 'vmovdqu'.

it could potentially be dangerous to use SSE2 if this function is ever
called without using 'vzeroupper' beforehand. While compilers appear
to use 'vzeroupper' before function calls if AVX2 has been used, using
SSE2 here is more brittle. Since it is not absolutely necessary it
should be avoided.

It costs 2-extra bytes but the extra bytes should only eat into
alignment padding.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-10-23 13:02:42 -05:00
Sunil K Pandey
4f690aad9e x86_64: Add missing libmvec ABI tests
Add vector ABI tests for cos, exp, log, pow and sin functions.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-10-22 06:46:49 -07:00
Adhemerval Zanella
927246e188 elf: Fix e6fd79f379 build with --enable-tunables=no
The _dl_sort_maps_init() is not defined when tunables is not enabled.

Checked on x86_64-linux-gnu.
2021-10-21 17:26:32 -03:00
Chung-Lin Tang
15a0c5730d elf: Fix slow DSO sorting behavior in dynamic loader (BZ #17645)
This second patch contains the actual implementation of a new sorting algorithm
for shared objects in the dynamic loader, which solves the slow behavior that
the current "old" algorithm falls into when the DSO set contains circular
dependencies.

The new algorithm implemented here is simply depth-first search (DFS) to obtain
the Reverse-Post Order (RPO) sequence, a topological sort. A new l_visited:1
bitfield is added to struct link_map to more elegantly facilitate such a search.

The DFS algorithm is applied to the input maps[nmap-1] backwards towards
maps[0]. This has the effect of a more "shallow" recursion depth in general
since the input is in BFS. Also, when combined with the natural order of
processing l_initfini[] at each node, this creates a resulting output sorting
closer to the intuitive "left-to-right" order in most cases.

Another notable implementation adjustment related to this _dl_sort_maps change
is the removing of two char arrays 'used' and 'done' in _dl_close_worker to
represent two per-map attributes. This has been changed to simply use two new
bit-fields l_map_used:1, l_map_done:1 added to struct link_map. This also allows
discarding the clunky 'used' array sorting that _dl_sort_maps had to sometimes
do along the way.

Tunable support for switching between different sorting algorithms at runtime is
also added. A new tunable 'glibc.rtld.dynamic_sort' with current valid values 1
(old algorithm) and 2 (new DFS algorithm) has been added. At time of commit
of this patch, the default setting is 1 (old algorithm).

Signed-off-by: Chung-Lin Tang  <cltang@codesourcery.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-21 11:23:53 -03:00
Fangrui Song
aa783f9a7b linux: Fix a possibly non-constant expression in _Static_assert
According to C11 6.6p6, `const int` as an operand may not make up a
constant expression. GCC -O0 errors:

../sysdeps/unix/sysv/linux/opendir.c:107:19: error: static_assert expression is not an integral constant expression
  _Static_assert (allocation_size >= sizeof (struct dirent64),

-O2 -Wpedantic has a similar warning.
See https://gcc.gnu.org/PR102502 for GCC's inconsistency.

Use enum which is guaranteed to be a constant expression.
This also makes the file compilable with Clang.

Fixes: 4b962c9e85 ("linux: Simplify opendir buffer allocation")
2021-10-20 14:22:43 -07:00
H.J. Lu
d962cce139 x86-64: Add sysdeps/x86_64/fpu/Makeconfig
1. Add sysdeps/x86_64/fpu/Makeconfig to auto-generate libmvec.mk, which
contains libmvec ABI test dependencies and CFLAGS, in the build directory.
2. Include libmvec.mk for libmvec ABI test dependencies and CFLAGS.

Tested on SSE4, AVX, AVX2 and AVX512 machines.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-10-20 11:53:45 -07:00
Adhemerval Zanella
82fd7314c7 powerpc: Remove backtrace implementation
The powerpc optimization to provide a fast stacktrace requires some
ad-hoc code to handle Linux signal frames and the change is fragile
once the kernel decides to slight change its execution sequence [1].

The generic implementation work as-is and it should be future proof
since the kernel provides the expected CFI directives in vDSO shared
page.

Checked on powerpc-linux-gnu, powerpc64le-linux-gnu, and
powerpc64-linux-gnu.

[1] https://sourceware.org/pipermail/libc-alpha/2021-January/122027.html
2021-10-20 10:40:53 -03:00
H.J. Lu
2ec99d8c42 ld.so: Initialize bootstrap_map.l_ld_readonly [BZ #28340]
1. Define DL_RO_DYN_SECTION to initalize bootstrap_map.l_ld_readonly
before calling elf_get_dynamic_info to get dynamic info in bootstrap_map,
2. Define a single

static inline bool
dl_relocate_ld (const struct link_map *l)
{
  /* Don't relocate dynamic section if it is readonly  */
  return !(l->l_ld_readonly || DL_RO_DYN_SECTION);
}

This updates BZ #28340 fix.
2021-10-19 06:40:38 -07:00
Stafford Horne
1d550265a7 timex: Use 64-bit fields on 32-bit TIMESIZE=64 systems (BZ #28469)
This was found when testing the OpenRISC port I am working on.  These
two tests fail with SIGSEGV:

  FAIL: misc/tst-ntp_gettime
  FAIL: misc/tst-ntp_gettimex

This was found to be due to the kernel overwriting the stack space
allocated by the timex structure.  The reason for the overwrite being
that the kernel timex has 64-bit fields and user space code only
allocates enough stack space for timex with 32-bit fields.

On 32-bit systems with TIMESIZE=64 __USE_TIME_BITS64 is not defined.
This causes the timex structure to use 32-bit fields with type
__syscall_slong_t.

This patch adjusts the ifdef condition to allow 32-bit systems with
TIMESIZE=64 to use the 64-bit long long timex definition.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-18 17:17:20 -03:00
Samuel Thibault
1d3decee99 hurd if_index: Explicitly use AF_INET for if index discovery
5bf07e1b3a ("Linux: Simplify __opensock and fix race condition [BZ #28353]")
made __opensock try NETLINK then UNIX then INET. On the Hurd, only INET
knows about network interfaces, so better actually specify that in
if_index.
2021-10-18 01:39:02 +02:00
Samuel Thibault
1d20f33ff4 hurd: Fix intr-msg parameter/stack kludge
INTR_MSG_TRAP was tinkering with esp to make it point to
_hurd_intr_rpc_mach_msg's parameters, and notably use (&msg)[-1] which is
meaningless in C.

Instead, just push the parameters on the stack, which also avoids leaving
local variables of _hurd_intr_rpc_mach_msg below esp. We now also
properly express that OPTION and TIMEOUT may be updated during the trap
call.
2021-10-18 00:50:41 +02:00
H.J. Lu
9d3c9a046a x86-64: Add test-vector-abi.h/test-vector-abi-sincos.h
Add templates for vector ABI test and use them for vector sincos/sincosf
ABI tests.
2021-10-14 11:59:12 -07:00
Adhemerval Zanella
d6d89608ac elf: Fix dynamic-link.h usage on rtld.c
The 4af6982e4c fix does not fully handle RTLD_BOOTSTRAP usage on
rtld.c due two issues:

  1. RTLD_BOOTSTRAP is also used on dl-machine.h on various
     architectures and it changes the semantics of various machine
     relocation functions.

  2. The elf_get_dynamic_info() change was done sideways, previously
     to 490e6c62aa get-dynamic-info.h was included by the first
     dynamic-link.h include *without* RTLD_BOOTSTRAP being defined.
     It means that the code within elf_get_dynamic_info() that uses
     RTLD_BOOTSTRAP is in fact unused.

To fix 1. this patch now includes dynamic-link.h only once with
RTLD_BOOTSTRAP defined.  The ELF_DYNAMIC_RELOCATE call will now have
the relocation fnctions with the expected semantics for the loader.

And to fix 2. part of 4af6982e4c is reverted (the check argument
elf_get_dynamic_info() is not required) and the RTLD_BOOTSTRAP
pieces are removed.

To reorganize the includes the static TLS definition is moved to
its own header to avoid a circular dependency (it is defined on
dynamic-link.h and dl-machine.h requires it at same time other
dynamic-link.h definition requires dl-machine.h defitions).

Also ELF_MACHINE_NO_REL, ELF_MACHINE_NO_RELA, and ELF_MACHINE_PLT_REL
are moved to its own header.  Only ancient ABIs need special values
(arm, i386, and mips), so a generic one is used as default.

The powerpc Elf64_FuncDesc is also moved to its own header, since
csu code required its definition (which would require either include
elf/ folder or add a full path with elf/).

Checked on x86_64, i686, aarch64, armhf, powerpc64, powerpc32,
and powerpc64le.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-10-14 14:52:07 -03:00
Noah Goldstein
e59ced2384 x86: Optimize memset-vec-unaligned-erms.S
No bug.

Optimization are

1. change control flow for L(more_2x_vec) to fall through to loop and
   jump for L(less_4x_vec) and L(less_8x_vec). This uses less code
   size and saves jumps for length > 4x VEC_SIZE.

2. For EVEX/AVX512 move L(less_vec) closer to entry.

3. Avoid complex address mode for length > 2x VEC_SIZE

4. Slightly better aligning code for the loop from the perspective of
   code size and uops.

5. Align targets so they make full use of their fetch block and if
   possible cache line.

6. Try and reduce total number of icache lines that will need to be
   pulled in for a given length.

7. Include "local" version of stosb target. For AVX2/EVEX/AVX512
   jumping to the stosb target in the sse2 code section will almost
   certainly be to a new page. The new version does increase code size
   marginally by duplicating the target but should get better iTLB
   behavior as a result.

test-memset, test-wmemset, and test-bzero are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-10-12 13:38:02 -05:00
Noah Goldstein
1bd8b8d58f x86: Optimize memcmp-evex-movbe.S for frontend behavior and size
No bug.

The frontend optimizations are to:
1. Reorganize logically connected basic blocks so they are either in
   the same cache line or adjacent cache lines.
2. Avoid cases when basic blocks unnecissarily cross cache lines.
3. Try and 32 byte align any basic blocks possible without sacrificing
   code size. Smaller / Less hot basic blocks are used for this.

Overall code size shrunk by 168 bytes. This should make up for any
extra costs due to aligning to 64 bytes.

In general performance before deviated a great deal dependending on
whether entry alignment % 64 was 0, 16, 32, or 48. These changes
essentially make it so that the current implementation is at least
equal to the best alignment of the original for any arguments.

The only additional optimization is in the page cross case. Branch on
equals case was removed from the size == [4, 7] case. As well the [4,
7] and [2, 3] case where swapped as [4, 7] is likely a more hot
argument size.

test-memcmp and test-wmemcmp are both passing.
2021-10-12 12:02:12 -05:00
Adhemerval Zanella
4af6982e4c elf: Fix elf_get_dynamic_info definition
Before to 490e6c62aa ('elf: Avoid nested functions in the loader
[BZ #27220]'), elf_get_dynamic_info() was defined twice on rtld.c: on
the first dynamic-link.h include and later within _dl_start().  The
former definition did not define DONT_USE_BOOTSTRAP_MAP and it is used
on setup_vdso() (since it is a global definition), while the former does
define DONT_USE_BOOTSTRAP_MAP and it is used on loader self-relocation.

With the commit change, the function is now included and defined once
instead of defined as a nested function.  So rtld.c defines without
defining RTLD_BOOTSTRAP and it brokes at least powerpc32.

This patch fixes by moving the get-dynamic-info.h include out of
dynamic-link.h, which then the caller can corirectly set the expected
semantic by defining STATIC_PIE_BOOTSTRAP, RTLD_BOOTSTRAP, and/or
RESOLVE_MAP.

It also required to enable some asserts only for the loader bootstrap
to avoid issues when called from setup_vdso().

As a side note, this is another issues with nested functions: it is
not clear from pre-processed output (-E -dD) how the function will
be build and its semantic (since nested function will be local and
extra C defines may change it).

I checked on x86_64-linux-gnu (w/o --enable-static-pie),
i686-linux-gnu, powerpc64-linux-gnu, powerpc-linux-gnu-power4,
aarch64-linux-gnu, arm-linux-gnu, sparc64-linux-gnu, and
s390x-linux-gnu.

Reviewed-by: Fangrui Song <maskray@google.com>
2021-10-12 13:25:43 -03:00
Joseph Myers
4912c738fc Fix nios2 localplt failure
Building for nios2-linux-gnu has recently started showing a localplt
test failure, arising from a reference to __floatunsidf from
getloadavg after commit b5c8a3aa82
("Linux: implement getloadavg(3) using sysinfo(2)") (this is an
architecture with soft-fp in libc).  Add this as a permitted local PLT
reference in localplt.data.

Tested with build-many-glibcs.py for nios2-linux-gnu.
2021-10-11 21:47:32 +00:00
Fangrui Song
bf433b849a elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT)
Intel MPX failed to gain wide adoption and has been deprecated for a
while. GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in
2019.

This patch removes the support code from the dynamic loader.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-10-11 11:14:02 -07:00
Noah Goldstein
fc5bd179ef x86: Modify ENTRY in sysdep.h so that p2align can be specified
No bug.

This change adds a new macro ENTRY_P2ALIGN which takes a second
argument, log2 of the desired function alignment.

The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this
doesn't affect any existing functionality.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-10-08 11:30:52 -05:00
Cristian Rodríguez
b5c8a3aa82 Linux: implement getloadavg(3) using sysinfo(2)
Signed-off-by: Cristian Rodríguez <crrodriguez@opensuse.org>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-08 09:52:19 -03:00
Fangrui Song
490e6c62aa elf: Avoid nested functions in the loader [BZ #27220]
dynamic-link.h is included more than once in some elf/ files (rtld.c,
dl-conflict.c, dl-reloc.c, dl-reloc-static-pie.c) and uses GCC nested
functions. This harms readability and the nested functions usage
is the biggest obstacle prevents Clang build (Clang doesn't support GCC
nested functions).

The key idea for unnesting is to add extra parameters (struct link_map
*and struct r_scope_elm *[]) to RESOLVE_MAP,
ELF_MACHINE_BEFORE_RTLD_RELOC, ELF_DYNAMIC_RELOCATE, elf_machine_rel[a],
elf_machine_lazy_rel, and elf_machine_runtime_setup. (This is inspired
by Stan Shebs' ppc64/x86-64 implementation in the
google/grte/v5-2.27/master which uses mixed extra parameters and static
variables.)

Future simplification:
* If mips elf_machine_runtime_setup no longer needs RESOLVE_GOTSYM,
  elf_machine_runtime_setup can drop the `scope` parameter.
* If TLSDESC no longer need to be in elf_machine_lazy_rel,
  elf_machine_lazy_rel can drop the `scope` parameter.

Tested on aarch64, i386, x86-64, powerpc64le, powerpc64, powerpc32,
sparc64, sparcv9, s390x, s390, hppa, ia64, armhf, alpha, and mips64.
In addition, tested build-many-glibcs.py with {arc,csky,microblaze,nios2}-linux-gnu
and riscv64-linux-gnu-rv64imafdc-lp64d.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-07 11:55:02 -07:00
H.J. Lu
349b0441da Add run-time check for indirect external access
When performing symbol lookup for references in executable without
indirect external access:

1. Disallow copy relocations in executable against protected data symbols
in a shared object with indirect external access.
2. Disallow non-zero symbol values of undefined function symbols in
executable, which are used as the function pointer, against protected
function symbols in a shared object with indirect external access.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-07 10:26:48 -07:00
H.J. Lu
1bd888d0b7 Initial support for GNU_PROPERTY_1_NEEDED
1. Add GNU_PROPERTY_1_NEEDED:

 #define GNU_PROPERTY_1_NEEDED      GNU_PROPERTY_UINT32_OR_LO

to indicate the needed properties by the object file.
2. Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:

 #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)

to indicate that the object file requires canonical function pointers and
cannot be used with copy relocation.
3. Scan GNU_PROPERTY_1_NEEDED property and store it in l_1_needed.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-07 10:26:08 -07:00
Stefan Liebler
f2e06656d0 S390: Add PCI_MIO and SIE HWCAPs
Both new HWCAPs were introduced in these kernel commits:
- 7e8403ecaf884f307b627f3c371475913dd29292
  "s390: add HWCAP_S390_PCI_MIO to ELF hwcaps"
- 7e82523f2583e9813e4109df3656707162541297
  "s390/hwcaps: make sie capability regular hwcap"

Also note that the kernel commit 511ad531afd4090625def4d9aba1f5227bd44b8e
"s390/hwcaps: shorten HWCAP defines" has shortened the prefix of the macros
from "HWCAP_S390_" to "HWCAP_".  For compatibility reasons, we do not
change the prefix in public glibc header file.
2021-10-07 06:49:39 +02:00
Stefan Liebler
47252e4336 S390: update libm test ulps
Update after

 commit 6bbf729832.
 Fixed inaccuracy of j0f (BZ #28185)

See also e.g.
commit c75b106145
aarch64: update libm test ulps
2021-10-06 16:34:40 +02:00
Adhemerval Zanella
260d3032ad powerpc: update libm test ulps
Update after commit 6bbf729832
(Fixed inaccuracy of j0f (BZ #28185)).
2021-10-06 10:50:33 -03:00
Adhemerval Zanella
d2b1254db2 y2038: Use a common definition for stat for sparc32
The sparc32 misses support for support done by 4e8521333b.

Checked on sparcv9-linux-gnu.
2021-10-06 08:10:13 -03:00
Szabolcs Nagy
c75b106145 aarch64: update libm test ulps
Update after

 commit 6bbf729832.
 Fixed inaccuracy of j0f (BZ #28185)
2021-10-05 13:44:27 +01:00
Paul Zimmermann
6bbf729832 Fixed inaccuracy of j0f (BZ #28185)
The largest errors over the full binary32 range are after this
patch (on x86_64):

RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7

Inputs that were yielding huge errors have been added to "make check".
Reviewed-by: Adhemeral Zanella  <adhemerval.zanella@linaro.org>
2021-10-05 13:45:37 +02:00
Szabolcs Nagy
83b5323261 elf: Avoid deadlock between pthread_create and ctors [BZ #28357]
The fix for bug 19329 caused a regression such that pthread_create can
deadlock when concurrent ctors from dlopen are waiting for it to finish.
Use a new GL(dl_load_tls_lock) in pthread_create that is not taken
around ctors in dlopen.

The new lock is also used in __tls_get_addr instead of GL(dl_load_lock).

The new lock is held in _dl_open_worker and _dl_close_worker around
most of the logic before/after the init/fini routines.  When init/fini
routines are running then TLS is in a consistent, usable state.
In _dl_open_worker the new lock requires catching and reraising dlopen
failures that happen in the critical section.

The new lock is reinitialized in a fork child, to keep the existing
behaviour and it is kept recursive in case malloc interposition or TLS
access from signal handlers can retake it.  It is not obvious if this
is necessary or helps, but avoids changing the preexisting behaviour.

The new lock may be more appropriate for dl_iterate_phdr too than
GL(dl_load_write_lock), since TLS state of an incompletely loaded
module may be accessed.  If the new lock can replace the old one,
that can be a separate change.

Fixes bug 28357.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-04 15:07:05 +01:00
Florian Weimer
eae81d7057 nptl: pthread_kill must send signals to a specific thread [BZ #28407]
The choice between the kill vs tgkill system calls is not just about
the TID reuse race, but also about whether the signal is sent to the
whole process (and any thread in it) or to a specific thread.

This was caught by the openposix test suite:

  LTP: openposix test suite - FAIL: SIGUSR1 is member of new thread pendingset.
  <https://gitlab.com/cki-project/kernel-tests/-/issues/764>

Fixes commit 526c3cf11e ("nptl: Fix race
between pthread_kill and thread exit (bug 12889)").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-10-01 18:16:41 +02:00
Adhemerval Zanella
2313ab153d nptl: Add CLOCK_MONOTONIC support for PI mutexes
Linux added FUTEX_LOCK_PI2 to support clock selection
(commit bf22a6976897977b0a3f1aeba6823c959fc4fdae).  With the new
flag we can now proper support CLOCK_MONOTONIC for
pthread_mutex_clocklock with Priority Inheritance.  If kernel
does not support, EINVAL is returned instead.

The difference is the futex operation will be issued and the kernel
will advertise the missing support (instead of hard-code error
return).

Checked on x86_64-linux-gnu and i686-linux-gnu on Linux 5.14, 5.11,
and 4.15.
2021-10-01 10:11:11 -03:00
Adhemerval Zanella
8352b6df37 nptl: Use FUTEX_LOCK_PI2 when available
This patch uses the new futex PI operation provided by Linux v5.14
when it is required.

The futex_lock_pi64() is moved to futex-internal.c (since it used on
two different places and its code size might be large depending of the
kernel configuration) and clockid is added as an argument.

Co-authored-by: Kurt Kanzenbach <kurt@linutronix.de>
2021-10-01 08:09:13 -03:00
Kurt Kanzenbach
dd5adb515c Linux: Add FUTEX_LOCK_PI2
Linux v5.14.0 introduced a new futex operation called FUTEX_LOCK_PI2.

This kernel feature can be used to implement
pthread_mutex_clocklock(MONOTONIC)/PI.

Signed-off-by: Kurt Kanzenbach <kurt@linutronix.de>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-01 08:09:13 -03:00
Adhemerval Zanella
beca615c5e Update alpha libm-test-ulps 2021-09-30 09:06:21 -03:00
Paul A. Clarke
ee874f44fd powerpc: Fix unrecognized instruction errors with recent binutils
Recent versions of binutils (with commit
b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a) stopped preserving "sticky"
options across a base `.machine` directive, nullifying the use of
passing "-many" through GCC to the assembler.  As a result, some
instructions which were recognized even under older, more stringent
`.machine` directives become unrecognized instructions in that
context.

In `sysdeps/powerpc/tst-set_ppr.c`, the use of the `mfppr32` extended
mnemonic became unrecognized, as the default compilation with GCC for
32bit powerpc adds a `.machine ppc` in the resulting assembly, so the
command line option `-Wa,-many` is essentially ignored, and the ISA 2.06
instructions and mnemonics, like `mfppr32`, are unrecognized.

The compilation of `sysdeps/powerpc/tst-set_ppr.c` fails with:
Error: unrecognized opcode: `mfppr32'

Add appropriate `.machine` directives in the assembly to bracket the
`mfppr32` instruction.

Part of a 2019 fix (commit 9250e6610f) to
the above test's Makefile to add `-many` to the compilation when GCC
itself stopped passing `-many` to the assember no longer has any effect,
so remove that.

Reported-by: Joseph Myers <joseph@codesourcery.com>
2021-09-29 14:42:20 -05:00
Joseph Myers
90f0ac10a7 Add fmaximum, fminimum functions
C2X adds new <math.h> functions for floating-point maximum and
minimum, corresponding to the new operations that were added in IEEE
754-2019 because of concerns about the old operations not being
associative in the presence of signaling NaNs.  fmaximum and fminimum
handle NaNs like most <math.h> functions (any NaN argument means the
result is a quiet NaN).  fmaximum_num and fminimum_num handle both
quiet and signaling NaNs the way fmax and fmin handle quiet NaNs (if
one argument is a number and the other is a NaN, return the number),
but still raise "invalid" for a signaling NaN argument, making them
exceptions to the normal rule that a function with a floating-point
result raising "invalid" also returns a quiet NaN.  fmaximum_mag,
fminimum_mag, fmaximum_mag_num and fminimum_mag_num are corresponding
functions returning the argument with greatest or least absolute
value.  All these functions also treat +0 as greater than -0.  There
are also corresponding <tgmath.h> type-generic macros.

Add these functions to glibc.  The implementations use type-generic
templates based on those for fmax, fmin, fmaxmag and fminmag, and test
inputs are based on those for those functions with appropriate
adjustments to the expected results.  The RISC-V maintainers might
wish to add optimized versions of fmaximum_num and fminimum_num (for
float and double), since RISC-V (F extension version 2.2 and later)
provides instructions corresponding to those functions - though it
might be at least as useful to add architecture-independent built-in
functions to GCC and teach the RISC-V back end to expand those
functions inline, which is what you generally want for functions that
can be implemented with a single instruction.

Tested for x86_64 and x86, and with build-many-glibcs.py.
2021-09-28 23:31:35 +00:00
Florian Weimer
5bf07e1b3a Linux: Simplify __opensock and fix race condition [BZ #28353]
AF_NETLINK support is not quite optional on modern Linux systems
anymore, so it is likely that the first attempt will always succeed.
Consequently, there is no need to cache the result.  Keep AF_UNIX
and the Internet address families as a fallback, for the rare case
that AF_NETLINK is missing.  The other address families previously
probed are totally obsolete be now, so remove them.

Use this simplified version as the generic implementation, disabling
Netlink support as needed.
2021-09-28 18:55:49 +02:00
Stafford Horne
9874ca536b pthread/tst-cancel28: Fix barrier re-init race condition
When running this test on the OpenRISC port I am working on this test
fails with a timeout.  The test passes when being straced or debugged.
Looking at the code there seems to be a race condition in that:

  1 main thread: calls xpthread_cancel
  2 sub thread : receives cancel signal
  3 sub thread : cleanup routine waits on barrier
  4 main thread: re-inits barrier
  5 main thread: waits on barrier

After getting to 5 the main thread and sub thread wait forever as the 2
barriers are no longer the same.

Removing the barrier re-init seems to fix this issue.  Also, the barrier
does not need to be reinitialized as that is done by default.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-09-28 10:47:08 -03:00
Fangrui Song
8e2557a2b8 powerpc: Delete unneeded ELF_MACHINE_BEFORE_RTLD_RELOC
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
2021-09-27 10:12:50 -07:00
Adhemerval Zanella
8f42a98654 posix: Remove spawni.c
Although it provide an alternate implementation that communicates
using pipe() instead of shared memory, no port uses and it adds extra
burden for posix_spawn() extensions.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-09-27 12:44:25 -03:00
H.J. Lu
b0a33dc967 Disable symbol hack in libc_nonshared.a
Don't reference __GI_memmove, __GI_memset, __GI_memcpy, __divdi3_internal,
__udivdi3_internal and __moddi3_internal in libc_nonshared.a.
2021-09-27 07:46:25 -07:00
Adhemerval Zanella
342298278e linux: Revert the use of sched_getaffinity on get_nproc (BZ #28310)
The use of sched_getaffinity on get_nproc and
sysconf (_SC_NPROCESSORS_ONLN) done in 903bc7dcc2 (BZ #27645)
breaks the top command in common hypervisor configurations and also
other monitoring tools.

The main issue using sched_getaffinity changed the symbols semantic
from system-wide scope of online CPUs to per-process one (which can
be changed with kernel cpusets or book parameters in VM).

This patch reverts mostly of the 903bc7dcc2, with the
exceptions:

  * No more cached values and atomic updates, since they are inherent
    racy.

  * No /proc/cpuinfo fallback, since /proc/stat is already used and
    it would require to revert more arch-specific code.

  * The alloca is replace with a static buffer of 1024 bytes.

So the implementation first consult the sysfs, and fallbacks to procfs.

Checked on x86_64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-09-27 09:18:43 -03:00
Adhemerval Zanella
33099d72e4 linux: Simplify get_nprocs
This patch simplifies the memory allocation code and uses the sched
routines instead of reimplement it.  This still uses a stack
allocation buffer, so it can be used on malloc initialization code.

Linux currently supports at maximum of 4096 cpus for most architectures:

$ find -iname Kconfig | xargs git grep -A10 -w NR_CPUS | grep -w range
arch/alpha/Kconfig-	range 2 32
arch/arc/Kconfig-	range 2 4096
arch/arm/Kconfig-	range 2 16 if DEBUG_KMAP_LOCAL
arch/arm/Kconfig-	range 2 32 if !DEBUG_KMAP_LOCAL
arch/arm64/Kconfig-	range 2 4096
arch/csky/Kconfig-	range 2 32
arch/hexagon/Kconfig-	range 2 6 if SMP
arch/ia64/Kconfig-	range 2 4096
arch/mips/Kconfig-	range 2 256
arch/openrisc/Kconfig-	range 2 32
arch/parisc/Kconfig-	range 2 32
arch/riscv/Kconfig-	range 2 32
arch/s390/Kconfig-	range 2 512
arch/sh/Kconfig-	range 2 32
arch/sparc/Kconfig-	range 2 32 if SPARC32
arch/sparc/Kconfig-	range 2 4096 if SPARC64
arch/um/Kconfig-	range 1 1
arch/x86/Kconfig-# [NR_CPUS_RANGE_BEGIN ... NR_CPUS_RANGE_END] range.
arch/x86/Kconfig-	range NR_CPUS_RANGE_BEGIN NR_CPUS_RANGE_END
arch/xtensa/Kconfig-	range 2 32

With x86 supporting 8192:

arch/x86/Kconfig
 976 config NR_CPUS_RANGE_END
 977         int
 978         depends on X86_64
 979         default 8192 if  SMP && CPUMASK_OFFSTACK
 980         default  512 if  SMP && !CPUMASK_OFFSTACK
 981         default    1 if !SMP

So using a maximum of 32k cpu should cover all cases (and I would
expect once we start to have many more CPUs that Linux would provide
a more straightforward way to query for such information).

A test is added to check if sched_getaffinity can successfully return
with large buffers.

Checked on x86_64-linux-gnu and i686-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-09-27 09:18:12 -03:00
Adhemerval Zanella
11a02b035b misc: Add __get_nprocs_sched
This is an internal function meant to return the number of avaliable
processor where the process can scheduled, different than the
__get_nprocs which returns a the system available online CPU.

The Linux implementation currently only calls __get_nprocs(), which
in tuns calls sched_getaffinity.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-09-27 09:13:06 -03:00
Samuel Thibault
1cc205c510 htl: make pthread_sigstate read/write set/oset outside sigstate section
so that if a segfault occurs, the handler can run fine.
2021-09-26 01:04:13 +02:00
Joseph Myers
b26901b26e Fix sysdeps/x86/fpu/s_ffma.c for 32-bit FMA processor case
It turns out the __SSE2_MATH__ conditional in sysdeps/x86/fpu/s_ffma.c
does not cover all cases where the x86 fenv_private.h macros might
manipulate one of the SSE and 387 floating-point state, while the
actual fma implementation uses the other.  Specifically, in the 32-bit
case, with a compiler not defaulting to -mfpmath=sse, but testing on a
processor with hardware FMA support, the multiarch fma function
implementations will end up using SSE, while the fenv_private.h macros
will use the 387 state for double.  Change the conditional to use the
default macros rather than the optimized ones in all cases except when
the compiler inlines an fma instruction (in which case, since all
those instructions are SSE instructions and -mfpmath=sse must be in
effect for them to be inlined, the optimized macros will only use the
SSE state and it's OK for them to only use the SSE state).

Tested for x86_64 and x86.  H.J. reports in
<https://sourceware.org/pipermail/libc-alpha/2021-September/131367.html>
that it fixes the problems he observed.
2021-09-24 17:59:22 +00:00
Florian Weimer
5ad9d62c3b Linux: Avoid closing -1 on failure in __closefrom_fallback
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-09-24 19:51:52 +02:00
Fangrui Song
91e92272ca i386: Port elf_machine_{load_address,dynamic} from x86-64
This drops reliance on _GLOBAL_OFFSET_TABLE_[0] being the link-time
address of _DYNAMIC.

The code sequence length does not change.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-09-24 09:36:32 -07:00
Naohiro Tamura
381b29616a aarch64: Disable A64FX memcpy/memmove BTI unconditionally
This patch disables A64FX memcpy/memmove BTI instruction insertion
unconditionally such as A64FX memset patch [1] for performance.

[1] commit 07b427296b

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-09-24 13:26:59 +01:00
Tulio Magno Quites Machado Filho
54ff4f1e39 powerpc64le: Avoid conflicting types for f64xfmaf128 when IFUNC is not used
Avoid defining f64xfmaf128 twice when building s_fmaf128.c.
This can be reproduced on powerpc64le whenever f128 functions do not
have IFUNC enabled, e.g. using "--with-cpu=power8 --disable-multi-arch", or
when using "-with-cpu=power9".

Fixes: b3f27d8150 ("Add narrowing fma functions")
2021-09-23 19:29:54 -03:00
Joseph Myers
4ed7a383f9 Fix ffma use of round-to-odd on x86
On 32-bit x86 with -mfpmath=sse, and on x86_64 with
--disable-multi-arch, the tests of ffma and its aliases (fma narrowing
from binary64 to binary32) fail.  This is probably the issue reported
by H.J. in
<https://sourceware.org/pipermail/libc-alpha/2021-September/131277.html>.

The problem is the use of fenv_private.h macros in the round-to-odd
implementation.  Those macros are set up to manipulate only one of the
SSE and 387 floating-point state, whichever is relevant for the type
indicated by the suffix on the macro name.  But x86 configurations
sometimes use the ldbl-96 implementation of binary64 fma (that's where
--disable-multi-arch is relevant for x86_64: it causes the ldbl-96
implementation to be used, instead of an IFUNC implementation that
falls back to the dbl-64 version), contrary to the expectations of
those macros for functions operating on double when __SSE2_MATH__ is
defined.

This can be addressed by using the default versions of those macros
(giving x86 its own version of s_ffma.c), as is done for the *f128
macro variants where it depends on the details of how GCC was
configured when building libgcc which floating-point state is affected
by _Float128 arithmetic.  The issue only applies when __SSE2_MATH__ is
defined, and doesn't apply when __FP_FAST_FMA is defined (because in
that case, fma will be inlined by the compiler, meaning it's
definitely an SSE operation; for the same reason, this is not an issue
for narrowing sqrt, as hardware sqrt is always inlined in that
implementation for x86), but in other cases it's safest to use the
default versions of the fenv_private.h macros to ensure things work
whichever fma implementation is used.

Tested for x86_64 (with and without --disable-multi-arch) and x86
(with and without -mfpmath=sse).
2021-09-23 21:18:31 +00:00
Florian Weimer
2849e2f533 nptl: Avoid setxid deadlock with blocked signals in thread exit [BZ #28361]
As part of the fix for bug 12889, signals are blocked during
thread exit, so that application code cannot run on the thread that
is about to exit.  This would cause problems if the application
expected signals to be delivered after the signal handler revealed
the thread to still exist, despite pthread_kill can no longer be used
to send signals to it.  However, glibc internally uses the SIGSETXID
signal in a way that is incompatible with signal blocking, due to the
way the setxid handshake delays thread exit until the setxid operation
has completed.  With a blocked SIGSETXID, the handshake can never
complete, causing a deadlock.

As a band-aid, restore the previous handshake protocol by not blocking
SIGSETXID during thread exit.

The new test sysdeps/pthread/tst-pthread-setuid-loop.c is based on
a downstream test by Martin Osvald.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-09-23 09:56:07 +02:00
Joseph Myers
b3f27d8150 Add narrowing fma functions
This patch adds the narrowing fused multiply-add functions from TS
18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal,
f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x,
f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128,
f64xfmaf128 for configurations with _Float64x and _Float128;
__f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case
(for calls to ffmal and dfmal when long double is IEEE binary128).
Corresponding tgmath.h macro support is also added.

The changes are mostly similar to those for the other narrowing
functions previously added, especially that for sqrt, so the
description of those generally applies to this patch as well.  As with
sqrt, I reused the same test inputs in auto-libm-test-in as for
non-narrowing fma rather than adding extra or separate inputs for
narrowing fma.  The tests in libm-test-narrow-fma.inc also follow
those for non-narrowing fma.

The non-narrowing fma has a known bug (bug 6801) that it does not set
errno on errors (overflow, underflow, Inf * 0, Inf - Inf).  Rather
than fixing this or having narrowing fma check for errors when
non-narrowing does not (complicating the cases when narrowing fma can
otherwise be an alias for a non-narrowing function), this patch does
not attempt to check for errors from narrowing fma and set errno; the
CHECK_NARROW_FMA macro is still present, but as a placeholder that
does nothing, and this missing errno setting is considered to be
covered by the existing bug rather than needing a separate open bug.
missing-errno annotations are duly added to many of the
auto-libm-test-in test inputs for fma.

This completes adding all the new functions from TS 18661-1 to glibc,
so will be followed by corresponding stdc-predef.h changes to define
__STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support
for TS 18661-1 will be at a similar level to that for C standard
floating-point facilities up to C11 (pragmas not implemented, but
library functions done).  (There are still further changes to be done
to implement changes to the types of fromfp functions from N2548.)

Tested as followed: natively with the full glibc testsuite for x86_64
(GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC
11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32
hard float, mips64 (all three ABIs, both hard and soft float).  The
different GCC versions are to cover the different cases in tgmath.h
and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in
glibc headers, GCC 7 has proper _Float* support, GCC 8 adds
__builtin_tgmath).
2021-09-22 21:25:31 +00:00
H.J. Lu
b413280cfb ld.so: Replace DL_RO_DYN_SECTION with dl_relocate_ld [BZ #28340]
We can't relocate entries in dynamic section if it is readonly:

1. Add a l_ld_readonly field to struct link_map to indicate if dynamic
section is readonly and set it based on p_flags of PT_DYNAMIC segment.
2. Replace DL_RO_DYN_SECTION with dl_relocate_ld to decide if dynamic
section should be relocated.
3. Remove DL_RO_DYN_TEMP_CNT.
4. Don't use a static dynamic section to make readonly dynamic section
in vDSO writable.
5. Remove the temp argument from elf_get_dynamic_info.

This fixes BZ #28340.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-09-22 11:12:43 -07:00
Joseph Myers
4eff749e8f Adjust new narrowing div/mul tests for IBM long double, update powerpc ULPs
Testing for powerpc shows some of the new narrowing div/mul tests need
XFAILing for IBM long double and some ULPs updates are needed for
those tests.
2021-09-22 12:35:44 +00:00
Joseph Myers
1356f38df5 Fix f64xdivf128, f64xmulf128 spurious underflows (bug 28358)
As described in bug 28358, the round-to-odd computations used in the
libm functions that round their results to a narrower format can yield
spurious underflow exceptions in the following circumstances: the
narrowing only narrows the precision of the type and not the exponent
range (i.e., it's narrowing _Float128 to _Float64x on x86_64, x86 or
ia64), the architecture does after-rounding tininess detection (which
applies to all those architectures), the result is inexact, tiny
before rounding but not tiny after rounding (with the chosen rounding
mode) for _Float64x (which is possible for narrowing mul, div and fma,
not for narrowing add, sub or sqrt), so the underflow exception
resulting from the toward-zero computation in _Float128 is spurious
for _Float64x.

Fixed by making ROUND_TO_ODD call feclearexcept (FE_UNDERFLOW) in the
problem cases (as indicated by an extra argument to the macro); there
is never any need to preserve underflow exceptions from this part of
the computation, because the conversion of the round-to-odd value to
the narrower type will underflow in exactly the cases in which the
function should raise that exception, but it may be more efficient to
avoid the extra manipulation of the floating-point environment when
not needed.

Tested for x86_64 and x86, and with build-many-glibcs.py.
2021-09-21 21:54:37 +00:00
Florian Weimer
f3e6645633 nptl: Fix type of pthread_mutexattr_getrobust_np, pthread_mutexattr_setrobust_np (bug 28036)
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-09-21 07:13:05 +02:00
Paul A. Clarke
064b475a2e powerpc: Fix unrecognized instruction errors with recent GCC
Recent binutils commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a
changes the behavior of `.machine` directives to override, rather
than augment, the base CPU. This can result in _reduced_ functionality
when, for example, compiling for default machine "power8", but explicitly
asking for ".machine power5", which loses Altivec instructions.

In tst-ucontext-ppc64-vscr.c, while the instructions provoking the new
error messages are bracketed by ".machine power5", which is ostensibly
Power ISA 2.03 (POWER5), the POWER5 processor did not support the
VSX subset, so these instructions are not recognized as "power5".

Error: unrecognized opcode: `vspltisb'
Error: unrecognized opcode: `vpkuwus'
Error: unrecognized opcode: `mfvscr'
Error: unrecognized opcode: `stvx'

Manually adding the VSX subset via ".machine altivec" is sufficient.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-09-20 16:52:38 -05:00
Florian Weimer
95dba35bf0 nptl: pthread_kill needs to return ESRCH for old programs (bug 19193)
The fix for bug 19193 breaks some old applications which appear
to use pthread_kill to probe if a thread is still running, something
that is not supported by POSIX.
2021-09-20 14:56:08 +02:00
H.J. Lu
a93d9e03a3 Extend struct r_debug to support multiple namespaces [BZ #15971]
Glibc does not provide an interface for debugger to access libraries
loaded in multiple namespaces via dlmopen.

The current rtld-debugger interface is described in the file:

elf/rtld-debugger-interface.txt

under the "Standard debugger interface" heading.  This interface only
provides access to the first link-map (LM_ID_BASE).

1. Bump r_version to 2 when multiple namespaces are used.  This triggers
the GDB bug:

https://sourceware.org/bugzilla/show_bug.cgi?id=28236

2. Add struct r_debug_extended to extend struct r_debug into a linked-list,
where each element correlates to an unique namespace.
3. Initialize the r_debug_extended structure.  Bump r_version to 2 for
the new namespace and add the new namespace to the namespace linked list.
4. Add _dl_debug_update to return the address of struct r_debug' of a
namespace.
5. Add a hidden symbol, _r_debug_extended, for struct r_debug_extended.
6. Provide the symbol, _r_debug, with size of struct r_debug, as an alias
of _r_debug_extended, for programs which reference _r_debug.

This fixes BZ #15971.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-09-19 13:51:35 -07:00
Sergey Bugaev
c484da9087 elf: Remove THREAD_GSCOPE_IN_TCB
All the ports now have THREAD_GSCOPE_IN_TCB set to 1. Remove all
support for !THREAD_GSCOPE_IN_TCB, along with the definition itself.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20210915171110.226187-4-bugaevc@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2021-09-16 01:04:20 +02:00
Sergey Bugaev
ed2f9aaf5e htl: Reimplement GSCOPE
This is a new implementation of GSCOPE which largely mirrors its NPTL
counterpart. Same as in NPTL, instead of a global flag shared between
threads, there is now a per-thread GSCOPE flag stored in each thread's
TCB. This makes entering and exiting a GSCOPE faster at the expense of
making THREAD_GSCOPE_WAIT () slower.

The largest win is the elimination of many redundant gsync_wake () RPC
calls; previously, even simplest programs would make dozens of fully
redundant gsync_wake () calls.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20210915171110.226187-3-bugaevc@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2021-09-16 01:04:17 +02:00
Sergey Bugaev
166bb3eac3 htl: Move thread table to ld.so
The next commit is going to introduce a new implementation of
THREAD_GSCOPE_WAIT which needs to access the list of threads.
Since it must be usable from the dynamic laoder, we have to move
the symbols for the list of threads into the loader.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20210915171110.226187-2-bugaevc@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2021-09-16 01:04:05 +02:00
Joseph Myers
4b6574a6f6 Redirect fma calls to __fma in libm
include/math.h has a mechanism to redirect internal calls to various
libm functions, that can often be inlined by the compiler, to call
non-exported __* names for those functions in the case when the calls
aren't inlined, with the redirection being disabled when
NO_MATH_REDIRECT.  Add fma to the functions to which this mechanism is
applied.

At present, libm-internal fma calls (generally to __builtin_fma*
functions) are only done when it's known the call will be inlined,
with alternative code not relying on an fma operation being used in
the caller otherwise.  This patch is in preparation for adding the TS
18661 / C2X narrowing fma functions to glibc; it will be natural for
the narrowing function implementations to call the underlying fma
functions unconditionally, with this either being inlined or resulting
in an __fma* call.  (Using two levels of round-to-odd computation like
that, in the case where there isn't an fma hardware instruction, isn't
optimal but is certainly a lot simpler for the initial implementation
than writing different narrowing fma implementations for all the
various pairs of formats.)

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by the patch (using
<https://sourceware.org/pipermail/libc-alpha/2021-September/130991.html>
to fix installed library stripping in build-many-glibcs.py).  Also
tested for x86_64.
2021-09-15 22:57:35 +00:00
Samuel Thibault
2444ce5421 mach lll_lock/unlock: Explicitly request private locking
0 was actually LLL_PRIVATE, so this does not actually change the code.
2021-09-15 01:36:08 +02:00
Sergey Bugaev
520a588705 elf: Replace most uses of THREAD_GSCOPE_IN_TCB
While originally this definition was indeed used to distinguish between
the cases where the GSCOPE flag was stored in TCB or not, it has since
become used as a general way to distinguish between HTL and NPTL.

THREAD_GSCOPE_IN_TCB will be removed in the following commits, as HTL,
which currently is the only port that does not put the flag into TCB,
will get ported to put the GSCOPE flag into the TCB as well. To prepare
for that change, migrate all code that wants to distinguish between HTL
and NPTL to use PTHREAD_IN_LIBC instead, which is a better choice since
the distinction mostly has to do with whether libc has access to the
list of thread structures and therefore can initialize thread-local
storage.

The parts of code that actually depend on whether the GSCOPE flag is in
TCB are left unchanged.

Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
Message-Id: <20210907133325.255690-2-bugaevc@gmail.com>
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2021-09-15 01:29:23 +02:00
Joseph Myers
3561106278 Add MADV_POPULATE_READ and MADV_POPULATE_WRITE from Linux 5.14 to bits/mman-linux.h
Linux 5.14 adds constants MADV_POPULATE_READ and MADV_POPULATE_WRITE
(with the same values on all architectures).  Add these to glibc's
bits/mman-linux.h.

Tested for x86_64.
2021-09-14 14:19:24 +00:00
Joseph Myers
4b39e34983 Update kernel version to 5.14 in tst-mman-consts.py
This patch updates the kernel version in the test tst-mman-consts.py
to 5.14.  (There are no new MAP_* constants covered by this test in
5.14 that need any other header changes.)

Tested with build-many-glibcs.py.
2021-09-14 13:51:58 +00:00
Florian Weimer
526c3cf11e nptl: Fix race between pthread_kill and thread exit (bug 12889)
A new thread exit lock and flag are introduced.  They are used to
detect that the thread is about to exit or has exited in
__pthread_kill_internal, and the signal is not sent in this case.

The test sysdeps/pthread/tst-pthread_cancel-select-loop.c is derived
from a downstream test originally written by Marek Polacek.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-09-13 11:06:08 +02:00
Florian Weimer
8af8456004 nptl: pthread_kill, pthread_cancel should not fail after exit (bug 19193)
This closes one remaining race condition related to bug 12889: if
the thread already exited on the kernel side, returning ESRCH
is not correct because that error is reserved for the thread IDs
(pthread_t values) whose lifetime has ended.  In case of a
kernel-side exit and a valid thread ID, no signal needs to be sent
and cancellation does not have an effect, so just return 0.

sysdeps/pthread/tst-kill4.c triggers undefined behavior and is
removed with this commit.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-09-13 11:06:08 +02:00
Joseph Myers
abd383584b Add narrowing square root functions
This patch adds the narrowing square root functions from TS 18661-1 /
TS 18661-3 / C2X to glibc's libm: fsqrt, fsqrtl, dsqrtl, f32sqrtf64,
f32sqrtf32x, f32xsqrtf64 for all configurations; f32sqrtf64x,
f32sqrtf128, f64sqrtf64x, f64sqrtf128, f32xsqrtf64x, f32xsqrtf128,
f64xsqrtf128 for configurations with _Float64x and _Float128;
__f32sqrtieee128 and __f64sqrtieee128 aliases in the powerpc64le case
(for calls to fsqrtl and dsqrtl when long double is IEEE binary128).
Corresponding tgmath.h macro support is also added.

The changes are mostly similar to those for the other narrowing
functions previously added, so the description of those generally
applies to this patch as well.  However, the not-actually-narrowing
cases (where the two types involved in the function have the same
floating-point format) are aliased to sqrt, sqrtl or sqrtf128 rather
than needing a separately built not-actually-narrowing function such
as was needed for add / sub / mul / div.  Thus, there is no
__nldbl_dsqrtl name for ldbl-opt because no such name was needed
(whereas the other functions needed such a name since the only other
name for that entry point was e.g. f32xaddf64, not reserved by TS
18661-1); the headers are made to arrange for sqrt to be called in
that case instead.

The DIAG_* calls in sysdeps/ieee754/soft-fp/s_dsqrtl.c are because
they were observed to be needed in GCC 7 testing of
riscv32-linux-gnu-rv32imac-ilp32.  The other sysdeps/ieee754/soft-fp/
files added didn't need such DIAG_* in any configuration I tested with
build-many-glibcs.py, but if they do turn out to be needed in more
files with some other configuration / GCC version, they can always be
added there.

I reused the same test inputs in auto-libm-test-in as for
non-narrowing sqrt rather than adding extra or separate inputs for
narrowing sqrt.  The tests in libm-test-narrow-sqrt.inc also follow
those for non-narrowing sqrt.

Tested as followed: natively with the full glibc testsuite for x86_64
(GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC
11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32
hard float, mips64 (all three ABIs, both hard and soft float).  The
different GCC versions are to cover the different cases in tgmath.h
and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in
glibc headers, GCC 7 has proper _Float* support, GCC 8 adds
__builtin_tgmath).
2021-09-10 20:56:22 +00:00
Joseph Myers
89dc0372a9 Update syscall lists for Linux 5.14
Linux 5.14 has two new syscalls, memfd_secret (on some architectures
only) and quotactl_fd.  Update syscall-names.list and regenerate the
arch-syscall.h headers with build-many-glibcs.py update-syscalls.

Tested with build-many-glibcs.py.
2021-09-08 12:42:06 +00:00
Jiaxun Yang
66016ec8ae MIPS: Setup errno for {f,l,}xstat
{f,l,}xstat stub for MIPS is using INTERNAL_SYSCALL
to do xstat syscall for glibc ver, However it leaves
errno untouched and thus giving bad errno output.

Setup errno properly when syscall returns non-zero.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-09-07 10:09:54 -03:00
John David Anglin
d8cf84ac7e Update hppa libm-test-ulps 2021-09-06 17:37:29 +00:00
Naohiro Tamura
1d9f99ce1b AArch64: Update A64FX memset not to degrade at 16KB
This patch updates unroll8 code so as not to degrade at the peak
performance 16KB for both FX1000 and FX700.

Inserted 2 instructions at the beginning of the unroll8 loop,
cmp and branch, are a workaround that is found heuristically.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2021-09-06 10:23:24 +01:00
Szabolcs Nagy
f873adf3df Revert "AArch64: Update A64FX memset not to degrade at 16KB"
Because of wrong commit author. Will recommit it with right author.

This reverts commit 23777232c2.
2021-09-06 10:23:25 +01:00
Siddhesh Poyarekar
30891f35fa Remove "Contributed by" lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date.  Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.

Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions.  These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.

The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively.  These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:

https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc
https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-09-03 22:06:44 +05:30
Naohiro Tamura via Libc-alpha
23777232c2 AArch64: Update A64FX memset not to degrade at 16KB
This patch updates unroll8 code so as not to degrade at the peak
performance 16KB for both FX1000 and FX700.

Inserted 2 instructions at the beginning of the unroll8 loop,
cmp and branch, are a workaround that is found heuristically.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2021-09-03 15:59:46 +01:00
Fangrui Song
224edada60 configure: Allow LD to be LLD 13.0.0 or above [BZ #26558]
When using LLD (LLVM linker) as the linker, configure prints a confusing
message.

    *** These critical programs are missing or too old: GNU ld

LLD>=13.0.0 can build glibc --enable-static-pie. (8.0.0 needs one
workaround for -Wl,-defsym=_begin=0. 9.0.0 works with
--disable-static-pie).

XFAIL two tests sysdeps/x86/tst-ifunc-isa-* which have the BZ #28154
issue (LLD follows the PowerPC port of GNU ld for ifunc by placing
IRELATIVE relocations in .rela.dyn, triggering a glibc ifunc fragility).

The set of dynamic symbols is the same with GNU ld and LLD,
modulo unused SHN_ABS version node symbols.

For comparison, gold does not support --enable-static-pie
yet (--no-dynamic-linker is unsupported BZ #22221), yet
has 6 failures more than LLD. gold linked libc.so has
larger .dynsym differences with GNU ld and LLD
(non-default version symbols are changed to default versions
by a version script BZ #28196).
2021-08-31 20:23:34 -07:00
Samuel Thibault
60dfb30976 hurd msync: Drop bogus test
MS_SYNC is actually 0, so we cannot test that both MS_SYNC and MS_ASYNC
are set.
2021-08-31 19:41:02 +02:00
Samuel Thibault
e2930d8777 hurd: Fix typo in msync
== has higher priority than &
2021-08-31 14:55:25 +02:00
H.J. Lu
3c8b9879ca x86-64: Use testl to check __x86_string_control
Use testl, instead of andl, to check __x86_string_control to avoid
updating __x86_string_control.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-08-30 10:39:53 -07:00
H.J. Lu
d4877540e5 i686: Don't include multiarch memove in libc.a
On i686, there is no multiarch memove in libc.a, don't include multiarch
memove in ifunc-impl-list.c in libc.a.
2021-08-30 05:57:49 -07:00
Adhemerval Zanella
6b20880b22 Use support_open_dev_null_range io/tst-closefrom, misc/tst-close_range, and posix/tst-spawn5 (BZ #28260)
It ensures a continuous range of file descriptor and avoid hitting
the RLIMIT_NOFILE.

Checked on x86_64-linux-gnu.
2021-08-26 17:13:47 -03:00
Fangrui Song
f9cd7d5d19 powerpc: Use --no-tls-get-addr-optimize in test only if the linker supports it
LLD doesn't support --{,no-}tls-get-addr-optimize.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-08-24 09:26:44 -07:00
H.J. Lu
528f9ff6bf x86-64: Remove assembler AVX512DQ check
The minimum GNU binutils requirement is 2.25 which supports AVX512DQ.
Remove assembler AVX512DQ check.
2021-08-24 07:05:35 -07:00
H.J. Lu
5359c3bc91 x86-64: Remove compiler -mavx512f check
The minimum GCC requirement is GCC 6.2 which supports -mavx512f.  Remove
compiler -mavx512f check.  Tested with GCC 6.4.1 on Linux/x86-64.
2021-08-24 07:05:35 -07:00
Samuel Thibault
c5e4c0dd0f hurd: Remove old test-err_np.c file
This is not referenced any more and includes a non-existing file.
2021-08-23 19:05:58 +02:00
H.J. Lu
78c9ec9000 x86-64: Optimize load of all bits set into ZMM register [BZ #28252]
Optimize loads of all bits set into ZMM register in AVX512 SVML codes
by replacing

	vpbroadcastq .L_2il0floatpacket.16(%rip), %zmmX

and

	vmovups   .L_2il0floatpacket.13(%rip), %zmmX

with
	vpternlogd $0xff, %zmmX, %zmmX, %zmmX

This fixes BZ #28252.
2021-08-22 06:23:37 -07:00
Matt Whitlock
0835c0f0ba x86: fix Autoconf caching of instruction support checks [BZ #27991]
The Autoconf documentation for the AC_CACHE_CHECK macro states:

  The commands-to-set-it must have no side effects except for setting
  the variable cache-id, see below.

However, the tests for support of -msahf and -mmovbe were embedded in
the commands-to-set-it for lib_cv_include_x86_isa_level. This had the
consequence that libc_cv_have_x86_lahf_sahf and libc_cv_have_x86_movbe
were not defined whenever lib_cv_include_x86_isa_level was read from
cache. These variables' being undefined meant that their unquoted use
in later test expressions led to the 'test' built-in's misparsing its
arguments and emitting errors like "test: =: unexpected operator" or
"test: =: unary operator expected", depending on the particular shell.

This commit refactors the tests for LAHF/SAHF and MOVBE instruction
support into their own AC_CACHE_CHECK macro invocations to obey the
rule that the commands-to-set-it must have no side effects other than
setting the variable named by cache-id.

Signed-off-by: Matt Whitlock <sourceware@mattwhitlock.name>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-08-19 09:11:35 -03:00
Fangrui Song
bca0f5cbc9 arm: Simplify elf_machine_{load_address,dynamic}
and drop reliance on _GLOBAL_OFFSET_TABLE_[0] being the link-time
address of _DYNAMIC. &__ehdr_start is a better way to get the load address.

This is similar to commits b37b75d269
(x86-64) and 43d06ed218 (aarch64).

Reviewed-by: Joseph Myers <joseph@codesourcery.com>
2021-08-18 11:13:03 -07:00
Fangrui Song
34b4624b04 riscv: Drop reliance on _GLOBAL_OFFSET_TABLE_[0]
&__ehdr_start is a better way to get the load address.

This is similar to commits b37b75d269
(x86-64) and 43d06ed218 (aarch64).

Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
2021-08-18 10:01:31 -07:00
Fangrui Song
710ba420fd Remove sysdeps/*/tls-macros.h
They provide TLS_GD/TLS_LD/TLS_IE/TLS_IE macros for TLS testing.  Now
that we have migrated to __thread and tls_model attributes, these macros
are unused and the tls-macros.h files can retire.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-08-18 09:15:20 -07:00
Fangrui Song
b37b75d269 x86_64: Simplify elf_machine_{load_address,dynamic}
and drop reliance on _GLOBAL_OFFSET_TABLE_[0] being the link-time
address of _DYNAMIC. &__ehdr_start is a better way to get the load address.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-08-17 10:45:57 -07:00
Fangrui Song
33c50ef428 elf: Drop elf/tls-macros.h in favor of __thread and tls_model attributes [BZ #28152] [BZ #28205]
elf/tls-macros.h was added for TLS testing when GCC did not support
__thread. __thread and tls_model attributes are mature now and have been
used by many newer tests.

Also delete tst-tls2.c which tests .tls_common (unused by modern GCC and
unsupported by Clang/LLD). .tls_common and .tbss definition are almost
identical after linking, so the runtime test doesn't add additional
coverage.  Assembler and linker tests should be on the binutils side.

When LLD 13.0.0 is allowed in configure.ac
(https://sourceware.org/pipermail/libc-alpha/2021-August/129866.html),
`make check` result is on par with glibc built with GNU ld on aarch64
and x86_64.

As a future clean-up, TLS_GD/TLS_LD/TLS_IE/TLS_IE macros can be removed from
sysdeps/*/tls-macros.h. We can add optional -mtls-dialect={gnu2,trad}
tests to ensure coverage.

Tested on aarch64-linux-gnu, powerpc64le-linux-gnu, and x86_64-linux-gnu.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-08-16 09:59:30 -07:00
Samuel Thibault
cbb2aa337b hurd: Drop fmh kludge
Gnumach's 0650a4ee30e3 implements support for high bits being set in the
mask parameter of vm_map. This allows to remove the fmh kludge that was
masking away the address range by mapping a dumb area there.
2021-08-16 11:20:38 +02:00
Xi Ruoyao
0f62fe0532 mips: increase stack alignment in clone to match the ABI
In "mips: align stack in clone [BZ #28223]"
(commit 1f51cd9a86) I made a mistake: I
misbelieved one "word" was 2-byte and "doubleword" should be 4-byte.
But in MIPS ABI one "word" is defined 32-bit (4-byte), so "doubleword" is
8-byte [1], and "quadword" is 16-byte [2].

[1]: "System V Application Binary Interface: MIPS(R) RISC Processor
      Supplement, 3rd edition", page 3-31
[2]: "MIPSpro(TM) 64-Bit Porting and Transition Guide", page 23
2021-08-13 16:01:14 +00:00
Xi Ruoyao
1f51cd9a86 mips: align stack in clone [BZ #28223]
The MIPS O32 ABI requires 4 byte aligned stack, and the MIPS N64 and N32
ABI require 8 byte aligned stack.  Previously if the caller passed an
unaligned stack to clone the the child misbehaved.

Fixes bug 28223.
2021-08-12 20:31:59 +00:00
Sergey Bugaev
5a5358b749 hurd mmap: Reduce the requested max vmprot
When the memory object is read-only, the kernel would be right in
refusing max vmprot containing VM_PROT_WRITE.

Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2021-08-11 18:39:51 +02:00
Sergey Bugaev
08fc6df294 hurd mmap: Factorize MAP_SHARED flag check
Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
2021-08-11 18:39:51 +02:00
Fangrui Song
43d06ed218 aarch64: Make elf_machine_{load_address,dynamic} robust [BZ #28203]
The AArch64 ABI is largely platform agnostic and does not specify
_GLOBAL_OFFSET_TABLE_[0] ([1]). glibc ld.so turns out to be probably the
only user of _GLOBAL_OFFSET_TABLE_[0] and GNU ld defines the value
to the link-time address _DYNAMIC. [2]

In 2012, __ehdr_start was implemented in GNU ld and gold in binutils
2.23.  Using adrp+add / (-mcmodel=tiny) adr to access
__ehdr_start/_DYNAMIC gives us a robust way to get the load address and
the link-time address of _DYNAMIC.

[1]: From a psABI maintainer, https://bugs.llvm.org/show_bug.cgi?id=49672#c2
[2]: LLD's aarch64 port does not set _GLOBAL_OFFSET_TABLE_[0] to the
link-time address _DYNAMIC.
LLD is widely used on aarch64 Android and ChromeOS devices.  Software
just works without the need for _GLOBAL_OFFSET_TABLE_[0].

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-08-11 09:00:38 -07:00
Wilco Dijkstra
a5db6a5cae [5/5] AArch64: Improve A64FX memset medium loops
Simplify the code for memsets smaller than L1. Improve the unroll8 and
L1_prefetch loops.

Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-08-10 13:46:20 +01:00
Wilco Dijkstra
e69d9981f8 [4/5] AArch64: Improve A64FX memset by removing unroll32
Remove unroll32 code since it doesn't improve performance.

Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-08-10 13:44:27 +01:00
Wilco Dijkstra
186092c6ba [3/5] AArch64: Improve A64FX memset for remaining bytes
Simplify handling of remaining bytes. Avoid lots of taken branches and complex
whilelo computations, instead unconditionally write vectors from the end.

Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-08-10 13:42:07 +01:00
Wilco Dijkstra
9bc2ed8f46 [2/5] AArch64: Improve A64FX memset for large sizes
Improve performance of large memsets. Simplify alignment code. For zero memset
use DC ZVA, which almost doubles performance. For non-zero memsets use the
unroll8 loop which is about 10% faster.

Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-08-10 13:39:37 +01:00
Wilco Dijkstra
07b427296b [1/5] AArch64: Improve A64FX memset for small sizes
Improve performance of small memsets by reducing instruction counts and
improving code alignment. Bench-memset shows 35-45% performance gain for
small sizes.

Reviewed-by: Naohiro Tamura <naohirot@fujitsu.com>
2021-08-10 13:30:27 +01:00
Joseph Myers
98149b16d6 Add PTRACE_GET_RSEQ_CONFIGURATION from Linux 5.13 to sys/ptrace.h
Linux 5.13 adds a PTRACE_GET_RSEQ_CONFIGURATION constant, with an
associated ptrace_rseq_configuration structure.

Add this constant to the various sys/ptrace.h headers in glibc, with
the structure in bits/ptrace-shared.h (named struct
__ptrace_rseq_configuration in glibc, as with other such structures).

Tested for x86_64, and with build-many-glibcs.py.
2021-08-09 16:51:38 +00:00
Nikita Popov
b805aebd42 librt: fix NULL pointer dereference (bug 28213)
Helper thread frees copied attribute on NOTIFY_REMOVED message
received from the OS kernel.  Unfortunately, it fails to check whether
copied attribute actually exists (data.attr != NULL).  This worked
earlier because free() checks passed pointer before actually
attempting to release corresponding memory.  But
__pthread_attr_destroy assumes pointer is not NULL.

So passing NULL pointer to __pthread_attr_destroy will result in
segmentation fault.  This scenario is possible if
notification->sigev_notify_attributes == NULL (which means default
thread attributes should be used).

Signed-off-by: Nikita Popov <npv1310@gmail.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-08-09 20:17:34 +05:30
Anton Blanchard
60b4dd2579 powerpc64: Add checks for Altivec and VSX in ifunc selection
We'd like to support processors without Altivec or VSX, so check
the relevant hwcap bits before selecting them.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-08-06 16:10:08 -03:00
Anton Blanchard
f2a15dd668 powerpc64: Check cacheline size before using optimised memset routines
A number of optimised memset routines assume the cacheline size is 128B,
so we better check before using them.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-08-06 16:09:59 -03:00
Anton Blanchard
e4ca6de1bc powerpc64: Replace some PPC_FEATURE_HAS_VSX with PPC_FEATURE_ARCH_2_06
We use PPC_FEATURE_HAS_VSX to select a number of POWER7 optimised
functions. These functions don't use any VSX instructions, so
PPC_FEATURE_ARCH_2_06 seems like a better fit.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-08-06 16:09:52 -03:00
Florian Weimer
c87fcacc50 Linux: Fix fcntl, ioctl, prctl redirects for _TIME_BITS=64 (bug 28182)
__REDIRECT and __THROW are not compatible with C++ due to the ordering of the
__asm__ alias and the throw specifier. __REDIRECT_NTH has to be used
instead.

Fixes commit 8a40aff86b ("io: Add time64 alias
for fcntl"), commit 82c395d91e ("misc: Add
time64 alias for ioctl"), commit b39ffab860
("Linux: Add time64 alias for prctl").

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-08-06 09:52:00 +02:00
Adhemerval Zanella
c52eb066bc Update sparc libm-test-ulps 2021-08-04 17:38:38 -03:00
Adhemerval Zanella
5b86241a03 linux: Add sparck brk implementation
It turned that the generic implementation of brk() does not work
for sparc, since on failure kernel will just return the previous
input value without setting the conditional register.

This patches adds back a sparc32 and sparc64 implementation removed
by 720480934a.

Checked on sparc64-linux-gnu and sparcv9-linux-gnu.
2021-08-04 17:38:30 -03:00
Siddhesh Poyarekar
b17e842a60 gethosts: Remove unused argument _type
The generated code is unchanged.
2021-08-04 02:23:43 +05:30
Siddhesh Poyarekar
77a34079d8 gaiconf_init: Avoid double-free in label and precedence lists
labellist and precedencelist could get freed a second time if there
are allocation failures, so set them to NULL to avoid a double-free.

Reviewed-by: Arjun Shankar <arjun@redhat.com>
2021-08-03 21:11:03 +05:30
H.J. Lu
91cc803d27 x86-64: Add Avoid_Short_Distance_REP_MOVSB
commit 3ec5d83d2a
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Jan 25 14:19:40 2020 -0800

    x86-64: Avoid rep movsb with short distance [BZ #27130]

introduced some regressions on Intel processors without Fast Short REP
MOV (FSRM).  Add Avoid_Short_Distance_REP_MOVSB to avoid rep movsb with
short distance only on Intel processors with FSRM.  bench-memmove-large
on Skylake server shows that cycles of __memmove_evex_unaligned_erms
improves for the following data size:

                                  before    after    Improvement
length=4127, align1=3, align2=0:  479.38    349.25      27%
length=4223, align1=9, align2=5:  405.62    333.25      18%
length=8223, align1=3, align2=0:  786.12    496.38      37%
length=8319, align1=9, align2=5:  727.50    501.38      31%
length=16415, align1=3, align2=0: 1436.88   840.00      41%
length=16511, align1=9, align2=5: 1375.50   836.38      39%
length=32799, align1=3, align2=0: 2890.00   1860.12     36%
length=32895, align1=9, align2=5: 2891.38   1931.88     33%
2021-07-28 13:23:57 -07:00
H.J. Lu
c25c32165d Typo: Rename HAVE_CLONE3_WAPPER to HAVE_CLONE3_WRAPPER 2021-07-28 10:19:08 -07:00
Samuel Thibault
de2f68c3c7 hurd: _Fork: unlock malloc before calling fork child hooks
The setitimer fork hook, fork_itimer, needs to call malloc inside
__mach_setup_tls, so we need to unlock malloc before calling it.
2021-07-27 02:03:01 +02:00
Arjun Shankar
e785361ce3 i386: Regenerate ulps
These failures were caught while building glibc master for Fedora Rawhide
which is built with `-mtune=generic -msse2 -mfpmath=sse'.
2021-07-25 22:29:27 +02:00
H.J. Lu
7c124e3714 x86: Install <bits/platform/x86.h> [BZ #27958]
1. Install <bits/platform/x86.h> for <sys/platform/x86.h> which includes
<bits/platform/x86.h>.
2. Rename HAS_CPU_FEATURE to CPU_FEATURE_PRESENT which checks if the
processor has the feature.
3. Rename CPU_FEATURE_USABLE to CPU_FEATURE_ACTIVE which checks if the
feature is active.  There may be other preconditions, like sufficient
stack space or further setup for AMX, which must be satisfied before the
feature can be used.

This fixes BZ #27958.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-23 05:12:51 -07:00
Siddhesh Poyarekar
5b8d271571 Fix build and tests with --disable-tunables
Remove unused code and declare __libc_mallopt when !IS_IN (libc) to
allow the debug hook to build with --disable-tunables.

Also, run tst-ifunc-isa-2* tests only when tunables are enabled since
the result depends on it.

Tested on x86_64.

Reported-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-23 13:57:56 +05:30
Samuel Thibault
9a7ab0769b hurd: Fix glob lstat compatibility
84f7ce8447 ("posix: Add glob64 with 64-bit time_t support") replaced
GLOB_NO_LSTAT with defining GLOB_LSTAT and GLOB_LSTAT64, but the posix
and gnu versions of the change were missing in the commit.
2021-07-22 20:31:52 +02:00
Florian Weimer
f032ac3b83 socket: Add time64 alias for setsockopt
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-22 19:16:26 +02:00
Florian Weimer
02c17c8c14 socket: Add time64 alias for getsockopt
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-22 19:16:25 +02:00
Siddhesh Poyarekar
0552fd2c7d Move malloc_{g,s}et_state to libc_malloc_debug
These deprecated functions are only safe to call from
__malloc_initialize_hook and as a result, are not useful in the
general case.  Move the implementations to libc_malloc_debug so that
existing binaries that need it will now have to preload the debug DSO
to work correctly.

This also allows simplification of the core malloc implementation by
dropping all the undumping support code that was added to make
malloc_set_state work.

One known breakage is that of ancient emacs binaries that depend on
this.  They will now crash when running with this libc.  With
LD_BIND_NOW=1, it will terminate immediately because of not being able
to find malloc_set_state but with lazy binding it will crash in
unpredictable ways.  It will need a preloaded libc_malloc_debug.so so
that its initialization hook is executed to allow its malloc
implementation to work properly.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-22 18:38:10 +05:30
Siddhesh Poyarekar
b5bd5bfe88 glibc.malloc.check: Wean away from malloc hooks
The malloc-check debugging feature is tightly integrated into glibc
malloc, so thanks to an idea from Florian Weimer, much of the malloc
implementation has been moved into libc_malloc_debug.so to support
malloc-check.  Due to this, glibc malloc and malloc-check can no
longer work together; they use altogether different (but identical)
structures for heap management.  This should not make a difference
though since the malloc check hook is not disabled anywhere.
malloc_set_state does, but it does so early enough that it shouldn't
cause any problems.

The malloc check tunable is now in the debug DSO and has no effect
when the DSO is not preloaded.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-22 18:38:08 +05:30
Siddhesh Poyarekar
9dad716d4d mtrace: Wean away from malloc hooks
Wean mtrace away from the malloc hooks and move them into the debug
DSO.  Split the API away from the implementation so that we can add
the API to libc.so as well as libc_malloc_debug.so, with the libc
implementations being empty.

Update localplt data since memalign no longer has any callers after
this change.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-22 18:38:06 +05:30
Siddhesh Poyarekar
c142eb253f mcheck: Wean away from malloc hooks [BZ #23489]
Split the mcheck implementation into the debugging hooks and API so
that the API can be replicated in libc and libc_malloc_debug.so.  The
libc APIs always result in failure.

The mcheck implementation has also been moved entirely into
libc_malloc_debug.so and with it, all of the hook initialization code
can now be moved into the debug library.  Now the initialization can
be done independently of libc internals.

With this patch, libc_malloc_debug.so can no longer be used with older
libcs, which is not its goal anyway.  tst-vfork3 breaks due to this
since it spawns shell scripts, which in turn execute using the system
glibc.  Move the test to tests-container so that only the built glibc
is used.

This move also fixes bugs in the mcheck version of memalign and
realloc, thus allowing removal of the tests from tests-mcheck
exclusion list.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-22 18:38:02 +05:30
Siddhesh Poyarekar
2d2d9f2b48 Move malloc hooks into a compat DSO
Remove all malloc hook uses from core malloc functions and move it
into a new library libc_malloc_debug.so.  With this, the hooks now no
longer have any effect on the core library.

libc_malloc_debug.so is a malloc interposer that needs to be preloaded
to get hooks functionality back so that the debugging features that
depend on the hooks, i.e. malloc-check, mcheck and mtrace work again.
Without the preloaded DSO these debugging features will be nops.
These features will be ported away from hooks in subsequent patches.

Similarly, legacy applications that need hooks functionality need to
preload libc_malloc_debug.so.

The symbols exported by libc_malloc_debug.so are maintained at exactly
the same version as libc.so.

Finally, static binaries will no longer be able to use malloc
debugging features since they cannot preload the debugging DSO.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-22 18:37:59 +05:30
Samuel Thibault
094ed6b0cc posix: Add sysconf(_SC_{MIN,}SIGSTKSZ) support 2021-07-22 01:24:52 +02:00
Vineet Gupta
8eb4f2e404 ARC: elf: make type safe
Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2021-07-21 13:13:45 -07:00
Vineet Gupta
31aefa93f3 ARC: fp: (micro)optimize FPU_STATUS read by eliding FWE bit clearing
Any FPU_STATUS write needs setting the FWE bit (31) whcih just provides
a "control signal" to enable explicit write (vs. the side-effect of FPU
instructions).  However this bit is RAZ and write-only, thus effectively
never stored in FPU_STATUS register. Thus when reading the register
there is no need to clear it. This shaves off a BCLR instruction from
the fe*exceptino family of functions and while no big deal still makes
sense to do.

This came up when debugging a race in math/test-fenv-tls [1]

[1]: https://github.com/foss-for-synopsys-dwc-arc-processors/linux/issues/54

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2021-07-21 13:13:44 -07:00
Florian Weimer
77ede5f010 socket: Add time64 alias for sendmsg
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-21 11:58:16 +02:00
Florian Weimer
0a921c52b3 socket: Add time64 alias for recvmsg
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-21 11:58:16 +02:00
Florian Weimer
8b2c706a9d socket: Add time64 alias for sendmmsg
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-21 11:58:16 +02:00
Florian Weimer
b39ffab860 Linux: Add time64 alias for prctl
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-21 11:58:16 +02:00
Florian Weimer
8a40aff86b io: Add time64 alias for fcntl
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-21 11:58:16 +02:00
Florian Weimer
82c395d91e misc: Add time64 alias for ioctl
Reviewed-by: Lukasz Majewski <lukma@denx.de>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-21 11:58:09 +02:00
Darius Rad
39e8eb5973 RISC-V: Update rv64 ULPs
Generated on a Microsemi Polarfire Icicle Kit running Linux version 5.6.18.
Same ULPs were also produced on QEMU 5.2.0 running Linux 5.10.46-1.
2021-07-21 08:44:09 +02:00
Samuel Thibault
ff417d4017 hurd: Add support for spawn_do_closefrom 2021-07-19 23:47:18 +02:00
Adhemerval Zanella
469761eac8 elf: Fix tst-cpu-features-cpuinfo on some AMD systems (BZ #28090)
The SSBD feature is implemented in 2 different ways on AMD processors:
newer systems (Zen3) provides AMD_SSBD (function 8000_0008, EBX[24]),
while older system provides AMD_VIRT_SSBD (function 8000_0008, EBX[25]).
However for AMD_VIRT_SSBD, kernel shows both 'ssdb' and 'virt_ssdb' on
/proc/cpuinfo; while for AMD_SSBD only 'ssdb' is provided.

This now check is AMD_SSBD is set to check for 'ssbd', otherwise check
if AMD_VIRT_SSDB is set to check for 'virt_ssbd'.

Checked on x86_64-linux-gnu on a Ryzen 9 5900x.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-07-19 14:12:29 -03:00
H.J. Lu
5adb0e14a5 i386: Add the clone3 wrapper
extern int clone3 (struct clone_args *__cl_args, size_t __size,
		   int (*__func) (void *__arg), void *__arg);

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:36:19 -07:00
Florian Weimer
ea9878ec27 resolv: Move res_query functions into libc
This switches to public symbols without __ prefixes, due to improved
namespace management in glibc.

The script was used with --no-new-version to move the symbols
__res_nquery, __res_nquerydomain, __res_nsearch, __res_query,
__res_querydomain, __res_search, res_query, res_querydomain,
res_search.  The public symbols res_nquery, res_nquerydomain,
res_nsearch, res_ownok, res_query, res_querydomain, res_search
were added with make update-all-abi.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:57 +02:00
Florian Weimer
21a497cc58 resolv: Move res_mkquery, res_nmkquery into libc
This switches to public symbols without __ prefixes, due to improved
namespace management in glibc.

The symbols res_mkquery, __res_mkquery, __res_nmkquery were
moved with the script (using --no-new-version).
res_mkquery@@GLIBC_2.34, res_nmkquery@@GLIBC_2.34 were added using
make update-all-abi.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:57 +02:00
Florian Weimer
b165c65c35 resolv: Move res_send, res_nsend into libc
Switch to public symbols without __ prefix (due to improved
namespace management).

__res_send, __res_nsend were moved using the script (with
--no-new-version).  res_send@@GLIBC_2.34 and res_nsend@@GLIBC_2.34
were added using make update-all-abi.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:45 +02:00
Florian Weimer
2fbe5860d3 resolv: Rename res_comp.c to res-name-checking.c and move into libc
This reflects what the remaining functions in the file do.

The __res_dnok, __res_hnok, __res_mailok, __res_ownok were moved
with the script, using --no-new-version, and turned into compat
symbols.  __libc_res_dnok@@GLIBC_PRIVATE and
__libc_res_hnok@@GLIBC_PRIVATE are added for internal use, to avoid
accidentally binding to compatibility symbols.  The new public
symbols res_dnok, res_hnok, res_mailok, res_ownok were added using
make update-all-abi.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:21 +02:00
Florian Weimer
391e02236b resolv: Move dn_skipname to its own file and into libc
And reformat it to GNU style.

dn_skipname is used outside glibc, so do not deprecate it,
and export it as dn_skipname (not __dn_skipname).  Due to internal
users, provide a __libc_dn_skipname alias, and keep __dn_skipname
as a pure compatibility symbol.

__dn_skipname@GLIBC_2.0 was moved using the script, and
dn_skipname@@GLIBC_2.34 was added using make update-all-abi.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:21 +02:00
Florian Weimer
fd8a87c0c1 resolv: Move dn_comp to its own file and into libc
And reformat it to GNU style.

dn_comp is used in various programs, so keep it as a non-deprecated
symbol.  Switch to dn_comp (not __dn_comp) for the ABI name.  There
are no internal users, so interposition is not a problem.

The __dn_comp symbol was moved with scripts/move-symbol-to-libc.py
--no-new-version.  dn_comp@@GLIBC_2.34 was added with
make update-all-abi.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:21 +02:00
Florian Weimer
640bbdf71c resolv: Move dn_expand to its own file and into libc
And reformat to GNU style.

This switches back to the dn_expand name for the ABI symbol and turns
__dn_expand into a compatibility symbol.  With the improved namespace
management in current glibc, it is no longer necessary to use a
private namespace symbol.  To avoid old code binding to a
GLIBC_PRIVATE symbol by accident, use __libc_dn_expand for the
internal symbol name.

The symbols dn_expand, __dnexpand were moved using
scripts/move-symbol-to-libc.py, followed by an adjustment to make
dn_expand the only GLIBC_2.34 symbol.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:21 +02:00
Florian Weimer
13e1f86706 resolv: Move ns_name_compress into its own file and into libc
And reformat to GNU style.

The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:21 +02:00
Florian Weimer
7ed1ac6da3 resolv: Move ns_name_pack into its own file and into libc
And reformat to GNU style, and eliminate the labellen function.

The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:21 +02:00
Florian Weimer
276e9822b3 resolv: Move ns_name_pton into its own file and into libc
And reformat to GNU style, and eliminate the digits variable.

The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:21 +02:00
Florian Weimer
4e1d3db1e8 resolv: Move ns_name_uncompress into its own file and into libc
And reformat to GNU style.  Check for negative error returns
(instead of -1).

The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-19 07:56:21 +02:00
Florian Weimer
cff2c78c51 resolv: Move ns_name_skip to its own file and into libc (bug 28091)
And reformat to GNU style.  Avoid out-of-bounds pointer arithmetic.
This also results in a fix of bug 28091 due to the additional packet
length checks.

The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Carlos O'Donell <carlos@systemhalted.org>
2021-07-19 07:56:13 +02:00
Samuel Thibault
0b217e5969 htl: Do not expose pthread hidden proto outside libpthread
Only libpthread.so can access them.
2021-07-18 20:25:33 +00:00
Florian Weimer
820bb23ff0 resolv: Move ns_name_unpack to its own file and into libc
Reformat to GNU style. Avoid out-of-bounds buffer arithmetic.
Eliminate the labellen function.

The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-15 09:00:27 +02:00
Florian Weimer
adcc572a29 resolv: Move ns_name_ntop to its own file and into libc
Reformat to GNU style.  Avoid out-of-bounds pointer arithmetic
(e.g., use eom - dn < 2 instead of dn + 1 >= eom).  Inline the
labellen function and fold the compression pointer check into
the length check (l >= 64).  Assume ASCII encoding.

The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-15 08:39:31 +02:00
Florian Weimer
b8f889064d socket: Add hidden prototype for setsockopt
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
2021-07-15 08:35:45 +02:00
Adhemerval Zanella
ba33937be2 elf: Fix DTV gap reuse logic (BZ #27135)
This is updated version of the 572bd547d5 (reverted by 40ebfd016a)
that fixes the _dl_next_tls_modid issues.

This issue with 572bd547d5 patch is the DTV entry will be only
update on dl_open_worker() with the update_tls_slotinfo() call after
all dependencies are being processed by _dl_map_object_deps().  However
_dl_map_object_deps() itself might call _dl_next_tls_modid(), and since
the _dl_tls_dtv_slotinfo_list::map is not yet set the entry will be
wrongly reused.

This patch fixes by renaming the _dl_next_tls_modid() function to
_dl_assign_tls_modid() and by passing the link_map so it can set
the slotinfo value so a subsequente _dl_next_tls_modid() call will
see the entry as allocated.

The intermediary value is cleared up on remove_slotinfo() for the case
a library fails to load with RTLD_NOW.

This patch fixes BZ #27135.

Checked on x86_64-linux-gnu.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-07-14 15:10:27 -03:00
H.J. Lu
84d40d702f Add static tests for __clone_internal
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-14 06:55:04 -07:00
H.J. Lu
24c78e2c75 x86-64: Add the clone3 wrapper
extern int clone3 (struct clone_args *__cl_args, size_t __size,
		   int (*__func) (void *__arg), void *__arg);

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-07-14 06:34:13 -07:00
H.J. Lu
d8ea0d0168 Add an internal wrapper for clone, clone2 and clone3
The clone3 system call (since Linux 5.3) provides a superset of the
functionality of clone and clone2.  It also provides a number of API
improvements, including the ability to specify the size of the child's
stack area which can be used by kernel to compute the shadow stack size
when allocating the shadow stack.  Add:

extern int __clone_internal (struct clone_args *__cl_args,
			     int (*__func) (void *__arg), void *__arg);

to provide an abstract interface for clone, clone2 and clone3.

1. Simplify stack management for thread creation by passing both stack
base and size to create_thread.
2. Consolidate clone vs clone2 differences into a single file.
3. Call __clone3 if HAVE_CLONE3_WAPPER is defined.  If __clone3 returns
-1 with ENOSYS, fall back to clone or clone2.
4. Use only __clone_internal to clone a thread.  Since the stack size
argument for create_thread is now unconditional, always pass stack size
to create_thread.
5. Enable the public clone3 wrapper in the future after it has been
added to all targets.

NB: Sandbox will return ENOSYS on clone3 in both Chromium:

The following revision refers to this bug:
  218438259d

commit 218438259dd795456f0a48f67cbe5b4e520db88b
Author: Matthew Denton <mpdenton@chromium.org>
Date: Thu Jun 03 20:06:13 2021

Linux sandbox: return ENOSYS for clone3

Because clone3 uses a pointer argument rather than a flags argument, we
cannot examine the contents with seccomp, which is essential to
preventing sandboxed processes from starting other processes. So, we
won't be able to support clone3 in Chromium. This CL modifies the
BPF policy to return ENOSYS for clone3 so glibc always uses the fallback
to clone.

Bug: 1213452
Change-Id: I7c7c585a319e0264eac5b1ebee1a45be2d782303
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/2936184
Reviewed-by: Robert Sesek <rsesek@chromium.org>
Commit-Queue: Matthew Denton <mpdenton@chromium.org>
Cr-Commit-Position: refs/heads/master@{#888980}

[modify] https://crrev.com/218438259dd795456f0a48f67cbe5b4e520db88b/sandbox/linux/seccomp-bpf-helpers/baseline_policy.cc

and Firefox:

https://hg.mozilla.org/integration/autoland/rev/ecb4011a0c76

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-07-14 06:33:58 -07:00
Samuel Thibault
d7fe71d842 htl: Fix linking static examples against libpthread
libpthread.a uses some mach and hurd RPCs so we need to link them in.
2021-07-13 23:49:53 +02:00
Samuel Thibault
c27bcc9588 htl: Let libc call __pthread_mutex_{,try,un}lock
Now that NPTL was moved to libc, libc makes internal __pthread calls, so
htl has to expose them internally.
2021-07-13 23:36:58 +02:00
H.J. Lu
84ea6ea24b mcheck: Align struct hdr to MALLOC_ALIGNMENT bytes [BZ #28068]
1. Align struct hdr to MALLOC_ALIGNMENT bytes so that malloc hooks in
libmcheck align memory to MALLOC_ALIGNMENT bytes.
2. Remove tst-mallocalign1 from tests-exclude-mcheck for i386 and x32.
3. Add tst-pvalloc-fortify and tst-reallocarray to tests-exclude-mcheck
since they use malloc_usable_size (see BZ #22057).

This fixed BZ #28068.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-07-12 18:13:32 -07:00
Adhemerval Zanella
72e84d1db2 Linux: Use 32-bit vDSO for clock_gettime, gettimeofday, time (BZ# 28071)
The previous approach defeats the vDSO optimization on older kernels
because a failing clock_gettime64 system call is performed on every
function call.  It also results in a clobbered errno value, exposing
an OpenJDK bug (JDK-8270244).

This patch fixes by open-code INLINE_VSYSCALL macro and replace all
INLINE_SYSCALL_CALL with INTERNAL_SYSCALL_CALLS.  Now for
__clock_gettime64x, the 64-bit vDSO is used and the 32-bit vDSO is
tried before falling back to 64-bit syscalls.

The previous code preferred 64-bit syscall for the case where the kernel
provides 64-bit time_t syscalls *and* also a 32-bit vDSO (in this case
the *64-bit* syscall should be preferable over the vDSO).  All
architectures that provides 32-bit vDSO (i386, mips, powerpc, s390)
modulo sparc; but I am not sure if some kernels versions do provide
only 32-bit vDSO while still providing 64-bit time_t syscall.
Regardless, for such cases the 64-bit time_t syscall is used if the
vDSO returns overflowed 32-bit time_t.

Tested on i686-linux-gnu (with a time64 and non-time64 kernel),
x86_64-linux-gnu.  Built with build-many-glibcs.py.

Co-authored-by: Florian Weimer <fweimer@redhat.com>
2021-07-12 17:37:56 -03:00
Florian Weimer
aaacde11f2 Reduce <limits.h> pollution due to dynamic PTHREAD_STACK_MIN
<limits.h> used to be a header file with no declarations.
GCC's libgomp includes it in a #pragma GCC visibility hidden block.
Including <unistd.h> from <limits.h> (indirectly) declares everything
in <unistd.h> with hidden visibility, resulting in linker failures.

This commit avoids C declarations in assembler mode and only declares
__sysconf in <limits.h> (and not the entire contents of <unistd.h>).
The __sysconf symbol is already part of the ABI.  PTHREAD_STACK_MIN
is no longer defined for __USE_DYNAMIC_STACK_SIZE && __ASSEMBLER__
because there is no possible definition.

Additionally, PTHREAD_STACK_MIN is now defined by <pthread.h> for
__USE_MISC because this is what developers expect based on the macro
name.  It also helps to avoid libgomp linker failures in GCC because
libgomp includes <pthread.h> before its visibility hacks.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-12 18:43:32 +02:00
Samuel Thibault
83b7008e11 hurd _Fork: Drop duplicate malloc_fork_lock calls
This was put in __libc_fork by c32c868ab8 ("posix: Add _Fork [BZ #4737]")
so we need to avoid locking them again in _Fork called by __libc_lock, otherwise
we deadlock.
2021-07-11 17:52:52 +00:00
H.J. Lu
5d98a7dae9 Define PTHREAD_STACK_MIN to sysconf(_SC_THREAD_STACK_MIN)
The constant PTHREAD_STACK_MIN may be too small for some processors.
Rename _SC_SIGSTKSZ_SOURCE to _DYNAMIC_STACK_SIZE_SOURCE.  When
_DYNAMIC_STACK_SIZE_SOURCE or _GNU_SOURCE are defined, define
PTHREAD_STACK_MIN to sysconf(_SC_THREAD_STACK_MIN) which is changed
to MIN (PTHREAD_STACK_MIN, sysconf(_SC_MINSIGSTKSZ)).

Consolidate <bits/local_lim.h> with <bits/pthread_stack_min.h> to
provide a constant target specific PTHREAD_STACK_MIN value.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-09 15:10:35 -07:00
Florian Weimer
7c241325d6 Force building with -fno-common
As a result, is not necessary to specify __attribute__ ((nocommon))
on individual definitions.

GCC 10 defaults to -fno-common on all architectures except ARC,
but this change is compatible with older GCC versions and ARC, too.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-09 20:09:14 +02:00
H.J. Lu
dc76a059fd Add a generic malloc test for MALLOC_ALIGNMENT
1. Add sysdeps/generic/malloc-size.h to define size related macros for
malloc.
2. Move x86_64/tst-mallocalign1.c to malloc and replace ALIGN_MASK with
MALLOC_ALIGN_MASK.
3. Add tst-mallocalign1 to tests-exclude-mcheck for i386 and x32 since
mcheck doesn't honor MALLOC_ALIGNMENT.
2021-07-09 06:39:30 -07:00
Florian Weimer
508ee037a3 nptl: Use out-of-line wake function in __libc_lock_unlock slow path
This slightly reduces code size, as can be seen below.
__libc_lock_unlock is usually used along with __libc_lock_lock in
the same function.  __libc_lock_lock already has an out-of-line
slow path, so this change should not introduce many additional
non-leaf functions.

This change also fixes a link failure in 32-bit Arm thumb mode
because commit 1f9c804fbd
("nptl: Use internal low-level lock type for !IS_IN (libc)")
introduced __libc_do_syscall calls outside of libc.

Before x86-64:

   text	   data	    bss	    dec	    hex	filename
1937748	  20456	  54896	2013100	 1eb7ac	libc.so.6
  25601	    856	  12768	  39225	   9939	nss/libnss_db.so.2
  40310	    952	  25144	  66406	  10366	nss/libnss_files.so.2

After x86-64:
   text	   data	    bss	    dec	    hex	filename
1935312	  20456	  54896	2010664	 1eae28	libc.so.6
  25559	    864	  12768	  39191	   9917	nss/libnss_db.so.2
  39764	    960	  25144	  65868	  1014c	nss/libnss_files.so.2

Before i686:

2110961	  11272	  39144	2161377	 20fae1	libc.so.6
  27243	    428	  12652	  40323	   9d83	nss/libnss_db.so.2
  43062	    476	  25028	  68566	  10bd6	nss/libnss_files.so.2

After i686:

2107347	  11272	  39144	2157763	 20ecc3	libc.so.6
  26929	    432	  12652	  40013	   9c4d	nss/libnss_db.so.2
  43132	    480	  25028	  68640	  10c20	nss/libnss_files.so.2

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-09 10:59:22 +02:00
Anton Blanchard
01d7806282 powerpc64le: Fix typo in configure
The configure script checks for -mlong-double-128 but mentions -mlongdouble
when it fails.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-08 21:59:28 -03:00
Tulio Magno Quites Machado Filho
20f0491c67 powerpc64: Remove strcspn ifunc from the loader
5 years ago, commit 8f1b841e45
unintentionally added an ifunc to the loader.
That modification has not caused any harm so far, but it doesn't add any
value either, because the hwcap information is available later during
libc initialization.

Suggested-by: Anton Blanchard <anton@ozlabs.org>
2021-07-08 21:59:28 -03:00
Noah Goldstein
0679442def x86: Remove wcsnlen-sse4_1 from wcslen ifunc-impl-list [BZ #28064]
The following commit

commit 6f573a27b6
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Wed Jun 23 01:19:34 2021 -0400

    x86-64: Add wcslen optimize for sse4.1

Added wcsnlen-sse4.1 to the wcslen ifunc implementation list and did
not add wcslen-sse4.1 to wcslen ifunc implementation list. This commit
fixes that by removing wcsnlen-sse4.1 from the wcslen ifunc
implementation list and adding wcslen-sse4.1 to the ifunc
implementation list.

Testing:
test-wcslen.c, test-rsi-wcslen.c, and test-rsi-strlen.c are passing as
well as all other tests in wcsmbs and string.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-07-08 18:55:43 -04:00
H.J. Lu
a6e7c3745d x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064]
commit 6f573a27b6
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Wed Jun 23 01:19:34 2021 -0400

    x86-64: Add wcslen optimize for sse4.1

added wcsnlen-sse4.1 to the wcslen ifunc implementation list.  Since the
random value in the the RSI register is larger than the wide-character
string length in the existing wcslen test, it didn't trigger the wcslen
test failure.  Add a test to force 0 into the RSI register before calling
wcslen.
2021-07-08 18:55:40 -04:00
Fangrui Song
115d242456 x86_64: Remove unneeded static PIE check for undefined weak diagnostic
https://sourceware.org/bugzilla/show_bug.cgi?id=21782 dropped an ld
diagnostic for R_X86_64_PC32 referencing an undefined weak symbol in
-pie links.  Arguably keeping the diagnostic like other ports is more
correct, since statically resolving movl foo(%rip), %eax to the
link-time zero address produces a corrupted output.

It turns out that --enable-static-pie builds do not depend on the ld
behavior. GCC generates GOT indirection for weak declarations for
-fPIE/-fPIC, so what ld does with the PC-relative relocation doesn't
really matter.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-07-08 14:26:22 -07:00
Adhemerval Zanella
882d6e17bc posix: Add posix_spawn_file_actions_addclosefrom_np
This patch adds a way to close a range of file descriptors on
posix_spawn as a new file action.  The API is similar to the one
provided by Solaris 11 [1], where the file action causes the all open
file descriptors greater than or equal to input on to be closed when
the new process is spawned.

The function posix_spawn_file_actions_addclosefrom_np is safe to be
implemented by iterating over /proc/self/fd, since the Linux spawni.c
helper process does not use CLONE_FILES, so its has own file descriptor
table and any failure (in /proc operation) aborts the process creation
and returns an error to the caller.

I am aware that this file action might be redundant to the current
approach of POSIX in promoting O_CLOEXEC in more interfaces. However
O_CLOEXEC is still not the default and for some specific usages, the
caller needs to close all possible file descriptors to avoid them
leaking.  Some examples are CPython (discussed in BZ#10353) and OpenJDK
jspawnhelper [2] (where OpenJDK spawns a helper process to exactly
closes all file descriptors).  Most likely any environment which calls
functions that might open file descriptor under the hood and aim to use
posix_spawn might face the same requirement.

Checked on x86_64-linux-gnu and i686-linux-gnu on kernel 5.11 and 4.15.

[1] https://docs.oracle.com/cd/E36784_01/html/E36874/posix-spawn-file-actions-addclosefrom-np-3c.html
[2] https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libjava/childproc.c#L82
2021-07-08 14:08:15 -03:00
Adhemerval Zanella
607449506f io: Add closefrom [BZ #10353]
The function closes all open file descriptors greater than or equal to
input argument.  Negative values are clamped to 0, i.e, it will close
all file descriptors.

As indicated by the bug report, this is a common symbol provided by
different systems (Solaris, OpenBSD, NetBSD, FreeBSD) and, although
its has inherent issues with not taking in consideration internal libc
file descriptors (such as syslog), this is also a common feature used
in multiple projects [1][2][3][4][5].

The Linux fallback implementation iterates over /proc and close all
file descriptors sequentially.  Although it was raised the questioning
whether getdents on /proc/self/fd might return disjointed entries
when file descriptor are closed; it does not seems the case on my
testing on multiple kernel (v4.18, v5.4, v5.9) and the same strategy
is used on different projects [1][2][3][5].

Also, the interface is set a fail-safe meaning that a failure in the
fallback results in a process abort.

Checked on x86_64-linux-gnu and i686-linux-gnu on kernel 5.11 and 4.15.

[1] 5238e95759/src/basic/fd-util.c (L217)
[2] ddf4b77e11/src/lxc/start.c (L236)
[3] 9e4f2f3a6b/Modules/_posixsubprocess.c (L220)
[4] 5f47c0613e/src/libstd/sys/unix/process2.rs (L303-L308)
[5] https://github.com/openjdk/jdk/blob/master/src/java.base/unix/native/libjava/childproc.c#L82
2021-07-08 14:08:14 -03:00
Adhemerval Zanella
286286283e linux: Add close_range
It was added on Linux 5.9 (278a5fbaed89) with CLOSE_RANGE_CLOEXEC
added on 5.11 (582f1fb6b721f).  Although FreeBSD has added the same
syscall, this only adds the symbol on Linux ports.  This syscall is
required to provided a fail-safe way to implement the closefrom
symbol (BZ #10353).

Checked on x86_64-linux-gnu and i686-linux-gnu on kernel 5.11 and 4.15.
2021-07-08 14:08:13 -03:00
Florian Weimer
7fcdb53253 libio: Replace internal _IO_getdelim symbol with __getdelim
__getdelim is exported, _IO_getdelim is not.  Add a hidden prototype
for __getdelim.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-07 18:33:52 +02:00
Joseph Myers
26890e1cd0 Update MIPS libm-test-ulps 2021-07-07 15:50:18 +00:00
Joseph Myers
f517610f3a Update powerpc-nofpu libm-test-ulps 2021-07-07 15:35:04 +00:00
Joseph Myers
b46cfcef3f Update kernel version to 5.13 in tst-mman-consts.py
This patch updates the kernel version in the test tst-mman-consts.py
to 5.13.  (There are no new MAP_* constants covered by this test in
5.13 that need any other header changes.)

Tested with build-many-glibcs.py.
2021-07-07 13:24:05 +00:00
Florian Weimer
8ec022a037 nptl: Remove GLIBC_2.34 versions of __pthread_mutex_lock, __pthread_mutex_unlock
Now that there are no internal users anymore, these new symbol
versions can be removed from the public ABI.  The compatibility
symbols remain.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-07 08:41:17 +02:00
Florian Weimer
1f9c804fbd nptl: Use internal low-level lock type for !IS_IN (libc)
This avoids an ABI hazard (types changing between different modules
of glibc) without introducing linknamespace issues. In particular,
NSS modules now call __lll_lock_wait_private@@GLIBC_PRIVATE to wait
on internal locks (the unlock path is inlined and performs a direct
system call).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-07 08:41:14 +02:00
Adhemerval Zanella
cf2256196c linux: Fix setsockopt fallback
The final 2 arguments for SO_TIMESTAMP/SO_TIMESTAMPNS are being set
wrongly.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2021-07-06 11:45:35 -03:00
Adhemerval Zanella
f7de21498d linux: Use the expected size for SO_TIMESTAMP{NS} convertion
Kernel returns 32-bit values for COMPAT_SO_TIMESTAMP{NS}_OLD,
not 64-bit values.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2021-07-06 11:45:35 -03:00
Adhemerval Zanella
4b93a93e40 linux: Consolidate Linux setsockopt implementation
This patch consolidates the setsockopt implementation on
sysdeps/unix/sysv/linux/getsockopt.c.  The changes are:

  1. Remove it from auto-generation syscalls.list on all architectures.

  2. Add __ASSUME_SETSOCKOPT_SYSCALL as default and undef if for
     specific kernel versions on some architectures.

This also fix a potential issue where 32-bit time_t ABI should use the
linux setsockopt which overrides the underlying SO_* constants used for
socket timestamping for _TIME_BITS=64.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2021-07-06 11:45:35 -03:00
Adhemerval Zanella
1c46663a70 linux: Consolidate Linux getsockopt implementation
This patch consolidates the getsockopt Linux syscall implementation on
sysdeps/unix/sysv/linux/getsockopt.c.  The changes are:

  1. Remove it from auto-generation syscalls.list on all architectures.

  2. Add __ASSUME_GETSOCKOPT_SYSCALL as default and undef if for
     specific kernel versions on some architectures.

This also fix a potential issue where 32-bit time_t ABI should use the
linux getsockopt which overrides the underlying SO_* constants used for
socket timestamping for _TIME_BITS=64.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2021-07-06 11:45:35 -03:00
Khem Raj
c8935581de linux: Check for null value msghdr struct before use
This avoids crashes in libc when cmsg is null and refrencing msg
structure when it is null

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-07-05 15:11:13 -03:00
Florian Weimer
dbb949f53d resolv: Move libanl into libc (if libpthread is in libc)
The symbols gai_cancel, gai_error, gai_suspend, getaddrinfo_a,
__gai_suspend_time64 were moved using scripts/move-symbol-to-libc.py.

For Hurd (which remains !PTHREAD_IN_LIBC), a few #define redirects
had to be added because several pthread functions are not available
under __.  (Linux uses __ prefixes for most hidden aliases, and has
to in some cases to avoid linknamespace issues.)
2021-07-02 11:45:00 +02:00
Pedro Franco de Carvalho
813c6ec808 powerpc: optimize strcpy/stpcpy for POWER9/10
This patch modifies the current POWER9 implementation of strcpy and
stpcpy to optimize it for POWER9/10.

Since no new POWER10 instructions are used, the original POWER9 strcpy is
modified instead of creating a new implementation for POWER10.  This
implementation is based on both the original POWER9 implementation of
strcpy and the preamble of the new POWER10 implementation of strlen.

The changes also affect stpcpy, which uses the same implementation with
some additional code before returning.

On POWER9, averaging improvements across the benchmark
inputs (length/source alignment/destination alignment), for an
experiment that ran the benchmark five times, bench-strcpy showed an
improvement of 5.23%, and bench-stpcpy showed an improvement of 6.59%.

On POWER10, bench-strcpy showed 13.16%, and bench-stpcpy showed 13.59%.

The changes are:

1. Removed the null string optimization.

   Although this results in a few extra cycles for the null string, in
   combination with the second change, this resulted in improvements for
   for other cases.

2. Adapted the preamble from strlen for POWER10.

   This is the part of the function that handles up to the first 16 bytes
   of the string.

3. Increased number of unrolled iterations in the main loop to 6.

Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Tested-by: Matheus Castanho <msc@linux.ibm.com>
2021-07-01 17:58:53 -03:00
H.J. Lu
ea8e465a6b x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033]
From

https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html

* Intel TSX will be disabled by default.
* The processor will force abort all Restricted Transactional Memory (RTM)
  transactions by default.
* A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated,
  which is set to indicate to updated software that the loaded microcode is
  forcing RTM abort.
* On processors that enumerate support for RTM, the CPUID enumeration bits
  for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to
  be set by default after microcode update.
* Workloads that were benefited from Intel TSX might experience a change
  in performance.
* System software may use a new bit in Model-Specific Register (MSR) 0x10F
  TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock
  Elision (HLE) and RTM bits to indicate to software that Intel TSX is
  disabled.

1. Add RTM_ALWAYS_ABORT to CPUID features.
2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set.  This skips the
string/tst-memchr-rtm etc. testcases on the affected processors, which
always fail after a microcde update.
3. Check RTM feature, instead of usability, against /proc/cpuinfo.

This fixes BZ #28033.
2021-07-01 10:47:35 -07:00
Joseph Myers
b1b4f7209e Update syscall lists for Linux 5.13
Linux 5.13 has three new syscalls (landlock_create_ruleset,
landlock_add_rule, landlock_restrict_self).  Update syscall-names.list
and regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.

Tested with build-many-glibcs.py.
2021-07-01 17:37:36 +00:00
Stefan Liebler
7c45df18e1 s390: Fix MEMCHR_Z900_G5 ifunc-variant if n>=0x80000000 [BZ #28024]
On s390 (31bit), the pointer to the first byte after s always wraps
around with n >= 0x80000000 and can lead to stop searching before
end of s.

Thus this patch just use NULL as byte after s in this case and
the srst instruction stops searching with "not found" when wrapping
around from top address to zero.

This is observable with testcase string/test-memchr
starting with commit "String: Add overflow tests for strnlen, memchr,
and strncat [BZ #27974]"
https://sourceware.org/git/?p=glibc.git;a=commit;h=da5a6fba0febbfc90896ce1b2eb75c6d8a88a72d
2021-07-01 16:46:59 +02:00
Stefan Liebler
ba436665b1 Fix extra PLT reference in libc.so due to __glob64_time64 if build with gcc 7.5 on 32bit.
Starting with recent commit 84f7ce8447
"posix: Add glob64 with 64-bit time_t support", elf/check-localplt
fails due to extra PLT reference __glob64_time64 in __glob64_time64
itself.

This is observable with gcc 7.5 on x86_64 with -m32 or s390x with
-m31.  E.g. if build with gcc 10, gcc is generating a call to
__glob64_time64.localalias.

This patch is adding a hidden version of __glob64_time64 in the
same way as for __globfree64_time64.
2021-07-01 16:46:59 +02:00
Wilco Dijkstra
6a34c928c2 AArch64: Add hp-timing.h
Add hp-timing.h using the cntvct_el0 counter. Return timing in nanoseconds
so it is fully compatible with generic hp-timing. Don't set HP_TIMING_INLINE
in the dynamic linker since it adds unnecessary overheads and some ancient
kernels may not handle emulating cntcvt correctly. Currently cntvct_el0 is
only used for timing in the benchtests.

Reviewed-by: Szabolcs Nagy  <szabolcs.nagy@arm.com>
2021-07-01 15:42:05 +01:00
Wilco Dijkstra
252cad02d4 AArch64: Improve strnlen performance
Optimize strnlen by avoiding UMINV which is slow on most cores. On Neoverse N1
large strings are 1.8x faster than the current version, and bench-strnlen is
50% faster overall. This version is MTE compatible.

Reviewed-by: Szabolcs Nagy  <szabolcs.nagy@arm.com>
2021-07-01 15:32:36 +01:00
Florian Weimer
eb68d7d23c Linux: Avoid calling malloc indirectly from __get_nprocs
malloc initialization depends on __get_nprocs, so using
scratch buffers in __get_nprocs may result in infinite recursion.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-06-30 17:41:47 +02:00
Florian Weimer
734c60ebb6 login: Move libutil into libc
The symbols forkpty, login, login_tty, logout, logwtmp, openpty
were moved using scripts/move-symbol-to-libc.py.

This is a single commit because most of the symbols are tied together
via forkpty, for example.

Several changes to use hidden prototypes are needed.  This commit
also updates pseudoterminal terminology on modified lines.

For 390 (31-bit), this commit follows the existing style for the
compat symbol version creation.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-30 08:43:37 +02:00
Florian Weimer
8d1f854d60 login: Hidden prototypes for _getpt, __ptsname_r, grantpt, unlockpt
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-30 07:28:12 +02:00
Andreas Roeseler
9dc7dc5708 Add RFC 8335 Definitions from Linux 5.13
RFC 8335 defines the network utility PROBE, which builds off of the
capabilities of Ping to query more detailed interface information from
networking nodes.

The definitions included in this patchset have been accepted into the
linux net-next branch and will be included in Linux 5.13. This
patchset adds the same definitions to the glibc for use in the
iputils package.

The relevant commits for the Linux definitions can be found here:
e542d29ca8
750f4fc2a1

These changes have been tested by running the glibc tests on x86_64

Signed-off-by: Andreas Roeseler <andreas.a.roeseler@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-29 15:38:27 -03:00
Stefan Liebler
259a17cc98 s390x: Update math: redirect roundeven function
After recent commit
447954a206
"math: redirect roundeven function", building on
s390x fails with:
Error: symbol `__roundevenl' is already defined

Similar to aarch64/riscv fix, this patch redirects target
specific functions for s390x:
commit 3213ed770c
"Update math: redirect roundeven function"
2021-06-29 09:07:14 +02:00
Adhemerval Zanella
c32c868ab8 posix: Add _Fork [BZ #4737]
Austin Group issue 62 [1] dropped the async-signal-safe requirement
for fork and provided a async-signal-safe _Fork replacement that
does not run the atfork handlers.  It will be included in the next
POSIX standard.

It allow to close a long standing issue to make fork AS-safe (BZ#4737).
As indicated on the bug, besides the internal lock for the atfork
handlers itself; there is no guarantee that the handlers itself will
not introduce more AS-safe issues.

The idea is synchronize fork with the required internal locks to allow
children in multithread processes to use mostly of standard function
(even though POSIX states only AS-safe function should be used).  On
signal handles, _Fork should be used intead and only AS-safe functions
should be used.

For testing, the new tst-_Fork only check basic usage.  I also added
a new tst-mallocfork3 which uses the same strategy to check for
deadlock of tst-mallocfork2 but using threads instead of subprocesses
(and it does deadlock if it replaces _Fork with fork).

[1] https://austingroupbugs.net/view.php?id=62
2021-06-28 15:55:56 -03:00
Florian Weimer
dd45734e32 nptl: Add glibc.pthread.stack_cache_size tunable
The valgrind/helgrind test suite needs a way to make stack dealloction
more prompt, and this feature seems to be generally useful.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-06-28 16:41:58 +02:00
Szabolcs Nagy
3101b96787 arm: align stack in clone [BZ 28020]
The arm PCS requires 8 byte aligned stack at function entry.
Previously unaligned stack could crash the clone child.

Fixes bug 28020.
2021-06-28 11:35:44 +01:00
Florian Weimer
30639e79d3 Linux: Cleanups after librt move
librt.so is no longer installed for PTHREAD_IN_LIBC, and tests
are not linked against it.  $(librt) is introduced globally for
shared tests that need to be linked for both PTHREAD_IN_LIBC
and !PTHREAD_IN_LIBC.

GLIBC_PRIVATE symbols that were needed during the transition are
removed again.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 09:51:01 +02:00
Florian Weimer
477910b83e Linux: Move timer_settime, __timer_settime64 from librt to libc
The symbols were moved using scripts/move-symbol-to-libc.py.

The way the ABI intransition is implemented is changed with this
commit: the implementation is now consolidated in one file with a
TIMER_T_WAS_INT_COMPAT check.

The shared librt is now empty, so this commit adds a placeholder
symbol at the base version, GLIBC_2.2, and potentially at the
GLIBC_2.3.3 version as well (the leftover from the int/timer_t ABI
transition).

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-28 09:51:01 +02:00
Florian Weimer
a1d6ed027b Linux: Move timer_gettime, __timer_gettime64 from librt to libc
The symbols were moved using scripts/move-symbol-to-libc.py.

The way the ABI intransition is implemented is changed with this
commit: the implementation is now consolidated in one file with a
TIMER_T_WAS_INT_COMPAT check.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-28 09:51:01 +02:00
Florian Weimer
df6d227e69 Linux: Move timer_getoverrun from librt to libc
The symbol was moved using scripts/move-symbol-to-libc.py.

The way the ABI intransition is implemented is changed with this
commit: the implementation is now consolidated in one file with a
TIMER_T_WAS_INT_COMPAT check.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-28 09:51:00 +02:00
Florian Weimer
273a2a2ae8 Linux: Move timer_create, timer_delete from librt to libc
The symbols were moved using scripts/move-symbol-to-libc.py.

timer_create and timer_delete are tied together via the int/timer_t
compatibility code.  The way the ABI intransition is implemented
is changed with this commit: the implementation is now consolidated
in one file with a TIMER_T_WAS_INT_COMPAT check.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-28 09:51:00 +02:00
Florian Weimer
d7d0efec47 Linux: Define TIMER_T_WAS_INT_COMPAT in kernel-posix-timers.h
This is almost equivalent to __WORDSIZE == 64
&& OTHER_SHLIB_COMPAT (librt, GLIBC_2_1, GLIBC_2_3_3), except
that this expression is true for mips64/n64 targets as well,
even though those did not undergo the timer_t transition.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-06-28 09:51:00 +02:00
H.J. Lu
3213ed770c Update math: redirect roundeven function
Redirect target specific roundeven functions for aarch64, ldbl-128ibm
and riscv.
2021-06-27 07:56:57 -07:00
Shen-Ta Hsieh
eb9066203f Use GCC builtins for roundeven functions if desired.
This patch is using the corresponding GCC builtin for roundevenf,
roundeven and roundevenl if the USE_FUNCTION_BUILTIN macros are defined
to one in math-use-builtins.h.

These builtin functions is supported since GCC 10.

The code of the generic implementation is not changed.

Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-27 07:56:57 -07:00
Shen-Ta Hsieh
1683249d17 x86_64: roundeven with sse4.1 support
This patch adds support for the sse4.1 hardware floating point
roundeven.

Here is some benchmark results on my systems:

=AMD Ryzen 9 3900X 12-Core Processor=

* benchmark result before this commit
|            |    roundeven |   roundevenf |
|------------|--------------|--------------|
| duration   |  3.75587e+09 |  3.75114e+09 |
| iterations |  3.93053e+08 |  4.35402e+08 |
| max        | 52.592       | 58.71        |
| min        |  7.98        |  7.22        |
| mean       |  9.55563     |  8.61535     |

* benchmark result after this commit
|            |     roundeven |   roundevenf |
|------------|---------------|--------------|
| duration   |   3.73815e+09 |  3.73738e+09 |
| iterations |   5.82692e+08 |  5.91498e+08 |
| max        |  56.468       | 51.642       |
| min        |   6.27        |  6.156       |
| mean       |   6.41532     |  6.3185      |

=Intel(R) Pentium(R) CPU D1508 @ 2.20GHz=

* benchmark result before this commit
|            |    roundeven |   roundevenf |
|------------|--------------|--------------|
| duration   |  2.18208e+09 |  2.18258e+09 |
| iterations |  2.39932e+08 |  2.46924e+08 |
| max        | 96.378       | 98.035       |
| min        |  6.776       |  5.94        |
| mean       |  9.09456     |  8.83907     |

* benchmark result after this commit
|            |    roundeven |   roundevenf |
|------------|--------------|--------------|
| duration   |  2.17415e+09 |  2.17005e+09 |
| iterations |  3.56193e+08 |  4.09824e+08 |
| max        | 51.693       | 97.192       |
| min        |  5.926       |  5.093       |
| mean       |  6.10385     |  5.29507     |

Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-27 07:56:57 -07:00
Shen-Ta Hsieh
447954a206 math: redirect roundeven function
This patch redirect roundeven function for futhermore changes.

Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-27 07:56:57 -07:00
Florian Weimer
2c16cb88a6 Linux: Move timer helper routines from librt to libc
This adds several temporary GLIBC_PRIVATE exports.  The symbol names
are changed so that they all start with __timer_.

It is now possible to invoke the fork handler directly, so
pthread_atfork is no longer necessary.  The associated error cannot
happen anymore, and cancellation handling can be removed from
the helper thread routine.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:21:12 +02:00
Florian Weimer
1a5a653be2 Linux: Move mq_unlink from librt to libc
The symbol was moved using scripts/move-symbol-to-libc.py.
A placeholder symbol is needed on some architectures for the
GLIBC_2.3.4 version.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:21:12 +02:00
Florian Weimer
5b3a2abfb3 Linux: Move mq_send, mq_timedsend, __mq_timedsend_time64 to libc
The symbols were moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:21:12 +02:00
Florian Weimer
903e6f9960 Linux: Move mq_receive, mq_timedreceive, __mq_timedreceive_time64 to libc
The symbols were moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:21:12 +02:00
Florian Weimer
983f43b57b Linux: Move mq_open, __mq_open_2 from librt to libc
The symbols were moved using scripts/move-symbol-to-libc.py.
A placeholder symbol is required to keep the GLIBC_2.7 version.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:21:12 +02:00
Florian Weimer
2da5f22fff Linux: Move mq_notify from librt to libc
The symbol was moved using scripts/move-symbol-to-libc.py.

An explicit call from fork into the mq_notify implementation replaces
the previous use of pthread_atfork.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:20:47 +02:00
Florian Weimer
f66d9abca7 Linux: Move mq_getattr from librt to libc
The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:19:58 +02:00
Florian Weimer
a752cb670a Linux: Move mq_setattr from librt to libc
The symbol was moved using scripts/move-symbol-to-libc.py.

To introduce the proper symbol versioning, the implementation of
the system call wrapper us moved to a C file.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:19:58 +02:00
Florian Weimer
12028b5031 Linux: Move mq_close from librt to libc
The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:19:58 +02:00
Florian Weimer
3fe3f8076e Linux: Move lio_listio, lio_listio64 from librt to libc
The symbols were moved using scripts/move-symbol-to-libc.py.
Placeholder symbols are needed on some architectures, to keep the
GLIBC_2.1 and GLIBC_2.4 symbol versions around.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:19:58 +02:00
Florian Weimer
3353a5a4cf rt: Rework lio_listio implementation
Move the common code into rt/lio_listio-common.c and include
the file in both rt/lio_listio.c and rt/lio_listio64.c.  The common
code automatically defines both public symbols for __WORDSIZE == 64.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:19:57 +02:00
Florian Weimer
496919b12f Linux: Move aio_write, aio_write64 into libc
Both symbols have to be moved at the same time because they
are intertwined for __WORDSIZE == 64.  The treatment of this case
is also changed to match more closely how the other files suppress
the declaration of the *64 identifier.

The symbols were moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 12:19:15 +02:00
Florian Weimer
32e750516c Linux: Move aio_suspend, aio_suspend64, __aio_suspend_time64 to libc
The symbols were moved using scripts/move-symbol-to-libc.py.

There is a minor oddity here: This is generic code shared with Hurd,
and Hurd does not have time64 support.  This is why the
versioned_symbol export for __aio_suspend_time64 is restricted to
the PTHREAD_IN_LIBC code.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 11:55:27 +02:00
Florian Weimer
406fb327fb Linux: Move aio_return, aio_return64 into libc
The symbols were moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerva Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 11:55:01 +02:00
Florian Weimer
7ad553b96e Linux: Move aio_read, aio_read64 into libc
Both symbols have to be moved at the same time because they
are intertwined for __WORDSIZE == 64.  The treatment of this case
is also changed to match more closely how the other files suppress
the declaration of the *64 identifier.

The symbols were moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 11:53:37 +02:00
Florian Weimer
1f3a8e716d Linux: Move aio_fsync, aio_fsync64 into libc
The symbols were moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 11:50:24 +02:00
Florian Weimer
1a7d0dedf0 Linux: Move aio_error, aio_error64 into libc
The symbols were moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 11:49:55 +02:00
Florian Weimer
3df6dcc5c7 Linux: Move aio_cancel, aio_cancel64 into libc
The symbols were moved using scripts/move-symbol-to-libc.py.

A version placeholder symbol is needed on alpha and sparc because
of the additional symbols formerly at version GLIBC_2.3.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>:
2021-06-25 11:48:46 +02:00
Florian Weimer
d12506b2db Linux: Move aio_init from librt into libc
This commit also moves the aio_misc and aio_sigquue helper,
so GLIBC_PRIVATE exports need to be added.

The symbol was moved using scripts/move-symbol-to-libc.py.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-25 11:48:25 +02:00
Noah Goldstein
08cbcd4dbc x86: Remove unnecessary overflow check from wcsnlen-sse4_1.S
No bug. The way wcsnlen will check if near the end of maxlen
is the following macro:

	mov	%r11, %rsi;	\
	subq	%rax, %rsi;	\
	andq	$-64, %rax;	\
	testq	$-64, %rsi;	\
	je	L(strnlen_ret)

Which words independently of s + maxlen overflowing. So the
second overflow check is unnecissary for correctness and
just extra overhead in the common no overflow case.

test-strlen.c, test-wcslen.c, test-strnlen.c and test-wcsnlen.c are
all passing

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-24 19:14:19 -04:00
Adhemerval Zanella
9f70985569 Consolidate pthread_atfork
The pthread_atfork is similar between Linux and Hurd, only the compat
version bits differs.  The generic version is place at sysdeps/pthread
with a common name.

It also fixes an issue with Hurd license, where the static-only object
did not use LGPL + exception.

Checked on x86_64-linux-gnu, i686-linux-gnu, and with a build for
i686-gnu.
2021-06-24 10:04:41 -03:00
Adhemerval Zanella
9a75654037 posix: Consolidate fork implementation
The Linux nptl implementation is used as base for generic fork
implementation to handle the internal locks and mutexes.  The
system specific bits are moved a new internal _Fork symbol.

(This new implementation will be used to provide a async-signal-safe
_Fork now that POSIX has clarified that fork might not be
async-signal-safe [1]).

For Hurd it means that the __nss_database_fork_prepare_parent and
__nss_database_fork_subprocess will be run in a slight different
order.

[1] https://austingroupbugs.net/view.php?id=62
2021-06-24 10:02:06 -03:00
Adhemerval Zanella
e3e3eb0a2e x86: Fix tst-cpu-features-cpuinfo on Ryzen 9 (BZ #27873)
AMD define different flags for IRPB, IBRS, and STIPBP [1], so new
x86_64_cpu are added and IBRS_IBPB is only tested for Intel.

The SSDB is also defined and implemented different on AMD [2],
and also a new AMD_SSDB flag is added.  It should map to the
cpuinfo 'ssdb' on recent AMD cpus.

It fixes tst-cpu-features-cpuinfo and tst-cpu-features-cpuinfo-static
on recent AMD cpus.

Checked on x86_64-linux-gnu on AMD Ryzen 9 5900X.

[1] https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf
[2] https://bugzilla.kernel.org/show_bug.cgi?id=199889

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-24 09:57:46 -03:00
H.J. Lu
ea26ff0322 x86: Copy IBT and SHSTK usable only if CET is enabled
IBT and SHSTK usable bits are copied from CPUID feature bits and later
cleared if kernel doesn't support CET.  Copy IBT and SHSTK usable only
if CET is enabled so that they aren't set on CET capable processors
with non-CET enabled glibc.
2021-06-23 17:35:47 -07:00
Noah Goldstein
a775a7a3eb x86: Fix overflow bug in wcsnlen-sse4_1 and wcsnlen-avx2 [BZ #27974]
This commit fixes the bug mentioned in the previous commit.

The previous implementations of wmemchr in these files relied
on maxlen * sizeof(wchar_t) which was not guranteed by the standard.

The new overflow tests added in the previous commit now
pass (As well as all the other tests).

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-23 14:13:06 -04:00
Noah Goldstein
645a158978 x86: Fix overflow bug with wmemchr-sse2 and wmemchr-avx2 [BZ #27974]
This commit fixes the bug mentioned in the previous commit.

The previous implementations of wmemchr in these files relied
on n * sizeof(wchar_t) which was not guranteed by the standard.

The new overflow tests added in the previous commit now
pass (As well as all the other tests).

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-23 14:13:03 -04:00
Noah Goldstein
6f573a27b6 x86-64: Add wcslen optimize for sse4.1
No bug. This comment adds the ifunc / build infrastructure
necessary for wcslen to prefer the sse4.1 implementation
in strlen-vec.S. test-wcslen.c is passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-23 14:12:36 -04:00
H.J. Lu
a0db678071 x86-64: Move strlen.S to multiarch/strlen-vec.S
Since strlen.S contains SSE2 version of strlen/strnlen and SSE4.1
version of wcslen/wcsnlen, move strlen.S to multiarch/strlen-vec.S
and include multiarch/strlen-vec.S from SSE2 and SSE4.1 variants.
This also removes the unused symbols, __GI___strlen_sse2 and
__GI___wcsnlen_sse4_1.
2021-06-23 10:24:35 -07:00
Adhemerval Zanella
30adcf5adb hurd: Fix build after 52a5fe70a2
Hurd does not support 64-bit time_t internally.
2021-06-23 14:14:48 -03:00
Adhemerval Zanella
6d97330d7a linux: Only use 64-bit syscall if required for clock_nanosleep
For !__ASSUME_TIME64_SYSCALLS there is no need to issue a 64-bit syscall
if the provided timeout fits in a 32-bit one.  The 64-bit usage should
be rare since the timeout is a relative one.

Checked on i686-linux-gnu on a 4.15 kernel and on a 5.11 kernel
(with and without --enable-kernel=5.1) and on x86_64-linux-gnu.

Reviewed-by: Lukasz Majewski <lukma@denx.de>
2021-06-22 12:09:52 -03:00
Adhemerval Zanella
b769b0a2cb linux: Only use 64-bit syscall if required for internal futex
For !__ASSUME_TIME64_SYSCALLS there is no need to issue a 64-bit syscall
if the provided timeout fits in a 32-bit one.  The 64-bit usage should
be rare since the timeout is a relative one.

Checked on i686-linux-gnu on a 4.15 kernel and on a 5.11 kernel
(with and without --enable-kernel=5.1) and on x86_64-linux-gnu.

Reviewed-by: Lukasz Majewski <lukma@denx.de>
2021-06-22 12:09:52 -03:00
Adhemerval Zanella
b286eca5d4 linux: Only use 64-bit syscall if required for utimensat family
For !__ASSUME_TIME64_SYSCALLS there is no need to issue a 64-bit syscall
if the provided timeout fits in a 32-bit one.  The 64-bit usage should
be rare since the timeout is a relative one.

The large timeout are already tests by io/tst-utimensat-skeleton.c.

Checked on i686-linux-gnu on a 4.15 kernel and on a 5.11 kernel
(with and without --enable-kernel=5.1) and on x86_64-linux-gnu.

Reviewed-by: Lukasz Majewski <lukma@denx.de>
2021-06-22 12:09:52 -03:00
Adhemerval Zanella
dafab287b4 linux: Only use 64-bit syscall if required for sigtimedwait
For !__ASSUME_TIME64_SYSCALLS there is no need to issue a 64-bit syscall
if the provided timeout fits in a 32-bit one.  The 64-bit usage should
be rare since the timeout is a relative one.

Checked on i686-linux-gnu on a 4.15 kernel and on a 5.11 kernel
(with and without --enable-kernel=5.1) and on x86_64-linux-gnu.

Reviewed-by: Lukasz Majewski <lukma@denx.de>
2021-06-22 12:09:52 -03:00
Adhemerval Zanella
1faff27011 linux: Only use 64-bit syscall if required for mq_timedsend
For !__ASSUME_TIME64_SYSCALLS there is no need to issue a 64-bit syscall
if the provided timeout fits in a 32-bit one.  The 64-bit usage should
be rare since the timeout is a relative one.

Checked on i686-linux-gnu on a 4.15 kernel and on a 5.11 kernel
(with and without --enable-kernel=5.1) and on x86_64-linux-gnu.

Reviewed-by: Lukasz Majewski <lukma@denx.de>
2021-06-22 12:09:52 -03:00