Commit Graph

13906 Commits

Author SHA1 Message Date
Szabolcs Nagy
2208066603 elf: Remove lazy tlsdesc relocation related code
Remove generic tlsdesc code related to lazy tlsdesc processing since
lazy tlsdesc relocation is no longer supported.  This includes removing
GL(dl_load_lock) from _dl_make_tlsdesc_dynamic which is only called at
load time when that lock is already held.

Added a documentation comment too.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-21 14:35:53 +01:00
Noah Goldstein
aaa23c3507 x86: Optimize strlen-avx2.S
No bug. This commit optimizes strlen-avx2.S. The optimizations are
mostly small things but they add up to roughly 10-30% performance
improvement for strlen. The results for strnlen are bit more
ambiguous. test-strlen, test-strnlen, test-wcslen, and test-wcsnlen
are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-19 18:03:49 -07:00
Noah Goldstein
4ba6558684 x86: Optimize strlen-evex.S
No bug. This commit optimizes strlen-evex.S. The
optimizations are mostly small things but they add up to roughly
10-30% performance improvement for strlen. The results for strnlen are
bit more ambiguous. test-strlen, test-strnlen, test-wcslen, and
test-wcsnlen are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-19 18:03:49 -07:00
Noah Goldstein
f53790272c x86: Optimize less_vec evex and avx512 memset-vec-unaligned-erms.S
No bug. This commit adds optimized cased for less_vec memset case that
uses the avx512vl/avx512bw mask store avoiding the excessive
branches. test-memset and test-wmemset are passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-19 15:08:04 -07:00
H.J. Lu
83c5b36822 x86-64: Require BMI2 for strchr-avx2.S
Since strchr-avx2.S updated by

commit 1f745ecc21
Author: noah <goldstein.w.n@gmail.com>
Date:   Wed Feb 3 00:38:59 2021 -0500

    x86-64: Refactor and improve performance of strchr-avx2.S

uses sarx:

c4 e2 72 f7 c0       	sarx   %ecx,%eax,%eax

for strchr-avx2 family functions, require BMI2 in ifunc-impl-list.c and
ifunc-avx2.h.
2021-04-19 11:01:45 -07:00
H.J. Lu
55bf411b45 x86-64: Require BMI2 for __strlen_evex and __strnlen_evex
Since __strlen_evex and __strnlen_evex added by

commit 1fd8c163a8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Mar 5 06:24:52 2021 -0800

    x86-64: Add ifunc-avx2.h functions with 256-bit EVEX

use sarx:

c4 e2 6a f7 c0       	sarx   %edx,%eax,%eax

require BMI2 for __strlen_evex and __strnlen_evex in ifunc-impl-list.c.
ifunc-avx2.h already requires BMI2 for EVEX implementation.
2021-04-19 07:51:33 -07:00
noah
1a8605b6cd x86: Update large memcpy case in memmove-vec-unaligned-erms.S
No Bug. This commit updates the large memcpy case (no overlap). The
update is to perform memcpy on either 2 or 4 contiguous pages at
once. This 1) helps to alleviate the affects of false memory aliasing
when destination and source have a close 4k alignment and 2) In most
cases and for most DRAM units is a modestly more efficient access
pattern. These changes are a clear performance improvement for
VEC_SIZE =16/32, though more ambiguous for VEC_SIZE=64. test-memcpy,
test-memccpy, test-mempcpy, test-memmove, and tst-memmove-overflow all
pass.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-04-16 10:06:56 -07:00
Matheus Castanho
5d61fc2021 powerpc: Add missing registers to clobbers list for syscalls [BZ #27623]
Some registers that can be clobbered by the kernel during a syscall are not
listed on the clobbers list in sysdeps/unix/sysv/linux/powerpc/sysdep.h.

For syscalls using sc:
    - XER is zeroed by the kernel on exit

For syscalls using scv:
    - XER is zeroed by the kernel on exit
    - Different from the sc case, most CR fields can be clobbered (according to
      the ELF ABI and the Linux kernel's syscall ABI for powerpc
      (linux/Documentation/powerpc/syscall64-abi.rst)

The same should apply to vsyscalls, which effectively execute a function call
but are not currently adding these registers as clobbers either.

These are likely not causing issues today, but they should be added to the
clobbers list just in case things change on the kernel side in the future.

Reported-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
2021-04-16 08:40:37 -03:00
Adhemerval Zanella
ded3cef361 misc: syslog: Assume MSG_NOSIGNAL support (BZ #17144)
MSG_NOSIGNAL was added on POSIX 2008 and Hurd seems to support it.
The SIGPIPE handling also makes the implementation not thread-safe
(due the sigaction usage).

Checked on x86_64-linux-gnu.
2021-04-15 11:32:40 -03:00
Adhemerval Zanella
243339d055 io: Move file timestamps tests out of Linux
Now that libsupport abstract Linux possible missing support (either
due FS limitation that can't handle 64 bit timestamp or architectures
that do not handle values larger than unsigned 32 bit values) the
tests can be turned generic.

Checked on x86_64-linux-gnu and i686-linux-gnu.  I also built the
tests for i686-gnu.

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-04-15 09:39:43 -03:00
Stefan Liebler
07c245a76b s390: Update ulps
Required after 9acda61d94 "Fix the inaccuracy of j0f/j1f/y0f/y1f
[BZ #14469, #14470, #14471, #14472]".
2021-04-15 11:05:43 +02:00
Szabolcs Nagy
a75a02a696 i386: Remove lazy tlsdesc relocation related code
Like in commit e75711ebfa976d5468ec292282566a18b07e4d67 for x86_64,
remove unused lazy tlsdesc relocation processing code:

  _dl_tlsdesc_resolve_abs_plus_addend
  _dl_tlsdesc_resolve_rel
  _dl_tlsdesc_resolve_rela
  _dl_tlsdesc_resolve_hold
2021-04-15 09:47:59 +01:00
Szabolcs Nagy
55c9f32380 x86_64: Remove lazy tlsdesc relocation related code
_dl_tlsdesc_resolve_rela and _dl_tlsdesc_resolve_hold are only used for
lazy tlsdesc relocation processing which is no longer supported.
2021-04-15 09:47:47 +01:00
Szabolcs Nagy
ddcacd91cc i386: Avoid lazy relocation of tlsdesc [BZ #27137]
Lazy tlsdesc relocation is racy because the static tls optimization and
tlsdesc management operations are done without holding the dlopen lock.

This similar to the commit b7cf203b5c
for aarch64, but it fixes a different race: bug 27137.

On i386 the code is a bit more complicated than on x86_64 because both
rel and rela relocs are supported.
2021-04-15 09:47:43 +01:00
Szabolcs Nagy
8f7e09f4db x86_64: Avoid lazy relocation of tlsdesc [BZ #27137]
Lazy tlsdesc relocation is racy because the static tls optimization and
tlsdesc management operations are done without holding the dlopen lock.

This similar to the commit b7cf203b5c
for aarch64, but it fixes a different race: bug 27137.

Another issue is that ld auditing ignores DT_BIND_NOW and thus tries to
relocate tlsdesc lazily, but that does not work in a BIND_NOW module
due to missing DT_TLSDESC_PLT. Unconditionally relocating tlsdesc at
load time fixes this bug 27721 too.
2021-04-15 09:47:37 +01:00
Vineet Gupta
aecbe50c9d ARC: Update ulps
Needed after 43576de04a

Signed-off-by: Vineet Gupta <vgupta@synopsys.com>
2021-04-14 09:24:45 -07:00
Szabolcs Nagy
f4596d9540 Remove PR_TAGGED_ADDR_ENABLE from sys/prctl.h
The value of PR_TAGGED_ADDR_ENABLE was incorrect in the installed
headers and the prctl command macros were missing that are needed
for it to be useful (PR_SET_TAGGED_ADDR_CTRL).  Linux headers have
the definitions since 5.4 so it's widely available, we don't need
to repeat these definitions.  The remaining definitions are from
Linux 5.10.

To build glibc with --enable-memory-tagging, Linux 5.4 headers and
binutils 2.33.1 or newer is needed.

Reviewed-by: DJ Delorie <dj@redhat.com>
2021-04-14 08:45:21 +01:00
Adhemerval Zanella
bdc12a77b7 linux: sysconf: Use a more explicit maximum_ARG_MAX 2021-04-13 17:45:14 -03:00
Michal Nazarewicz
a9880586ee linux: sysconf: limit _SC_MAX_ARG to 6 MiB (BZ #25305)
Since Linux 4.13, kernel limits the maximum command line arguments
length to 6 MiB [1].  Normally the limit is still quarter of the maximum
stack size but if that limit exceeds 6 MiB it's clamped down.

glibc's __sysconf implementation for Linux platform is not aware of
this limitation and for stack sizes of over 24 MiB it returns higher
ARG_MAX than Linux will actually accept.  This can be verified by
executing the following application on Linux 4.13 or newer:

    #include <stdio.h>
    #include <string.h>
    #include <sys/resource.h>
    #include <sys/time.h>
    #include <unistd.h>

    int main(void) {
            const struct rlimit rlim = { 40 * 1024 * 1024,
                                         40 * 1024 * 1024 };
            if (setrlimit(RLIMIT_STACK, &rlim) < 0) {
                    perror("setrlimit: RLIMIT_STACK");
                    return 1;
            }

            printf("ARG_MAX     : %8ld\n", sysconf(_SC_ARG_MAX));
            printf("63 * 100 KiB: %8ld\n", 63L * 100 * 1024);
            printf("6 MiB       : %8ld\n", 6L * 1024 * 1024);

            char str[100 * 1024], *argv[64], *envp[1];
            memset(&str, 'A', sizeof str);
            str[sizeof str - 1] = '\0';
            for (size_t i = 0; i < sizeof argv / sizeof *argv - 1; ++i) {
                    argv[i] = str;
            }
            argv[sizeof argv / sizeof *argv - 1] = envp[0] = 0;

            execve("/bin/true", argv, envp);
            perror("execve");
            return 1;
    }

On affected systems the program will report ARG_MAX as 10 MiB but
despite that executing /bin/true with a bit over 6 MiB of command line
arguments will fail with E2BIG error.  Expected result is that ARG_MAX
is reported as 6 MiB.

Update the __sysconf function to clamp ARG_MAX value to 6 MiB if it
would otherwise exceed it.  This resolves bug #25305 which was market
WONTFIX as suggested solution was to cap ARG_MAX at 128 KiB.

As an aside and point of comparison, bionic (a libc implementation for
Android systems) decided to resolve this issue by always returning 128
KiB ignoring any potential xargs regressions [2].

On older kernels this results in returning overly conservative value
but that's a safer option than being aggressive and returning invalid
value on recent systems.  It's also worth noting that at this point
all supported Linux releases have the 6 MiB barrier so only someone
running an unsupported kernel version would get incorrectly truncated
result.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

[1] See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da029c11e6b12f321f36dac8771e833b65cec962
[2] See baed51ee3a
2021-04-13 17:10:02 -03:00
Adhemerval Zanella
58137d00ba s390: Update ulps
Required after 43576de04a "Improve the accuracy of tgamma
(BZ #26983)"
2021-04-13 16:33:27 -03:00
Adhemerval Zanella
30c2a0e41b i386: Update ulps
Required after 43576de04a "Improve the accuracy of tgamma
(BZ #26983)"
2021-04-13 16:33:27 -03:00
Adhemerval Zanella
cedbf6d5f3 linux: always update select timeout (BZ #27706)
The timeout should be updated even on failure for time64 support.

Checked on i686-linux-gnu.
2021-04-12 18:38:37 -03:00
Adhemerval Zanella
9d7c5cc38e linux: Normalize and return timeout on select (BZ #27651)
The commit 2433d39b69, which added time64 support to select, changed
the function to use __NR_pselect6 (or __NR_pelect6_time64) on all
architectures.  However, on architectures where the symbol was
implemented with __NR_select the kernel normalizes the passed timeout
instead of return EINVAL.  For instance, the input timeval
{ 0, 5000000 } is interpreted as { 5, 0 }.

And as indicated by BZ #27651, this semantic seems to be expected
and changing it results in some performance issues (most likely
the program does not check the return code and keeps issuing
select with unormalized tv_usec argument).

To avoid a different semantic depending whether which syscall the
architecture used to issue, select now always normalize the timeout
input.  This is a slight change for some ABIs (for instance aarch64).

Checked on x86_64-linux-gnu and i686-linux-gnu.
2021-04-12 18:38:37 -03:00
Szabolcs Nagy
8d4d77f6c8 arm: Fix an incorrect check in ____longjmp_chk [BZ #27709]
An incorrect check in __longjmp_chk could fail on valid code causing

FAIL: debug/tst-longjmp_chk2

The original check was

  altstack_sp + altstack_size - setjmp_sp > altstack_size

i.e. sp at setjmp was outside of the altstack range. Here we know that
longjmp is called from a signal handler on the altstack (SS_ONSTACK),
and that it jumps in the wrong direction (sp decreases), so the check
wants to ensure the jump goes to another stack.

The check is wrong when altstack_sp == setjmp_sp which can happen
when the altstack is a local buffer in the function that calls setjmp,
so the patch allows == too. This fixes bug 27709.

Note that the generic __longjmp_chk check seems to be different.
(it checks if longjmp was on the altstack but does not check setjmp,
so it would not catch incorrect longjmp use within the signal handler).
2021-04-12 14:28:07 +01:00
Samuel Thibault
0385d5fff8 hurd: Export _hurd_libc_proc_init
hurd's libdiskfs needs to be able to call _hurd_init + _hurd_libc_proc_init
for bootstrap initialization.
2021-04-12 00:23:36 +02:00
Tulio Magno Quites Machado Filho
667d9c8d55 powerpc: Update libm test ulps
Update after commit 43576de04a.
2021-04-09 17:41:22 -03:00
Szabolcs Nagy
2d690bbb17 arm: update libm test ulps
Updated after commits 9acda61d94
and 43576de04a.
2021-04-08 09:55:33 +01:00
Szabolcs Nagy
e06e6554c3 aarch64: update libm test ulps
Update after commit 43576de04a.
2021-04-08 08:24:30 +01:00
Paul Zimmermann
43576de04a Improve the accuracy of tgamma (BZ #26983)
With this patch, the maximal known error for tgamma is now reduced to 9 ulps
for dbl-64, for all rounding modes. Since exhaustive testing is not possible
for dbl-64, it might be that there are still cases with an error larger than
9 ulps, but all known cases are fixed (intensive tests were done to find cases
with large errors).

Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm,
s390x, sparc, and i686).
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-07 13:23:39 +02:00
John David Anglin
e9eeeb3a58 Update hppa libm-test-ulps 2021-04-06 18:55:58 +00:00
Adhemerval Zanella
5f6ff07dbf m68: Fix build after 9acda61d94
The j0f/j1f/y0f/y1f now uses __inv_pio4.
2021-04-06 15:10:31 -03:00
Szabolcs Nagy
69499bb6ee aarch64: free tlsdesc data on dlclose [BZ #27403]
DL_UNMAP_IS_SPECIAL and DL_UNMAP were not defined. The definitions are
now copied from arm, since the same is needed on aarch64. The cleanup
of tlsdesc data is handled by the custom _dl_unmap.

Fixes bug 27403.
2021-04-06 14:35:05 +01:00
Adhemerval Zanella
edb0ba79a1 ia64: Update ulps
Required after 9acda61d94 "Fix the inaccuracy of j0f/j1f/y0f/y1f
[BZ #14469, #14470, #14471, #14472]" and db3f7bb558 "math: Remove
slow paths from asin and acos [BZ #15267]".
2021-04-05 10:11:09 -03:00
Adhemerval Zanella
52c512bc56 ia64: Fix build after 9acda61d94
The j0f/j1f/y0f/y1f now uses __inv_pio4 and call roundf (which turns
to __roundf on ia64).
2021-04-05 10:07:42 -03:00
Adhemerval Zanella
1d64e962ab i386: Update ulps
Required after 9acda61d94 "Fix the inaccuracy of j0f/j1f/y0f/y1f
[BZ #14469, #14470, #14471, #14472]".
2021-04-05 10:02:15 -03:00
Paul Zimmermann
9acda61d94 Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.

The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average.  Two different algorithms are used:

* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
  degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)

* for large inputs, an asymptotic formula from [1] is used

[1] Fast and Accurate Bessel Function Computation,
    John Harrison, Proceedings of Arith 19, 2009.

Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).

Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-02 06:15:48 +02:00
Sunil K Pandey
595c22ecd8 x86-64: Fix ifdef indentation in strlen-evex.S
Fix some indentations of ifdef in file strlen-evex.S which are off by 1
and confusing to read.
2021-04-01 16:13:33 -07:00
Joseph Myers
e21b7c87e8 Update Nios II libm-test-ulps. 2021-04-01 19:41:40 +00:00
Adhemerval Zanella
be60d70166 Update arm libm-tests-ulps
Required after db3f7bb558 "math: Remove slow paths from asin and
acos [BZ #15267]".
2021-04-01 14:02:05 -03:00
H.J. Lu
b1ec623ed5 x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591]
config/i386/constraints.md in GCC has

(define_constraint "e"
  "32-bit signed integer constant, or a symbolic reference known
   to fit that range (for immediate operands in sign-extending x86-64
   instructions)."
  (match_operand 0 "x86_64_immediate_operand"))

Since movq takes a signed 32-bit immediate or a register source operand,
use "er", instead of "nr"/"ir", constraint for 32-bit signed integer
constant or register on movq.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-04-01 07:00:22 -07:00
Andreas Schwab
5ccea9a011 powerpc64le: Use ifunc for _Float128 functions also in libc
This fixes missing definition of math functions in libc in a static link
that are no longer built for libm after commit 4898d9712b ("Avoid adding
duplicated symbols into static libraries").
2021-04-01 10:55:42 +02:00
Stefan Liebler
01e0451175 S390: Allow "v" constraint for long double math_opt_barrier and math_force_eval with GCC 11.
Starting with GCC 11, long double values can also be processed in vector
registers if build with -march >= z14.  Then GCC defines the
__LONG_DOUBLE_VX__ macro.

FYI: GCC commit "IBM Z: Introduce __LONG_DOUBLE_VX__ macro"
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=f47df2af313d2ce7f9149149010a142c2237beda
2021-04-01 09:14:20 +02:00
Stefan Liebler
18f0afa848 Fix conform linknamespace tests due to gnu_dev_makedev
If building on s390 / i686 with -Os, various conformance
tests are failing with e.g.
conform/ISO/assert.h/linknamespace.out:
[initial] __assert_fail -> [libc.a(assert.o)] __dcgettext -> [libc.a(dcgettext.o)] __dcigettext -> [libc.a(dcigettext.o)] __getcwd -> [libc.a(getcwd.o)] __fstatat64 -> [libc.a(fstatat64.o)] gnu_dev_makedev

The usage of gnu_dev_makedev was recently introduced by
usage of the makedev makro in commit:
5b980d4809
linux: Use statx for MIPSn64

This patch is now linking against __gnu_dev_makedev as
also done in commit:
8b4a118222
Fix -Os gnu_dev_* linknamespace, localplt issues (bug 15105, bug 19463).
2021-03-31 16:10:14 +02:00
Adhemerval Zanella
42624c7dc7 Update sparc libm-tests-ulps
Required after db3f7bb558 "math: Remove slow paths from asin and
acos [BZ #15267]".
2021-03-30 14:04:11 -03:00
Siddhesh Poyarekar
abadbef5c8 Move __isnanf128 to libc.so
All of the isnan functions are in libc.so due to printf_fp, so move
__isnanf128 there too for consistency.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-03-30 14:58:19 +05:30
Samuel Thibault
64786a7090 fork.h: replace with register-atfork.h
UNREGISTER_ATFORK is now defined for all ports in register-atfork.h, so most
previous includes of fork.h actually only need register-atfork.h now, and
cxa_finalize.c does not need an ifdef UNREGISTER_ATFORK any more.

The nptl-specific fork generation counters can then go to pthreadP.h, and
fork.h be removed.

Checked on x86_64-linux-gnu and i686-gnu.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-03-29 21:41:09 +02:00
H.J. Lu
e4fda46310 x86-64: Use ZMM16-ZMM31 in AVX512 memmove family functions
Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
4e2d8f3527 x86-64: Use ZMM16-ZMM31 in AVX512 memset family functions
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort
with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
2021-03-29 07:40:17 -07:00
H.J. Lu
4bd660be40 x86: Add string/memory function tests in RTM region
At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort.   When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation.  Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.
2021-03-29 07:40:17 -07:00
H.J. Lu
7ebba91361 x86-64: Add AVX optimized string/memory functions for RTM
Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with

	xtest
	jz	1f
	vzeroall
	ret
1:
	vzeroupper
	ret

at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.
2021-03-29 07:40:17 -07:00