Commit Graph

14521 Commits

Author SHA1 Message Date
Wilco Dijkstra
ccfa865a82 math: Improve hypot performance with FMA
Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency.  The non-fma version has 30% higher throughput and 10%
higher latency.

Max ULP error is 0.949 with fma and 0.792 without fma.

Passes GLIBC testsuite.
2021-12-13 09:02:34 -03:00
Wilco Dijkstra
6c848d7038 math: Use an improved algorithm for hypot (dbl-64)
This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:

 - Handle qNaN and sNaN.
 - Tune the 'widely varying operands' to avoid spurious underflow
   due the multiplication and fix the return value for upwards
   rounding mode.
 - Handle required underflow exception for denormal results.

The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).

The performance result are also only slight worse than current
implementation.  On x86_64 (Ryzen 5900X) with gcc 12:

Before:

  "hypot": {
   "workload-random": {
    "duration": 3.73319e+09,
    "iterations": 1.12e+08,
    "reciprocal-throughput": 22.8737,
    "latency": 43.7904,
    "max-throughput": 4.37184e+07,
    "min-throughput": 2.28361e+07
   }
  }

After:

  "hypot": {
   "workload-random": {
    "duration": 3.7597e+09,
    "iterations": 9.8e+07,
    "reciprocal-throughput": 23.7547,
    "latency": 52.9739,
    "max-throughput": 4.2097e+07,
    "min-throughput": 1.88772e+07
   }
  }

Co-Authored-By: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

[1] https://arxiv.org/pdf/1904.09481.pdf
2021-12-13 09:02:34 -03:00
Adhemerval Zanella
7fe0ace3e2 math: Simplify hypotf implementation
Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float.  With gcc it
results in 2 FP comparisons.

The file Copyright is also changed to use  GPL, the implementation was
completely changed by 7c10fd3515 to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.

Checked on x86_64-linux-gnu.
2021-12-13 09:02:30 -03:00
Siddhesh Poyarekar
5afe4c0d69 Cleanup encoding in comments
Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8
equivalents so that files don't end up with mixed encodings.  With this,
all files (except tests that actually test different encodings) have a
single encoding.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-12-13 10:01:45 +05:30
Siddhesh Poyarekar
23645707f1 Replace --enable-static-pie with --disable-default-pie
Build glibc programs and tests as PIE by default and enable static-pie
automatically if the architecture and toolchain supports it.

Also add a new configuration option --disable-default-pie to prevent
building programs as PIE.

Only the following architectures now have PIE disabled by default
because they do not work at the moment.  hppa, ia64, alpha and csky
don't work because the linker is unable to handle a pcrel relocation
generated from PIE objects.  The microblaze compiler is currently
failing with an ICE.  GNU hurd tries to enable static-pie, which does
not work and hence fails.  All these targets have default PIE disabled
at the moment and I have left it to the target maintainers to enable PIE
on their targets.

build-many-glibcs runs clean for all targets.  I also tested x86_64 on
Fedora and Ubuntu, to verify that the default build as well as
--disable-default-pie work as expected with both system toolchains.

Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-12-13 08:08:59 +05:30
Samuel Thibault
556a6126f8 hurd: Add rules for static PIE build
This fixes [BZ #28671].
2021-12-12 00:42:13 +01:00
Samuel Thibault
26803075e4 hurd: Fix gmon-static
We need to use crt0 for gmon-static too.
2021-12-12 00:42:12 +01:00
H.J. Lu
ea5814467a x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]
Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since
the first PT_LOAD segment is no longer executable due to defaulting to
-z separate-code.

This fixes [BZ #28656].

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-10 14:01:34 -08:00
Florian Weimer
5cc3385654 nptl: Add one more barrier to nptl/tst-create1
Without the bar_ctor_finish barrier, it was possible that thread2
re-locked user_lock before ctor had a chance to lock it.  ctor then
blocked in its locking operation, xdlopen from the main thread
did not return, and thread2 was stuck waiting in bar_dtor:

thread 1: started.
thread 2: started.
thread 2: locked user_lock.
constructor started: 0.
thread 1: in ctor: started.
thread 3: started.
thread 3: done.
thread 2: unlocked user_lock.
thread 2: locked user_lock.

Fixes the test in commit 83b5323261
("elf: Avoid deadlock between pthread_create and ctors [BZ #28357]").

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-10 11:51:25 +01:00
Florian Weimer
627f5ede70 Remove TLS_TCB_ALIGN and TLS_INIT_TCB_ALIGN
TLS_INIT_TCB_ALIGN is not actually used.  TLS_TCB_ALIGN was likely
introduced to support a configuration where the thread pointer
has not the same alignment as THREAD_SELF.  Only ia64 seems to use
that, but for the stack/pointer guard, not for storing tcbhead_t.
Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift
the thread pointer, potentially landing in a different residue class
modulo the alignment, but the changes should not impact that.

In general, given that TLS variables have their own alignment
requirements, having different alignment for the (unshifted) thread
pointer and struct pthread would potentially result in dynamic
offsets, leading to more complexity.

hppa had different values before: __alignof__ (tcbhead_t), which
seems to be 4, and __alignof__ (struct pthread), which was 8
(old default) and is now 32.  However, it defines THREAD_SELF as:

/* Return the thread descriptor for the current thread.  */
# define THREAD_SELF \
  ({ struct pthread *__self;			\
	__self = __get_cr27();			\
	__self - 1;				\
   })

So the thread pointer points after struct pthread (hence __self - 1),
and they have to have the same alignment on hppa as well.

Similarly, on ia64, the definitions were different.  We have:

# define TLS_PRE_TCB_SIZE \
  (sizeof (struct pthread)						\
   + (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t)		\
      ? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1)	\
	 & ~(__alignof__ (struct pthread) - 1))				\
      : 0))
# define THREAD_SELF \
  ((struct pthread *) ((char *) __thread_self - TLS_PRE_TCB_SIZE))

And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment
(confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c).

On m68k, we have a larger gap between tcbhead_t and struct pthread.
But as far as I can tell, the port is fine with that.  The definition
of TCB_OFFSET is sufficient to handle the shifted TCB scenario.

This fixes commit 23c77f6018
("nptl: Increase default TCB alignment to 32").

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-09 23:47:49 +01:00
Florian Weimer
c901c3e764 nptl: Add public rseq symbols and <sys/rseq.h>
The relationship between the thread pointer and the rseq area
is made explicit.  The constant offset can be used by JIT compilers
to optimize rseq access (e.g., for really fast sched_getcpu).

Extensibility is provided through __rseq_size and __rseq_flags.
(In the future, the kernel could request a different rseq size
via the auxiliary vector.)

Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09 09:49:32 +01:00
Florian Weimer
e3e589829d nptl: Add glibc.pthread.rseq tunable to control rseq registration
This tunable allows applications to register the rseq area instead
of glibc.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-12-09 09:49:32 +01:00
Florian Weimer
1d350aa060 Linux: Use rseq to accelerate sched_getcpu
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09 09:49:32 +01:00
Florian Weimer
95e114a091 nptl: Add rseq registration
The rseq area is placed directly into struct pthread.  rseq
registration failure is not treated as an error, so it is possible
that threads run with inconsistent registration status.

<sys/rseq.h> is not yet installed as a public header.

Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-12-09 09:49:32 +01:00
Florian Weimer
8d1927d8dc nptl: Introduce THREAD_GETMEM_VOLATILE
This will be needed for rseq TCB access.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09 09:49:32 +01:00
Florian Weimer
ce2248ab91 nptl: Introduce <tcb-access.h> for THREAD_* accessors
These are common between most architectures.  Only the x86 targets
are outliers.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09 09:49:32 +01:00
Florian Weimer
8dbeb0561e nptl: Add <thread_pointer.h> for defining __thread_pointer
<tls.h> already contains a definition that is quite similar,
but it is not consistent across architectures.

Only architectures for which rseq support is added are covered.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09 09:49:32 +01:00
H.J. Lu
ceeffe968c x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
they won't lower CPU frequency when ZMM load and store instructions are
used.
2021-12-06 07:14:12 -08:00
Adhemerval Zanella
a329f68f2e linux: Add generic ioctl implementation
The powerpc is refactor to use the default implementation.
2021-12-06 08:03:18 -03:00
Adhemerval Zanella
00baddbb93 linux: Add generic syscall implementation
It allows also to remove hppa specific implementation and simplify
riscv implementation a bit.
2021-12-06 08:03:11 -03:00
Florian Weimer
4fb4e7e821 csu: Always use __executable_start in gmon-start.c
Current binutils defines __executable_start as the lowest text
address, so using the entry point address as a fallback is no
longer necessary.  As a result, overriding <entry.h> is only
necessary if the entry point is not called _start.

The previous approach to define __ASSEMBLY__ to suppress the
declaration breaks if headers included by <entry.h> are not
compatible with __ASSEMBLY__.  This happens with rseq integration
because it is necessary to include kernel headers in more places.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-05 13:50:21 +01:00
Florian Weimer
c1cb2deeca elf: execve statically linked programs instead of crashing [BZ #28648]
Programs without dynamic dependencies and without a program
interpreter are now run via execve.

Previously, the dynamic linker either crashed while attempting to
read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT
data), or the self-relocated in the static PIE executable crashed
because the outer dynamic linker had already applied RELRO protection.

<dl-execve.h> is needed because execve is not available in the
dynamic loader on Hurd.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-05 11:28:34 +01:00
Noah Goldstein
4df1fa6ddc x86-64: Use notl in EVEX strcmp [BZ #28646]
Must use notl %edi here as lower bits are for CHAR comparisons
potentially out of range thus can be 0 without indicating mismatch.
This fixes BZ #28646.

Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
2021-12-03 21:14:11 -08:00
Florian Weimer
23c77f6018 nptl: Increase default TCB alignment to 32
rseq support will use a 32-byte aligned field in struct pthread,
so the whole struct needs to have at least that alignment.

nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h>
to obtain the fallback definition.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-03 20:43:31 +01:00
Wilco Dijkstra
b31bd11454 AArch64: Improve A64FX memcpy
v2 is a complete rewrite of the A64FX memcpy. Performance is improved
by streamlining the code, aligning all large copies and using a single
unrolled loop for all sizes. The code size for memcpy and memmove goes
down from 1796 bytes to 868 bytes. Performance is better in all cases:
bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33%
faster for large sizes, bench-memcpy-walk is 25% faster for small sizes
and 20% for the largest sizes. The geomean of all tests in bench-memcpy
is 5.1% faster, and total time is reduced by 4%.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-02 18:36:03 +00:00
Wilco Dijkstra
b51eb35c57 AArch64: Optimize memcmp
Rewrite memcmp to improve performance. On small and medium inputs performance
is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per
iteration, which is 30-50% faster depending on the size.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-02 18:35:53 +00:00
Matheus Castanho
d120fb9941 powerpc64[le]: Fix CFI and LR save address for asm syscalls [BZ #28532]
Syscalls based on the assembly templates are missing CFI for r31, which gets
clobbered when scv is used, and info for LR is inaccurate, placed in the wrong
LOC and not using the proper offset. LR was also being saved to the callee's
frame, while the ABI mandates it to be saved to the caller's frame. These are
fixed by this commit.

After this change:

$ readelf -wF libc.so.6 | grep 0004b9d4.. -A 7 && objdump --disassemble=kill libc.so.6
00004a48 0000000000000020 00004a4c FDE cie=00000000 pc=000000000004b9d4..000000000004ba3c
   LOC           CFA      r31   ra
000000000004b9d4 r1+0     u     u
000000000004b9e4 r1+48    u     u
000000000004b9e8 r1+48    c-16  u
000000000004b9fc r1+48    c-16  c+16
000000000004ba08 r1+48    c-16
000000000004ba18 r1+48    u
000000000004ba1c r1+0     u

libc.so.6:     file format elf64-powerpcle

Disassembly of section .text:

000000000004b9d4 <kill>:
   4b9d4:       1f 00 4c 3c     addis   r2,r12,31
   4b9d8:       2c c3 42 38     addi    r2,r2,-15572
   4b9dc:       25 00 00 38     li      r0,37
   4b9e0:       d1 ff 21 f8     stdu    r1,-48(r1)
   4b9e4:       20 00 e1 fb     std     r31,32(r1)
   4b9e8:       98 8f ed eb     ld      r31,-28776(r13)
   4b9ec:       10 00 ff 77     andis.  r31,r31,16
   4b9f0:       1c 00 82 41     beq     4ba0c <kill+0x38>
   4b9f4:       a6 02 28 7d     mflr    r9
   4b9f8:       40 00 21 f9     std     r9,64(r1)
   4b9fc:       01 00 00 44     scv     0
   4ba00:       40 00 21 e9     ld      r9,64(r1)
   4ba04:       a6 03 28 7d     mtlr    r9
   4ba08:       08 00 00 48     b       4ba10 <kill+0x3c>
   4ba0c:       02 00 00 44     sc
   4ba10:       00 00 bf 2e     cmpdi   cr5,r31,0
   4ba14:       20 00 e1 eb     ld      r31,32(r1)
   4ba18:       30 00 21 38     addi    r1,r1,48
   4ba1c:       18 00 96 41     beq     cr5,4ba34 <kill+0x60>
   4ba20:       01 f0 20 39     li      r9,-4095
   4ba24:       40 48 23 7c     cmpld   r3,r9
   4ba28:       20 00 e0 4d     bltlr+
   4ba2c:       d0 00 63 7c     neg     r3,r3
   4ba30:       08 00 00 48     b       4ba38 <kill+0x64>
   4ba34:       20 00 e3 4c     bnslr+
   4ba38:       c8 32 fe 4b     b       2ed00 <__syscall_error>
        ...
   4ba44:       40 20 0c 00     .long 0xc2040
   4ba48:       68 00 00 00     .long 0x68
   4ba4c:       06 00 5f 5f     rlwnm   r31,r26,r0,0,3
   4ba50:       6b 69 6c 6c     xoris   r12,r3,26987
2021-11-30 15:18:52 -03:00
Adhemerval Zanella
efc6b2dbc4 linux: Implement pipe in terms of __NR_pipe2
The syscall pipe2 was added in linux 2.6.27 and glibc requires linux
3.2.0.  The patch removes the arch-specific implementation for alpha,
ia64, mips, sh, and sparc which requires a different kernel ABI
than the usual one.

Checked on x86_64-linux-gnu and with a build for the affected ABIs.
2021-11-30 13:13:03 -03:00
Adhemerval Zanella
5b3e31e312 linux: Implement mremap in C
Variadic function calls in syscalls.list does not work for all ABIs
(for instance where the argument are passed on the stack instead of
registers) and might have underlying issues depending of the variadic
type (for instance if a 64-bit argument is used).

Checked on x86_64-linux-gnu.
2021-11-30 13:13:03 -03:00
Adhemerval Zanella
83008fa495 linux: Add prlimit64 C implementation
The LFS prlimit64 requires a arch-specific implementation in
syscalls.list.  Instead add a generic one that handles the
required symbol alias for __RLIM_T_MATCHES_RLIM64_T.

HPPA is the only outlier which requires a different default
symbol.

Checked on x86_64-linux-gnu and with build for the affected ABIs.
2021-11-30 13:13:03 -03:00
Adhemerval Zanella
137ed5ac44 linux: Use /proc/stat fallback for __get_nprocs_conf (BZ #28624)
The /proc/statm fallback was removed by f13fb81ad3 if sysfs is
not available, reinstate it.

Checked on x86_64-linux-gnu.
2021-11-25 11:00:42 -03:00
Adhemerval Zanella
d150181d73 linux: Add fanotify_mark C implementation
Passing 64-bit arguments on syscalls.list is tricky: it requires
to reimplement the expected kernel abi in each architecture.  This
is way to better to represent in C code where we already have
macros for this (SYSCALL_LL64).

Checked on x86_64-linux-gnu.
2021-11-25 09:56:57 -03:00
Adhemerval Zanella
c3b023a782 linux: Only build fstatat fallback if required
For 32-bit architecture with __ASSUME_STATX there is no need to
build fstatat64_time64_stat.

Checked on i686-linux-gnu.
2021-11-25 09:28:27 -03:00
Sunil K Pandey
c58d3b7d00 x86-64: Add vector sin/sinf to libmvec microbenchmark
Add vector sin/sinf and input files to libmvec microbenchmark.

libmvec-sin-inputs:
  90% Normal random distribution
  range: (-DBL_MAX, DBL_MAX)
  mean: 0.0
  sigma: 5.0
  10% uniform random distribution in range (-1000.0, 1000.0)

libmvec-sinf-inputs:
  90% Normal random distribution
  range: (-FLT_MAX, FLT_MAX)
  mean: 0.0f
  sigma: 5.0f
  10% uniform random distribution in range (-1000.0f, 1000.0f)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:50:23 -08:00
Sunil K Pandey
6a556bac81 x86-64: Add vector pow/powf to libmvec microbenchmark
Add vector pow/powf and input files to libmvec microbenchmark.

libmvec-pow-inputs:
  arg1:
    90% Normal random distribution
    range: (0.0, 256.0)
    mean: 0.0
    sigma: 32.0
    10% uniform random distribution in range (0.0, 256.0)
  arg2:
    90% Normal random distribution
    range: (-127.0, 127.0)
    mean: 0.0
    sigma: 16.0
    10% uniform random distribution in range (-127.0, 127.0)

libmvec-powf-inputs:
  arg1:
    90% Normal random distribution
    range: (0.0f, 100.0f)
    mean: 0.0f
    sigma: 16.0f
    10% uniform random distribution in range (0.0f, 100.0f)
  arg2:
    90% Normal random distribution
    range: (-10.0f, 10.0f)
    mean: 0.0f
    sigma: 8.0f
    10% uniform random distribution in range (-10.0f, 10.0f)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:49:14 -08:00
Sunil K Pandey
8ab8afb336 x86-64: Add vector log/logf to libmvec microbenchmark
Add vector log/logf and input files to libmvec microbenchmark.

libmvec-log-inputs:
  70% Normal random distribution
  range: (0.0, DBL_MAX)
  mean: 1.0
  sigma: 50.0
  30% uniform random distribution in range (0.0, DBL_MAX)

libmvec-logf-inputs:
  70% Normal random distribution
  range: (0.0f, FLT_MAX)
  mean: 1.0f
  sigma: 50.0f
  30% uniform random distribution in range (0.0f, FLT_MAX)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:48:14 -08:00
Sunil K Pandey
37df38bd5f x86-64: Add vector exp/expf to libmvec microbenchmark
Add vector exp/expf and input files to libmvec microbenchmark.

libmvec-exp-inputs:
  90% Normal random distribution
  range: (-708.0, 709.0)
  mean: 0.0
  sigma: 16.0
  10% uniform random distribution in range (-500.0, 500.0)

libmvec-expf-inputs:
  90% Normal random distribution
  range: (-87.0f, 88.0f)
  mean: 0.0f
  sigma: 8.0f
  10% uniform random distribution in range (-50.0f, 50.0f)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:46:59 -08:00
Sunil K Pandey
4443695598 x86-64: Add vector cos/cosf to libmvec microbenchmark
Add vector cos/cosf and input files to libmvec microbenchmark.

libmvec-cos-inputs:
  90% Normal random distribution
  range: (-DBL_MAX, DBL_MAX)
  mean: 0.0
  sigma: 5.0
  10% uniform random distribution in range (-1000.0, 1000.0)

libmvec-cosf-inputs:
  90% Normal random distribution
  range: (-FLT_MAX, FLT_MAX)
  mean: 0.0f
  sigma: 5.0f
  10% uniform random distribution in range (-1000.0f, 1000.0f)

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-24 07:45:20 -08:00
Adhemerval Zanella
456b3c08b6 io: Refactor close_range and closefrom
Now that Hurd implementis both close_range and closefrom (f2c996597d),
we can make close_range() a base ABI, and make the default closefrom()
implementation on top of close_range().

The generic closefrom() implementation based on __getdtablesize() is
moved to generic close_range().  On Linux it will be overriden by
the auto-generation syscall while on Hurd it will be a system specific
implementation.

The closefrom() now calls close_range() and __closefrom_fallback().
Since on Hurd close_range() does not fail, __closefrom_fallback() is an
empty static inline function set by__ASSUME_CLOSE_RANGE.

The __ASSUME_CLOSE_RANGE also allows optimize Linux
__closefrom_fallback() implementation when --enable-kernel=5.9 or
higher is used.

Finally the Linux specific tst-close_range.c is moved to io and
enabled as default.  The Linuxism and CLOSE_RANGE_UNSHARE are
guarded so it can be built for Hurd (I have not actually test it).

Checked on x86_64-linux-gnu, i686-linux-gnu, and with a i686-gnu
build.
2021-11-24 09:09:37 -03:00
Florian Weimer
e186fc5a31 nptl: Do not set signal mask on second setjmp return [BZ #28607]
__libc_signal_restore_set was in the wrong place: It also ran
when setjmp returned the second time (after pthread_exit or
pthread_cancel).  This is observable with blocked pending
signals during thread exit.

Fixes commit b3cae39dcb
("nptl: Start new threads with all signals blocked [BZ #25098]").

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-11-24 08:59:54 +01:00
Adhemerval Zanella
aac54dcd37 powerpc: Define USE_PPC64_NOTOC iff compiler supports it
The @notoc usage only yields an advantage on ISA 3.1+ machine (power10)
and for ld.bfd also when it sees pcrel relocations used on the code
(generated if compiler targets ISA 3.1+).  On bfd case ISA 3.1+
instruction on stubs are used iff linker also sees the new pc-relative
relocations (for instance R_PPC64_D34), otherwise it generates default
stubs (ppc64_elf_check_relocs:4700).

This patch also help on linkers that do not implement this optimization,
since building for older ISA (such as 3.0 / power9) will also trigger
power10 stubs generation in the assembly code uses the NOTOC imacro.

Checked on powerpc64le-linux-gnu.

Reviewed-by: Fangrui Song <maskray@google.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
2021-11-22 14:49:11 -03:00
Adhemerval Zanella
bc801b3a40 setjmp: Replace jmp_buf-macros.h with jmp_buf-macros.sym
It requires less boilerplate code for newer ports.  The _Static_assert
checks from internal setjmp are moved to its own internal test since
setjmp.h is included early by multiple headers (to generate
rtld-sizes.sym).

The riscv jmp_buf-macros.h check is also redundant, it is already
done by riscv configure.ac.

Checked with a build for the affected architectures.
2021-11-22 13:43:22 -03:00
Joseph Myers
5c3ece451d Update kernel version to 5.15 in tst-mman-consts.py
This patch updates the kernel version in the test tst-mman-consts.py
to 5.15.  (There are no new MAP_* constants covered by this test in
5.15 that need any other header changes.)

Tested with build-many-glibcs.py.
2021-11-22 15:30:12 +00:00
Joseph Myers
bdeb7a8fa9 Add PF_MCTP, AF_MCTP from Linux 5.15 to bits/socket.h
Linux 5.15 adds a new address / protocol family PF_MCTP / AF_MCTP; add
these constants to bits/socket.h.

Tested for x86_64.
2021-11-17 14:25:16 +00:00
Florian Weimer
f1d333b5bf elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres
This will be used to deallocate memory allocated using the non-minimal
malloc.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-11-17 12:20:29 +01:00
Florian Weimer
8bd336a00a nptl: Extract <bits/atomic_wide_counter.h> from pthread_cond_common.c
And make it an installed header.  This addresses a few aliasing
violations (which do not seem to result in miscompilation due to
the use of atomics), and also enables use of wide counters in other
parts of the library.

The debug output in nptl/tst-cond22 has been adjusted to print
the 32-bit values instead because it avoids a big-endian/little-endian
difference.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-11-17 12:20:13 +01:00
Sunil K Pandey
a43c0b5483 x86-64: Create microbenchmark infrastructure for libmvec
Add python script to generate libmvec microbenchmark from the input
values for each libmvec function using skeleton benchmark template.

Creates double and float benchmarks with vector length 1, 2, 4, 8,
and 16 for each libmvec function.  Vector length 1 corresponds to
scalar version of function and is included for vector function perf
comparison.

Co-authored-by: Haochen Jiang <haochen.jiang@intel.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-16 11:37:39 -08:00
Noah Goldstein
2f9062d717 x86: Shrink memcmp-sse4.S code size
No bug.

This implementation refactors memcmp-sse4.S primarily with minimizing
code size in mind. It does this by removing the lookup table logic and
removing the unrolled check from (256, 512] bytes.

memcmp-sse4 code size reduction : -3487 bytes
wmemcmp-sse4 code size reduction: -1472 bytes

The current memcmp-sse4.S implementation has a large code size
cost. This has serious adverse affects on the ICache / ITLB. While
in micro-benchmarks the implementations appears fast, traces of
real-world code have shown that the speed in micro benchmarks does not
translate when the ICache/ITLB are not primed, and that the cost
of the code size has measurable negative affects on overall
application performance.

See https://research.google/pubs/pub48320/ for more details.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-10 20:12:10 -06:00
Joseph Myers
3387c40a8b Update syscall lists for Linux 5.15
Linux 5.15 has one new syscall, process_mrelease (and also enables the
clone3 syscall for RV32).  It also has a macro __NR_SYSCALL_MASK for
Arm, which is not a syscall but matches the pattern used for syscall
macro names.

Add __NR_SYSCALL_MASK to the names filtered out in the code dealing
with syscall lists, update syscall-names.list for the new syscall and
regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.

Tested with build-many-glibcs.py.
2021-11-10 15:21:19 +00:00
Florian Weimer
98966749f2 s390: Use long branches across object boundaries (jgh instead of jh)
Depending on the layout chosen by the linker, the 16-bit displacement
of the jh instruction is insufficient to reach the target label.

Analysis of the linker failure was carried out by Nick Clifton.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
2021-11-10 15:21:37 +01:00