Commit Graph

14572 Commits

Author SHA1 Message Date
Sunil K Pandey
3fc9ccc20b x86-64: Add vector exp2/exp2f implementation to libmvec
Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector exp2/exp2f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:29 -08:00
Sunil K Pandey
37475ba883 x86-64: Add vector hypot/hypotf implementation to libmvec
Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector hypot/hypotf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:21 -08:00
Sunil K Pandey
11c01de14c x86-64: Add vector asin/asinf implementation to libmvec
Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector asin/asinf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:03 -08:00
Sunil K Pandey
146310177a x86-64: Add vector atan/atanf implementation to libmvec
Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector atan/atanf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:36:46 -08:00
Florian Weimer
5d28a8962d elf: Add _dl_find_object function
It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).

_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed.  It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).

It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.

The implementation uses software transactional memory, as suggested
by Torvald Riegel.  Two copies of the supporting data structures are
used, also achieving full async-signal-safety.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-12-28 22:52:56 +01:00
Adhemerval Zanella
83b8d5027d malloc: Remove memusage.h
And use machine-sp.h instead.  The Linux implementation is based on
already provided CURRENT_STACK_FRAME (used on nptl code) and
STACK_GROWS_UPWARD is replaced with _STACK_GROWS_UP.
2021-12-28 14:57:57 -03:00
Adhemerval Zanella
a75b1e35c5 malloc: Use hp-timing on libmemusage
Instead of reimplemeting on GETTIME macro.
2021-12-28 14:57:57 -03:00
Adhemerval Zanella
92ff345137 Remove atomic-machine.h atomic typedefs
Now that memusage.c uses generic types we can remove them.
2021-12-28 14:57:57 -03:00
Adhemerval Zanella
5a5f7a160d malloc: Remove atomic_* usage
These typedef are used solely on memusage and can be replaced with
generic types.
2021-12-28 14:57:57 -03:00
Thomas Petazzoni
c75aa9246a microblaze: Add missing implementation when !__ASSUME_TIME64_SYSCALLS
In commit a92f4e6299 ("linux: Add time64
pselect support"), a Microblaze specific implementation of
__pselect32() was added to cover the case of kernels < 3.15 which lack
the pselect6 system call.

This new file sysdeps/unix/sysv/linux/microblaze/pselect32.c takes
precedence over the default implementation
sysdeps/unix/sysv/linux/pselect32.c.

However sysdeps/unix/sysv/linux/pselect32.c provides an implementation
of __pselect32() which is needed when __ASSUME_TIME64_SYSCALLS is not
defined. On Microblaze, which is a 32-bit architecture,
__ASSUME_TIME64_SYSCALLS is only true for kernels >= 5.1.

Due to sysdeps/unix/sysv/linux/microblaze/pselect32.c taking
precedence over sysdeps/unix/sysv/linux/pselect32.c, it means that
when we are with a kernel >= 3.15 but < 5.1, we need a __pselect32()
implementation, but sysdeps/unix/sysv/linux/microblaze/pselect32.c
doesn't provide it, and sysdeps/unix/sysv/linux/pselect32.c which
would provide it is not compiled in.

This causes the following build failure on Microblaze with for example
Linux kernel headers 4.9:

[...]/build/libc_pic.os: in function `__pselect64':
(.text+0x120b44): undefined reference to `__pselect32'
collect2: error: ld returned 1 exit status

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-12-28 09:09:49 -03:00
Adhemerval Zanella
8c0664e2b8 elf: Add _dl_audit_pltexit
It consolidates the code required to call la_pltexit audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-28 08:40:38 -03:00
Adhemerval Zanella
eff687e846 elf: Add _dl_audit_pltenter
It consolidates the code required to call la_pltenter audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-28 08:40:38 -03:00
Adhemerval Zanella
0b98a87487 elf: Add _dl_audit_preinit
It consolidates the code required to call la_preinit audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-28 08:40:38 -03:00
Adhemerval Zanella
cda4f265c6 elf: Add _dl_audit_symbind_alt and _dl_audit_symbind
It consolidates the code required to call la_symbind{32,64} audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-28 08:40:38 -03:00
Adhemerval Zanella
311c9ee54e elf: Add _dl_audit_objclose
It consolidates the code required to call la_objclose audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-28 08:40:38 -03:00
Adhemerval Zanella
c91008d349 elf: Add _dl_audit_objsearch
It consolidates the code required to call la_objsearch audit
callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-28 08:40:38 -03:00
Adhemerval Zanella
3dac3959a5 elf: Add _dl_audit_activity_map and _dl_audit_activity_nsid
It consolidates the code required to call la_activity audit
callback.

Also for a new Lmid_t the namespace link_map list are empty, so it
requires to check if before using it.  This can happen for when audit
module is used along with dlmopen.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-28 08:40:38 -03:00
Adhemerval Zanella
aee6e90f93 elf: Add _dl_audit_objopen
It consolidates the code required to call la_objopen audit callback.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-28 08:40:38 -03:00
Samuel Thibault
ae49f218da hurd: Fix static-PIE startup
hurd initialization stages use RUN_HOOK to run various initialization
functions.  That is however using absolute addresses which need to be
relocated, which is done later by csu.  We can however easily make the
linker compute relative addresses which thus don't need a relocation.
The new SET_RELHOOK and RUN_RELHOOK macros implement this.
2021-12-28 10:28:22 +01:00
Samuel Thibault
2ce0481d26 hurd: let csu initialize tls
Since 9cec82de71 ("htl: Initialize later"), we let csu initialize
pthreads. We can thus let it initialize tls later too, to better align
with the generic order.  Initialization however accesses ports which
links/unlinks into the sigstate for unwinding.  We can however easily
skip that during initialization.
2021-12-28 10:15:52 +01:00
Samuel Thibault
7b358de1af hurd: Fix XFAIL-ing mallocfork2 tests
They are using setpshared but are outside the htl directory.
2021-12-27 22:21:08 +01:00
Samuel Thibault
1c6e6e52e5 hurd: XFAIL more tests that require setpshared support 2021-12-27 22:15:43 +01:00
Noah Goldstein
cca457f9c5 x86: Optimize L(less_vec) case in memcmpeq-evex.S
No bug.
Optimizations are twofold.

1) Replace page cross and 0/1 checks with masked load instructions in
   L(less_vec). In applications this reduces branch-misses in the
   hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.

Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range.  From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-27 03:18:58 -06:00
Noah Goldstein
abddd61de0 x86: Optimize L(less_vec) case in memcmp-evex-movbe.S
No bug.
Optimizations are twofold.

1) Replace page cross and 0/1 checks with masked load instructions in
   L(less_vec). In applications this reduces branch-misses in the
   hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.

Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range.  From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-27 03:17:59 -06:00
Adhemerval Zanella
a4b4131355 Set default __TIMESIZE default to 64
This is expected size for newer ABIs.
2021-12-23 11:41:08 -03:00
Sunil K Pandey
f20f980c71 x86-64: Add vector acos/acosf implementation to libmvec
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-22 13:03:14 -08:00
H.J. Lu
d3e4f5a101 s_sincosf.h: Change pio4 type to float [BZ #28713]
s_cosf.c and s_sinf.c have

  if (abstop12 (y) < abstop12 (pio4))

where abstop12 takes a float argument, but pio4 is static const double.
pio4 is used only in calls to abstop12 and never in arithmetic.  Apply

-static const double pio4 = 0x1.921FB54442D18p-1;
+static const float pio4 = 0x1.921FB6p-1f;

to fix:

FAIL: math/test-float-cos
FAIL: math/test-float-sin
FAIL: math/test-float-sincos
FAIL: math/test-float32-cos
FAIL: math/test-float32-sin
FAIL: math/test-float32-sincos

when compiling with GCC 12.

Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-12-21 08:56:12 -08:00
maminjie
e0fc721ce6 Linux: Fix 32-bit vDSO for clock_gettime on powerpc32
When the clock_id is CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID,
on the 5.10 kernel powerpc 32-bit, the 32-bit vDSO is executed successfully (
because the __kernel_clock_gettime in arch/powerpc/kernel/vdso32/gettimeofday.S
does not support these two IDs, the 32-bit time_t syscall will be used),
but tp32.tv_sec is equal to 0, causing the 64-bit time_t syscall to continue to be used,
resulting in two system calls.

Fix commit 72e84d1db2.

Signed-off-by: maminjie  <maminjie2@huawei.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-12-21 09:47:16 -03:00
H.J. Lu
de8a0897e3 Regenerate ulps on x86_64 with GCC 12
Fix

FAIL: math/test-float-clog10
FAIL: math/test-float32-clog10

on Intel Core i7-1165G7 with GCC 12.
2021-12-20 15:25:00 -08:00
Joseph Myers
a94d9659cd Add ARPHRD_CAN, ARPHRD_MCTP to net/if_arp.h
Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along
with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit
cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol
numbers for PF_CAN") but apparently missed for glibc at the time.

Tested for x86_64.
2021-12-20 15:38:32 +00:00
Adhemerval Zanella
691d9ae9e6 Remove ununsed tcb-offset
Some architectures do not use the auto-generated tcb-offsets.h.
2021-12-17 17:47:29 -03:00
Aurelien Jarno
225da459ce riscv: align stack before calling _dl_init [BZ #28703]
Align the stack pointer to 128 bits during the call to _dl_init() as
specified by the RISC-V ABI [1]. This fixes the elf/tst-align2 test.

Fixes bug 28703.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc
2021-12-17 20:29:34 +01:00
Aurelien Jarno
d2e594d715 riscv: align stack in clone [BZ #28702]
The RISC-V ABI [1] mandates that "the stack pointer shall be aligned to
a 128-bit boundary upon procedure entry". This as not the case in clone.

This fixes the misc/tst-misalign-clone-internal and
misc/tst-misalign-clone tests.

Fixes bug 28702.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc
2021-12-17 20:29:32 +01:00
Aurelien Jarno
94058f6cde elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704]
On KVM guests running on some AMD systems, the IBRS feature is reported
as a synthetic feature using the Intel feature, while the cpuinfo entry
keeps the same. Handle that by first checking the presence of the Intel
feature on AMD systems.

Fixes bug 28704.
2021-12-17 20:20:15 +01:00
Matheus Castanho
ae91d3df24 powerpc64[le]: Allocate extra stack frame on syscall.S
The syscall function does not allocate the extra stack frame for scv like other
assembly syscalls using DO_CALL_SCV. So after commit d120fb9941 changed the
offset that is used to save LR, syscall ended up using an invalid offset,
causing regressions on powerpc64. So make sure the extra stack frame is
allocated in syscall.S as well to make it consistent with other uses of
DO_CALL_SCV and avoid similar issues in the future.

Tested on powerpc, powerpc64, and powerpc64le (with and without scv)

Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
2021-12-17 15:40:53 -03:00
Florian Weimer
ce1e5b1122 arm: Guard ucontext _rtld_global_ro access by SHARED, not PIC macro
Due to PIE-by-default, PIC is now defined in more cases.  libc.a
does not have _rtld_global_ro, and statically linking setcontext
fails.  SHARED is the right condition to use, so that libc.a
references _dl_hwcap instead of _rtld_global_ro.

For static PIE support, the !SHARED case would still have to be made
PIC.  This patch does not achieve that.

Fixes commit 23645707f1
("Replace --enable-static-pie with --disable-default-pie").

Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-17 11:48:44 +01:00
Adhemerval Zanella
98d5fcb8d0 malloc: Add Huge Page support for mmap
With the morecore hook removed, there is not easy way to provide huge
pages support on with glibc allocator without resorting to transparent
huge pages.  And some users and programs do prefer to use the huge pages
directly instead of THP for multiple reasons: no splitting, re-merging
by the VM, no TLB shootdowns for running processes, fast allocation
from the reserve pool, no competition with the rest of the processes
unlike THP, no swapping all, etc.

This patch extends the 'glibc.malloc.hugetlb' tunable: the value
'2' means to use huge pages directly with the system default size,
while a positive value means and specific page size that is matched
against the supported ones by the system.

Currently only memory allocated on sysmalloc() is handled, the arenas
still uses the default system page size.

To test is a new rule is added tests-malloc-hugetlb2, which run the
addes tests with the required GLIBC_TUNABLE setting.  On systems without
a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB)
allocation failure.  To improve test coverage it is required to create
a pool with some allocated pages.

Checked on x86_64-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2021-12-15 17:35:38 -03:00
Adhemerval Zanella
5f6d8d97c6 malloc: Add madvise support for Transparent Huge Pages
Linux Transparent Huge Pages (THP) current supports three different
states: 'never', 'madvise', and 'always'.  The 'never' is
self-explanatory and 'always' will enable THP for all anonymous
pages.  However, 'madvise' is still the default for some system and
for such case THP will be only used if the memory range is explicity
advertise by the program through a madvise(MADV_HUGEPAGE) call.

To enable it a new tunable is provided, 'glibc.malloc.hugetlb',
where setting to a value diffent than 0 enables the madvise call.

This patch issues the madvise(MADV_HUGEPAGE) call after a successful
mmap() call at sysmalloc() with sizes larger than the default huge
page size.  The madvise() call is disable is system does not support
THP or if it has the mode set to "never" and on Linux only support
one page size for THP, even if the architecture supports multiple
sizes.

To test is a new rule is added tests-malloc-hugetlb1, which run the
addes tests with the required GLIBC_TUNABLE setting.

Checked on x86_64-linux-gnu.

Reviewed-by: DJ Delorie <dj@redhat.com>
2021-12-15 17:35:14 -03:00
Florian Weimer
cb976fba4c powerpc: Use global register variable in <thread_pointer.h>
A local register variable is merely a compiler hint, and so not
appropriate in this context.  Move the global register variable into
<thread_pointer.h> and include it from <tls.h>, as there can only
be one global definition for one particular register.

Fixes commit 8dbeb0561e
("nptl: Add <thread_pointer.h> for defining __thread_pointer").

Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
2021-12-15 16:06:25 +01:00
H.J. Lu
4435c29892 Support target specific ALIGN for variable alignment test [BZ #28676]
Add <tst-file-align.h> to support target specific ALIGN for variable
alignment test:

1. Alpha: Use 0x10000.
2. MicroBlaze and Nios II: Use 0x8000.
3. All others: Use 0x200000.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-12-14 14:50:33 -08:00
Samuel Thibault
ec06717856 hurd: Do not set PIE_UNSUPPORTED
This is now supported.
2021-12-14 08:38:05 +01:00
Akila Welihinda
3b1402b3fc sysdeps: Simplify sin Taylor Series calculation
The macro TAYLOR_SIN adds the term `-0.5*da*a^2 + da` in hopes
of regaining some precision as a function of da. However the
comment says we add the term `-0.5*da*a^2 + 0.5*da` which is
different. This fix updates the comment to reflect the
code and also simplifies the calculation by replacing `a` with `x`
because they always have the same value.

Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu>
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-12-13 15:31:05 +01:00
Adhemerval Zanella
104d2005d5 math: Remove the error handling wrapper from hypot and hypotf
The error handling is moved to sysdeps/ieee754 version with no SVID
support.  The compatibility symbol versions still use the wrapper with
SVID error handling around the new code.  There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
2021-12-13 10:08:46 -03:00
Wilco Dijkstra
2f44eef584 math: Use fmin/fmax on hypot
It optimizes for architectures that provides fast builtins.

Checked on aarch64-linux-gnu.
2021-12-13 10:08:46 -03:00
Adhemerval Zanella
ecb94e9587 aarch64: Add math-use-builtins-f{max,min}.h
It allows to remove the arch-specific implementations.
2021-12-13 10:08:46 -03:00
Adhemerval Zanella
583c4d424e math: Add math-use-builtinds-fmin.h
It allows the architecture to use the builtin instead of generic
implementation.
2021-12-13 10:08:43 -03:00
Adhemerval Zanella
72ab1eaec7 math: Add math-use-builtinds-fmax.h
It allows the architecture to use the builtin instead of generic
implementation.
2021-12-13 09:08:07 -03:00
Adhemerval Zanella
2eb1cd2f47 math: Remove powerpc e_hypot
The generic implementation is shows only slight worse performance:

POWER10    reciprocal-throughput    latency
master                   8.28478    13.7253
new hypot                7.21945    13.1933

POWER9     reciprocal-throughput    latency
master                   13.4024    14.0967
new hypot                14.8479    15.8061

POWER8     reciprocal-throughput    latency
master                   15.5767    16.8885
new hypot                16.5371    18.4057

One way to improve might to make gcc generate xsmaxdp/xsmindp for
fmax/fmin (it onl does for -ffast-math, clang does for default
options).

Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
2021-12-13 09:08:07 -03:00
Adhemerval Zanella
a1d3c9b642 i386: Move hypot implementation to C
The generic hypotf is slight slower, mostly due the tricks the assembly
does to optimize the isinf/isnan/issignaling.  The generic hypot is way
slower, since the optimized implementation uses the i386 default
excessive precision to issue the operation directly.  A similar
implementation is provided instead of using the generic implementation:

Checked on i686-linux-gnu.
2021-12-13 09:08:02 -03:00
Adhemerval Zanella
c212d6397e math: Use an improved algorithm for hypotl (ldbl-128)
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:

  - Handle qNaN and sNaN.
  - Tune the 'widely varying operands' to avoid spurious underflow
    due the multiplication and fix the return value for upwards
    rounding mode.
  - Handle required underflow exception for subnormal results.

The main advantage of the new algorithm is its precision.  With a
random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.05% results with an error of
1 ulp (453266 results) while the new implementation only shows
0.0001% of total (1280).

Checked on aarch64-linux-gnu and x86_64-linux-gnu.

[1] https://arxiv.org/pdf/1904.09481.pdf
2021-12-13 09:02:34 -03:00