This commit contains following formatting changes
1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
between it and the first operand.
3. Instruction greater than 7 characters in length have a
space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
This commit contains following formatting changes
1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
between it and the first operand.
3. Instruction greater than 7 characters in length have a
space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
8. 1 space between line content and line comment.
9. Space after all commas.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
This commit contains following formatting changes
1. Instructions proceeded by a tab.
2. Instruction less than 8 characters in length have a tab
between it and the first operand.
3. Instruction greater than 7 characters in length have a
space between it and the first operand.
4. Tabs after `#define`d names and their value.
5. 8 space at the beginning of line replaced by tab.
6. Indent comments with code.
7. Remove redundent .text section.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Remove libc-do-syscall from sysdep-dl-routines added by:
commit 3b33d6ed60
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Sun Jan 8 11:38:23 2017 -0200
Rework -fno-omit-frame-pointer support on i386
and use auto-generated io/rtld-libc-do-syscall.os instead. This fixes
BZ #28936.
And optimize it slightly.
This is commit 8c8510ab27 revised.
In _dl_aux_init in elf/dl-support.c, use an explicit loop
and -fno-tree-loop-distribute-patterns to avoid memset.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
1. Also generate .d dependency files for $(tests-container) and
$(tests-printers).
2. elf: Add tst-auditmod17.os to extra-test-objs.
3. iconv: Add tst-gconv-init-failure-mod.os to extra-test-objs.
4. malloc: Rename extra-tests-objs to extra-test-objs.
5. linux: Add tst-sysconf-iov_max-uapi.o to extra-test-objs.
6. x86_64: Add tst-x86_64mod-1.o, tst-platformmod-2.o, test-libmvec.o,
test-libmvec-avx.o, test-libmvec-avx2.o and test-libmvec-avx512f.o to
extra-test-objs.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The symbol is not present current POSIX specification and compiler
already generates memset call. The arch specific implementation
is just to avoid the __bzero symbol creation (which ia64 abi does
not export).
This change fixes two warnings from _dl_lookup_address.
The first warning comes from dropping the volatile keyword from
desc in the call to _dl_read_access_allowed. We now have a full
atomic barrier between loading desc[0] and the access check, so
desc no longer needs to be declared as volatile.
The second warning comes from the implicit declaration of
_dl_fix_reloc_arg. This is fixed by including dl-runtime.h and
declaring _dl_fix_reloc_arg in dl-runtime.h.
The current getcontext return trampoline is overly complex and it
unnecessarily clobbers several registers. By saving the context
pointer (r26) in the context, __getcontext_ret can restore any
registers not restored by setcontext. This allows getcontext to
save and restore the entire register context present when getcontext
is entered. We use the unused oR0 context slot for the return
from __getcontext_ret.
While this is not directly useful in C, it can be exploited in
assembly code. Registers r20, r23, r24 and r25 are not clobbered
in the call path to getcontext. This allows a small simplification
of swapcontext.
It also allows saving and restoring the 6-bit SAR register in the
LSB of the oSAR context slot. The getcontext flag value can be
stored in the MSB of the oSAR slot.
Previously TEST_NAME was passing a function pointer. This didn't fail
because of the -Wno-error flag (to allow for overflow sizes passed
to strncmp/wcsncmp)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would
call strcmp-avx2 and wcscmp-avx2 respectively. This would have
not checks around vzeroupper and would trigger spurious
aborts. This commit fixes that.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on
AVX2 machines with and without RTM.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This change fixes the failure of stdlib/tst-setcontext2 and
stdlib/tst-setcontext7 on hppa. The implementation of swapcontext
in C is broken. C saves the return pointer (rp) and any non
call-clobbered registers (in this case r3, r4 and r5) on the
stack. However, the setcontext call in swapcontext pops the
stack and subsequent calls clobber the saved registers. When
the context in oucp is restored, both tests fault.
Here we rewrite swapcontext in assembly code to avoid using
the stack for register values that need to be used after
restoration. The getcontext and setcontext routines are
revised to save and restore register ret1 for normal returns.
We copy the oucp pointer to ret1. This allows access to
the old context after calling getcontext and setcontext.
In the overflow fallback strncmp-avx2-rtm and wcsncmp-avx2-rtm would
call strcmp-avx2 and wcscmp-avx2 respectively. This would have
not checks around vzeroupper and would trigger spurious
aborts. This commit fixes that.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass on
AVX2 machines with and without RTM.
Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
On Microblaze only __NR_newselect is implemented, even though kernel
advertise __NR_select on asm/unistd.h. Since microblaze is the
only architecture that undef __ASSUME_PSELECT, the generic code
change is simpler than chaging the architecture syscall number.
Acked-by: Mark Hatle <mark.hatle@xilinx.com>
This patch updates the kernel version in the test tst-mman-consts.py
to 5.16. (There are no new MAP_* constants covered by this test in
5.16 that need any other header changes.)
Tested with build-many-glibcs.py.
The __sem_check_add_mapping internal stat calls fails with
EOVERFLOW if system time is larger than 32 bit.
It is a missing spot from 52a5fe70a2 fix to use 64 bit stat
internally.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Logic can read before the start of `s1` / `s2` if both `s1` and `s2`
are near the start of a page. To avoid having the result contimated by
these comparisons the `strcmp` variants would mask off these
comparisons. This was missing in the `strncmp` variants causing
the bug. This commit adds the masking to `strncmp` so that out of
range comparisons don't affect the result.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass as
well a full xcheck on x86_64 linux.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Since __pthread_key_create might be concurrently reallocating the
__pthread_key_destructors array, it's not safe to access it without the
mutex held. Posix explicitly says we are allowed to prefer performance
over error detection.
commit 3d9f171bfb
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Mon Feb 7 05:55:15 2022 -0800
x86-64: Optimize bzero
added the optimized bzero. Remove bzero weak alias in SS2 memset to
avoid undefined __bzero in memset-sse2-unaligned-erms.
The kernel header might not define the SO_TIMESTAMP{NS}_OLD or
SO_TIMESTAMP{NS}_NEW if it older than v5.1.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
The test elf/tst-audit2 fails on hppa with a segmentation fault in the
long branch stub used to call malloc from calloc. This occurs because
the test is not a PIC executable and calloc is called from the dynamic
linker before the dp register is initialized in _dl_start_user.
The fix is to move the dp register initialization into
elf_machine_runtime_setup. Since the address of $global$ can't be
loaded directly, we continue to use the DT_PLTGOT value from the
the main_map to initialize dp.
commit 3d9f171bfb
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Mon Feb 7 05:55:15 2022 -0800
x86-64: Optimize bzero
Remove setting the .text section for the code. This commit
adds that back.
Otherwise, <dl-auxv.h> on POWER ends up being included twice,
once in dl-sysdep.c, once in dl-support.c. That leads to a linker
failure due to multiple definitions of _dl_cache_line_size.
Fixes commit d96d2995c1
("Revert "Linux: Consolidate auxiliary vector parsing").
This reverts commit 8c8510ab27. The
revert is not perfect because the commit included a bug fix for
_dl_sysdep_start with an empty argv, introduced in commit
2d47fa6862 ("Linux: Remove
DL_FIND_ARG_COMPONENTS"), and this bug fix is kept.
The revert is necessary because the reverted commit introduced an
early memset call on aarch64, which leads to crash due to lack of TCB
initialization.
Prelinked binaries and libraries still work, the dynamic tags
DT_GNU_PRELINKED, DT_GNU_LIBLIST, DT_GNU_CONFLICT just ignored
(meaning the process is reallocated as default).
The loader environment variable TRACE_PRELINKING is also removed,
since it used solely on prelink.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
And optimize it slightly.
The large switch statement in _dl_sysdep_start can be replaced with
a large array. This reduces source code and binary size. On
i686-linux-gnu:
Before:
text data bss dec hex filename
7791 12 0 7803 1e7b elf/dl-sysdep.os
After:
text data bss dec hex filename
7135 12 0 7147 1beb elf/dl-sysdep.os
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The generic version is the de-facto Linux implementation. It
requires an auxiliary vector, so Hurd does not use it.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
On hppa, a function pointer returned by la_symbind is actually a function
descriptor has the plabel bit set (bit 30). This must be cleared to get
the actual address of the descriptor. If the descriptor has been bound,
the first word of the descriptor is the physical address of theA function,
otherwise, the first word of the descriptor points to a trampoline in the
PLT.
This patch also adds a workaround on tests because on hppa (and it seems
to be the only ABI I have see it), some shared library adds a dynamic PLT
relocation to am empty symbol name:
$ readelf -r elf/tst-audit25mod1.so
[...]
Relocation section '.rela.plt' at offset 0x464 contains 6 entries:
Offset Info Type Sym.Value Sym. Name + Addend
00002008 00000081 R_PARISC_IPLT 508
[...]
It breaks some assumptions on the test, where a symbol with an empty
name ("") is passed on la_symbind.
Checked on x86_64-linux-gnu and hppa-linux-gnu.
memset with zero as the value to set is by far the majority value (99%+
for Python3 and GCC).
bzero can be slightly more optimized for this case by using a zero-idiom
xor for broadcasting the set value to a register (vector or GPR).
Co-developed-by: Noah Goldstein <goldstein.w.n@gmail.com>
get_nprocs() and get_nprocs_conf() use various methods to obtain an
accurate number of processors. Re-introduce __get_nprocs_sched() as
a source of information, and fix the order in which these methods are
used to return the most accurate information. The primary source of
information used in both functions remains unchanged.
This also changes __get_nprocs_sched() error return value from 2 to 0,
but all its users are already prepared to handle that.
Old fallback order:
get_nprocs:
/sys/devices/system/cpu/online -> /proc/stat -> 2
get_nprocs_conf:
/sys/devices/system/cpu/ -> /proc/stat -> 2
New fallback order:
get_nprocs:
/sys/devices/system/cpu/online -> /proc/stat -> sched_getaffinity -> 2
get_nprocs_conf:
/sys/devices/system/cpu/ -> /proc/stat -> sched_getaffinity -> 2
Fixes: 342298278e ("linux: Revert the use of sched_getaffinity on get_nproc")
Closes: BZ #28865
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
commit b62ace2740
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date: Sun Feb 6 00:54:18 2022 -0600
x86: Improve vec generation in memset-vec-unaligned-erms.S
Revert usage of 'pshufb' in broadcast logic as it is an SSSE3
instruction and memset.S is restricted to only SSE2 instructions.
No bug.
Split vec generation into multiple steps. This allows the
broadcast in AVX2 to use 'xmm' registers for the L(less_vec)
case. This saves an expensive lane-cross instruction and removes
the need for 'vzeroupper'.
For SSE2 replace 2x 'punpck' instructions with zero-idiom 'pxor' for
byte broadcast.
Results for memset-avx2 small (geomean of N = 20 benchset runs).
size, New Time, Old Time, New / Old
0, 4.100, 3.831, 0.934
1, 5.074, 4.399, 0.867
2, 4.433, 4.411, 0.995
4, 4.487, 4.415, 0.984
8, 4.454, 4.396, 0.987
16, 4.502, 4.443, 0.987
All relevant string/wcsmbs tests are passing.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector tan/tanf and input files to libmvec microbenchmark.
libmvec-tan-inputs:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 5.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-tanf-inputs:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 5.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector erfc/erfcf and input files to libmvec microbenchmark.
libmvec-erfc-inputs:
90% Normal random distribution
range: (-6.0, 6.0)
mean: 0.0
sigma: 1.0
10% uniform random distribution in range (-5.9, 5.9)
libmvec-erfcf-inputs:
90% Normal random distribution
range: (-4.0f, 4.0f)
mean: 0.0f
sigma: 1.0f
10% uniform random distribution in range (-3.9f, 3.9f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector asinh/asinhf and input files to libmvec microbenchmark.
libmvec-asinh-inputs:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 2.0
10% uniform random distribution in range (-1.0e6, 1.0e6)
libmvec-asinhf-inputs:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 2.0f
10% uniform random distribution in range (-1.0e6f, 1.0e6f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector tanh/tanhf and input files to libmvec microbenchmark.
libmvec-tanh-inputs:
90% Normal random distribution
range: (-19.0, 19.0)
mean: 0.0
sigma: 2.0
10% uniform random distribution in range (-16.0, 16.0)
libmvec-tanhf-inputs:
90% Normal random distribution
range: (-10.0f, 10.0f)
mean: 0.0f
sigma: 2.0f
10% uniform random distribution in range (-8.0f, 8.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector erf/erff and input files to libmvec microbenchmark.
libmvec-erf-inputs:
90% Normal random distribution
range: (-6.0, 6.0)
mean: 0.0
sigma: 1.0
10% uniform random distribution in range (-5.9, 5.9)
libmvec-erff-inputs:
90% Normal random distribution
range: (-4.0f, 4.0f)
mean: 0.0f
sigma: 1.0f
10% uniform random distribution in range (-3.9f, 3.9f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector acosh/acoshf and input files to libmvec microbenchmark.
libmvec-acosh-inputs:
90% Normal random distribution
range: (1.0, DBL_MAX)
mean: 1.0
sigma: 8.0
10% uniform random distribution in range (1.0, 1.0e6)
libmvec-acoshf-inputs:
90% Normal random distribution
range: (1.0f, FLT_MAX)
mean: 1.0f
sigma: 4.0f
10% uniform random distribution in range (1.0f, 1.0e6f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector atanh/atanhf and input files to libmvec microbenchmark.
libmvec-atanh-inputs:
90% Normal random distribution
range: (-1.0, 1.0)
mean: 0.0
sigma: 1.0
10% uniform random distribution in range (-1.0, 1.0)
libmvec-atanhf-inputs:
90% Normal random distribution
range: (-1.0f, 1.0f)
mean: 0.0f
sigma: 1.0f
10% uniform random distribution in range (-1.0f, 1.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector log1p/log1pf and input files to libmvec microbenchmark.
libmvec-log1p-inputs:
70% Normal random distribution
range: (-1.0, DBL_MAX)
mean: 0.0
sigma: 50.0
30% uniform random distribution in range (-1.0, 1.0e6)
libmvec-log1pf-inputs:
70% Normal random distribution
range: (-1.0f, FLT_MAX)
mean: 0.0f
sigma: 50.0f
30% uniform random distribution in range (-1.0f, 1.0e6f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector log2/log2f and input files to libmvec microbenchmark.
libmvec-log2-inputs:
70% Normal random distribution
range: (0.0, DBL_MAX)
mean: 1.0
sigma: 50.0
30% uniform random distribution in range (0.0, 1.0e6)
libmvec-log2f-inputs:
70% Normal random distribution
range: (0.0f, FLT_MAX)
mean: 1.0f
sigma: 50.0f
30% uniform random distribution in range (0.0f, 1.0e6f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector log10/log10f and input files to libmvec microbenchmark.
libmvec-log10-inputs:
70% Normal random distribution
range: (0.0, DBL_MAX)
mean: 1.0
sigma: 50.0
30% uniform random distribution in range (0.0, 1.0e6)
libmvec-log10f-inputs:
70% Normal random distribution
range: (0.0f, FLT_MAX)
mean: 1.0f
sigma: 50.0f
30% uniform random distribution in range (0.0f, 1.0e6f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector atan2/atan2f and input files to libmvec microbenchmark.
libmvec-atan2-inputs:
arg1:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 4.0
10% uniform random distribution in range (-1.0e6, 1.0e6)
arg2:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 4.0
10% uniform random distribution in range (-1.0e6, 1.0e6)
libmvec-atan2f-inputs:
arg1:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 4.0f
10% uniform random distribution in range (-1.0e6f, 1.0e6f)
arg2:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 4.0f
10% uniform random distribution in range (-1.0e6f, 1.0e6f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector cbrt/cbrtf and input files to libmvec microbenchmark.
libmvec-cbrt-inputs:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 10.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-cbrtf-inputs:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 10.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector sinh/sinhf and input files to libmvec microbenchmark.
libmvec-sinh-inputs:
90% Normal random distribution
range: (-710.0, 710.0)
mean: 0.0
sigma: 32.0
10% uniform random distribution in range (-500.0, 500.0)
libmvec-sinhf-inputs:
90% Normal random distribution
range: (-89.0f, 89.0f)
mean: 0.0f
sigma: 16.0f
10% uniform random distribution in range (-50.0f, 50.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector expm1/expm1f and input files to libmvec microbenchmark.
libmvec-expm1-inputs:
90% Normal random distribution
range: (-708.0, 709.0)
mean: 0.0
sigma: 16.0
10% uniform random distribution in range (-500.0, 500.0)
libmvec-expm1f-inputs:
90% Normal random distribution
range: (-87.0f, 88.0f)
mean: 0.0f
sigma: 8.0f
10% uniform random distribution in range (-50.0f, 50.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector cosh/coshf and input files to libmvec microbenchmark.
libmvec-cosh-inputs:
90% Normal random distribution
range: (-710.0, 710.0)
mean: 0.0
sigma: 32.0
10% uniform random distribution in range (-500.0, 500.0)
libmvec-coshf-inputs:
90% Normal random distribution
range: (-89.0f, 89.0f)
mean: 0.0f
sigma: 16.0f
10% uniform random distribution in range (-50.0f, 50.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector exp10/exp10f and input files to libmvec microbenchmark.
libmvec-exp10-inputs:
90% Normal random distribution
range: (-307.0, 308.0)
mean: 0.0
sigma: 16.0
10% uniform random distribution in range (-250.0, 250.0)
libmvec-exp10f-inputs:
90% Normal random distribution
range: (-37.0f, 38.0f)
mean: 0.0f
sigma: 8.0f
10% uniform random distribution in range (-25.0f, 25.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector exp2/exp2f and input files to libmvec microbenchmark.
libmvec-exp2-inputs:
90% Normal random distribution
range: (-1022.0, 1024.0)
mean: 0.0
sigma: 16.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-exp2f-inputs:
90% Normal random distribution
range: (-126.0f, 128.0f)
mean: 0.0f
sigma: 8.0f
10% uniform random distribution in range (-100.0f, 100.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector hypot/hypotf and input files to libmvec microbenchmark.
libmvec-hypot-inputs:
arg1:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 10.0
10% uniform random distribution in range (-1000.0, 1000.0)
arg1:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 10.0
10% uniform random distribution in range (-1000.0, 1000.0)
libmvec-hypotf-inputs:
arg1:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 10.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
arg2:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 10.0f
10% uniform random distribution in range (-1000.0f, 1000.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector asin/asinf and input files to libmvec microbenchmark.
libmvec-asin-inputs:
90% Normal random distribution
range: (-1.0, 1.0)
mean: 0.0
sigma: 1.0
10% uniform random distribution in range (-1.0, 1.0)
libmvec-asinf-inputs:
90% Normal random distribution
range: (-1.0f, 1.0f)
mean: 0.0f
sigma: 1.0f
10% uniform random distribution in range (-1.0f, 1.0f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector atan/atanf and input files to libmvec microbenchmark.
libmvec-atan-inputs:
arg1:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 4.0
10% uniform random distribution in range (-1.0e6, 1.0e6)
arg2:
90% Normal random distribution
range: (-DBL_MAX, DBL_MAX)
mean: 0.0
sigma: 4.0
10% uniform random distribution in range (-1.0e6, 1.0e6)
libmvec-atanf-inputs:
arg1:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 4.0f
10% uniform random distribution in range (-1.0e6f, 1.0e6f)
arg2:
90% Normal random distribution
range: (-FLT_MAX, FLT_MAX)
mean: 0.0f
sigma: 4.0f
10% uniform random distribution in range (-1.0e6f, 1.0e6f)
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Add vector acos/acosf and input files to libmvec microbenchmark.
libmvec-acos-inputs:
90% Normal random distribution
range: (-1.0, 1.0)
mean: 0.0
sigma: 1.0
10% uniform random distribution in range (-1.0, 1.0)
libmvec-acosf-inputs:
90% Normal random distribution
range: (-1.0f, 1.0f)
mean: 0.0f
sigma: 1.0f
10% uniform random distribution in range (-1.0f, 1.0f)
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Optimization are primarily to the loop logic and how the page cross
logic interacts with the loop.
The page cross logic is at times more expensive for short strings near
the end of a page but not crossing the page. This is done to retest
the page cross conditions with a non-faulty check and to improve the
logic for entering the loop afterwards. This is only particular cases,
however, and is general made up for by more than 10x improvements on
the transition from the page cross -> loop case.
The non-page cross cases as well are nearly universally improved.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Optimization are primarily to the loop logic and how the page cross
logic interacts with the loop.
The page cross logic is at times more expensive for short strings near
the end of a page but not crossing the page. This is done to retest
the page cross conditions with a non-faulty check and to improve the
logic for entering the loop afterwards. This is only particular cases,
however, and is general made up for by more than 10x improvements on
the transition from the page cross -> loop case.
The non-page cross cases are improved most for smaller sizes [0, 128]
and go about even for (128, 4096]. The loop page cross logic is
improved so some more significant speedup is seen there as well.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Commit 948ce73b31 made recvmsg/recvmmsg to always call
__convert_scm_timestamps for 64 bit time_t symbol, so adjust it to
always build it for __TIMESIZE != 64.
It fixes build for architecture with 32 bit time_t support when
configured with minimum kernel of 5.1.
Pass the actual number of bytes returned by the kernel.
Fixes: 33099d72e4 ("linux: Simplify get_nprocs")
Reviewed-by: Dmitry V. Levin <ldv@altlinux.org>
This matches the data size initial-exec relocations use on most
targets.
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The posix_spawnattr_tcsetpgrp_np works on a file descriptor (the
controlling terminal), so it would make more sense to actually fit
it on the file actions API.
Also, POSIX_SPAWN_TCSETPGROUP is not really required since it is
implicit by the presence of tcsetpgrp file action.
The posix/tst-spawn6.c is also fixed when TTY can is not present.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
PI_STATIC_AND_HIDDEN means that references to static functions, data
and symbols with hidden visibility do not need any run-time relocations
after the final link, with the build flags used by glibc.
OpenRISC follows this so enabled PI_STATIC_AND_HIDDEN by adding
configure.ac and generating configure.
Suggested-by: Florian Weimer <fweimer@redhat.com>
It is not Hurd-specific, but H.J. Lu wants it there.
Also, dc.a can be used to avoid hardcoding .long vs .quad and thus use
the same implementation for i386 and x86_64.
The rtld audit support show two problems on aarch64:
1. _dl_runtime_resolve does not preserve x8, the indirect result
location register, which might generate wrong result calls
depending of the function signature.
2. The NEON Q registers pushed onto the stack by _dl_runtime_resolve
were twice the size of D registers extracted from the stack frame by
_dl_runtime_profile.
While 2. might result in wrong information passed on the PLT tracing,
1. generates wrong runtime behaviour.
The aarch64 rtld audit support is changed to:
* Both La_aarch64_regs and La_aarch64_retval are expanded to include
both x8 and the full sized NEON V registers, as defined by the
ABI.
* dl_runtime_profile needed to extract registers saved by
_dl_runtime_resolve and put them into the new correctly sized
La_aarch64_regs structure.
* The LAV_CURRENT check is change to only accept new audit modules
to avoid the undefined behavior of not save/restore x8.
* Different than other architectures, audit modules older than
LAV_CURRENT are rejected (both La_aarch64_regs and La_aarch64_retval
changed their layout and there are no requirements to support multiple
audit interface with the inherent aarch64 issues).
* A new field is also reserved on both La_aarch64_regs and
La_aarch64_retval to support variant pcs symbols.
Similar to x86, a new La_aarch64_vector type to represent the NEON
register is added on the La_aarch64_regs (so each type can be accessed
directly).
Since LAV_CURRENT was already bumped to support bind-now, there is
no need to increase it again.
Checked on aarch64-linux-gnu.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
The audit symbind callback is not called for binaries built with
-Wl,-z,now or when LD_BIND_NOW=1 is used, nor the PLT tracking callbacks
(plt_enter and plt_exit) since this would change the expected
program semantics (where no PLT is expected) and would have performance
implications (such as for BZ#15533).
LAV_CURRENT is also bumped to indicate the audit ABI change (where
la_symbind flags are set by the loader to indicate no possible PLT
trace).
To handle powerpc64 ELFv1 function descriptor, _dl_audit_symbind
requires to know whether bind-now is used so the symbol value is
updated to function text segment instead of the OPD (for lazy binding
this is done by PPC64_LOAD_FUNCPTR on _dl_runtime_resolve).
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
powerpc64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
For audit modules and dependencies with initial-exec TLS, we can not
set the initial TLS image on default loader initialization because it
would already be set by the audit setup. However, subsequent thread
creation would need to follow the default behaviour.
This patch fixes it by setting l_auditing link_map field not only
for the audit modules, but also for all its dependencies. This is
used on _dl_allocate_tls_init to avoid the static TLS initialization
at load time.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
Add <dl-r_debug.h> to get the adddress of the r_debug structure after
relocation and its offset before relocation from the PT_DYNAMIC segment
to support DT_DEBUG, DT_MIPS_RLD_MAP_REL and DT_MIPS_RLD_MAP.
Co-developed-by: Xi Ruoyao <xry111@mengyan1223.wang>
The timestamps created by __convert_scm_timestamps only make sense for
64 bit time_t programs, 32 bit time_t programs will ignore 64 bit time_t
timestamps since SO_TIMESTAMP will be defined to old values (either by
glibc or kernel headers).
Worse, if the buffer is not suffice MSG_CTRUNC is set to indicate it
(which breaks some programs [1]).
This patch makes only 64 bit time_t recvmsg and recvmmsg to call
__convert_scm_timestamps. Also, the assumption to called it is changed
from __ASSUME_TIME64_SYSCALLS to __TIMESIZE != 64 since the setsockopt
might be called by libraries built without __TIME_BITS=64. The
MSG_CTRUNC is only set for the 64 bit symbols, it should happen only
if 64 bit time_t programs run older kernels.
Checked on x86_64-linux-gnu and i686-linux-gnu.
[1] https://github.com/systemd/systemd/pull/20567
Reviewed-by: Florian Weimer <fweimer@redhat.com>
The __convert_scm_timestamps only updates the control message last
pointer for SOL_SOCKET type, so if the message control buffer contains
multiple ancillary message types the converted timestamp one might
overwrite a valid message.
The test checks if the extra ancillary space is correctly handled
by recvmsg/recvmmsg, where if there is no extra space for the 64-bit
time_t converted message the control buffer should be marked with
MSG_TRUNC. It also check if recvmsg/recvmmsg handle correctly multiple
ancillary data.
Checked on x86_64-linux and on i686-linux-gnu on both 5.11 and
4.15 kernel.
Co-authored-by: Fabian Vogt <fvogt@suse.de>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
The glibc 2.34 release really should have added a GLIBC_2.34
symbol to the dynamic loader. With it, we could move functions such
as dlopen or pthread_key_create that work on process-global state
into the dynamic loader (once we have fixed a longstanding issue
with static linking). Without the GLIBC_2.34 symbol, yet another
new symbol version would be needed because old glibc will fail to
load binaries due to the missing symbol version in ld.so that newly
linked programs will require.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Currently there is no proper way to set the controlling terminal through
posix_spawn in race free manner [1]. This forces shell implementations
to keep using fork+exec when launching background process groups,
even when using posix_spawn yields better performance.
This patch adds a new GNU extension so the creating process can
configure the created process terminal group. This is done with a new
flag, POSIX_SPAWN_TCSETPGROUP, along with two new attribute functions:
posix_spawnattr_tcsetpgrp_np, and posix_spawnattr_tcgetpgrp_np.
The function sets a new attribute, spawn-tcgroupfd, that references to
the controlling terminal.
The controlling terminal is set after the spawn-pgroup attribute, and
uses the spawn-tcgroupfd along with current creating process group
(so it is composable with POSIX_SPAWN_SETPGROUP).
To create a process and set the controlling terminal, one can use the
following sequence:
posix_spawnattr_t attr;
posix_spawnattr_init (&attr);
posix_spawnattr_setflags (&attr, POSIX_SPAWN_TCSETPGROUP);
posix_spawnattr_tcsetpgrp_np (&attr, tcfd);
If the idea is also to create a new process groups:
posix_spawnattr_t attr;
posix_spawnattr_init (&attr);
posix_spawnattr_setflags (&attr, POSIX_SPAWN_TCSETPGROUP
| POSIX_SPAWN_SETPGROUP);
posix_spawnattr_tcsetpgrp_np (&attr, tcfd);
posix_spawnattr_setpgroup (&attr, 0);
The controlling terminal file descriptor is ignored if the new flag is
not set.
This interface is slight different than the one provided by QNX [2],
which only provides the POSIX_SPAWN_TCSETPGROUP flag. The QNX
documentation does not specify how the controlling terminal is obtained
nor how it iteracts with POSIX_SPAWN_SETPGROUP. Since a glibc
implementation is library based, it is more straightforward and avoid
requires additional file descriptor operations to request the caller
to setup the controlling terminal file descriptor (and it also allows
a bit less error handling by posix_spawn).
Checked on x86_64-linux-gnu and i686-linux-gnu.
[1] https://github.com/ksh93/ksh/issues/79
[2] https://www.qnx.com/developers/docs/7.0.0/index.html#com.qnx.doc.neutrino.lib_ref/topic/p/posix_spawn.html
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
No valid path returned by getcwd would fit into 1 byte, so reject the
size early and return NULL with errno set to ERANGE. This change is
prompted by CVE-2021-3999, which describes a single byte buffer
underflow and overflow when all of the following conditions are met:
- The buffer size (i.e. the second argument of getcwd) is 1 byte
- The current working directory is too long
- '/' is also mounted on the current working directory
Sequence of events:
- In sysdeps/unix/sysv/linux/getcwd.c, the syscall returns ENAMETOOLONG
because the linux kernel checks for name length before it checks
buffer size
- The code falls back to the generic getcwd in sysdeps/posix
- In the generic func, the buf[0] is set to '\0' on line 250
- this while loop on line 262 is bypassed:
while (!(thisdev == rootdev && thisino == rootino))
since the rootfs (/) is bind mounted onto the directory and the flow
goes on to line 449, where it puts a '/' in the byte before the
buffer.
- Finally on line 458, it moves 2 bytes (the underflowed byte and the
'\0') to the buf[0] and buf[1], resulting in a 1 byte buffer overflow.
- buf is returned on line 469 and errno is not set.
This resolves BZ #28769.
Reviewed-by: Andreas Schwab <schwab@linux-m68k.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Qualys Security Advisory <qsa@qualys.com>
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
If any RPC fails, the reply port will already be deallocated.
__pthread_thread_terminate thus has to defer taking its name until the very last
__thread_terminate_release which doesn't reply a message. But then we
have to read from the pthread structure.
This introduces __pthread_dealloc_finish() which does the recording of
the thread termination, so the slot can be reused really only just before
the __thread_terminate_release call. Only the real thread can set it, so
let's decouple this from the pthread_state by just removing the
PTHREAD_TERMINATED state and add a terminated field.
We were getting
../scripts/evaluate-test.sh posix/annexc $? true false > /usr/src/glibc-upstream/build/posix/annexc.test-result
In file included from ../include/pthread.h:1,
from <stdin>:1:
../sysdeps/htl/include/pthread.h:7:62: error: missing binary operator before token "("
7 | # if defined __USE_EXTERN_INLINES && defined _LIBC && !IS_IN (libsupport)
| ^
In some cases (e.g QEMU, non-Intel/AMD CPU) the cache information can
not be retrieved and the corresponding values are set to 0.
Commit 2d651eb926 ("x86: Move x86 processor cache info to
cpu_features") changed the behaviour in such case by defining the
__x86_shared_cache_size and __x86_data_cache_size variables to 0 instead
of using the default values. This cause an issue with the i686 SSE2
optimized bzero/routine which assumes that the cache size is at least
128 bytes, and otherwise tries to zero/set the whole address space minus
128 bytes.
Fix that by restoring the original code to only update
__x86_shared_cache_size and __x86_data_cache_size variables if the
corresponding cache sizes are not zero.
Fixes bug 28784
Fixes commit 2d651eb926
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
It is similar to epoll_wait, with the difference the timeout has
nanosecond resoluting by using struct timespec instead of int.
Although Linux interface only provides 64 bit time_t support, old
32 bit interface is also provided (so keep in sync with current
practice and to no force opt-in on 64 bit time_t).
Checked on x86_64-linux-gnu and i686-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
The usage of internal static symbol for statically linked binaries
does not work correctly for objects built with -D_TIME_BITS=64,
since the internal definition does not provide the expected aliases.
This patch makes it to use the default stat functions instead (which
uses the default 64 time_t alias and types).
Checked on i686-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
The content of the structure is only used internally, so we can make
__pthread_attr_getschedparam and __pthread_attr_setschedparam convert
between the public sched_param type and an internal __sched_param.
This allows to avoid to spuriously expose the sched_param type.
This fixes BZ #23088.
We have to drop the kernel_thread port from the thread structure, to
avoid pthread_kill's call to _hurd_thread_sigstate trying to reference
it and fail.
This is required so that the checks still work if $(early-cflags)
selects a different ISA level.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
This ISA level covers the glibc build itself. <dl-hwcap-check.h>
cannot be used because this check (by design) happens before
DL_PLATFORM_INIT and the x86 CPU flags initialization.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This is required so that the checks still work if $(early-cflags)
selects a different ISA level.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
HAVE_X86_LAHF_SAHF is implied by x86-64-v2, and HAVE_X86_MOVBE by
x86-64-v3.
The individual flag does not appear in -fverbose-asm flag output
even if the ISA level implies it.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This patch adds following inputs:
0x1.bcab29da0e947p-54 0x1.bc41f4d2294b8p-54
0x1.a11891ec004d4p-348 0x1.814830510be26p-348
0x1.b836ed678be29p-588 0x1.b7be6f5a03a8cp-588
0x1.a83f842ef3f73p-633 0x1.a799d8a6677ep-633
to atan2 tests and updates x86_64 double atan2 ulps.
This fixes BZ #28765.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Linux 5.16 has one new syscall, futex_waitv. Update
syscall-names.list and regenerate the arch-syscall.h headers with
build-many-glibcs.py update-syscalls.
Tested with build-many-glibcs.py.
The configure check for CAN_USE_REGISTER_ASM_EBP tried to compile a
simple function that uses %ebp as an inline assembly operand. If
compilation failed, CAN_USE_REGISTER_ASM_EBP was set 0, which
eventually had these consequences:
(1) %ebx was avoided as an inline assembly operand, with an
assembler macro hack to avoid unnecessary register moves.
(2) %ebp was avoided as an inline assembly operand, using an
out-of-line syscall function for 6-argument system calls.
(1) is no longer needed for any GCC version that is supported for
building glibc. %ebx can be used directly as a register operand.
Therefore, this commit removes the %ebx avoidance completely. This
avoids the assembler macro hack, which turns out to be incompatible
with the current Systemtap probe macros (which switch to .altmacro
unconditionally).
(2) is still needed in many build configurations. The existing
configure check cannot really capture that because the simple function
succeeds to compile, while the full glibc build still fails.
Therefore, this commit removes the check, the CAN_USE_REGISTER_ASM_EBP
macro, and uses the out-of-line syscall function for 6-argument system
calls unconditionally.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This patch fixes SSE4.2 libmvec atan2 function accuracy for following
inputs to less than 4 ulps.
{0x1.bcab29da0e947p-54,0x1.bc41f4d2294b8p-54} 4.19888 ulps
{0x1.b836ed678be29p-588,0x1.b7be6f5a03a8cp-588} 4.09889 ulps
This fixes BZ #28765.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The __convert_scm_timestamps() only updates the control message last
pointer for SOL_SOCKET type, so if the message control buffer contains
multiple ancillary message types the converted timestamp one might
overwrite a valid message.
The test check if the extra ancillary space is correctly handled
by recvmsg/recvmmsg, where if there is no extra space for the 64-bit
time_t converted message the control buffer should be marked with
MSG_TRUNC. It also check if recvmsg/recvmmsg handle correctly multiple
ancillary data.
Checked on x86_64-linux and on i686-linux-gnu on both 5.11 and
4.15 kernel.
Co-authored-by: Fabian Vogt <fvogt@suse.de>
The usage of internal static symbol for statically linked binaries
does not work correctly for objects built with -D_TIME_BITS=64,
since the internal definition does not provide the expected aliases.
This patch makes it to use the default stat functions instead (which
uses the default 64 time_t alias and types).
Checked on i686-linux-gnu.
Indicates the availability of enhanced counter virtualization extension
of armv8.6-a with self-synchronized virtual counter CNTVCTSS_EL0 usable
in userspace.
Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to
__wcscmp_evex. For x86_64 this covers the entire address range so any
length larger could not possibly be used to bound `s1` or `s2`.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Fixes [BZ# 28755] for wcsncmp by redirecting length >= 2^56 to
__wcscmp_avx2. For x86_64 this covers the entire address range so any
length larger could not possibly be used to bound `s1` or `s2`.
test-strcmp, test-strncmp, test-wcscmp, and test-wcsncmp all pass.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Converting double precision constants to float is now affected by the
runtime dynamic rounding mode instead of being evaluated at compile
time with default rounding mode (except static object initializers).
This can change the computed result and cause performance regression.
The known correctness issues (increased ulp errors) are already fixed,
this patch fixes remaining cases of unnecessary runtime conversions.
Add float M_* macros to math.h as new GNU extension API. To avoid
conversions the new M_* macros are used and instead of casting double
literals to float, use float literals (only required if the conversion
is inexact).
The patch was tested on aarch64 where the following symbols had new
spurious conversion instructions that got fixed:
__clog10f
__gammaf_r_finite@GLIBC_2.17
__j0f_finite@GLIBC_2.17
__j1f_finite@GLIBC_2.17
__jnf_finite@GLIBC_2.17
__kernel_casinhf
__lgamma_negf
__log1pf
__y0f_finite@GLIBC_2.17
__y1f_finite@GLIBC_2.17
cacosf
cacoshf
casinhf
catanf
catanhf
clogf
gammaf_positive
Fixes bug 28713.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Trapping SIGSEGV within the process is error-prone, adds security
issues, and modern analysis design tends to happen out of the
process (either by attaching a debugger or by post-mortem analysis).
The libSegfault also has some design problems, it uses non
async-signal-safe function (backtrace) on signal handler.
There are multiple alternatives if users do want to use similar
functionality, such as sigsegv gnulib module or libsegfault.
Here we define the minumum linux kernel version at 5.4.0, as that is the
long term support version where 32-bit architectures start to support
64-bit time API's. The OpenRISC kernel had some bugs up until version 5.8
which caused issues with glibc fork/clone, they have been backported to
5.4 but not previous versions.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
OpenRISC support hard float but I will like to submit that after glibc
soft float goes upstream. The hard float support depends on adding user
access to the FPCSR, which is not supported by the kernel yet.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
OpenRISC includes 3 TLS addressing models. Local Dynamic optimizations
are not done in the linker and therefore use the same code sequences as
Global Dynamic.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add <dl-debug.h> to setup debugging entry in PT_DYNAMIC segment to support
DT_DEBUG, DT_MIPS_RLD_MAP_REL and DT_MIPS_RLD_MAP.
Tested on x86-64, x32 and i686 as well as with build-many-glibcs.py.
I've updated copyright dates in glibc for 2022. This is the patch for
the changes not generated by scripts/update-copyrights and subsequent
build / regeneration of generated files. As well as the usual annual
updates, mainly dates in --version output (minus csu/version.c which
previously had to be handled manually but is now successfully updated
by update-copyrights), there is a small change to the copyright notice
in NEWS which should let NEWS get updated automatically next year.
Please remember to include 2022 in the dates for any new files added
in future (which means updating any existing uncommitted patches you
have that add new files to use the new copyright dates in them).
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
407765e9f2 ("hurd: Fix ELF_MACHINE_USER_ADDRESS_MASK value") switched
ELF_MACHINE_USER_ADDRESS_MASK from 0xf8000000UL to 0xf0000000UL to let
libraries etc. get loaded at 0x08000000. But
ELF_MACHINE_USER_ADDRESS_MASK is actually only meaningful for the main
program anyway, so keep it at 0xf8000000UL to prevent the program loader
from putting ld.so beyond 0x08000000. And conversely, drop the use of
ELF_MACHINE_USER_ADDRESS_MASK for shared objects, which don't need any
constraints since the program will have already be loaded by then.
Commit a4b4131355 changed default __TIMESIZE to 64, however
it added sub-architecture timesize.h for powerpc, s390, and
sparc.
Also simplify mips by removing _MIPS_SIM usage (which would require
to add sgidefs inclusion.
glibc uses /dev/urandom for getrandom(), and from version 2.34 malloc
initialization uses it. We have to detect when we are running the random
translator itself, in which case we can't read ourself.
When running tests on OpenRISC which has 32-bit wordsize but 64-bit
timesize it was found that O_LARGEFILE is not being set when calling
open64. For 64-bit architectures the O_LARGEFILE flag is generally
implied by the kernel according to force_o_largefile. However, for
32-bit architectures this is not done.
For this patch we unconditionally now set the O_LARGEFILE flag for
open64 class syscalls as there is no harm in doing so.
Tested on the OpenRISC the build works and timezone/tst-tzset passes
which was failing before. I would expect this also would fix arc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Implement vectorized tan/tanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector tan/tanf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized erfc/erfcf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector erfc/erfcf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector asinh/asinhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector tanh/tanhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized erf/erff containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector erf/erff with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector acosh/acoshf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atanh/atanhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector log1p/log1pf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized log2/log2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector log2/log2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized log10/log10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector log10/log10f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atan2/atan2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector sinh/sinhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector expm1/expm1f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector cosh/coshf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector exp10/exp10f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector exp2/exp2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector hypot/hypotf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector asin/asinf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atan/atanf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).
_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed. It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).
It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.
The implementation uses software transactional memory, as suggested
by Torvald Riegel. Two copies of the supporting data structures are
used, also achieving full async-signal-safety.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
And use machine-sp.h instead. The Linux implementation is based on
already provided CURRENT_STACK_FRAME (used on nptl code) and
STACK_GROWS_UPWARD is replaced with _STACK_GROWS_UP.
In commit a92f4e6299 ("linux: Add time64
pselect support"), a Microblaze specific implementation of
__pselect32() was added to cover the case of kernels < 3.15 which lack
the pselect6 system call.
This new file sysdeps/unix/sysv/linux/microblaze/pselect32.c takes
precedence over the default implementation
sysdeps/unix/sysv/linux/pselect32.c.
However sysdeps/unix/sysv/linux/pselect32.c provides an implementation
of __pselect32() which is needed when __ASSUME_TIME64_SYSCALLS is not
defined. On Microblaze, which is a 32-bit architecture,
__ASSUME_TIME64_SYSCALLS is only true for kernels >= 5.1.
Due to sysdeps/unix/sysv/linux/microblaze/pselect32.c taking
precedence over sysdeps/unix/sysv/linux/pselect32.c, it means that
when we are with a kernel >= 3.15 but < 5.1, we need a __pselect32()
implementation, but sysdeps/unix/sysv/linux/microblaze/pselect32.c
doesn't provide it, and sysdeps/unix/sysv/linux/pselect32.c which
would provide it is not compiled in.
This causes the following build failure on Microblaze with for example
Linux kernel headers 4.9:
[...]/build/libc_pic.os: in function `__pselect64':
(.text+0x120b44): undefined reference to `__pselect32'
collect2: error: ld returned 1 exit status
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It consolidates the code required to call la_pltexit audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
It consolidates the code required to call la_pltenter audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
It consolidates the code required to call la_preinit audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
It consolidates the code required to call la_symbind{32,64} audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
It consolidates the code required to call la_objclose audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
It consolidates the code required to call la_objsearch audit
callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
It consolidates the code required to call la_activity audit
callback.
Also for a new Lmid_t the namespace link_map list are empty, so it
requires to check if before using it. This can happen for when audit
module is used along with dlmopen.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
It consolidates the code required to call la_objopen audit callback.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
hurd initialization stages use RUN_HOOK to run various initialization
functions. That is however using absolute addresses which need to be
relocated, which is done later by csu. We can however easily make the
linker compute relative addresses which thus don't need a relocation.
The new SET_RELHOOK and RUN_RELHOOK macros implement this.
Since 9cec82de71 ("htl: Initialize later"), we let csu initialize
pthreads. We can thus let it initialize tls later too, to better align
with the generic order. Initialization however accesses ports which
links/unlinks into the sigstate for unwinding. We can however easily
skip that during initialization.
No bug.
Optimizations are twofold.
1) Replace page cross and 0/1 checks with masked load instructions in
L(less_vec). In applications this reduces branch-misses in the
hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.
Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range. From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
No bug.
Optimizations are twofold.
1) Replace page cross and 0/1 checks with masked load instructions in
L(less_vec). In applications this reduces branch-misses in the
hot [0, 32] case.
2) Change controlflow so that L(less_vec) case gets the fall through.
Change 2) helps copies in the [0, 32] size range but comes at the cost
of copies in the [33, 64] size range. From profiles of GCC and
Python3, 94%+ and 99%+ of calls are in the [0, 32] range so this
appears to the the right tradeoff.
Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector acos/acosf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
s_cosf.c and s_sinf.c have
if (abstop12 (y) < abstop12 (pio4))
where abstop12 takes a float argument, but pio4 is static const double.
pio4 is used only in calls to abstop12 and never in arithmetic. Apply
-static const double pio4 = 0x1.921FB54442D18p-1;
+static const float pio4 = 0x1.921FB6p-1f;
to fix:
FAIL: math/test-float-cos
FAIL: math/test-float-sin
FAIL: math/test-float-sincos
FAIL: math/test-float32-cos
FAIL: math/test-float32-sin
FAIL: math/test-float32-sincos
when compiling with GCC 12.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
When the clock_id is CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID,
on the 5.10 kernel powerpc 32-bit, the 32-bit vDSO is executed successfully (
because the __kernel_clock_gettime in arch/powerpc/kernel/vdso32/gettimeofday.S
does not support these two IDs, the 32-bit time_t syscall will be used),
but tp32.tv_sec is equal to 0, causing the 64-bit time_t syscall to continue to be used,
resulting in two system calls.
Fix commit 72e84d1db2.
Signed-off-by: maminjie <maminjie2@huawei.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add the constant ARPHRD_MCTP, from Linux 5.15, to net/if_arp.h, along
with ARPHRD_CAN which was added to Linux in version 2.6.25 (commit
cd05acfe65ed2cf2db683fa9a6adb8d35635263b, "[CAN]: Allocate protocol
numbers for PF_CAN") but apparently missed for glibc at the time.
Tested for x86_64.
Align the stack pointer to 128 bits during the call to _dl_init() as
specified by the RISC-V ABI [1]. This fixes the elf/tst-align2 test.
Fixes bug 28703.
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc
The RISC-V ABI [1] mandates that "the stack pointer shall be aligned to
a 128-bit boundary upon procedure entry". This as not the case in clone.
This fixes the misc/tst-misalign-clone-internal and
misc/tst-misalign-clone tests.
Fixes bug 28702.
[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc
On KVM guests running on some AMD systems, the IBRS feature is reported
as a synthetic feature using the Intel feature, while the cpuinfo entry
keeps the same. Handle that by first checking the presence of the Intel
feature on AMD systems.
Fixes bug 28704.
The syscall function does not allocate the extra stack frame for scv like other
assembly syscalls using DO_CALL_SCV. So after commit d120fb9941 changed the
offset that is used to save LR, syscall ended up using an invalid offset,
causing regressions on powerpc64. So make sure the extra stack frame is
allocated in syscall.S as well to make it consistent with other uses of
DO_CALL_SCV and avoid similar issues in the future.
Tested on powerpc, powerpc64, and powerpc64le (with and without scv)
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Due to PIE-by-default, PIC is now defined in more cases. libc.a
does not have _rtld_global_ro, and statically linking setcontext
fails. SHARED is the right condition to use, so that libc.a
references _dl_hwcap instead of _rtld_global_ro.
For static PIE support, the !SHARED case would still have to be made
PIC. This patch does not achieve that.
Fixes commit 23645707f1
("Replace --enable-static-pie with --disable-default-pie").
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
With the morecore hook removed, there is not easy way to provide huge
pages support on with glibc allocator without resorting to transparent
huge pages. And some users and programs do prefer to use the huge pages
directly instead of THP for multiple reasons: no splitting, re-merging
by the VM, no TLB shootdowns for running processes, fast allocation
from the reserve pool, no competition with the rest of the processes
unlike THP, no swapping all, etc.
This patch extends the 'glibc.malloc.hugetlb' tunable: the value
'2' means to use huge pages directly with the system default size,
while a positive value means and specific page size that is matched
against the supported ones by the system.
Currently only memory allocated on sysmalloc() is handled, the arenas
still uses the default system page size.
To test is a new rule is added tests-malloc-hugetlb2, which run the
addes tests with the required GLIBC_TUNABLE setting. On systems without
a reserved huge pages pool, is just stress the mmap(MAP_HUGETLB)
allocation failure. To improve test coverage it is required to create
a pool with some allocated pages.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
Linux Transparent Huge Pages (THP) current supports three different
states: 'never', 'madvise', and 'always'. The 'never' is
self-explanatory and 'always' will enable THP for all anonymous
pages. However, 'madvise' is still the default for some system and
for such case THP will be only used if the memory range is explicity
advertise by the program through a madvise(MADV_HUGEPAGE) call.
To enable it a new tunable is provided, 'glibc.malloc.hugetlb',
where setting to a value diffent than 0 enables the madvise call.
This patch issues the madvise(MADV_HUGEPAGE) call after a successful
mmap() call at sysmalloc() with sizes larger than the default huge
page size. The madvise() call is disable is system does not support
THP or if it has the mode set to "never" and on Linux only support
one page size for THP, even if the architecture supports multiple
sizes.
To test is a new rule is added tests-malloc-hugetlb1, which run the
addes tests with the required GLIBC_TUNABLE setting.
Checked on x86_64-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
A local register variable is merely a compiler hint, and so not
appropriate in this context. Move the global register variable into
<thread_pointer.h> and include it from <tls.h>, as there can only
be one global definition for one particular register.
Fixes commit 8dbeb0561e
("nptl: Add <thread_pointer.h> for defining __thread_pointer").
Reported-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Add <tst-file-align.h> to support target specific ALIGN for variable
alignment test:
1. Alpha: Use 0x10000.
2. MicroBlaze and Nios II: Use 0x8000.
3. All others: Use 0x200000.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The macro TAYLOR_SIN adds the term `-0.5*da*a^2 + da` in hopes
of regaining some precision as a function of da. However the
comment says we add the term `-0.5*da*a^2 + 0.5*da` which is
different. This fix updates the comment to reflect the
code and also simplifies the calculation by replacing `a` with `x`
because they always have the same value.
Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu>
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper with
SVID error handling around the new code. There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
The generic implementation is shows only slight worse performance:
POWER10 reciprocal-throughput latency
master 8.28478 13.7253
new hypot 7.21945 13.1933
POWER9 reciprocal-throughput latency
master 13.4024 14.0967
new hypot 14.8479 15.8061
POWER8 reciprocal-throughput latency
master 15.5767 16.8885
new hypot 16.5371 18.4057
One way to improve might to make gcc generate xsmaxdp/xsmindp for
fmax/fmin (it onl does for -ffast-math, clang does for default
options).
Checked on powerpc64-linux-gnu (power8) and powerpc64le-linux-gnu
(power9).
The generic hypotf is slight slower, mostly due the tricks the assembly
does to optimize the isinf/isnan/issignaling. The generic hypot is way
slower, since the optimized implementation uses the i386 default
excessive precision to issue the operation directly. A similar
implementation is provided instead of using the generic implementation:
Checked on i686-linux-gnu.
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.05% results with an error of
1 ulp (453266 results) while the new implementation only shows
0.0001% of total (1280).
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.02% results with an error of
1 ulp (23158 results) while the new implementation only shows
0.0001% of total (111).
[1] https://arxiv.org/pdf/1904.09481.pdf
Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency. The non-fma version has 30% higher throughput and 10%
higher latency.
Max ULP error is 0.949 with fma and 0.792 without fma.
Passes GLIBC testsuite.
This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for denormal results.
The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).
The performance result are also only slight worse than current
implementation. On x86_64 (Ryzen 5900X) with gcc 12:
Before:
"hypot": {
"workload-random": {
"duration": 3.73319e+09,
"iterations": 1.12e+08,
"reciprocal-throughput": 22.8737,
"latency": 43.7904,
"max-throughput": 4.37184e+07,
"min-throughput": 2.28361e+07
}
}
After:
"hypot": {
"workload-random": {
"duration": 3.7597e+09,
"iterations": 9.8e+07,
"reciprocal-throughput": 23.7547,
"latency": 52.9739,
"max-throughput": 4.2097e+07,
"min-throughput": 1.88772e+07
}
}
Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float. With gcc it
results in 2 FP comparisons.
The file Copyright is also changed to use GPL, the implementation was
completely changed by 7c10fd3515 to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.
Checked on x86_64-linux-gnu.
Replace non-UTF-8 and non-ASCII characters in comments with their UTF-8
equivalents so that files don't end up with mixed encodings. With this,
all files (except tests that actually test different encodings) have a
single encoding.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Build glibc programs and tests as PIE by default and enable static-pie
automatically if the architecture and toolchain supports it.
Also add a new configuration option --disable-default-pie to prevent
building programs as PIE.
Only the following architectures now have PIE disabled by default
because they do not work at the moment. hppa, ia64, alpha and csky
don't work because the linker is unable to handle a pcrel relocation
generated from PIE objects. The microblaze compiler is currently
failing with an ICE. GNU hurd tries to enable static-pie, which does
not work and hence fails. All these targets have default PIE disabled
at the moment and I have left it to the target maintainers to enable PIE
on their targets.
build-many-glibcs runs clean for all targets. I also tested x86_64 on
Fedora and Ubuntu, to verify that the default build as well as
--disable-default-pie work as expected with both system toolchains.
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since
the first PT_LOAD segment is no longer executable due to defaulting to
-z separate-code.
This fixes [BZ #28656].
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Without the bar_ctor_finish barrier, it was possible that thread2
re-locked user_lock before ctor had a chance to lock it. ctor then
blocked in its locking operation, xdlopen from the main thread
did not return, and thread2 was stuck waiting in bar_dtor:
thread 1: started.
thread 2: started.
thread 2: locked user_lock.
constructor started: 0.
thread 1: in ctor: started.
thread 3: started.
thread 3: done.
thread 2: unlocked user_lock.
thread 2: locked user_lock.
Fixes the test in commit 83b5323261
("elf: Avoid deadlock between pthread_create and ctors [BZ #28357]").
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
TLS_INIT_TCB_ALIGN is not actually used. TLS_TCB_ALIGN was likely
introduced to support a configuration where the thread pointer
has not the same alignment as THREAD_SELF. Only ia64 seems to use
that, but for the stack/pointer guard, not for storing tcbhead_t.
Some ports use TLS_TCB_OFFSET and TLS_PRE_TCB_SIZE to shift
the thread pointer, potentially landing in a different residue class
modulo the alignment, but the changes should not impact that.
In general, given that TLS variables have their own alignment
requirements, having different alignment for the (unshifted) thread
pointer and struct pthread would potentially result in dynamic
offsets, leading to more complexity.
hppa had different values before: __alignof__ (tcbhead_t), which
seems to be 4, and __alignof__ (struct pthread), which was 8
(old default) and is now 32. However, it defines THREAD_SELF as:
/* Return the thread descriptor for the current thread. */
# define THREAD_SELF \
({ struct pthread *__self; \
__self = __get_cr27(); \
__self - 1; \
})
So the thread pointer points after struct pthread (hence __self - 1),
and they have to have the same alignment on hppa as well.
Similarly, on ia64, the definitions were different. We have:
# define TLS_PRE_TCB_SIZE \
(sizeof (struct pthread) \
+ (PTHREAD_STRUCT_END_PADDING < 2 * sizeof (uintptr_t) \
? ((2 * sizeof (uintptr_t) + __alignof__ (struct pthread) - 1) \
& ~(__alignof__ (struct pthread) - 1)) \
: 0))
# define THREAD_SELF \
((struct pthread *) ((char *) __thread_self - TLS_PRE_TCB_SIZE))
And TLS_PRE_TCB_SIZE is a multiple of the struct pthread alignment
(confirmed by the new _Static_assert in sysdeps/ia64/libc-tls.c).
On m68k, we have a larger gap between tcbhead_t and struct pthread.
But as far as I can tell, the port is fine with that. The definition
of TCB_OFFSET is sufficient to handle the shifted TCB scenario.
This fixes commit 23c77f6018
("nptl: Increase default TCB alignment to 32").
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The relationship between the thread pointer and the rseq area
is made explicit. The constant offset can be used by JIT compilers
to optimize rseq access (e.g., for really fast sched_getcpu).
Extensibility is provided through __rseq_size and __rseq_flags.
(In the future, the kernel could request a different rseq size
via the auxiliary vector.)
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
This tunable allows applications to register the rseq area instead
of glibc.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The rseq area is placed directly into struct pthread. rseq
registration failure is not treated as an error, so it is possible
that threads run with inconsistent registration status.
<sys/rseq.h> is not yet installed as a public header.
Co-Authored-By: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
<tls.h> already contains a definition that is quite similar,
but it is not consistent across architectures.
Only architectures for which rseq support is added are covered.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Current binutils defines __executable_start as the lowest text
address, so using the entry point address as a fallback is no
longer necessary. As a result, overriding <entry.h> is only
necessary if the entry point is not called _start.
The previous approach to define __ASSEMBLY__ to suppress the
declaration breaks if headers included by <entry.h> are not
compatible with __ASSEMBLY__. This happens with rseq integration
because it is necessary to include kernel headers in more places.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Programs without dynamic dependencies and without a program
interpreter are now run via execve.
Previously, the dynamic linker either crashed while attempting to
read a non-existing dynamic segment (looking for DT_AUDIT/DT_DEPAUDIT
data), or the self-relocated in the static PIE executable crashed
because the outer dynamic linker had already applied RELRO protection.
<dl-execve.h> is needed because execve is not available in the
dynamic loader on Hurd.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Must use notl %edi here as lower bits are for CHAR comparisons
potentially out of range thus can be 0 without indicating mismatch.
This fixes BZ #28646.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
rseq support will use a 32-byte aligned field in struct pthread,
so the whole struct needs to have at least that alignment.
nptl/tst-tls3mod.c uses TCB_ALIGNMENT, therefore include <descr.h>
to obtain the fallback definition.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
v2 is a complete rewrite of the A64FX memcpy. Performance is improved
by streamlining the code, aligning all large copies and using a single
unrolled loop for all sizes. The code size for memcpy and memmove goes
down from 1796 bytes to 868 bytes. Performance is better in all cases:
bench-memcpy-random is 2.3% faster overall, bench-memcpy-large is ~33%
faster for large sizes, bench-memcpy-walk is 25% faster for small sizes
and 20% for the largest sizes. The geomean of all tests in bench-memcpy
is 5.1% faster, and total time is reduced by 4%.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
Rewrite memcmp to improve performance. On small and medium inputs performance
is 10-20% better. Large inputs use a SIMD loop processing 64 bytes per
iteration, which is 30-50% faster depending on the size.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>