https://sourceware.org/bugzilla/show_bug.cgi?id=21782 dropped an ld
diagnostic for R_X86_64_PC32 referencing an undefined weak symbol in
-pie links. Arguably keeping the diagnostic like other ports is more
correct, since statically resolving movl foo(%rip), %eax to the
link-time zero address produces a corrupted output.
It turns out that --enable-static-pie builds do not depend on the ld
behavior. GCC generates GOT indirection for weak declarations for
-fPIE/-fPIC, so what ld does with the PC-relative relocation doesn't
really matter.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Refactor the sincos implementation - rather than rely on odd partial inlining
of preprocessed portions from sin and cos, explicitly write out the cases.
This makes sincos much easier to maintain and provides an additional 16-20%
speedup between 0 and 2^27. The overall speedup of sincos is 48% over this range.
Between 0 and PI it is 66% faster.
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Cleanup ifdefs.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c (__sincos): Refactor using the same
logic as sin and cos.
Refactor duplicated code into do_sin. Since all calls to do_sin use copysign to
set the sign of the result, move it inside do_sin. Small inputs use a separate
polynomial, so move this into do_sin as well (the check is based on the more
conservative case when doing large range reduction, but could be relaxed).
* sysdeps/ieee754/dbl-64/s_sin.c (do_sin): Use TAYLOR_SIN for small
inputs. Return correct sign.
(do_sincos): Remove small input check before do_sin, let do_sin set
the sign.
(__sin): Likewise.
(__cos): Likewise.
For huge inputs use the improved do_sincos function as well. Now no cases use
the correction factor returned by do_sin, do_cos and TAYLOR_SIN, so remove it.
* sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SIN): Remove cor parameter.
(do_cos): Remove corp parameter and calculations.
(do_sin): Likewise.
(do_sincos): Remove cor variable.
(__sin): Use do_sincos for huge inputs.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
(reduce_and_compute_sincos): Remove unused function.
This patch improves the accuracy of the range reduction. When the input is
large (2^27) and very close to a multiple of PI/2, using 110 bits of PI is not
enough. Improve range reduction accuracy to 136 bits. As a result the special
checks for results close to zero can be removed. The ULP of the polynomials is
at worst 0.55ULP, so there is no reason for the slow functions, and they can be
removed.
* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_1): Rename to
reduce_sincos, improve accuracy to 136 bits.
(do_sincos_1): Rename to do_sincos, remove fallbacks to slow functions.
(__sin): Use improved reduction and simplified do_sincos calculation.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
This patch removes the large range reduction code and defers to the huge range
reduction code. The first level range reducer supports inputs up to 2^27,
which is way too large given that inputs for sin/cos are typically small
(< 10), and optimizing for a smaller range would give a significant speedup.
Input values above 2^27 are practically never used, so there is no reason for
supporting range reduction between 2^27 and 2^48. Removing it significantly
simplifies code and enables further speedups. There is about a 2.3x slowdown
in this range due to __branred being extremely slow (a better algorithm could
easily more than double performance).
* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_2): Remove function.
(do_sincos_2): Likewise.
(__sin): Remove middle range reduction case.
(__cos): Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Remove middle range
reduction case.
This series of patches removes the slow patchs from sin, cos and sincos.
Besides greatly simplifying the implementation, the new version is also much
faster for inputs up to PI (41% faster) and for large inputs needing range
reduction (27% faster).
ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most
of the range with mpsin and mpcos. The number of incorrectly rounded results
(ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5,
the average is ~850 per million between 0 and PI.
Tested on AArch64 and x86_64 with no regressions.
The first patch removes the slow paths for the cases where the input is small
and doesn't require range reduction. Update ULP tables for sin, cos and sincos
on AArch64 and x86_64.
* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
inputs.
(__cos): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
Unlike GCC, llvm always uses an integrated assembler, which attempts to
recognized all `asm` statements written in the C code. glibc uses some
syntactically invalid asm statements to emit constants into assembly that
are later extracted with a sed or AWK script.
This change fixes two such invalid `asm` statements by wrapping the
output in a `.ascii` directive.. This does not break the sed/AWK (the same
special sequence is output) but it makes the statement syntactically valid.
See cf8e3f8757 for a previous fix for the same issue.
+ Use DOT_MACHINE macro instead of ".machine" instruction.
+ Use __isinf and __isinff instead of builtin versions.
+ In s_logb, s_logbf and s_logbl functions, used float versions to
calculate "ret = x & 0x7f800000;" expression.
When compiled as mempcpy, the return value is the end of the destination
buffer, thus it cannot be used to refer to the start of it.
(cherry picked from commit 9aaaab7c6e)
On s390 (31bit) if glibc is build with -Os, pthread_join sometimes
blocks indefinitely. This is e.g. observable with
testcase intl/tst-gettext6.
pthread_join is calling lll_wait_tid(tid), which performs the futex-wait
syscall in a loop as long as tid != 0 (thread is alive).
On s390 (and build with -Os), tid is loaded from memory before
comparing against zero and then the tid is loaded a second time
in order to pass it to the futex-wait-syscall.
If the thread exits in between, then the futex-wait-syscall is
called with the value zero and it waits until a futex-wake occurs.
As the thread is already exited, there won't be a futex-wake.
In lll_wait_tid, the tid is stored to the local variable __tid,
which is then used as argument for the futex-wait-syscall.
But unfortunately the compiler is allowed to reload the value
from memory.
With this patch, the tid is loaded with atomic_load_acquire.
Then the compiler is not allowed to reload the value for __tid from memory.
ChangeLog:
[BZ #23137]
* sysdeps/nptl/lowlevellock.h (lll_wait_tid):
Use atomic_load_acquire to load __tid.
(cherry picked from commit 1660901840)
This patch adds the PTRACE_SECCOMP_GET_METADATA constant from Linux
4.16 to all relevant sys/ptrace.h files. A type struct
__ptrace_seccomp_metadata, analogous to other such types, is also
added.
Tested for x86_64, and with build-many-glibcs.py.
* sysdeps/unix/sysv/linux/sys/ptrace.h
(PTRACE_SECCOMP_GET_METADATA): New enum value and macro.
* sysdeps/unix/sysv/linux/bits/ptrace-shared.h
(struct __ptrace_seccomp_metadata): New type.
* sysdeps/unix/sysv/linux/aarch64/sys/ptrace.h
(PTRACE_SECCOMP_GET_METADATA): Likewise.
* sysdeps/unix/sysv/linux/arm/sys/ptrace.h
(PTRACE_SECCOMP_GET_METADATA): Likewise.
* sysdeps/unix/sysv/linux/ia64/sys/ptrace.h
(PTRACE_SECCOMP_GET_METADATA): Likewise.
* sysdeps/unix/sysv/linux/powerpc/sys/ptrace.h
(PTRACE_SECCOMP_GET_METADATA): Likewise.
* sysdeps/unix/sysv/linux/s390/sys/ptrace.h
(PTRACE_SECCOMP_GET_METADATA): Likewise.
* sysdeps/unix/sysv/linux/sparc/sys/ptrace.h
(PTRACE_SECCOMP_GET_METADATA): Likewise.
* sysdeps/unix/sysv/linux/tile/sys/ptrace.h
(PTRACE_SECCOMP_GET_METADATA): Likewise.
* sysdeps/unix/sysv/linux/x86/sys/ptrace.h
(PTRACE_SECCOMP_GET_METADATA): Likewise.
(cherry picked from commit 9320ca88a1)