Commit Graph

11746 Commits

Author SHA1 Message Date
Fangrui Song
a21d58a0dc x86_64: Remove unneeded static PIE check for undefined weak diagnostic
https://sourceware.org/bugzilla/show_bug.cgi?id=21782 dropped an ld
diagnostic for R_X86_64_PC32 referencing an undefined weak symbol in
-pie links.  Arguably keeping the diagnostic like other ports is more
correct, since statically resolving movl foo(%rip), %eax to the
link-time zero address produces a corrupted output.

It turns out that --enable-static-pie builds do not depend on the ld
behavior. GCC generates GOT indirection for weak declarations for
-fPIE/-fPIC, so what ld does with the PC-relative relocation doesn't
really matter.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-08-27 17:26:06 -07:00
Wilco Dijkstra
2d20ffe431 [PATCH 7/7] sin/cos slow paths: refactor sincos implementation
Refactor the sincos implementation - rather than rely on odd partial inlining
of preprocessed portions from sin and cos, explicitly write out the cases.
This makes sincos much easier to maintain and provides an additional 16-20%
speedup between 0 and 2^27.  The overall speedup of sincos is 48% over this range.
Between 0 and PI it is 66% faster.

	* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Cleanup ifdefs.
	(__cos): Likewise.
	* sysdeps/ieee754/dbl-64/s_sin.c (__sincos): Refactor using the same
	logic as sin and cos.
2021-08-27 17:26:06 -07:00
Wilco Dijkstra
c8aaaf67f6 [PATCH 6/7] sin/cos slow paths: refactor duplicated code into dosin
Refactor duplicated code into do_sin.  Since all calls to do_sin use copysign to
set the sign of the result, move it inside do_sin.  Small inputs use a separate
polynomial, so move this into do_sin as well (the check is based on the more
conservative case when doing large range reduction, but could be relaxed).

	* sysdeps/ieee754/dbl-64/s_sin.c (do_sin): Use TAYLOR_SIN for small
	inputs.  Return correct sign.
	(do_sincos): Remove small input check before do_sin, let do_sin set
	the sign.
	(__sin): Likewise.
	(__cos): Likewise.
2021-08-27 17:26:06 -07:00
Wilco Dijkstra
c015f0cc57 [PATCH 5/7] sin/cos slow paths: remove unused slowpath functions
Remove all unused slowpath functions.

	* sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SLOW): Remove.
	(do_cos_slow): Likewise.
	(do_sin_slow): Likewise.
	(reduce_and_compute): Likewise.
	(slow): Likewise.
	(slow1): Likewise.
	(slow2): Likewise.
	(sloww): Likewise.
	(sloww1): Likewise.
	(sloww2): Likewise.
	(bslow): Likewise.
	(bslow1): Likewise.
	(bslow2): Likewise.
	(cslow2): Likewise.
2021-08-27 17:26:06 -07:00
Wilco Dijkstra
d4d26acd8a [PATCH 4/7] sin/cos slow paths: remove slow paths from huge range reduction
For huge inputs use the improved do_sincos function as well.  Now no cases use
the correction factor returned by do_sin, do_cos and TAYLOR_SIN, so remove it.

	* sysdeps/ieee754/dbl-64/s_sin.c (TAYLOR_SIN): Remove cor parameter.
	(do_cos): Remove corp parameter and calculations.
	(do_sin): Likewise.
	(do_sincos): Remove cor variable.
	(__sin): Use do_sincos for huge inputs.
	(__cos): Likewise.
	* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
	(reduce_and_compute_sincos): Remove unused function.
2021-08-27 17:26:05 -07:00
Wilco Dijkstra
76f9784421 [PATCH 3/7] sin/cos slow paths: remove slow paths from small range reduction
This patch improves the accuracy of the range reduction.  When the input is
large (2^27) and very close to a multiple of PI/2, using 110 bits of PI is not
enough.  Improve range reduction accuracy to 136 bits.  As a result the special
checks for results close to zero can be removed.  The ULP of the polynomials is
at worst 0.55ULP, so there is no reason for the slow functions, and they can be
removed.

	* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_1): Rename to
	reduce_sincos, improve accuracy to 136 bits.
	(do_sincos_1): Rename to do_sincos, remove fallbacks to slow functions.
	(__sin): Use improved reduction and simplified do_sincos calculation.
	(__cos): Likewise.
	* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Likewise.
2021-08-27 17:26:05 -07:00
Wilco Dijkstra
e525ff25df [PATCH 2/7] sin/cos slow paths: remove large range reduction
This patch removes the large range reduction code and defers to the huge range
reduction code.  The first level range reducer supports inputs up to 2^27,
which is way too large given that inputs for sin/cos are typically small
(< 10), and optimizing for a smaller range would give a significant speedup.

Input values above 2^27 are practically never used, so there is no reason for
supporting range reduction between 2^27 and 2^48.  Removing it significantly
simplifies code and enables further speedups.  There is about a 2.3x slowdown
in this range due to __branred being extremely slow  (a better algorithm could
easily more than double performance).

	* sysdeps/ieee754/dbl-64/s_sin.c (reduce_sincos_2): Remove function.
	(do_sincos_2): Likewise.
	(__sin): Remove middle range reduction case.
	(__cos): Likewise.
	* sysdeps/ieee754/dbl-64/s_sincos.c (__sincos): Remove middle range
	reduction case.
2021-08-27 17:26:05 -07:00
Wilco Dijkstra
bc57e68bbb [PATCH 1/7] sin/cos slow paths: avoid slow paths for small inputs
This series of patches removes the slow patchs from sin, cos and sincos.
Besides greatly simplifying the implementation, the new version is also much
faster for inputs up to PI (41% faster) and for large inputs needing range
reduction (27% faster).

ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most
of the range with mpsin and mpcos.  The number of incorrectly rounded results
(ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5,
the average is ~850 per million between 0 and PI.

Tested on AArch64 and x86_64 with no regressions.

The first patch removes the slow paths for the cases where the input is small
and doesn't require range reduction.  Update ULP tables for sin, cos and sincos
on AArch64 and x86_64.

	* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
	* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
	inputs.
	(__cos): Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
2021-08-27 17:26:05 -07:00
Stan Shebs
d548adb4ef Let time and gettimeofday use vdso by removing old clang workaround 2021-08-27 17:26:04 -07:00
Stan Shebs
2c9e5207e4 Do not use ppc-specific long double pack/unpack when compiling with clang 2021-08-27 17:26:04 -07:00
Stan Shebs
c3064d5f50 Remove old workaround in power7 logb functions, clang no longer crashes on the inline assembly 2021-08-27 17:26:04 -07:00
Josh Kunz
cb90884046 Additional fixes for llvm-as
Unlike GCC, llvm always uses an integrated assembler, which attempts to
recognized all `asm` statements written in the C code. glibc uses some
syntactically invalid asm statements to emit constants into assembly that
are later extracted with a sed or AWK script.

This change fixes two such invalid `asm` statements by wrapping the
output in a `.ascii` directive.. This does not break the sed/AWK (the same
special sequence is output) but it makes the statement syntactically valid.

See cf8e3f8757 for a previous fix for the same issue.
2021-08-27 17:26:04 -07:00
Stan Shebs
144448d566 Add workaround for infinite looping in ppc vsyscall for sched_getcpu. 2021-08-27 17:26:03 -07:00
Stan Shebs
6a12504329 Add an LD_DEBUG=tls option to help debug thread-local storage handling in ld.so 2021-08-27 17:26:03 -07:00
Stan Shebs
c4d57c29b5 Make multi-arch ifunc support work with clang 2021-08-27 17:26:02 -07:00
Ambrose Feinstein
af63681769 Redesign the fastload support for additional performance 2021-08-27 17:26:02 -07:00
Stan Shebs
8d141ab782 Fix sense of a test in the static-linking version of ppc get_clockfreq 2021-08-27 17:26:02 -07:00
Shu-Chun Weng
e1c6d2b0f4 Makes it compile for AArch64
De-nesting fix in 83c02e85 changed function signature but AArch64 was untested.
2021-08-27 17:26:01 -07:00
Shu-Chun Weng
83bede0cfc Makes AArch64 assembly acceptable to clang
According to ARMv8 architecture reference manual section C7.2.188, SIMD MOV (to
general) instruction format is

  MOV <Xd>, <Vn>.D[<index>]

gas appears to accept "<Vn>.2D[<index>]" as well, but clang's assembler does
not. C.f. https://community.arm.com/developer/ip-products/processors/f/cortex-a-forum/5214/aarch64-assembly-syntax-for-armclang
2021-08-27 17:26:01 -07:00
Siva Chandra Reddy
038be62f96 Include STATIC_PIE_BOOTSTRAP with !NESTING in powerpc64/dl-machine.h 2021-08-27 17:26:01 -07:00
Siva Chandra Reddy
738baca865 Enable relaxed relocations when building certain object files for x86_64. 2021-08-27 17:26:01 -07:00
Siva Chandra Reddy
0337af1396 Un-nest an include in dl-reloc-static-pie.c.
A corresponding adjustment in sysdeps/x86_64/dl-machine.h has also been
made.
2021-08-27 17:26:01 -07:00
Stan Shebs
43afb70033 Disable -mfloat128 for clang, lets power9 insns into power8 executables 2021-08-27 17:26:00 -07:00
Stan Shebs
895947a3ca Also work around clang bctrl issue in get_clockfreq.c 2021-08-27 17:26:00 -07:00
Raman Tenneti
9e8081d123 Changes to compile glibc-2.27 on PPC (Power8) with clang.
+ Use DOT_MACHINE macro instead of ".machine" instruction.
+ Use __isinf and __isinff instead of builtin versions.
+ In s_logb, s_logbf and s_logbl functions, used float versions to
  calculate "ret = x & 0x7f800000;" expression.
2021-08-27 17:23:15 -07:00
Raman Tenneti
bb9e16c6ea Undid the dl_enable_fastload environment variable changes. 2021-08-27 17:23:15 -07:00
Paul Pluzhnikov
590786950c Add "fastload" support. 2021-08-27 17:23:15 -07:00
Stan Shebs
3372bfe221 Work around lack of mfppr in clang 2021-08-27 17:23:14 -07:00
Stan Shebs
960ba7975c Work around mtfsb0 syntax limitation with clang 2021-08-27 17:23:14 -07:00
Stan Shebs
e04e10b431 Avoid passing gcc-specific options to clang 2021-08-27 17:23:14 -07:00
Stan Shebs
452fe68a53 Make asm-based constraints be gcc-only 2021-08-27 17:23:14 -07:00
Stan Shebs
4b86f820b8 Make xxland syntax gcc-only 2021-08-27 17:23:14 -07:00
Stan Shebs
5e4f72b895 Add a first approximation of float definitions for ppc clang 2021-08-27 17:23:14 -07:00
Stan Shebs
e21102f77e Make powerpc .machine directives be gcc-only 2021-08-27 17:23:14 -07:00
Stan Shebs
bb112e11de Make mutex hints gcc-only, improve a type in __arch_compare_and_exchange_bool_32_acq 2021-08-27 17:23:14 -07:00
Stan Shebs
7724302310 Make power6 directives be gcc-only 2021-08-27 17:23:13 -07:00
Stan Shebs
1e88b203b3 Add power9 flag to go with -mfloat128 2021-08-27 17:23:13 -07:00
Stan Shebs
6fd7bec86f Disable more attempts to pass -mlong-double-128 to clang 2021-08-27 17:23:13 -07:00
Stan Shebs
d21dfbccdc Disable attempts to pass -mlong-double-128 to clang 2021-08-27 17:23:13 -07:00
Stan Shebs
b2d69ea7ac Add workaround for infinite looping in ppc vsyscalls 2021-08-27 17:23:13 -07:00
Stan Shebs
6ea6782b69 Work around clang crash by skipping apparently-unneeded asm 2021-08-27 17:23:13 -07:00
Stan Shebs
b35774068a Work around clang problem with ifuncs and vdso 2021-08-27 17:23:12 -07:00
Stan Shebs
96509a9dce Work around a ppc clang inlining bug 2021-08-27 17:23:12 -07:00
Stan Shebs
0f93e3333f Change de-nesting fix to use added argument instead of globals 2021-08-27 17:23:12 -07:00
Stan Shebs
21991760c7 Fix regressions in async-safe TLS, add run-time control for debugging, add more comments 2021-08-27 17:23:12 -07:00
Stan Shebs
c0ab16f8cc Fix TLS problems not handled by cherrypick 2021-08-27 17:23:12 -07:00
Brooks Moses
3e9a530aae Revert upstream removal of async-safe TLS patches. 2021-08-27 17:23:11 -07:00
Andreas Schwab
c4fde9669a Don't write beyond destination in __mempcpy_avx512_no_vzeroupper (bug 23196)
When compiled as mempcpy, the return value is the end of the destination
buffer, thus it cannot be used to refer to the start of it.

(cherry picked from commit 9aaaab7c6e)
2021-08-27 16:22:13 -07:00
Stefan Liebler
b3356fb4a1 Fix blocking pthread_join. [BZ #23137]
On s390 (31bit) if glibc is build with -Os, pthread_join sometimes
blocks indefinitely. This is e.g. observable with
testcase intl/tst-gettext6.

pthread_join is calling lll_wait_tid(tid), which performs the futex-wait
syscall in a loop as long as tid != 0 (thread is alive).

On s390 (and build with -Os), tid is loaded from memory before
comparing against zero and then the tid is loaded a second time
in order to pass it to the futex-wait-syscall.
If the thread exits in between, then the futex-wait-syscall is
called with the value zero and it waits until a futex-wake occurs.
As the thread is already exited, there won't be a futex-wake.

In lll_wait_tid, the tid is stored to the local variable __tid,
which is then used as argument for the futex-wait-syscall.
But unfortunately the compiler is allowed to reload the value
from memory.

With this patch, the tid is loaded with atomic_load_acquire.
Then the compiler is not allowed to reload the value for __tid from memory.

ChangeLog:

	[BZ #23137]
	* sysdeps/nptl/lowlevellock.h (lll_wait_tid):
	Use atomic_load_acquire to load __tid.

(cherry picked from commit 1660901840)
2021-08-27 16:22:12 -07:00
Joseph Myers
1ab675ca63 Add PTRACE_SECCOMP_GET_METADATA from Linux 4.16 to sys/ptrace.h.
This patch adds the PTRACE_SECCOMP_GET_METADATA constant from Linux
4.16 to all relevant sys/ptrace.h files.  A type struct
__ptrace_seccomp_metadata, analogous to other such types, is also
added.

Tested for x86_64, and with build-many-glibcs.py.

	* sysdeps/unix/sysv/linux/sys/ptrace.h
	(PTRACE_SECCOMP_GET_METADATA): New enum value and macro.
	* sysdeps/unix/sysv/linux/bits/ptrace-shared.h
	(struct __ptrace_seccomp_metadata): New type.
	* sysdeps/unix/sysv/linux/aarch64/sys/ptrace.h
	(PTRACE_SECCOMP_GET_METADATA): Likewise.
	* sysdeps/unix/sysv/linux/arm/sys/ptrace.h
	(PTRACE_SECCOMP_GET_METADATA): Likewise.
	* sysdeps/unix/sysv/linux/ia64/sys/ptrace.h
	(PTRACE_SECCOMP_GET_METADATA): Likewise.
	* sysdeps/unix/sysv/linux/powerpc/sys/ptrace.h
	(PTRACE_SECCOMP_GET_METADATA): Likewise.
	* sysdeps/unix/sysv/linux/s390/sys/ptrace.h
	(PTRACE_SECCOMP_GET_METADATA): Likewise.
	* sysdeps/unix/sysv/linux/sparc/sys/ptrace.h
	(PTRACE_SECCOMP_GET_METADATA): Likewise.
	* sysdeps/unix/sysv/linux/tile/sys/ptrace.h
	(PTRACE_SECCOMP_GET_METADATA): Likewise.
	* sysdeps/unix/sysv/linux/x86/sys/ptrace.h
	(PTRACE_SECCOMP_GET_METADATA): Likewise.

(cherry picked from commit 9320ca88a1)
2021-08-27 16:22:11 -07:00