Commit Graph

1215 Commits

Author SHA1 Message Date
Joseph Myers
e44acb2063 Use floor functions not __floor functions in glibc libm.
Similar to the changes that were made to call sqrt functions directly
in glibc, instead of __ieee754_sqrt variants, so that the compiler
could inline them automatically without needing special inline
definitions in lots of math_private.h headers, this patch makes libm
code call floor functions directly instead of __floor variants,
removing the inlines / macros for x86_64 (SSE4.1) and powerpc
(POWER5).

The redirection used to ensure that __ieee754_sqrt does still get
called when the compiler doesn't inline a built-in function expansion
is refactored so it can be applied to other functions; the refactoring
is arranged so it's not limited to unary functions either (it would be
reasonable to use this mechanism for copysign - removing the inline in
math_private_calls.h but also eliminating unnecessary local PLT entry
use in the cases (powerpc soft-float and e500v1, for IBM long double)
where copysign calls don't get inlined).

The point of this change is that more architectures can get floor
calls inlined where they weren't previously (AArch64, for example),
without needing special inline definitions in their math_private.h,
and existing such definitions in math_private.h headers can be
removed.

Note that it's possible that in some cases an inline may be used where
an IFUNC call was previously used - this is the case on x86_64, for
example.  I think the direct calls to floor are still appropriate; if
there's any significant performance cost from inline SSE2 floor
instead of an IFUNC call ending up with SSE4.1 floor, that indicates
that either the function should be doing something else that's faster
than using floor at all, or it should itself have IFUNC variants, or
that the compiler choice of inlining for generic tuning should change
to allow for the possibility that, by not inlining, an SSE4.1 IFUNC
might be called at runtime - but not that glibc should avoid calling
floor internally.  (After all, all the same considerations would apply
to any user program calling floor, where it might either be inlined or
left as an out-of-line call allowing for a possible IFUNC.)

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT):
	New macro.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (floor): Likewise.
	* sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_floorf.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_floor.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise.
	* sysdeps/ieee754/float128/s_floorf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_floorf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
	* sysdeps/riscv/rvf/s_floorf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor):
	Remove macro.
	[_ARCH_PWR5X] (__floorf): Likewise.
	* sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove
	inline function.
	[__SSE4_1__] (__floorf): Likewise.
	* math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions
	instead of __floor variants.
	* math/w_lgamma_r_compat.c (__lgamma_r): Likewise.
	* math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise.
	* math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise.
	* math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise.
	* math/w_lgammal_r_compat.c (__lgammal_r): Likewise.
	* math/w_tgamma_compat.c (__tgamma): Likewise.
	* math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise.
	* math/w_tgammaf_compat.c (__tgammaf): Likewise.
	* math/w_tgammal_compat.c (__tgammal): Likewise.
	* sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise.
	* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
	Likewise.
	* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
	* sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise.
	* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
	* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise.
	* sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-14 13:09:01 +00:00
Joseph Myers
4e7fbdd7c2 Remove x86_64 math_private.h asms.
The x86_64 math_private.h has asm versions of the macros to
reinterpret between floating-point and integer types.

This is the sort of thing we now strongly discourage; the expectation
in such cases, where the generic C code gives the compiler all the
information needed about the required semantics, is that you should
get the compiler to do the right thing for the generic C code rather
than writing an asm version.

Trivial tests showed GCC generates the expected single instructions
for reinterpretation from floating point to integer.  In the other
direction, it goes via memory when the asms don't; I asked about this
in GCC bug 87236 and was advised this was deliberate for generic
tuning because it was faster that way on some AMD processors (but
-mtune=intel, and -Os with the latest GCC, avoid going via memory).
The asms don't and can't know about those tuning details, so that's
evidence that they are actually making the code worse.

This patch removes the asms accordingly.  Tested for x86_64.

	* sysdeps/x86_64/fpu/math_private.h (MOVD): Remove macro.
	(MOVQ): Likewise.
	(EXTRACT_WORDS64): Likewise.
	(INSERT_WORDS64): Likewise.
	(GET_FLOAT_WORD): Likewise.
	(SET_FLOAT_WORD): Likewise.
2018-09-11 14:51:40 +00:00
Szabolcs Nagy
e70c176825 Add new exp and exp2 implementations
Optimized exp and exp2 implementations using a lookup table for
fractional powers of 2.  There are several variants, see e_exp_data.c,
they can be selected by modifying math_config.h allowing different
tradeoffs.

The default selection should be acceptable as generic libm code.
Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on
aarch64 the rodata size is 2160 bytes, shared between exp and exp2.
On aarch64 .text + .rodata size decreased by 24912 bytes.

The non-nearest rounding error is less than 1 ULP even on targets
without efficient round implementation (although the error rate is
higher in that case).  Targets with single instruction, rounding mode
independent, to nearest integer rounding and conversion can use them
by setting TOINT_INTRINSICS and adding the necessary code to their
math_private.h.

The __exp1 code uses the same algorithm, so the error bound of pow
increased a bit.

New double precision error handling code was added following the
style of the single precision error handling code.

Improvements on Cortex-A72 compared to current glibc master:
exp thruput: 1.61x in [-9.9 9.9]
exp latency: 1.53x in [-9.9 9.9]
exp thruput: 1.13x in [0.5 1]
exp latency: 1.30x in [0.5 1]
exp2 thruput: 2.03x in [-9.9 9.9]
exp2 latency: 1.64x in [-9.9 9.9]

For small (< 1) inputs the current exp code uses a separate algorithm
so the speed up there is less.

Was tested on
aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and
arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and
x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and
powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets,
only non-nearest rounding ulp errors increase and they are within
acceptable bounds (ulp updates are in separate patches).

	* NEWS: Mention exp and exp2 improvements.
	* math/Makefile (libm-support): Remove t_exp.
	(type-double-routines): Add math_err and e_exp_data.
	* sysdeps/aarch64/libm-test-ulps: Update.
	* sysdeps/arm/libm-test-ulps: Update.
	* sysdeps/i386/fpu/e_exp_data.c: New file.
	* sysdeps/i386/fpu/math_err.c: New file.
	* sysdeps/i386/fpu/t_exp.c: Remove.
	* sysdeps/ia64/fpu/e_exp_data.c: New file.
	* sysdeps/ia64/fpu/math_err.c: New file.
	* sysdeps/ia64/fpu/t_exp.c: Remove.
	* sysdeps/ieee754/dbl-64/e_exp.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_exp_data.c: New file.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound.
	* sysdeps/ieee754/dbl-64/eexp.tbl: Remove.
	* sysdeps/ieee754/dbl-64/math_config.h: New file.
	* sysdeps/ieee754/dbl-64/math_err.c: New file.
	* sysdeps/ieee754/dbl-64/t_exp.c: Remove.
	* sysdeps/ieee754/dbl-64/t_exp2.h: Remove.
	* sysdeps/ieee754/dbl-64/uexp.h: Remove.
	* sysdeps/ieee754/dbl-64/uexp.tbl: Remove.
	* sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file.
	* sysdeps/m68k/m680x0/fpu/math_err.c: New file.
	* sysdeps/m68k/m680x0/fpu/t_exp.c: Remove.
	* sysdeps/powerpc/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2018-09-05 16:22:00 +01:00
Paul Pluzhnikov
a6e8926f8d [BZ #20271] Add newlines in __libc_fatal calls. 2018-08-31 18:04:32 -07:00
Joseph Myers
ff6b24501f Split fenv_private.h out of math_private.h more consistently.
On some architectures, the parts of math_private.h relating to the
floating-point environment are in a separate file fenv_private.h
included from math_private.h.  As this is purely an
architecture-specific convention used by several architectures,
however, all such architectures still need their own math_private.h,
even if it has nothing to do beyond #include <fenv_private.h> and
peculiarity of including the i386 file directly instead of having a
shared file in sysdeps/x86.

This patch makes the fenv_private.h name an architecture-independent
convention in glibc.  The include of fenv_private.h from
math_private.h becomes architecture-independent (until callers are
updated to include fenv_private.h directly so the include from
math_private.h is no longer needed).  Some architecture math_private.h
headers are removed if no longer needed, or renamed to fenv_private.h
if all they define belongs in that header; architecture fenv_private.h
headers now do require #include_next <fenv_private.h>.  The i386
fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is
actually shared with x86_64.  The generic math_private.h gets a new
include of <stdbool.h>, as needed for bool in some prototypes in that
header (previously that was indirectly included via include/fenv.h,
which now only gets included too late in math_private.h, after those
prototypes).

Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.

	* sysdeps/aarch64/fpu/fenv_private.h: New file.  Based on ....
	* sysdeps/aarch64/fpu/math_private.h: ... this file.  All contents
	moved to fenv_private.h except for ...
	(TOINT_INTRINSICS): Kept in math_private.h.
	(roundtoint): Likewise.
	(converttoint): Likewise.
	* sysdeps/arm/fenv_private.h: Change multiple-include guard to
	[ARM_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/arm/math_private.h: Remove.
	* sysdeps/generic/fenv_private.h: New file.  Contents moved from
	....
	* sysdeps/generic/math_private.h: ... this file.  Include
	<stdbool.h>.  Do not include <fenv.h> or <get-rounding-mode.h>.
	Include <fenv_private.h>.  Remove functions and macros moved to
	fenv_private.h.
	* sysdeps/i386/fpu/math_private.h: Remove.
	* sysdeps/mips/math_private.h: Move to ....
	* sysdeps/mips/fpu/fenv_private.h: ... here.  Change
	multiple-include guard to [MIPS_FENV_PRIVATE_H].  Remove
	[__mips_hard_float] conditional.  Include next <fenv_private.h>.
	* sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include
	guard to [POWERPC_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/powerpc/fpu/math_private.h: Do not include
	<fenv_private.h>.
	* sysdeps/riscv/rvf/math_private.h: Move to ....
	* sysdeps/riscv/rvf/fenv_private.h: ... here.  Change
	multiple-include guard to [RISCV_FENV_PRIVATE_H].  Include next
	<fenv_private.h>.
	* sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard
	to [SPARC_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/sparc/fpu/math_private.h: Remove.
	* sysdeps/i386/fpu/fenv_private.h: Move to ....
	* sysdeps/x86/fpu/fenv_private.h: ... here.  Change
	multiple-include guard to [X86_FENV_PRIVATE_H].  Include next
	<fenv_private.h>.
	* sysdeps/x86_64/fpu/math_private.h: Do not include
	<sysdeps/i386/fpu/fenv_private.h>.
2018-08-28 20:48:49 +00:00
Wilco Dijkstra
49acec179c Fix spaces in x86_64 ULP file
Fix a few missing spaces, it's now identical to the regenerated version.

Passes GLIBC tests on x64.

	* sysdeps/x86_64/fpu/libm-test-ulps: Regenerate to fix spaces.
2018-08-15 12:56:22 +01:00
Wilco Dijkstra
599cf39766 Improve performance of sinf and cosf
The second patch improves performance of sinf and cosf using the same
algorithms and polynomials.  The returned values are identical to sincosf
for the same input.  ULP definitions for AArch64 and x64 are updated.

sinf/cosf througput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.2x
* |x| < M_PI_4  : 1.8x
* |x| < 2 * M_PI: 1.7x
* |x| < 120.0   : 2.3x
* |x| < Inf     : 3.0x

	* NEWS: Mention sinf, cosf, sincosf.
	* sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of
	constants rather than including generic sincosf.h.
	* sysdeps/x86_64/fpu/s_sincosf_data.c: Remove.
	* sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite.
	* sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove.
	(reduced_cos): Remove.
	(sinf_poly): New function.
	* sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
2018-08-14 10:45:59 +01:00
Joseph Myers
2ce7ba7d15 Move SNAN_TESTS_* out of math-tests.h.
Continuing moving macros out of math-tests.h to smaller headers
following typo-proof conventions instead of using #ifndef, this patch
moves the SNAN_TESTS_* macros for individual types out to their own
sysdeps header (while the type-generic SNAN_TESTS wrapper for those
macros remains in math-tests.h).

Tested for x86_64 and x86, and with build-many-glibcs.py.

	* sysdeps/generic/math-tests-snan.h: New file.
	* sysdeps/generic/math-tests.h: Include <math-tests-snan.h>.
	(SNAN_TESTS_float): Do not define here.
	(SNAN_TESTS_double): Likewise.
	(SNAN_TESTS_long_double): Likewise.
	(SNAN_TESTS_float128): Likewise.
	* sysdeps/i386/fpu/math-tests-snan.h: New file.
	* sysdeps/i386/fpu/math-tests.h: Remove file.
	* sysdeps/ia64/math-tests-snan.h: New file.
	* sysdeps/ia64/math-tests.h: Remove file.
	* sysdeps/x86/math-tests.h: Likewise.
	* sysdeps/x86_64/fpu/math-tests-snan.h: New file.
2018-08-10 19:22:01 +00:00
Wilco Dijkstra
ea5c662c62 Improve performance of sincosf
This patch is a complete rewrite of sincosf.  The new version is
significantly faster, as well as simple and accurate.
The worst-case ULP is 0.5607, maximum relative error is 0.5303 * 2^-23 over
all 4 billion inputs.  In non-nearest rounding modes the error is 1ULP.

The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction.  The code uses approximate integer
comparisons to quickly decide between these cases.

The small range reducer uses a single reduction step to handle values up to
120.0.  It is fastest on targets which support inlined round instructions.

The large range reducer uses integer arithmetic for simplicity.  It does a
32x96 bit multiply to compute a 64-bit modulo result.  This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4.  It could be further optimized, however it is
already much faster than necessary.

sincosf throughput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.6x
* |x| < M_PI_4  : 1.7x
* |x| < 2 * M_PI: 1.5x
* |x| < 120.0   : 1.8x
* |x| < Inf     : 2.3x

	* math/Makefile: Add s_sincosf_data.c.
	* sysdeps/ia64/fpu/s_sincosf_data.c: New file.
	* sysdeps/ieee754/flt-32/s_sincosf.h (abstop12): Add new function.
	(sincosf_poly): Likewise.
	(reduce_small): Likewise.
	(reduce_large): Likewise.
	* sysdeps/ieee754/flt-32/s_sincosf.c (sincosf): Rewrite.
	* sysdeps/ieee754/flt-32/s_sincosf_data.c: New file with sincosf data.
	* sysdeps/m68k/m680x0/fpu/s_sincosf_data.c: New file.
	* sysdeps/x86_64/fpu/s_sincosf_data.c: New file.
2018-08-10 17:34:39 +01:00
Ilya Leoshkevich
8d997d2253 Move __fentry__ version definition to sysdeps/{i386,x86_64}
__fentry__ symbol is currently not defined for other architectures.
Attempts to introduce it cause abicheck to fail, because it will be
available since 2.29 earliest, and not 2.13, which is the case for
Intel.  With the new code, abicheck passes for i686-linux-gnu,
x86_64-linux-gnu and x86_64-linux-gnu32 triples.

ChangeLog:

	* stdlib/Versions: Remove __fentry__.
	* sysdeps/i386/Versions: Add __fentry__.
	* sysdeps/x86_64/Versions: Add __fentry__.
2018-08-10 09:07:44 +02:00
H.J. Lu
fb4c32aef6 x86: Move STATE_SAVE_OFFSET/STATE_SAVE_MASK to sysdep.h
Move STATE_SAVE_OFFSET and STATE_SAVE_MASK to sysdep.h to make
sysdeps/x86/cpu-features.h a C header file.

	* sysdeps/x86/cpu-features.h (STATE_SAVE_OFFSET): Removed.
	(STATE_SAVE_MASK): Likewise.
	Don't check __ASSEMBLER__ to include <cpu-features-offsets.h>.
	* sysdeps/x86/sysdep.h (STATE_SAVE_OFFSET): New.
	(STATE_SAVE_MASK): Likewise.
	* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features-offsets.h>
	instead of <cpu-features.h>.
2018-08-06 06:25:43 -07:00
H.J. Lu
430388d5dc x86: Don't include <init-arch.h> in assembly codes
There is no need to include <init-arch.h> in assembly codes since all
x86 IFUNC selector functions are written in C.  Tested on i686 and
x86-64.  There is no code change in libc.so, ld.so and libmvec.so.

	* sysdeps/i386/i686/multiarch/bzero-ia32.S: Don't include
	<init-arch.h>.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise.
2018-08-03 08:05:00 -07:00
Siddhesh Poyarekar
dce452dc52 Rename the glibc.tune namespace to glibc.cpu
The glibc.tune namespace is vaguely named since it is a 'tunable', so
give it a more specific name that describes what it refers to.  Rename
the tunable namespace to 'cpu' to more accurately reflect what it
encompasses.  Also rename glibc.tune.cpu to glibc.cpu.name since
glibc.cpu.cpu is weird.

	* NEWS: Mention the change.
	* elf/dl-tunables.list: Rename tune namespace to cpu.
	* sysdeps/powerpc/dl-tunables.list: Likewise.
	* sysdeps/x86/dl-tunables.list: Likewise.
	* sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to
	cpu.name.
	* elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust.
	* elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise.
	* manual/README.tunables: Likewise.
	* manual/tunables.texi: Likewise.
	* sysdeps/powerpc/cpu-features.c: Likewise.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
	(init_cpu_features): Likewise.
	* sysdeps/x86/cpu-features.c: Likewise.
	* sysdeps/x86/cpu-features.h: Likewise.
	* sysdeps/x86/cpu-tunables.c: Likewise.
	* sysdeps/x86_64/Makefile: Likewise.
	* sysdeps/x86/dl-cet.c: Likewise.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2018-08-02 23:49:19 +05:30
H.J. Lu
9aa3113a42 x86: Rename __glibc_reserved2 to ssp_base in tcbhead_t
This will be used to record the current shadow stack base for shadow
stack switching by getcontext, makecontext, setcontext and swapcontext.
If the target shadow stack base is the same as the current shadow stack
base, we unwind the shadow stack.  Otherwise it is a stack switch and
we look for a restore token to restore the target shadow stack.

	* sysdeps/i386/nptl/tcb-offsets.sym (SSP_BASE_OFFSET): New.
	* sysdeps/i386/nptl/tls.h (tcbhead_t): Replace __glibc_reserved2
	with ssp_base.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (SSP_BASE_OFFSET): New.
	* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Replace __glibc_reserved2
	with ssp_base.
2018-07-25 04:39:39 -07:00
H.J. Lu
ca027e0f62 x86-64: Add endbr64 to tst-quadmod[12].S
Add endbr64 to tst-quadmod1.S and tst-quadmod2.S so that func and foo
can be called indirectly.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/x86_64/tst-quadmod1.S (func): Add endbr64 if IBT is
	enabled.
	(foo): Likewise.
	* sysdeps/x86_64/tst-quadmod2.S (func) : Likewise.
	(foo): Likewise.
2018-07-24 05:12:14 -07:00
H.J. Lu
e2d40a8822 x86-64: Use _CET_NOTRACK in memcmp-sse4.S
* sysdeps/x86_64/multiarch/memcmp-sse4.S (BRANCH_TO_JMPTBL_ENTRY):
	Add _CET_NOTRACK before indirect jump to jump table.
2018-07-18 08:07:32 -07:00
H.J. Lu
03aaf49b68 x86-64: Use _CET_NOTRACK in memcpy-ssse3.S
* sysdeps/x86_64/multiarch/memcpy-ssse3.S
	(BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump
	to jump table.
	(MEMCPY): Likewise.
2018-07-18 06:39:46 -07:00
H.J. Lu
811e9e52b2 x86-64: Use _CET_NOTRACK in memcpy-ssse3-back.S
* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S
	(BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump
	to jump table.
	(MEMCPY): Likewise.
2018-07-18 06:38:23 -07:00
H.J. Lu
8817df4265 x86-64: Use _CET_NOTRACK in strcmp-sse42.S
* sysdeps/x86_64/multiarch/strcmp-sse42.S (STRCMP_SSE42): Add
	_CET_NOTRACK before indirect jump to jump table.
2018-07-18 06:37:09 -07:00
H.J. Lu
921595d151 x86-64: Use _CET_NOTRACK in strcpy-sse2-unaligned.S
* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S
	(BRANCH_TO_JMPTBL_ENTRY): Add _CET_NOTRACK before indirect jump
	to jump table.
2018-07-18 06:33:06 -07:00
H.J. Lu
4ef60d9597 x86_64: Use _CET_NOTRACK in strcmp.S
* sysdeps/x86_64/strcmp.S (STRCMP): Add _CET_NOTRACK before
	indirect jump to jump table.
2018-07-18 06:31:53 -07:00
H.J. Lu
5efc6777ad x86-64: Add _CET_ENDBR to STRCMP_SSE42
Add _CET_ENDBR to STRCMP_SSE42, which is called indirectly, to support
IBT.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/x86_64/multiarch/strcmp-sse42.S (STRCMP_SSE42): Add
	_CET_ENDBR.
2018-07-17 16:08:47 -07:00
H.J. Lu
562837c002 x86: Add _CET_ENDBR to functions in dl-tlsdesc.S
Add _CET_ENDBR to functions in dl-tlsdesc.S, which are called indirectly,
to support IBT.

Tested on i686 and x86-64.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/i386/dl-tlsdesc.S (_dl_tlsdesc_return): Add
	_CET_ENDBR.
	(_dl_tlsdesc_undefweak): Likewise.
	(_dl_tlsdesc_dynamic): Likewise.
	(_dl_tlsdesc_resolve_abs_plus_addend): Likewise.
	(_dl_tlsdesc_resolve_rel): Likewise.
	(_dl_tlsdesc_resolve_rela): Likewise.
	(_dl_tlsdesc_resolve_hold): Likewise.
	* sysdeps/x86_64/dl-tlsdesc.S (_dl_tlsdesc_return): Likewise.
	(_dl_tlsdesc_undefweak): Likewise.
	(_dl_tlsdesc_dynamic): Likewise.
	(_dl_tlsdesc_resolve_rela): Likewise.
	(_dl_tlsdesc_resolve_hold): Likewise.
2018-07-17 16:07:17 -07:00
H.J. Lu
124bcde683 x86: Add _CET_ENDBR to functions in crti.S
Add _CET_ENDBR to functions in crti.S, which are called indirectly, to
support IBT.

Tested on i686 and x86-64.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/i386/crti.S (_init): Add _CET_ENDBR.
	(_fini): Likewise.
	* sysdeps/x86_64/crti.S (_init): Likewise.
	(_fini): Likewise.
2018-07-17 16:05:18 -07:00
H.J. Lu
f753fa7dea x86: Support IBT and SHSTK in Intel CET [BZ #21598]
Intel Control-flow Enforcement Technology (CET) instructions:

https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en
forcement-technology-preview.pdf

includes Indirect Branch Tracking (IBT) and Shadow Stack (SHSTK).

GNU_PROPERTY_X86_FEATURE_1_IBT is added to GNU program property to
indicate that all executable sections are compatible with IBT when
ENDBR instruction starts each valid target where an indirect branch
instruction can land.  Linker sets GNU_PROPERTY_X86_FEATURE_1_IBT on
output only if it is set on all relocatable inputs.

On an IBT capable processor, the following steps should be taken:

1. When loading an executable without an interpreter, enable IBT and
lock IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the executable.
2. When loading an executable with an interpreter, enable IBT if
GNU_PROPERTY_X86_FEATURE_1_IBT is set on the interpreter.
  a. If GNU_PROPERTY_X86_FEATURE_1_IBT isn't set on the executable,
     disable IBT.
  b. Lock IBT.
3. If IBT is enabled, when loading a shared object without
GNU_PROPERTY_X86_FEATURE_1_IBT:
  a. If legacy interwork is allowed, then mark all pages in executable
     PT_LOAD segments in legacy code page bitmap.  Failure of legacy code
     page bitmap allocation causes an error.
  b. If legacy interwork isn't allowed, it causes an error.

GNU_PROPERTY_X86_FEATURE_1_SHSTK is added to GNU program property to
indicate that all executable sections are compatible with SHSTK where
return address popped from shadow stack always matches return address
popped from normal stack.  Linker sets GNU_PROPERTY_X86_FEATURE_1_SHSTK
on output only if it is set on all relocatable inputs.

On a SHSTK capable processor, the following steps should be taken:

1. When loading an executable without an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on the executable.
2. When loading an executable with an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on interpreter.
  a. If GNU_PROPERTY_X86_FEATURE_1_SHSTK isn't set on the executable
     or any shared objects loaded via the DT_NEEDED tag, disable SHSTK.
  b. Otherwise lock SHSTK.
3. After SHSTK is enabled, it is an error to load a shared object
without GNU_PROPERTY_X86_FEATURE_1_SHSTK.

To enable CET support in glibc, --enable-cet is required to configure
glibc.  When CET is enabled, both compiler and assembler must support
CET.  Otherwise, it is a configure-time error.

To support CET run-time control,

1. _dl_x86_feature_1 is added to the writable ld.so namespace to indicate
if IBT or SHSTK are enabled at run-time.  It should be initialized by
init_cpu_features.
2. For dynamic executables:
   a. A l_cet field is added to struct link_map to indicate if IBT or
      SHSTK is enabled in an ELF module.  _dl_process_pt_note or
      _rtld_process_pt_note is called to process PT_NOTE segment for
      GNU program property and set l_cet.
   b. _dl_open_check is added to check IBT and SHSTK compatibilty when
      dlopening a shared object.
3. Replace i386 _dl_runtime_resolve and _dl_runtime_profile with
_dl_runtime_resolve_shstk and _dl_runtime_profile_shstk, respectively if
SHSTK is enabled.

CET run-time control can be changed via GLIBC_TUNABLES with

$ export GLIBC_TUNABLES=glibc.tune.x86_shstk=[permissive|on|off]
$ export GLIBC_TUNABLES=glibc.tune.x86_ibt=[permissive|on|off]

1. permissive: SHSTK is disabled when dlopening a legacy ELF module.
2. on: IBT or SHSTK are always enabled, regardless if there are IBT or
SHSTK bits in GNU program property.
3. off: IBT or SHSTK are always disabled, regardless if there are IBT or
SHSTK bits in GNU program property.

<cet.h> from CET-enabled GCC is automatically included by assembly codes
to add GNU_PROPERTY_X86_FEATURE_1_IBT and GNU_PROPERTY_X86_FEATURE_1_SHSTK
to GNU program property.  _CET_ENDBR is added at the entrance of all
assembly functions whose address may be taken.  _CET_NOTRACK is used to
insert NOTRACK prefix with indirect jump table to support IBT.  It is
defined as notrack when _CET_NOTRACK is defined in <cet.h>.

	 [BZ #21598]
	* configure.ac: Add --enable-cet.
	* configure: Regenerated.
	* elf/Makefille (all-built-dso): Add a comment.
	* elf/dl-load.c (filebuf): Moved before "dynamic-link.h".
	Include <dl-prop.h>.
	(_dl_map_object_from_fd): Call _dl_process_pt_note on PT_NOTE
	segment.
	* elf/dl-open.c: Include <dl-prop.h>.
	(dl_open_worker): Call _dl_open_check.
	* elf/rtld.c: Include <dl-prop.h>.
	(dl_main): Call _rtld_process_pt_note on PT_NOTE segment.  Call
	_rtld_main_check.
	* sysdeps/generic/dl-prop.h: New file.
	* sysdeps/i386/dl-cet.c: Likewise.
	* sysdeps/unix/sysv/linux/x86/cpu-features.c: Likewise.
	* sysdeps/unix/sysv/linux/x86/dl-cet.h: Likewise.
	* sysdeps/x86/cet-tunables.h: Likewise.
	* sysdeps/x86/check-cet.awk: Likewise.
	* sysdeps/x86/configure: Likewise.
	* sysdeps/x86/configure.ac: Likewise.
	* sysdeps/x86/dl-cet.c: Likewise.
	* sysdeps/x86/dl-procruntime.c: Likewise.
	* sysdeps/x86/dl-prop.h: Likewise.
	* sysdeps/x86/libc-start.h: Likewise.
	* sysdeps/x86/link_map.h: Likewise.
	* sysdeps/i386/dl-trampoline.S (_dl_runtime_resolve): Add
	_CET_ENDBR.
	(_dl_runtime_profile): Likewise.
	(_dl_runtime_resolve_shstk): New.
	(_dl_runtime_profile_shstk): Likewise.
	* sysdeps/linux/x86/Makefile (sysdep-dl-routines): Add dl-cet
	if CET is enabled.
	(CFLAGS-.o): Add -fcf-protection if CET is enabled.
	(CFLAGS-.os): Likewise.
	(CFLAGS-.op): Likewise.
	(CFLAGS-.oS): Likewise.
	(asm-CPPFLAGS): Add -fcf-protection -include cet.h if CET
	is enabled.
	(tests-special): Add $(objpfx)check-cet.out.
	(cet-built-dso): New.
	(+$(cet-built-dso:=.note)): Likewise.
	(common-generated): Add $(cet-built-dso:$(common-objpfx)%=%.note).
	($(objpfx)check-cet.out): New.
	(generated): Add check-cet.out.
	* sysdeps/x86/cpu-features.c: Include <dl-cet.h> and
	<cet-tunables.h>.
	(TUNABLE_CALLBACK (set_x86_ibt)): New prototype.
	(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
	(init_cpu_features): Call get_cet_status to check CET status
	and update dl_x86_feature_1 with CET status.  Call
	TUNABLE_CALLBACK (set_x86_ibt) and TUNABLE_CALLBACK
	(set_x86_shstk).  Disable and lock CET in libc.a.
	* sysdeps/x86/cpu-tunables.c: Include <cet-tunables.h>.
	(TUNABLE_CALLBACK (set_x86_ibt)): New function.
	(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
	* sysdeps/x86/sysdep.h (_CET_NOTRACK): New.
	(_CET_ENDBR): Define if not defined.
	(ENTRY): Add _CET_ENDBR.
	* sysdeps/x86/dl-tunables.list (glibc.tune): Add x86_ibt and
	x86_shstk.
	* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Add
	_CET_ENDBR.
	(_dl_runtime_profile): Likewise.
2018-07-16 14:08:27 -07:00
H.J. Lu
faaee1f07e x86: Support shadow stack pointer in setjmp/longjmp
Save and restore shadow stack pointer in setjmp and longjmp to support
shadow stack in Intel CET.  Use feature_1 in tcbhead_t to check if
shadow stack is enabled before saving and restoring shadow stack pointer.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	* sysdeps/i386/__longjmp.S: Include <jmp_buf-ssp.h>.
	(__longjmp): Restore shadow stack pointer if shadow stack is
	enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
	isn't defined for __longjmp_cancel.
	* sysdeps/i386/bsd-_setjmp.S: Include <jmp_buf-ssp.h>.
	(_setjmp): Save shadow stack pointer if shadow stack is enabled
	and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/i386/bsd-setjmp.S: Include <jmp_buf-ssp.h>.
	(setjmp): Save shadow stack pointer if shadow stack is enabled
	and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/i386/setjmp.S: Include <jmp_buf-ssp.h>.
	(__sigsetjmp): Save shadow stack pointer if shadow stack is
	enabled and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/unix/sysv/linux/i386/____longjmp_chk.S: Include
	<jmp_buf-ssp.h>.
	(____longjmp_chk): Restore shadow stack pointer if shadow stack
	is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/unix/sysv/linux/x86/Makefile (gen-as-const-headers):
	Remove jmp_buf-ssp.sym.
	* sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S: Include
	<jmp_buf-ssp.h>.
	(____longjmp_chk): Restore shadow stack pointer if shadow stack
	is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
	* sysdeps/x86/Makefile (gen-as-const-headers): Add
	jmp_buf-ssp.sym.
	* sysdeps/x86/jmp_buf-ssp.sym: New dummy file.
	* sysdeps/x86_64/__longjmp.S: Include <jmp_buf-ssp.h>.
	(__longjmp): Restore shadow stack pointer if shadow stack is
	enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
	isn't defined for __longjmp_cancel.
	* sysdeps/x86_64/setjmp.S: Include <jmp_buf-ssp.h>.
	(__sigsetjmp): Save shadow stack pointer if shadow stack is
	enabled and SHADOW_STACK_POINTER_OFFSET is defined.
2018-07-14 05:59:53 -07:00
H.J. Lu
ebff9c5cfa x86: Rename __glibc_reserved1 to feature_1 in tcbhead_t [BZ #22563]
feature_1 has X86_FEATURE_1_IBT and X86_FEATURE_1_SHSTK bits for CET
run-time control.

CET_ENABLED, IBT_ENABLED and SHSTK_ENABLED are defined to 1 or 0 to
indicate that if CET, IBT and SHSTK are enabled.

<tls-setup.h> is added to set up thread-local data.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>

	[BZ #22563]
	* nptl/pthread_create.c: Include <tls-setup.h>.
	(__pthread_create_2_1): Call tls_setup_tcbhead.
	* sysdeps/generic/tls-setup.h: New file.
	* sysdeps/x86/nptl/tls-setup.h: Likewise.
	* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET):
	Likewise.
	* sysdeps/i386/nptl/tls.h (tcbhead_t): Rename __glibc_reserved1
	to feature_1.
	* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Likewise.
	* sysdeps/x86/sysdep.h (X86_FEATURE_1_IBT): New.
	(X86_FEATURE_1_SHSTK): Likewise.
	(CET_ENABLED): Likewise.
	(IBT_ENABLED): Likewise.
	(SHSTK_ENABLED): Likewise.
2018-07-14 05:56:46 -07:00
H.J. Lu
0221ce2a90 i386: Change offset of __private_ss to 0x30 [BZ #23250]
sysdeps/i386/nptl/tls.h has

typedef struct
{
  void *tcb;            /* Pointer to the TCB.  Not necessarily the
                           thread descriptor used by libpthread.  */
  dtv_t *dtv;
  void *self;           /* Pointer to the thread descriptor.  */
  int multiple_threads;
  uintptr_t sysinfo;
  uintptr_t stack_guard;
  uintptr_t pointer_guard;
  int gscope_flag;
  int __glibc_reserved1;
  /* Reservation of some values for the TM ABI.  */
  void *__private_tm[4];
  /* GCC split stack support.  */
  void *__private_ss;
} tcbhead_t;

The offset of __private_ss is 0x34.  But GCC defines

/* We steal the last transactional memory word.  */
 #define TARGET_THREAD_SPLIT_STACK_OFFSET 0x30

and libgcc/config/i386/morestack.S has

	cmpl	%gs:0x30,%eax		# See if we have enough space.
	movl	%eax,%gs:0x30		# Save the new stack boundary.
	movl	%eax,%gs:0x30		# Save the new stack boundary.
	movl	%ecx,%gs:0x30		# Save new stack boundary.
	movl	%eax,%gs:0x30
	movl	%gs:0x30,%eax
	movl	%eax,%gs:0x30

Since update TARGET_THREAD_SPLIT_STACK_OFFSET changes split stack ABI,
this patch updates tcbhead_t to match GCC.

	[BZ #23250]
	[BZ #10686]
	* sysdeps/i386/nptl/tls.h (tcbhead_t): Change __private_tm[4]
	to _private_tm[3] and add __glibc_reserved2.
	Add _Static_assert of offset of __private_ss == 0x30.
	* sysdeps/x86_64/nptl/tls.h: Add _Static_assert of offset of
	__private_ss == 0x40 for ILP32 and == 0x70 for LP64.
2018-06-12 06:34:48 -07:00
Florian Weimer
e826574c98 x86: Make strncmp usable from rtld
Due to the way the conditions were written, the rtld build of strncmp
ended up with no definition of the strncmp symbol at all: The
implementations were renamed for use within an IFUNC resolver, but the
IFUNC resolver itself was missing (because rtld does not use IFUNCs).

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2018-06-12 15:00:33 +02:00
H.J. Lu
67c0579669 Mark _init and _fini as hidden [BZ #23145]
_init and _fini are special functions provided by glibc for linker to
define DT_INIT and DT_FINI in executable and shared library.  They
should never be put in dynamic symbol table.  This patch marks them as
hidden to remove them from dynamic symbol table.

Tested with build-many-glibcs.py.

	[BZ #23145]
	* elf/Makefile (tests-special): Add $(objpfx)check-initfini.out.
	($(all-built-dso:=.dynsym): New target.
	(common-generated): Add $(all-built-dso:$(common-objpfx)%=%.dynsym).
	($(objpfx)check-initfini.out): New target.
	(generated): Add check-initfini.out.
	* scripts/check-initfini.awk: New file.
	* sysdeps/aarch64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/alpha/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/arm/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/hppa/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/i386/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/ia64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/m68k/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/microblaze/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/mips/mips32/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/mips/mips64/n32/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/mips/mips64/n64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/nios2/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/powerpc/powerpc32/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/powerpc/powerpc64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/s390/s390-32/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/s390/s390-64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/sh/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/sparc/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
	* sysdeps/x86_64/crti.S (_init): Mark as hidden.
	(_fini): Likewise.
2018-06-08 10:28:52 -07:00
Leonardo Sandoval
1457016337 x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2
Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector
comparison as much as possible. Peak performance observed on a SkyLake
machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp,
respectively. The larger the comparison length, the more benefit using
avx2 functions, except on the strcmp, where peak is observed at length
== 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper
is preferred and AVX unaligned load is fast.

NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input.  TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strcmp-avx2, strncmp-avx2, wcscmp-avx2, wcscmp-sse2, wcsncmp-avx2 and
	wcsncmp-sse2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add tests for __strcmp_avx2,
	__strncmp_avx2,	__wcscmp_avx2, __wcsncmp_avx2, __wcscmp_sse2
	and __wcsncmp_sse2.
	* sysdeps/x86_64/multiarch/strcmp.c (OPTIMIZE (avx2)):
	(IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if
	AVX unaligned load is fast and vzeroupper is preferred.
	* sysdeps/x86_64/multiarch/strncmp.c: Likewise.
	* sysdeps/x86_64/multiarch/strcmp-avx2.S: New file.
	* sysdeps/x86_64/multiarch/strncmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp-sse2.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp.c: Likewise.
	* sysdeps/x86_64/wcscmp.S (__wcscmp): Add alias only if __wcscmp
	is undefined.
2018-06-01 16:32:43 -05:00
Paul Pluzhnikov
50d004c91c Update ulps with "make regen-ulps" on AMD Ryzen 7 1800X.
2018-05-30  Paul Pluzhnikov  <ppluzhnikov@google.com>

	* sysdeps/x86_64/fpu/libm-test-ulps (log_vlen8_avx2): Update for
	AMD Ryzen 7 1800X.
2018-05-30 09:17:47 -07:00
H.J. Lu
727b38df05 x86-64: Skip zero length in __mem[pcpy|move|set]_erms
This patch skips zero length in __mempcpy_erms, __memmove_erms and
__memset_erms.

Tested on x86-64.

	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
	(__mempcpy_erms): Skip zero length.
	(__memmove_erms): Likewise.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
	(__memset_erms): Likewise.
2018-05-23 11:25:42 -07:00
Andreas Schwab
9aaaab7c6e Don't write beyond destination in __mempcpy_avx512_no_vzeroupper (bug 23196)
When compiled as mempcpy, the return value is the end of the destination
buffer, thus it cannot be used to refer to the start of it.
2018-05-23 09:50:57 +02:00
H.J. Lu
e28e9b1ec4 x86-64: Check Prefer_FSRM in ifunc-memmove.h
Although the REP MOVSB implementations of memmove, memcpy and mempcpy
aren't used by the current processors, this patch adds Prefer_FSRM
check in ifunc-memmove.h so that they can be used in the future.

	* sysdeps/x86/cpu-features.h (bit_arch_Prefer_FSRM): New.
	(index_arch_Prefer_FSRM): Likewise.
	* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
	Also check Prefer_FSRM.
	* sysdeps/x86_64/multiarch/ifunc-memmove.h (IFUNC_SELECTOR):
	Also return OPTIMIZE (erms) for Prefer_FSRM.
2018-05-21 16:54:59 -07:00
Leonardo Sandoval
e4ebc1380d x86-64: remove duplicate line on PREFETCH_ONE_SET macro
Tested on 64-bit AVX machine

       * sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
       (PREFETCH_ONE_SET): Remove duplicate line
2018-05-17 09:09:13 -05:00
H.J. Lu
0068c08588 nptl: Remove __ASSUME_PRIVATE_FUTEX
Since __ASSUME_PRIVATE_FUTEX is always defined, this patch removes the
!__ASSUME_PRIVATE_FUTEX paths.

Tested with build-many-glibcs.py.

	* nptl/allocatestack.c (allocate_stack): Remove the
	!__ASSUME_PRIVATE_FUTEX paths.
	* nptl/descr.h (header): Remove the !__ASSUME_PRIVATE_FUTEX path.
	* nptl/nptl-init.c (__pthread_initialize_minimal_internal):
	Likewise.
	* sysdeps/i386/nptl/tcb-offsets.sym (PRIVATE_FUTEX): Removed.
	* sysdeps/powerpc/nptl/tcb-offsets.sym (PRIVATE_FUTEX): Likewise.
	* sysdeps/sh/nptl/tcb-offsets.sym (PRIVATE_FUTEX): Likewise.
	* sysdeps/x86_64/nptl/tcb-offsets.sym (PRIVATE_FUTEX): Likewise.
	* sysdeps/i386/nptl/tls.h: (tcbhead_t): Remve the
	!__ASSUME_PRIVATE_FUTEX path.
	* sysdeps/s390/nptl/tls.h (tcbhead_t): Likewise.
	* sysdeps/sparc/nptl/tls.h (tcbhead_t): Likewise.
	* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Likewise.
	* sysdeps/unix/sysv/linux/i386/lowlevellock.S: Remove the
	!__ASSUME_PRIVATE_FUTEX macros.
	* sysdeps/unix/sysv/linux/lowlevellock-futex.h: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/cancellation.S: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: Likewise.
	* sysdeps/unix/sysv/linux/kernel-features.h
	(__ASSUME_PRIVATE_FUTEX): Removed.
2018-05-17 04:25:10 -07:00
H.J. Lu
04958880e0 x86-64: Use IFUNC strncat inside libc.so
Unlike i386, we can call hidden IFUNC functions inside libc.so since
x86-64 PLT is always PIC.

Tested on x86-64.

	* sysdeps/x86_64/multiarch/strncat-c.c (STRNCAT_PRIMARY): Removed.
	Include <string/strncat.c>.
	* sysdeps/x86_64/multiarch/strncat.c (__strncat): New strong
	alias.
	(__GI___strncat): New hidden alias.
2018-05-16 09:04:35 -07:00
H.J. Lu
98ee36c7a4 x86: Add sysdeps/x86/ldsodefs.h
Merge sysdeps/i386/ldsodefs.h and sysdeps/x86_64/ldsodefs.h into
sysdeps/x86/ldsodefs.h.

Tested on i686 and x86-64.

	* sysdeps/i386/ldsodefs.h: Removed.
	* sysdeps/x86_64/ldsodefs.h: Moved to ...
	* sysdeps/x86/ldsodefs.h: This.
	(La_i86_regs): New.
	(La_i86_retval): Likewise.
	(ARCH_PLTENTER_MEMBERS): Add i86_gnu_pltenter.
	(ARCH_PLTEXIT_MEMBERS): i86_gnu_pltexit.

Acked-by: Christian Brauner (Ubuntu) christian@brauner.io
2018-05-14 09:19:41 -07:00
H.J. Lu
e322ec3282 x86-64: Remove the unnecessary testl in strlen-avx2.S
Since the result of testl is never used, this patch removes it.

Tested on 64-bit AVX2 machine.

	* sysdeps/x86_64/multiarch/strlen-avx2.S (STRLEN): Remove the
	unnecessary testl.
2018-05-14 03:41:35 -07:00
H.J. Lu
50d7d351b5 x86-64/memset: Mark the debugger symbol as hidden
When MEMSET_SYMBOL (__memset, erms) is provided for debugger, mark it
as hidden so that it will be local to the library.

	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
	(MEMSET_SYMBOL (__memset, erms)): Mark the debugger symbol as
	hidden.
2018-05-07 11:01:48 -07:00
Maciej W. Rozycki
10a446ddcc elf: Unify symbol address run-time calculation [BZ #19818]
Wrap symbol address run-time calculation into a macro and use it
throughout, replacing inline calculations.

There are a couple of variants, most of them different in a functionally
insignificant way.  Most calculations are right following RESOLVE_MAP,
at which point either the map or the symbol returned can be checked for
validity as the macro sets either both or neither.  In some places both
the symbol and the map has to be checked however.

My initial implementation therefore always checked both, however that
resulted in code larger by as much as 0.3%, as many places know from
elsewhere that no check is needed.  I have decided the size growth was
unacceptable.

Having looked closer I realized that it's the map that is the culprit.
Therefore I have modified LOOKUP_VALUE_ADDRESS to accept an additional
boolean argument telling it to access the map without checking it for
validity.  This in turn has brought quite nice results, with new code
actually being smaller for i686, and MIPS o32, n32 and little-endian n64
targets, unchanged in size for x86-64 and, unusually, marginally larger
for big-endian MIPS n64, as follows:

i686:
   text    data     bss     dec     hex filename
 152255    4052     192  156499   26353 ld-2.27.9000-base.so
 152159    4052     192  156403   262f3 ld-2.27.9000-elf-symbol-value.so
MIPS/o32/el:
   text    data     bss     dec     hex filename
 142906    4396     260  147562   2406a ld-2.27.9000-base.so
 142890    4396     260  147546   2405a ld-2.27.9000-elf-symbol-value.so
MIPS/n32/el:
   text    data     bss     dec     hex filename
 142267    4404     260  146931   23df3 ld-2.27.9000-base.so
 142171    4404     260  146835   23d93 ld-2.27.9000-elf-symbol-value.so
MIPS/n64/el:
   text    data     bss     dec     hex filename
 149835    7376     408  157619   267b3 ld-2.27.9000-base.so
 149787    7376     408  157571   26783 ld-2.27.9000-elf-symbol-value.so
MIPS/o32/eb:
   text    data     bss     dec     hex filename
 142870    4396     260  147526   24046 ld-2.27.9000-base.so
 142854    4396     260  147510   24036 ld-2.27.9000-elf-symbol-value.so
MIPS/n32/eb:
   text    data     bss     dec     hex filename
 142019    4404     260  146683   23cfb ld-2.27.9000-base.so
 141923    4404     260  146587   23c9b ld-2.27.9000-elf-symbol-value.so
MIPS/n64/eb:
   text    data     bss     dec     hex filename
 149763    7376     408  157547   2676b ld-2.27.9000-base.so
 149779    7376     408  157563   2677b ld-2.27.9000-elf-symbol-value.so
x86-64:
   text    data     bss     dec     hex filename
 148462    6452     400  155314   25eb2 ld-2.27.9000-base.so
 148462    6452     400  155314   25eb2 ld-2.27.9000-elf-symbol-value.so

	[BZ #19818]
	* sysdeps/generic/ldsodefs.h (LOOKUP_VALUE_ADDRESS): Add `set'
	parameter.
	(SYMBOL_ADDRESS): New macro.
	[!ELF_FUNCTION_PTR_IS_SPECIAL] (DL_SYMBOL_ADDRESS): Use
	SYMBOL_ADDRESS for symbol address calculation.
	* elf/dl-runtime.c (_dl_fixup): Likewise.
	(_dl_profile_fixup): Likewise.
	* elf/dl-symaddr.c (_dl_symbol_address): Likewise.
	* elf/rtld.c (dl_main): Likewise.
	* sysdeps/aarch64/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/alpha/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/arm/dl-machine.h (elf_machine_rel): Likewise.
	(elf_machine_rela): Likewise.
	* sysdeps/hppa/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/hppa/dl-symaddr.c (_dl_symbol_address): Likewise.
	* sysdeps/i386/dl-machine.h (elf_machine_rel): Likewise.
	(elf_machine_rela): Likewise.
	* sysdeps/ia64/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/m68k/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/microblaze/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/mips/dl-machine.h (ELF_MACHINE_BEFORE_RTLD_RELOC):
	Likewise.
	(elf_machine_reloc): Likewise.
	(elf_machine_got_rel): Likewise.
	* sysdeps/mips/dl-trampoline.c (__dl_runtime_resolve): Likewise.
	* sysdeps/nios2/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/riscv/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/s390/s390-32/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/s390/s390-64/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/sh/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_rela):
	Likewise.
	* sysdeps/tile/dl-machine.h (elf_machine_rela): Likewise.
	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2018-04-04 23:09:37 +01:00
Wilco Dijkstra
19a8b9a300 [PATCH 1/7] sin/cos slow paths: avoid slow paths for small inputs
This series of patches removes the slow patchs from sin, cos and sincos.
Besides greatly simplifying the implementation, the new version is also much
faster for inputs up to PI (41% faster) and for large inputs needing range
reduction (27% faster).

ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most
of the range with mpsin and mpcos.  The number of incorrectly rounded results
(ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5,
the average is ~850 per million between 0 and PI.

Tested on AArch64 and x86_64 with no regressions.

The first patch removes the slow paths for the cases where the input is small
and doesn't require range reduction.  Update ULP tables for sin, cos and sincos
on AArch64 and x86_64.

	* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
	* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
	inputs.
	(__cos): Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
2018-04-03 16:52:16 +01:00
Joseph Myers
ffec7b2740 Use x86_64 backtrace as generic version.
No glibc configuration uses the present debug/backtrace.c, whereas
several #include the x86_64 version.  The x86_64 version is
effectively a generic one (using _Unwind_Backtrace from libgcc, which
works much more reliably than the built-in functions used by
debug/backtrace.c).  This patch moves it to debug/backtrace.c and
removes all the #includes of the x86_64 version from other
architectures which are no longer required.

I do not know whether all the other architecture-specific backtrace
implementations that are based on _Unwind_Backtrace are required, or
whether, where their differences from the generic version do something
useful, suitable hooks could be added to the generic version to reduce
the duplication involved.

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by this patch.

	* sysdeps/x86_64/backtrace.c: Move to ....
	* debug/backtrace.c: ... here.
	* sysdeps/aarch64/backtrace.c: Remove file.
	* sysdeps/alpha/backtrace.c: Likewise.
	* sysdeps/hppa/backtrace.c: Likewise.
	* sysdeps/ia64/backtrace.c: Likewise.
	* sysdeps/mips/backtrace.c: Likewise.
	* sysdeps/nios2/backtrace.c: Likewise.
	* sysdeps/riscv/backtrace.c: Likewise.
	* sysdeps/sh/backtrace.c: Likewise.
	* sysdeps/tile/backtrace.c: Likewise.
2018-03-21 17:25:30 +00:00
Wilco Dijkstra
700593fdd7 Remove all target specific __ieee754_sqrt(f/l) inlines
Remove the now unused target specific__ieee754_sqrt(f/l) inlines.
Also remove inlines of sqrt which are for really old GCC versions.
Removing these is desirable, under the general principle of leaving
such inlining to the compiler rather than trying to do it in installed
headers, especially when only very old compilers are affected.

Note that removing inlines for __ieee754_sqrt disables inlining in the
sqrt wrapper functions.  Given the sqrt function will typically only be
called for negative arguments, it doesn't matter whether the inlining
happens or not.

	* sysdeps/aarch64/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/generic/math-type-macros.h (M_SQRT): Use sqrt.
	* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
	* sysdeps/powerpc/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/s390/fpu/bits/mathinline.h: Remove file.
	* sysdeps/sparc/fpu/bits/mathinline.h (sqrt) Remove.
	(sqrtf): Remove.
	(sqrtl): Remove.
	(__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	(__ieee754_sqrtl): Remove.
	* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
	* sysdeps/x86/fpu/math_private.h (__ieee754_sqrt): Remove.
	* sysdeps/x86_64/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	(__ieee754_sqrtl): Remove.
2018-03-15 19:21:36 +00:00
Samuel Thibault
a5df0318ef hurd: add gscope support
* elf/dl-support.c [!THREAD_GSCOPE_IN_TCB] (_dl_thread_gscope_count):
Define variable.
* sysdeps/generic/ldsodefs.h [!THREAD_GSCOPE_IN_TCB] (struct
rtld_global): Add _dl_thread_gscope_count member.
* sysdeps/mach/hurd/tls.h: Include <atomic.h>.
[!defined __ASSEMBLER__] (THREAD_GSCOPE_GLOBAL, THREAD_GSCOPE_SET_FLAG,
THREAD_GSCOPE_RESET_FLAG, THREAD_GSCOPE_WAIT): Define macros.
* sysdeps/generic/tls.h: Document THREAD_GSCOPE_IN_TCB.
* sysdeps/aarch64/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/alpha/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/arm/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/hppa/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/i386/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/ia64/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/m68k/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/microblaze/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/mips/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/nios2/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/powerpc/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/riscv/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/s390/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/sh/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/sparc/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/tile/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
* sysdeps/x86_64/nptl/tls.h: Define THREAD_GSCOPE_IN_TCB to 1.
2018-03-11 13:06:33 +01:00
Wilco Dijkstra
610ee1fc93 Remove mplog and mpexp
Remove the now unused mplog and mpexp files.

	* math/Makefile: Remove mpexp.c and mplog.c
	* sysdeps/i386/fpu/mpexp.c: Delete file.
	* sysdeps/i386/fpu/mplog.c: Likewise.
	* sysdeps/ia64/fpu/mpexp.c: Likewise.
	* sysdeps/ia64/fpu/mplog.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c: Remove mention of mpexp and mplog.
	* sysdeps/ieee754/dbl-64/mpa.h (__pow_mp): Remove unused function.
	* sysdeps/ieee754/dbl-64/mpexp.c: Delete file.
	* sysdeps/ieee754/dbl-64/mplog.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/mpexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/mplog.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove mpexp* and mplog*.
	* sysdeps/x86_64/fpu/multiarch/e_log-avx.c: Remove unused defines.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-avx.c: Delete file.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma4.c: Likewise.
2018-02-15 12:41:05 +00:00
Szabolcs Nagy
de800d8305 Remove slow paths from exp
Remove the __slowexp code, so exp is no longer correctly rounded.  The
result is computed to about 70 bits precision so the worst case ulp
error is about 0.500007 in nearest rounding mode.

	* manual/probes.texi: Remove slowexp probes.
	* math/Makefile: Remove slowexp.
	* sysdeps/generic/math_private.h (__slowexp): Remove.
	* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and
	document error bounds.
	* sysdeps/i386/fpu/slowexp.c: Remove.
	* sysdeps/ia64/fpu/slowexp.c: Remove.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove.
	* sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Remove.
	* sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove.
2018-02-12 11:33:33 +00:00
Wilco Dijkstra
c3d466cba1 Remove slow paths from pow
Remove the slow paths from pow.  Like several other double precision math
functions, pow is exactly rounded.  This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP.  Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.

All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP.  A simple test over a few hundred million
values shows pow is 10% faster on average.  This fixes BZ #13932.

	[BZ #13932]
	* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
	* benchtests/pow-inputs: Update comment for slow path cases.
	* manual/probes.texi (slowpow_p10): Delete removed probe.
	(slowpow_p10): Likewise.
	* math/Makefile: Remove halfulp.c and slowpow.c.
	* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/generic/math_private.h (__exp1): Remove error argument.
	(__halfulp): Remove.
	(__slowpow): Remove.
	* sysdeps/i386/fpu/halfulp.c: Delete file.
	* sysdeps/i386/fpu/slowpow.c: Likewise.
	* sysdeps/ia64/fpu/halfulp.c: Likewise.
	* sysdeps/ia64/fpu/slowpow.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
	improve comments and add error analysis.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
	(power1): Remove function:
	(log1): Remove error argument, add error analysis.
	(my_log2): Remove function.
	* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
	* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
	* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
	* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
	slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
2018-02-12 10:47:09 +00:00
H.J. Lu
06fbebfff7 x86-64: Use __glibc_likely/__glibc_likely in dl-machine.h
The differences in elf/dl-reloc.os are

--- before    	2018-02-05 03:52:32.803125207 -0800
+++ after     	2018-02-05 03:52:14.913711879 -0800
@@ -1129,7 +1129,7 @@ _dl_relocate_object:
 	leaq	__PRETTY_FUNCTION__.9767(%rip), %rcx
 	leaq	.LC11(%rip), %rsi
 	leaq	.LC12(%rip), %rdi
-	movl	$540, %edx
+	movl	$539, %edx
 	call	__GI___assert_fail
 	.p2align 4,,10
 	.p2align 3

	* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Replace
	__builtin_expect with __glibc_likely and __glibc_likely.
	(elf_machine_lazy_rel): Likewise.
2018-02-05 06:08:07 -08:00