Commit Graph

483 Commits

Author SHA1 Message Date
Adhemerval Zanella
b24381e50f i386: Use builtin sqrtl
Checked on i686-linux-gnu.
2020-06-22 11:09:49 -03:00
Adhemerval Zanella
d19d25dd06 x86_64: Use builtin sqrt{f,l}
Checked on x86_64-linux-gnu.
2020-06-22 11:09:49 -03:00
Adhemerval Zanella
f721171632 Revert "x86_64: Add SSE sfp-exceptions"
The __sfp_handle_exceptions is not fully correct regarding raising
exceptions, since there is no direct way to raise only FP_EX_OVERFLOW
nor FP_EX_UNDERFLOW for SSE mode.  Both libgcc and feraiseexcept rely
on x87 mode to accomplish it.

This reverts commit 460ee50de0.

Checked on x86_64.
2020-04-20 14:56:05 -03:00
Adhemerval Zanella
460ee50de0 x86_64: Add SSE sfp-exceptions
The exported x86_64 fenv.h functions operate on both i387 and SSE (since
they should work on both float, double, and long double) while the
internal libc_fe* set either SSE (float, double, and float128) or
i387 (long double).

The libgcc __sfp_handle_exceptions (used on float128 implementation),
however, will set either SEE or i387 exception depending of the
exception to raise.  This broke the internal assumption of float128
where only SSE operations will be used.

This patch reimplements the libgcc __sfp_handle_exceptions to use only
SSE operations and sets libgcc to use it instead of its own
implementation.

And I think we should fix libgcc in a similar manner, since checking on
config/i386/64/sfp-machine.h it already only supports SSE rounding mode
and x86_64 ABI also expectes float128 to use SSE registers [1]
(although it is not clear on how future implementation might implement
it).

Checked on x86_64-linux-gnu.

[1] https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI
2020-04-17 11:42:29 -03:00
Paul Zimmermann
a9d42c09a3 math: Add inputs that yield larger errors for float type (x86_64)
The corner cases included were generated using exhaustive search
for all float/binary32 values on x86_64 (comparing to MPFR for
correct rounding to nearest).

For the j0/j1/y0 functions, only cases with ulp error <= 9 were
included.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-03-31 21:48:54 -04:00
Adhemerval Zanella
1c15464ca0 math: Remove inline math tests
With mathinline removal there is no need to keep building and testing
inline math tests.

The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries.  The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2020-03-19 11:45:44 -03:00
Wilco Dijkstra
220622dde5 Add libm_alias_finite for _finite symbols
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol.  It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).

The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.

Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).

Passes buildmanyglibc.

Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2020-01-03 10:02:04 -03:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Stefan Liebler
1c94bf0f0a Always use wordsize-64 version of s_trunc.c.
This patch replaces s_trunc.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:14 +01:00
Stefan Liebler
9f234eafe8 Always use wordsize-64 version of s_ceil.c.
This patch replaces s_ceil.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:13 +01:00
Stefan Liebler
95b0c2c431 Always use wordsize-64 version of s_floor.c.
This patch replaces s_floor.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 and sparc64 files.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:12 +01:00
Stefan Liebler
ab48bdd098 Always use wordsize-64 version of s_rint.c.
This patch replaces s_rint.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 file.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:12 +01:00
Stefan Liebler
af123aa950 Always use wordsize-64 version of s_nearbyint.c.
This patch replaces s_nearbyint.c in sysdeps/dbl-64 with the one in
sysdeps/dbl-64/wordsize-64 and removes the latter one.
The code is not changed except changes in code style.

Also adjusted the include path in x86_64 file.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-12-11 15:12:11 +01:00
Wilco Dijkstra
d0007dc53c Remove x64 _finite tests and references
Remove _finite tests and references from x86_64.  Rather than calling
__exp_finite, use exp directly (since it's the same entry point).

x86_64 builds and passes testsuite.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2019-10-21 14:29:12 -03:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
H.J. Lu
7e681561a3 x86-64: Compile branred.c with -mprefer-vector-width=128 [BZ #24603]
When compiled with -O3 and AVX, GCC 8 and 9 optimize some loops in
sysdeps/ieee754/dbl-64/branred.c with 256-bit vector instructions,
which leads to store forward stall:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90579

There is no easy fix in compiler.  This patch limits vector width to
128 bits to work around this issue.  It improves performance of sin
and cos by more than 40% on Skylake compiled with -O3 -march=skylake.

Tested with GCC 7/8/9 on x86-64.

	[BZ #24603]
	* sysdeps/x86_64/configure.ac: Check if -mprefer-vector-width=128
	works.
	* sysdeps/x86_64/configure: Regenerated.
	* sysdeps/x86_64/fpu/Makefile (CFLAGS-branred.c): New.  Set
	to -mprefer-vector-width=128 if supported.
2019-07-24 14:48:43 -07:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
H.J. Lu
ba4b8fab20 x86-64: Remove s_sincosf-sse2.S
The current s_sincosf.c is faster than s_sincosf-sse2.S.  On Broadwell
with FMA disabled, bench-sincosf shows:

       Before         After      Improvement
max    154.032        114.517        34%
min    6.25           5.609          11%
mean   14.8728        12.8589        15%

	* sysdeps/x86_64/fpu/s_sincosf.S: Removed.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.c: New file.
2018-12-26 06:58:31 -08:00
H.J. Lu
9412979a43 Regenerate sysdeps/x86_64/fpu/libm-test-ulps
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
2018-12-26 06:57:30 -08:00
H.J. Lu
8700a7851b x86-64: Vectorize sincosf_poly and update s_sincosf-fma.c
Add <sincosf_poly.h> and include it in s_sincosf.h to allow vectorized
sincosf_poly.  Add x86 sincosf_poly.h to vectorize sincosf_poly.  On
Broadwell, bench-sincosf shows:

       Before         After      Improvement
max    160.273        114.198        40%
min    6.25           5.625          11%
mean   13.0325        10.6462        22%

Vectorized sincosf_poly shows

       Before         After      Improvement
max    138.653        114.198        21%
min    5.004          5.625          -11%
mean   11.5934        10.6462        9%

Tested on x86-64 and i686 as well as with build-many-glibcs.py.

	* sysdeps/ieee754/flt-32/s_sincosf.h: Include <sincosf_poly.h>.
	(sincos_t, sincosf_poly, sinf_poly): Moved to ...
	* sysdeps/ieee754/flt-32/sincosf_poly.h: Here.  New file.
	* sysdeps/x86/fpu/s_sincosf_data.c: New file.
	* sysdeps/x86/fpu/sincosf_poly.h: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Just include
	<sysdeps/ieee754/flt-32/s_sincosf.c>.
2018-12-26 06:56:13 -08:00
Szabolcs Nagy
a502c5294b Remove the error handling wrapper from pow
Introduce new pow symbol version that doesn't do SVID compatible error
handling.  The standard errno and fp exception based error handling is
inline in the new code and does not have significant overhead.

The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty
w_pow.c and enabled for targets with their own pow implementation or
ifunc dispatch on __ieee754_pow by including math/w_pow.c.

The compatibility symbol version still uses the wrapper with SVID error
handling around the new code.  There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

On targets where previously powl was an alias of pow, now it points to
the compatibility symbol with the wrapper, because it still need the
SVID compatible error handling.  This affects NO_LONG_DOUBLE (e.g. arm)
and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well.

The __pow_finite symbol is now an alias of pow.  Both __pow_finite and
pow set errno and thus not const functions.

The ia64 asm is changed so the compat and new symbol versions map to the
same address.

On x86_64 #include <math.h> was added before macro definitions that
may affect that header.

Tested with build-many-glibcs.py.

	* math/Versions (GLIBC_2.29): Add pow.
	* math/w_pow_compat.c (__pow_compat): Change to versioned compat
	symbol.
	* math/w_pow.c: New file.
	* sysdeps/i386/fpu/w_pow.c: New file.
	* sysdeps/ia64/fpu/e_pow.S: Add versioned symbols.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Rename to __pow
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/w_pow.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_pow.c: New file.
	* sysdeps/mach/hurd/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__ieee754_pow): Rename to
	__pow.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__ieee754_pow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_pow.c (__ieee754_pow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/w_pow.c: New file.
2018-11-21 09:58:36 +00:00
Szabolcs Nagy
f29b7c492d Remove the error handling wrapper from log
Introduce new log symbol version that doesn't do SVID compatible error
handling.  The standard errno and fp exception based error handling is
inline in the new code and does not have significant overhead.

The wrapper is disabled for sysdeps/ieee754/dbl-64 by using empty
w_log.c and enabled for targets with their own log implementation by
including math/w_log.c.

The compatibility symbol version still uses the wrapper with SVID error
handling around the new code.  There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

On targets where previously logl was an alias of log, now it points to
the compatibility symbol with the wrapper, because it still need the
SVID compatible error handling.  This affects NO_LONG_DOUBLE (e.g. arm)
and LONG_DOUBLE_COMPAT (e.g. alpha) targets as well.

The __log_finite symbol is now an alias of log.  Both __log_finite and
log set errno and thus not const functions.

The ia64 asm is changed so the compat and new symbol versions map to the
same address.

On x86_64 #include <math.h> was added before macro definitions that may
affect that header.

Tested with build-many-glibcs.py.

	* math/Versions (GLIBC_2.29): Add log.
	* math/w_log_compat.c (__log_compat): Change to versioned compat
	symbol.
	* math/w_log.c: New file.
	* sysdeps/i386/fpu/w_log.c: New file.
	* sysdeps/ia64/fpu/e_log.S: Update.
	* sysdeps/ieee754/dbl-64/e_log.c (__ieee754_log): Rename to __log
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/w_log.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_log.c: New file.
	* sysdeps/mach/hurd/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
	* sysdeps/x86_64/fpu/multiarch/e_log-avx.c (__ieee754_log): Rename to
	__log.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma.c (__ieee754_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma4.c (__ieee754_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log.c (__ieee754_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/w_log.c: New file.
2018-11-21 09:56:27 +00:00
Szabolcs Nagy
c20a10561a Remove the error handling wrapper from exp and exp2
Introduce new exp and exp2 symbol version that don't do SVID compatible
error handling.  The standard errno and fp exception based error handling
is inline in the new code and does not have significant overhead.

The double precision wrappers are disabled for sysdeps/ieee754/dbl-64
by using empty w_exp.c and w_exp2.c files, the math/w_exp.c and
math/w_exp2.c files use the wrapper template and can be included by
targets that have their own exp and exp2 implementations or use ifunc
on the glibc internal __ieee754_exp symbol.

The compatibility symbol versions still use the wrapper with SVID error
handling around the new code.  There is no new symbol version nor
compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

On targets where previously expl and exp2l were aliases of exp and exp2,
now they point to the compatibility symbols with the wrapper, because
they still need the SVID compatible error handling.  This affects
NO_LONG_DOUBLE (e.g arm) and LONG_DOUBLE_COMPAT (e.g. alpha) targets
as well.

The _finite symbols are now aliases of the standard symbols (they have
no performance advantage anymore).  Both the standard symbols and
_finite symbols set errno and thus not const functions.

The ia64 asm is changed so the compat and new symbol versions map to the
same address.

On x86_64 #include <math.h> was added before macro definitions that may
affect that header (the new macro name is __exp instead of __ieee754_exp
which breaks some math.h macros).

Tested with build-many-glibcs.py.

	* math/Versions (GLIBC_2.29): Add exp and exp2.
	* math/w_exp2_compat.c (__exp2_compat): Change to versioned compat
	symbol, handle NO_LONG_DOUBLE and LONG_DOUBLE_COMPAT explicitly.
	* math/w_exp_compat.c (__exp_compat): Likewise.
	* math/w_exp.c: New file.
	* math/w_exp2.c: New file.
	* sysdeps/i386/fpu/w_exp.c: New file.
	* sysdeps/i386/fpu/w_exp2.c: New file.
	* sysdeps/ia64/fpu/e_exp.S: Add versioned symbols.
	* sysdeps/ia64/fpu/e_exp2.S: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Rename to __exp
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Rename to __exp2
	and add necessary aliases.
	* sysdeps/ieee754/dbl-64/w_exp.c: New file.
	* sysdeps/ieee754/dbl-64/w_exp2.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_exp.c: New file.
	* sysdeps/m68k/m680x0/fpu/w_exp2.c: New file.
	* sysdeps/mach/hurd/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/alpha/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/arm/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/hppa/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/ia64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/nios2/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Update.
	* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sh/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Update.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Update.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__exp1): Remove.
	(__ieee754_exp): Rename to __exp.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__exp1): Remove.
	(__ieee754_exp): Rename to __exp.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__exp1): Remove.
	(__ieee754_exp): Rename to __exp.
	* sysdeps/x86_64/fpu/multiarch/e_exp.c (__ieee754_exp): Rename to
	__exp.
	* sysdeps/x86_64/fpu/multiarch/w_exp.c: New file.
2018-11-21 09:55:02 +00:00
Joseph Myers
7abf97bed9 Use trunc functions not __trunc functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __trunc functions to call the
corresponding trunc names instead, with asm redirection to __trunc
when the calls are not inlined.

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (trunc): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_trunc.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_truncf.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_trunc.c: Likewise.
	* sysdeps/ieee754/float128/s_truncf128.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_trunc.c: Likewise.
	* sysdeps/ieee754/flt-32/s_truncf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_truncl.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
	* sysdeps/riscv/rvf/s_truncf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_trunc_template.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c: Likewise.
	(ceil): Redirect to __ceil.
	(floor): Redirect to __floor.
	(trunc): Redirect to __trunc.
	(__truncl): Call trunc instead of __trunc.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__trunc):
	Remove macro.
	[_ARCH_PWR5X] (__truncf): Likewise.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r): Use
	trunc functions instead of __trunc variants.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
2018-09-20 21:11:10 +00:00
Szabolcs Nagy
424c4f60ed Add new pow implementation
The algorithm is exp(y * log(x)), where log(x) is computed with about
1.3*2^-68 relative error (1.5*2^-68 without fma), returning the result
in two doubles, and the exp part uses the same algorithm (and lookup
tables) as exp, but takes the input as two doubles and a sign (to handle
negative bases with odd integer exponent).  The __exp1 internal symbol
is no longer necessary.

There is separate code path when fma is not available but the worst case
error is about 0.54 ULP in both cases.  The lookup table and consts for
log are 4168 bytes.  The .rodata+.text is decreased by 37908 bytes on
aarch64.  The non-nearest rounding error is less than 1 ULP.

Improvements on Cortex-A72 compared to current glibc master:
pow thruput: 2.40x in [0.01 11.1]x[0.01 11.1]
pow latency: 1.84x in [0.01 11.1]x[0.01 11.1]

Tested on
aarch64-linux-gnu (defined __FP_FAST_FMA, TOINT_INTRINSICS) and
arm-linux-gnueabihf (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
x86_64-linux-gnu (!defined __FP_FAST_FMA, !TOINT_INTRINSICS) and
powerpc64le-linux-gnu (defined __FP_FAST_FMA, !TOINT_INTRINSICS) targets.

	* NEWS: Mention pow improvements.
	* math/Makefile (type-double-routines): Add e_pow_log_data.
	* sysdeps/generic/math_private.h (__exp1): Remove.
	* sysdeps/i386/fpu/e_pow_log_data.c: New file.
	* sysdeps/ia64/fpu/e_pow_log_data.c: New file.
	* sysdeps/ieee754/dbl-64/Makefile (CFLAGS-e_pow.c): Allow fma
	contraction.
	* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove.
	(exp_inline): Remove.
	(__ieee754_exp): Only single double input is handled.
	* sysdeps/ieee754/dbl-64/e_pow.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_pow_log_data.c: New file.
	* sysdeps/ieee754/dbl-64/math_config.h (issignaling_inline): Define.
	(__pow_log_data): Define.
	* sysdeps/ieee754/dbl-64/upow.h: Remove.
	* sysdeps/ieee754/dbl-64/upow.tbl: Remove.
	* sysdeps/m68k/m680x0/fpu/e_pow_log_data.c: New file.
	* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma.c): Allow fma
	contraction.
	(CFLAGS-e_pow-fma4.c): Likewise.
2018-09-19 10:04:51 +01:00
Joseph Myers
71223ef909 Use ceil functions not __ceil functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __ceil functions to call the
corresponding ceil names instead, with asm redirection to __ceil when
the calls are not inlined.

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (ceil): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_ceil.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_ceilf.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_ceil.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_ceil.c: Likewise.
	* sysdeps/ieee754/float128/s_ceilf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_ceilf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_ceill.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_ceill.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_ceil_template.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
	* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_ceil.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_ceilf.c: Likewise.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__ceil):
	Remove macro.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use ceil
	functions instead of __ceil variants.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive):
	Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-17 20:42:06 +00:00
Joseph Myers
f29b6f17e4 Use rint functions not __rint functions in glibc libm.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __rint functions to call the
corresponding rint names instead, with asm redirection to __rint when
the calls are not inlined.  The x86_64 math_private.h is removed as no
longer useful after this patch.

This patch is relative to a tree with my floor patch
<https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied,
and much the same considerations arise regarding possibly replacing an
IFUNC call with a direct inline expansion.

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect
	using MATH_REDIRECT.
	* sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_rintf.c: Likewise.
	* sysdeps/alpha/fpu/s_rint.c: Likewise.
	* sysdeps/alpha/fpu/s_rintf.c: Likewise.
	* sysdeps/i386/fpu/s_rintl.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_rint.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise.
	* sysdeps/ieee754/float128/s_rintf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_rintf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
	* sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise.
	* sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise.
	* sysdeps/powerpc/fpu/s_rint.c: Likewise.
	* sysdeps/powerpc/fpu/s_rintf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_rint.c: Likewise.
	* sysdeps/riscv/rvf/s_rintf.c: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise.
	* sysdeps/x86_64/fpu/math_private.h: Remove file.
	* math/e_scalb.c (invalid_fn): Use rint functions instead of
	__rint variants.
	* math/e_scalbf.c (invalid_fn): Likewise.
	* math/e_scalbl.c (invalid_fn): Likewise.
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
	Likewise.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
	Likewise.
	* sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise.
	* sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
	Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise.
	* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
2018-09-14 13:10:39 +00:00
Joseph Myers
e44acb2063 Use floor functions not __floor functions in glibc libm.
Similar to the changes that were made to call sqrt functions directly
in glibc, instead of __ieee754_sqrt variants, so that the compiler
could inline them automatically without needing special inline
definitions in lots of math_private.h headers, this patch makes libm
code call floor functions directly instead of __floor variants,
removing the inlines / macros for x86_64 (SSE4.1) and powerpc
(POWER5).

The redirection used to ensure that __ieee754_sqrt does still get
called when the compiler doesn't inline a built-in function expansion
is refactored so it can be applied to other functions; the refactoring
is arranged so it's not limited to unary functions either (it would be
reasonable to use this mechanism for copysign - removing the inline in
math_private_calls.h but also eliminating unnecessary local PLT entry
use in the cases (powerpc soft-float and e500v1, for IBM long double)
where copysign calls don't get inlined).

The point of this change is that more architectures can get floor
calls inlined where they weren't previously (AArch64, for example),
without needing special inline definitions in their math_private.h,
and existing such definitions in math_private.h headers can be
removed.

Note that it's possible that in some cases an inline may be used where
an IFUNC call was previously used - this is the case on x86_64, for
example.  I think the direct calls to floor are still appropriate; if
there's any significant performance cost from inline SSE2 floor
instead of an IFUNC call ending up with SSE4.1 floor, that indicates
that either the function should be doing something else that's faster
than using floor at all, or it should itself have IFUNC variants, or
that the compiler choice of inlining for generic tuning should change
to allow for the possibility that, by not inlining, an SSE4.1 IFUNC
might be called at runtime - but not that glibc should avoid calling
floor internally.  (After all, all the same considerations would apply
to any user program calling floor, where it might either be inlined or
left as an out-of-line call allowing for a possible IFUNC.)

Tested for x86_64, and with build-many-glibcs.py.

	* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
	__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT):
	New macro.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT.
	[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
	&& !NO_MATH_REDIRECT] (floor): Likewise.
	* sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before
	header inclusion.
	* sysdeps/aarch64/fpu/s_floorf.c: Likewise.
	* sysdeps/ieee754/dbl-64/s_floor.c: Likewise.
	* sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise.
	* sysdeps/ieee754/float128/s_floorf128.c: Likewise.
	* sysdeps/ieee754/flt-32/s_floorf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
	* sysdeps/riscv/rvf/s_floorf.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise.
	* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor):
	Remove macro.
	[_ARCH_PWR5X] (__floorf): Likewise.
	* sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove
	inline function.
	[__SSE4_1__] (__floorf): Likewise.
	* math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions
	instead of __floor variants.
	* math/w_lgamma_r_compat.c (__lgamma_r): Likewise.
	* math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise.
	* math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise.
	* math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise.
	* math/w_lgammal_r_compat.c (__lgammal_r): Likewise.
	* math/w_tgamma_compat.c (__tgamma): Likewise.
	* math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise.
	* math/w_tgammaf_compat.c (__tgammaf): Likewise.
	* math/w_tgammal_compat.c (__tgammal): Likewise.
	* sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise.
	* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
	Likewise.
	* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
	* sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise.
	* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
	* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl):
	Likewise.
	* sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl):
	Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise.
	* sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
	* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
2018-09-14 13:09:01 +00:00
Joseph Myers
4e7fbdd7c2 Remove x86_64 math_private.h asms.
The x86_64 math_private.h has asm versions of the macros to
reinterpret between floating-point and integer types.

This is the sort of thing we now strongly discourage; the expectation
in such cases, where the generic C code gives the compiler all the
information needed about the required semantics, is that you should
get the compiler to do the right thing for the generic C code rather
than writing an asm version.

Trivial tests showed GCC generates the expected single instructions
for reinterpretation from floating point to integer.  In the other
direction, it goes via memory when the asms don't; I asked about this
in GCC bug 87236 and was advised this was deliberate for generic
tuning because it was faster that way on some AMD processors (but
-mtune=intel, and -Os with the latest GCC, avoid going via memory).
The asms don't and can't know about those tuning details, so that's
evidence that they are actually making the code worse.

This patch removes the asms accordingly.  Tested for x86_64.

	* sysdeps/x86_64/fpu/math_private.h (MOVD): Remove macro.
	(MOVQ): Likewise.
	(EXTRACT_WORDS64): Likewise.
	(INSERT_WORDS64): Likewise.
	(GET_FLOAT_WORD): Likewise.
	(SET_FLOAT_WORD): Likewise.
2018-09-11 14:51:40 +00:00
Szabolcs Nagy
e70c176825 Add new exp and exp2 implementations
Optimized exp and exp2 implementations using a lookup table for
fractional powers of 2.  There are several variants, see e_exp_data.c,
they can be selected by modifying math_config.h allowing different
tradeoffs.

The default selection should be acceptable as generic libm code.
Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on
aarch64 the rodata size is 2160 bytes, shared between exp and exp2.
On aarch64 .text + .rodata size decreased by 24912 bytes.

The non-nearest rounding error is less than 1 ULP even on targets
without efficient round implementation (although the error rate is
higher in that case).  Targets with single instruction, rounding mode
independent, to nearest integer rounding and conversion can use them
by setting TOINT_INTRINSICS and adding the necessary code to their
math_private.h.

The __exp1 code uses the same algorithm, so the error bound of pow
increased a bit.

New double precision error handling code was added following the
style of the single precision error handling code.

Improvements on Cortex-A72 compared to current glibc master:
exp thruput: 1.61x in [-9.9 9.9]
exp latency: 1.53x in [-9.9 9.9]
exp thruput: 1.13x in [0.5 1]
exp latency: 1.30x in [0.5 1]
exp2 thruput: 2.03x in [-9.9 9.9]
exp2 latency: 1.64x in [-9.9 9.9]

For small (< 1) inputs the current exp code uses a separate algorithm
so the speed up there is less.

Was tested on
aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and
arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and
x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and
powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets,
only non-nearest rounding ulp errors increase and they are within
acceptable bounds (ulp updates are in separate patches).

	* NEWS: Mention exp and exp2 improvements.
	* math/Makefile (libm-support): Remove t_exp.
	(type-double-routines): Add math_err and e_exp_data.
	* sysdeps/aarch64/libm-test-ulps: Update.
	* sysdeps/arm/libm-test-ulps: Update.
	* sysdeps/i386/fpu/e_exp_data.c: New file.
	* sysdeps/i386/fpu/math_err.c: New file.
	* sysdeps/i386/fpu/t_exp.c: Remove.
	* sysdeps/ia64/fpu/e_exp_data.c: New file.
	* sysdeps/ia64/fpu/math_err.c: New file.
	* sysdeps/ia64/fpu/t_exp.c: Remove.
	* sysdeps/ieee754/dbl-64/e_exp.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_exp_data.c: New file.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound.
	* sysdeps/ieee754/dbl-64/eexp.tbl: Remove.
	* sysdeps/ieee754/dbl-64/math_config.h: New file.
	* sysdeps/ieee754/dbl-64/math_err.c: New file.
	* sysdeps/ieee754/dbl-64/t_exp.c: Remove.
	* sysdeps/ieee754/dbl-64/t_exp2.h: Remove.
	* sysdeps/ieee754/dbl-64/uexp.h: Remove.
	* sysdeps/ieee754/dbl-64/uexp.tbl: Remove.
	* sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file.
	* sysdeps/m68k/m680x0/fpu/math_err.c: New file.
	* sysdeps/m68k/m680x0/fpu/t_exp.c: Remove.
	* sysdeps/powerpc/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2018-09-05 16:22:00 +01:00
Joseph Myers
ff6b24501f Split fenv_private.h out of math_private.h more consistently.
On some architectures, the parts of math_private.h relating to the
floating-point environment are in a separate file fenv_private.h
included from math_private.h.  As this is purely an
architecture-specific convention used by several architectures,
however, all such architectures still need their own math_private.h,
even if it has nothing to do beyond #include <fenv_private.h> and
peculiarity of including the i386 file directly instead of having a
shared file in sysdeps/x86.

This patch makes the fenv_private.h name an architecture-independent
convention in glibc.  The include of fenv_private.h from
math_private.h becomes architecture-independent (until callers are
updated to include fenv_private.h directly so the include from
math_private.h is no longer needed).  Some architecture math_private.h
headers are removed if no longer needed, or renamed to fenv_private.h
if all they define belongs in that header; architecture fenv_private.h
headers now do require #include_next <fenv_private.h>.  The i386
fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is
actually shared with x86_64.  The generic math_private.h gets a new
include of <stdbool.h>, as needed for bool in some prototypes in that
header (previously that was indirectly included via include/fenv.h,
which now only gets included too late in math_private.h, after those
prototypes).

Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.

	* sysdeps/aarch64/fpu/fenv_private.h: New file.  Based on ....
	* sysdeps/aarch64/fpu/math_private.h: ... this file.  All contents
	moved to fenv_private.h except for ...
	(TOINT_INTRINSICS): Kept in math_private.h.
	(roundtoint): Likewise.
	(converttoint): Likewise.
	* sysdeps/arm/fenv_private.h: Change multiple-include guard to
	[ARM_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/arm/math_private.h: Remove.
	* sysdeps/generic/fenv_private.h: New file.  Contents moved from
	....
	* sysdeps/generic/math_private.h: ... this file.  Include
	<stdbool.h>.  Do not include <fenv.h> or <get-rounding-mode.h>.
	Include <fenv_private.h>.  Remove functions and macros moved to
	fenv_private.h.
	* sysdeps/i386/fpu/math_private.h: Remove.
	* sysdeps/mips/math_private.h: Move to ....
	* sysdeps/mips/fpu/fenv_private.h: ... here.  Change
	multiple-include guard to [MIPS_FENV_PRIVATE_H].  Remove
	[__mips_hard_float] conditional.  Include next <fenv_private.h>.
	* sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include
	guard to [POWERPC_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/powerpc/fpu/math_private.h: Do not include
	<fenv_private.h>.
	* sysdeps/riscv/rvf/math_private.h: Move to ....
	* sysdeps/riscv/rvf/fenv_private.h: ... here.  Change
	multiple-include guard to [RISCV_FENV_PRIVATE_H].  Include next
	<fenv_private.h>.
	* sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard
	to [SPARC_FENV_PRIVATE_H].  Include next <fenv_private.h>.
	* sysdeps/sparc/fpu/math_private.h: Remove.
	* sysdeps/i386/fpu/fenv_private.h: Move to ....
	* sysdeps/x86/fpu/fenv_private.h: ... here.  Change
	multiple-include guard to [X86_FENV_PRIVATE_H].  Include next
	<fenv_private.h>.
	* sysdeps/x86_64/fpu/math_private.h: Do not include
	<sysdeps/i386/fpu/fenv_private.h>.
2018-08-28 20:48:49 +00:00
Wilco Dijkstra
49acec179c Fix spaces in x86_64 ULP file
Fix a few missing spaces, it's now identical to the regenerated version.

Passes GLIBC tests on x64.

	* sysdeps/x86_64/fpu/libm-test-ulps: Regenerate to fix spaces.
2018-08-15 12:56:22 +01:00
Wilco Dijkstra
599cf39766 Improve performance of sinf and cosf
The second patch improves performance of sinf and cosf using the same
algorithms and polynomials.  The returned values are identical to sincosf
for the same input.  ULP definitions for AArch64 and x64 are updated.

sinf/cosf througput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.2x
* |x| < M_PI_4  : 1.8x
* |x| < 2 * M_PI: 1.7x
* |x| < 120.0   : 2.3x
* |x| < Inf     : 3.0x

	* NEWS: Mention sinf, cosf, sincosf.
	* sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of
	constants rather than including generic sincosf.h.
	* sysdeps/x86_64/fpu/s_sincosf_data.c: Remove.
	* sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite.
	* sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove.
	(reduced_cos): Remove.
	(sinf_poly): New function.
	* sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
2018-08-14 10:45:59 +01:00
Joseph Myers
2ce7ba7d15 Move SNAN_TESTS_* out of math-tests.h.
Continuing moving macros out of math-tests.h to smaller headers
following typo-proof conventions instead of using #ifndef, this patch
moves the SNAN_TESTS_* macros for individual types out to their own
sysdeps header (while the type-generic SNAN_TESTS wrapper for those
macros remains in math-tests.h).

Tested for x86_64 and x86, and with build-many-glibcs.py.

	* sysdeps/generic/math-tests-snan.h: New file.
	* sysdeps/generic/math-tests.h: Include <math-tests-snan.h>.
	(SNAN_TESTS_float): Do not define here.
	(SNAN_TESTS_double): Likewise.
	(SNAN_TESTS_long_double): Likewise.
	(SNAN_TESTS_float128): Likewise.
	* sysdeps/i386/fpu/math-tests-snan.h: New file.
	* sysdeps/i386/fpu/math-tests.h: Remove file.
	* sysdeps/ia64/math-tests-snan.h: New file.
	* sysdeps/ia64/math-tests.h: Remove file.
	* sysdeps/x86/math-tests.h: Likewise.
	* sysdeps/x86_64/fpu/math-tests-snan.h: New file.
2018-08-10 19:22:01 +00:00
Wilco Dijkstra
ea5c662c62 Improve performance of sincosf
This patch is a complete rewrite of sincosf.  The new version is
significantly faster, as well as simple and accurate.
The worst-case ULP is 0.5607, maximum relative error is 0.5303 * 2^-23 over
all 4 billion inputs.  In non-nearest rounding modes the error is 1ULP.

The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction.  The code uses approximate integer
comparisons to quickly decide between these cases.

The small range reducer uses a single reduction step to handle values up to
120.0.  It is fastest on targets which support inlined round instructions.

The large range reducer uses integer arithmetic for simplicity.  It does a
32x96 bit multiply to compute a 64-bit modulo result.  This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4.  It could be further optimized, however it is
already much faster than necessary.

sincosf throughput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.6x
* |x| < M_PI_4  : 1.7x
* |x| < 2 * M_PI: 1.5x
* |x| < 120.0   : 1.8x
* |x| < Inf     : 2.3x

	* math/Makefile: Add s_sincosf_data.c.
	* sysdeps/ia64/fpu/s_sincosf_data.c: New file.
	* sysdeps/ieee754/flt-32/s_sincosf.h (abstop12): Add new function.
	(sincosf_poly): Likewise.
	(reduce_small): Likewise.
	(reduce_large): Likewise.
	* sysdeps/ieee754/flt-32/s_sincosf.c (sincosf): Rewrite.
	* sysdeps/ieee754/flt-32/s_sincosf_data.c: New file with sincosf data.
	* sysdeps/m68k/m680x0/fpu/s_sincosf_data.c: New file.
	* sysdeps/x86_64/fpu/s_sincosf_data.c: New file.
2018-08-10 17:34:39 +01:00
H.J. Lu
430388d5dc x86: Don't include <init-arch.h> in assembly codes
There is no need to include <init-arch.h> in assembly codes since all
x86 IFUNC selector functions are written in C.  Tested on i686 and
x86-64.  There is no code change in libc.so, ld.so and libmvec.so.

	* sysdeps/i386/i686/multiarch/bzero-ia32.S: Don't include
	<init-arch.h>.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core-avx2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Likewise.
2018-08-03 08:05:00 -07:00
Paul Pluzhnikov
50d004c91c Update ulps with "make regen-ulps" on AMD Ryzen 7 1800X.
2018-05-30  Paul Pluzhnikov  <ppluzhnikov@google.com>

	* sysdeps/x86_64/fpu/libm-test-ulps (log_vlen8_avx2): Update for
	AMD Ryzen 7 1800X.
2018-05-30 09:17:47 -07:00
Wilco Dijkstra
19a8b9a300 [PATCH 1/7] sin/cos slow paths: avoid slow paths for small inputs
This series of patches removes the slow patchs from sin, cos and sincos.
Besides greatly simplifying the implementation, the new version is also much
faster for inputs up to PI (41% faster) and for large inputs needing range
reduction (27% faster).

ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most
of the range with mpsin and mpcos.  The number of incorrectly rounded results
(ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5,
the average is ~850 per million between 0 and PI.

Tested on AArch64 and x86_64 with no regressions.

The first patch removes the slow paths for the cases where the input is small
and doesn't require range reduction.  Update ULP tables for sin, cos and sincos
on AArch64 and x86_64.

	* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
	* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
	inputs.
	(__cos): Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
2018-04-03 16:52:16 +01:00
Wilco Dijkstra
700593fdd7 Remove all target specific __ieee754_sqrt(f/l) inlines
Remove the now unused target specific__ieee754_sqrt(f/l) inlines.
Also remove inlines of sqrt which are for really old GCC versions.
Removing these is desirable, under the general principle of leaving
such inlining to the compiler rather than trying to do it in installed
headers, especially when only very old compilers are affected.

Note that removing inlines for __ieee754_sqrt disables inlining in the
sqrt wrapper functions.  Given the sqrt function will typically only be
called for negative arguments, it doesn't matter whether the inlining
happens or not.

	* sysdeps/aarch64/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/generic/math-type-macros.h (M_SQRT): Use sqrt.
	* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
	* sysdeps/powerpc/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	* sysdeps/s390/fpu/bits/mathinline.h: Remove file.
	* sysdeps/sparc/fpu/bits/mathinline.h (sqrt) Remove.
	(sqrtf): Remove.
	(sqrtl): Remove.
	(__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	(__ieee754_sqrtl): Remove.
	* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
	* sysdeps/x86/fpu/math_private.h (__ieee754_sqrt): Remove.
	* sysdeps/x86_64/fpu/math_private.h (__ieee754_sqrt): Remove.
	(__ieee754_sqrtf): Remove.
	(__ieee754_sqrtl): Remove.
2018-03-15 19:21:36 +00:00
Wilco Dijkstra
610ee1fc93 Remove mplog and mpexp
Remove the now unused mplog and mpexp files.

	* math/Makefile: Remove mpexp.c and mplog.c
	* sysdeps/i386/fpu/mpexp.c: Delete file.
	* sysdeps/i386/fpu/mplog.c: Likewise.
	* sysdeps/ia64/fpu/mpexp.c: Likewise.
	* sysdeps/ia64/fpu/mplog.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c: Remove mention of mpexp and mplog.
	* sysdeps/ieee754/dbl-64/mpa.h (__pow_mp): Remove unused function.
	* sysdeps/ieee754/dbl-64/mpexp.c: Delete file.
	* sysdeps/ieee754/dbl-64/mplog.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/mpexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/mplog.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove mpexp* and mplog*.
	* sysdeps/x86_64/fpu/multiarch/e_log-avx.c: Remove unused defines.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_log-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-avx.c: Delete file.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mpexp-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/mplog-fma4.c: Likewise.
2018-02-15 12:41:05 +00:00
Szabolcs Nagy
de800d8305 Remove slow paths from exp
Remove the __slowexp code, so exp is no longer correctly rounded.  The
result is computed to about 70 bits precision so the worst case ulp
error is about 0.500007 in nearest rounding mode.

	* manual/probes.texi: Remove slowexp probes.
	* math/Makefile: Remove slowexp.
	* sysdeps/generic/math_private.h (__slowexp): Remove.
	* sysdeps/ieee754/dbl-64/e_exp.c (__ieee754_exp): Remove __slowexp and
	document error bounds.
	* sysdeps/i386/fpu/slowexp.c: Remove.
	* sysdeps/ia64/fpu/slowexp.c: Remove.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove.
	* sysdeps/ieee754/dbl-64/uexp.h (err_0): Remove.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Remove.
	* sysdeps/powerpc/power4/fpu/Makefile (CPPFLAGS-slowexp.c): Remove.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowexp-fma.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Remove.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Remove.
2018-02-12 11:33:33 +00:00
Wilco Dijkstra
c3d466cba1 Remove slow paths from pow
Remove the slow paths from pow.  Like several other double precision math
functions, pow is exactly rounded.  This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP.  Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.

All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP.  A simple test over a few hundred million
values shows pow is 10% faster on average.  This fixes BZ #13932.

	[BZ #13932]
	* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
	* benchtests/pow-inputs: Update comment for slow path cases.
	* manual/probes.texi (slowpow_p10): Delete removed probe.
	(slowpow_p10): Likewise.
	* math/Makefile: Remove halfulp.c and slowpow.c.
	* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/generic/math_private.h (__exp1): Remove error argument.
	(__halfulp): Remove.
	(__slowpow): Remove.
	* sysdeps/i386/fpu/halfulp.c: Delete file.
	* sysdeps/i386/fpu/slowpow.c: Likewise.
	* sysdeps/ia64/fpu/halfulp.c: Likewise.
	* sysdeps/ia64/fpu/slowpow.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
	improve comments and add error analysis.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
	(power1): Remove function:
	(log1): Remove error argument, add error analysis.
	(my_log2): Remove function.
	* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
	* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
	* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
	* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
	slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
2018-02-12 10:47:09 +00:00
H.J. Lu
c70e4e9c9e x86-64: Add sincosf with vector FMA
Since the x86-64 assembly version of sincosf is higly optimized with
vector instructions, there isn't much room for improvement.  However
s_sincosf.c written in C with vector math and intrinsics can be
optimized by GCC with FMA.

On Skylake, bench-sincosf reports performance improvement:

           Assembly       FMA         improvement
max        104.042       101.008         3%
min        9.426         8.586           10%
mean       20.6209       18.2238         13%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_sincosf-sse2 and s_sincosf-fma.
	(CFLAGS-s_sincosf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf.c: Likewise.
	* sysdeps/x86_64/fpu/s_sincosf.S: Don't add alias if
	__sincosf is defined.
2018-01-08 08:04:40 -08:00
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
Joseph Myers
f1e005022e Revert exp reimplementation (causes test failures).
Revert:

	2017-12-19  Joseph Myers  <joseph@codesourcery.com>

	* sysdeps/x86_64/fpu/libm-test-ulps: Update.

	2017-12-19  Patrick McGehearty  <patrick.mcgehearty@oracle.com>

	* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
	<errno.h>.  Include "eexp.tbl".
	(half): New constant.
	(one): Likewise.
	(__ieee754_exp): Rewrite.
	(__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
	* sysdeps/i386/fpu/slowexp.c: Likewise.
	* sysdeps/ia64/fpu/slowexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
	* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
	comment.
	* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
	(CPPFLAGS-slowexp.c): Remove variable.
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
	(CFLAGS-slowexp-fma.c): Remove variable.
	(CFLAGS-slowexp-fma4.c): Likewise.
	(CFLAGS-slowexp-avx.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
	define as macro.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
	* math/Makefile (type-double-routines): Remove slowexp.
	* manual/probes.texi (slowexp_p6): Remove.
	(slowexp_p32): Likewise.
2017-12-19 18:11:37 +00:00
Joseph Myers
6f58c10ded Update x86_64 libm-test-ulps.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2017-12-19 17:38:41 +00:00
Patrick McGehearty
6fd0a3c6a8 Improve __ieee754_exp() performance by greater than 5x on sparc/x86.
These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).

Using the glibc perf tests on sparc,
      sparc (nsec)    x86 (nsec)
      old     new     old     new
max   17629   395    5173     144
min     399    54      15      13
mean   5317   200    1349      23

The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.

Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0)  (-1.46113312244415283203125)
 new code:    2.31973271630014299393707e-01   0x1.db14cd799387ap-3
 old code:    2.31973271630014271638132e-01   0x1.db14cd7993879p-3
    exp    =  2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp

In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes.  For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures.  I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.

	* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
	<errno.h>.  Include "eexp.tbl".
	(half): New constant.
	(one): Likewise.
	(__ieee754_exp): Rewrite.
	(__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
	* sysdeps/i386/fpu/slowexp.c: Likewise.
	* sysdeps/ia64/fpu/slowexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
	* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
	comment.
	* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
	(CPPFLAGS-slowexp.c): Remove variable.
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
	(CFLAGS-slowexp-fma.c): Remove variable.
	(CFLAGS-slowexp-fma4.c): Likewise.
	(CFLAGS-slowexp-avx.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
	define as macro.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
	* math/Makefile (type-double-routines): Remove slowexp.
	* manual/probes.texi (slowexp_p6): Remove.
	(slowexp_p32): Likewise.
2017-12-19 17:27:31 +00:00
H.J. Lu
4ca945e9c5 x86-64: Remove sysdeps/x86_64/fpu/s_cosf.S
On Ivy Bridge, bench-cosf reports performance improvement:

           s_cosf.S      s_cosf.c       Improvement
max        114.136       82.401            39%
min        13.119        11.301            16%
mean       22.1882       21.8938           1%

	* sysdeps/x86_64/fpu/s_cosf.S: Removed.
2017-12-14 04:39:53 -08:00
H.J. Lu
ac817e083b x86-64: Add cosf with FMA
On Skylake, bench-cosf reports performance improvement:

            Before        After         Improvement
max        135.362       94.552            43%
min        8.532         7.688             11%
mean       17.1446       11.8128           45%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_cosf-sse2 and s_cosf-fma.
	(CFLAGS-s_cosf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_cosf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_cosf-sse2.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_cosf.c: Likewise.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2017-12-12 15:32:58 -08:00
H.J. Lu
9d0ffa60ad x86-64: Add sinf with FMA
On Skylake, bench-sinf reports performance improvement:

            Before        After         Improvement
max        153.996       100.094           54%
min        8.546         6.852             25%
mean       18.1223       11.802            54%

	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Add s_sinf-sse2 and s_sinf-fma.
	(CFLAGS-s_sinf-fma.c): New.
	* sysdeps/x86_64/fpu/multiarch/s_sinf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/s_sinf-sse2.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_sinf.c: Likewise.
2017-12-07 10:11:16 -08:00