Commit Graph

219 Commits

Author SHA1 Message Date
Sunil K Pandey
11c01de14c x86-64: Add vector asin/asinf implementation to libmvec
Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector asin/asinf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:03 -08:00
Sunil K Pandey
146310177a x86-64: Add vector atan/atanf implementation to libmvec
Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector atan/atanf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:36:46 -08:00
Sunil K Pandey
f20f980c71 x86-64: Add vector acos/acosf implementation to libmvec
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-22 13:03:14 -08:00
H.J. Lu
de8a0897e3 Regenerate ulps on x86_64 with GCC 12
Fix

FAIL: math/test-float-clog10
FAIL: math/test-float32-clog10

on Intel Core i7-1165G7 with GCC 12.
2021-12-20 15:25:00 -08:00
Paul Zimmermann
e6eef0adc5 regenerate ulps on x86_64 with -march=native
On x86_64, when configuring glibc with CFLAGS="-O2 -g -march=native",
some tests fail. After this patch, "make check" succeeds.

Tested on Intel Core i5-4590 with gcc 10.2.1.
2021-04-28 12:46:00 +02:00
Paul Zimmermann
43576de04a Improve the accuracy of tgamma (BZ #26983)
With this patch, the maximal known error for tgamma is now reduced to 9 ulps
for dbl-64, for all rounding modes. Since exhaustive testing is not possible
for dbl-64, it might be that there are still cases with an error larger than
9 ulps, but all known cases are fixed (intensive tests were done to find cases
with large errors).

Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm,
s390x, sparc, and i686).
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-07 13:23:39 +02:00
Paul Zimmermann
9acda61d94 Fix the inaccuracy of j0f/j1f/y0f/y1f [BZ #14469, #14470, #14471, #14472]
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.

The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average.  Two different algorithms are used:

* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
  degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)

* for large inputs, an asymptotic formula from [1] is used

[1] Fast and Accurate Bessel Function Computation,
    John Harrison, Proceedings of Arith 19, 2009.

Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).

Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-02 06:15:48 +02:00
Wilco Dijkstra
db3f7bb558 math: Remove slow paths from asin and acos [BZ #15267]
This patch series removes all remaining slow paths and related code.
First asin/acos, tan, atan, atan2 implementations are updated, and the final
patch removes the unused mpa files, headers and probes. Passes buildmanyglibc.

Remove slow paths from asin/acos. Add ULP annotations based on previous slow
path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Paul Zimmermann
5a051454a9 Add inputs that generate larger error bounds
(Using values from https://members.loria.fr/PZimmermann/papers/accuracy.pdf)
2021-02-27 06:32:11 +01:00
Paul Zimmermann
cad5ad81d2 add inputs to auto-libm-test-in yielding larger errors (binary64, x86_64) 2020-12-21 10:35:20 +05:30
Adhemerval Zanella
5ff35e9544 math: Update x86_64 ulps
From new j0 test.
2020-08-08 16:43:11 -03:00
Andreas K. Hüttel
180d5a045f Update x86-64 libm-test-ulps
x86_64 Intel(R) Core(TM) i5-8265U
gcc (Gentoo 10.1.0-r2 p3) 10.1.0
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-07-25 17:10:53 -04:00
Paul Zimmermann
a9d42c09a3 math: Add inputs that yield larger errors for float type (x86_64)
The corner cases included were generated using exhaustive search
for all float/binary32 values on x86_64 (comparing to MPFR for
correct rounding to nearest).

For the j0/j1/y0 functions, only cases with ulp error <= 9 were
included.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-03-31 21:48:54 -04:00
Adhemerval Zanella
1c15464ca0 math: Remove inline math tests
With mathinline removal there is no need to keep building and testing
inline math tests.

The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries.  The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.

Checked on x86_64-linux-gnu and i686-linux-gnu.
2020-03-19 11:45:44 -03:00
H.J. Lu
9412979a43 Regenerate sysdeps/x86_64/fpu/libm-test-ulps
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
2018-12-26 06:57:30 -08:00
Szabolcs Nagy
e70c176825 Add new exp and exp2 implementations
Optimized exp and exp2 implementations using a lookup table for
fractional powers of 2.  There are several variants, see e_exp_data.c,
they can be selected by modifying math_config.h allowing different
tradeoffs.

The default selection should be acceptable as generic libm code.
Worst case error is 0.509 ULP for exp and 0.507 ULP for exp2, on
aarch64 the rodata size is 2160 bytes, shared between exp and exp2.
On aarch64 .text + .rodata size decreased by 24912 bytes.

The non-nearest rounding error is less than 1 ULP even on targets
without efficient round implementation (although the error rate is
higher in that case).  Targets with single instruction, rounding mode
independent, to nearest integer rounding and conversion can use them
by setting TOINT_INTRINSICS and adding the necessary code to their
math_private.h.

The __exp1 code uses the same algorithm, so the error bound of pow
increased a bit.

New double precision error handling code was added following the
style of the single precision error handling code.

Improvements on Cortex-A72 compared to current glibc master:
exp thruput: 1.61x in [-9.9 9.9]
exp latency: 1.53x in [-9.9 9.9]
exp thruput: 1.13x in [0.5 1]
exp latency: 1.30x in [0.5 1]
exp2 thruput: 2.03x in [-9.9 9.9]
exp2 latency: 1.64x in [-9.9 9.9]

For small (< 1) inputs the current exp code uses a separate algorithm
so the speed up there is less.

Was tested on
aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and
arm-linux-gnueabihf (!TOINT_INTRINSICS, no fma contraction) and
x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) and
powerpc64le-linux-gnu (!TOINT_INTRINSICS, fma contraction) targets,
only non-nearest rounding ulp errors increase and they are within
acceptable bounds (ulp updates are in separate patches).

	* NEWS: Mention exp and exp2 improvements.
	* math/Makefile (libm-support): Remove t_exp.
	(type-double-routines): Add math_err and e_exp_data.
	* sysdeps/aarch64/libm-test-ulps: Update.
	* sysdeps/arm/libm-test-ulps: Update.
	* sysdeps/i386/fpu/e_exp_data.c: New file.
	* sysdeps/i386/fpu/math_err.c: New file.
	* sysdeps/i386/fpu/t_exp.c: Remove.
	* sysdeps/ia64/fpu/e_exp_data.c: New file.
	* sysdeps/ia64/fpu/math_err.c: New file.
	* sysdeps/ia64/fpu/t_exp.c: Remove.
	* sysdeps/ieee754/dbl-64/e_exp.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_exp2.c: Rewrite.
	* sysdeps/ieee754/dbl-64/e_exp_data.c: New file.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Update error bound.
	* sysdeps/ieee754/dbl-64/eexp.tbl: Remove.
	* sysdeps/ieee754/dbl-64/math_config.h: New file.
	* sysdeps/ieee754/dbl-64/math_err.c: New file.
	* sysdeps/ieee754/dbl-64/t_exp.c: Remove.
	* sysdeps/ieee754/dbl-64/t_exp2.h: Remove.
	* sysdeps/ieee754/dbl-64/uexp.h: Remove.
	* sysdeps/ieee754/dbl-64/uexp.tbl: Remove.
	* sysdeps/m68k/m680x0/fpu/e_exp_data.c: New file.
	* sysdeps/m68k/m680x0/fpu/math_err.c: New file.
	* sysdeps/m68k/m680x0/fpu/t_exp.c: Remove.
	* sysdeps/powerpc/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2018-09-05 16:22:00 +01:00
Wilco Dijkstra
49acec179c Fix spaces in x86_64 ULP file
Fix a few missing spaces, it's now identical to the regenerated version.

Passes GLIBC tests on x64.

	* sysdeps/x86_64/fpu/libm-test-ulps: Regenerate to fix spaces.
2018-08-15 12:56:22 +01:00
Wilco Dijkstra
599cf39766 Improve performance of sinf and cosf
The second patch improves performance of sinf and cosf using the same
algorithms and polynomials.  The returned values are identical to sincosf
for the same input.  ULP definitions for AArch64 and x64 are updated.

sinf/cosf througput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.2x
* |x| < M_PI_4  : 1.8x
* |x| < 2 * M_PI: 1.7x
* |x| < 120.0   : 2.3x
* |x| < Inf     : 3.0x

	* NEWS: Mention sinf, cosf, sincosf.
	* sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf.
	* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of
	constants rather than including generic sincosf.h.
	* sysdeps/x86_64/fpu/s_sincosf_data.c: Remove.
	* sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite.
	* sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove.
	(reduced_cos): Remove.
	(sinf_poly): New function.
	* sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
2018-08-14 10:45:59 +01:00
Paul Pluzhnikov
50d004c91c Update ulps with "make regen-ulps" on AMD Ryzen 7 1800X.
2018-05-30  Paul Pluzhnikov  <ppluzhnikov@google.com>

	* sysdeps/x86_64/fpu/libm-test-ulps (log_vlen8_avx2): Update for
	AMD Ryzen 7 1800X.
2018-05-30 09:17:47 -07:00
Wilco Dijkstra
19a8b9a300 [PATCH 1/7] sin/cos slow paths: avoid slow paths for small inputs
This series of patches removes the slow patchs from sin, cos and sincos.
Besides greatly simplifying the implementation, the new version is also much
faster for inputs up to PI (41% faster) and for large inputs needing range
reduction (27% faster).

ULP is ~0.55 with no errors found after testing 1.6 billion inputs across most
of the range with mpsin and mpcos.  The number of incorrectly rounded results
(ie. ULP >0.5) is at most ~2750 per million inputs between 0.125 and 0.5,
the average is ~850 per million between 0 and PI.

Tested on AArch64 and x86_64 with no regressions.

The first patch removes the slow paths for the cases where the input is small
and doesn't require range reduction.  Update ULP tables for sin, cos and sincos
on AArch64 and x86_64.

	* sysdeps/aarch64/libm-test-ulps: Update ULP for sin, cos, sincos.
	* sysdeps/ieee754/dbl-64/s_sin.c (__sin): Remove slow paths for small
	inputs.
	(__cos): Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sin, cos, sincos.
2018-04-03 16:52:16 +01:00
Wilco Dijkstra
c3d466cba1 Remove slow paths from pow
Remove the slow paths from pow.  Like several other double precision math
functions, pow is exactly rounded.  This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP.  Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.

All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP.  A simple test over a few hundred million
values shows pow is 10% faster on average.  This fixes BZ #13932.

	[BZ #13932]
	* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
	* benchtests/pow-inputs: Update comment for slow path cases.
	* manual/probes.texi (slowpow_p10): Delete removed probe.
	(slowpow_p10): Likewise.
	* math/Makefile: Remove halfulp.c and slowpow.c.
	* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/generic/math_private.h (__exp1): Remove error argument.
	(__halfulp): Remove.
	(__slowpow): Remove.
	* sysdeps/i386/fpu/halfulp.c: Delete file.
	* sysdeps/i386/fpu/slowpow.c: Likewise.
	* sysdeps/ia64/fpu/halfulp.c: Likewise.
	* sysdeps/ia64/fpu/slowpow.c: Likewise.
	* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
	improve comments and add error analysis.
	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
	(power1): Remove function:
	(log1): Remove error argument, add error analysis.
	(my_log2): Remove function.
	* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
	* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
	* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
	* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
	slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
	* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
	* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
2018-02-12 10:47:09 +00:00
Joseph Myers
f1e005022e Revert exp reimplementation (causes test failures).
Revert:

	2017-12-19  Joseph Myers  <joseph@codesourcery.com>

	* sysdeps/x86_64/fpu/libm-test-ulps: Update.

	2017-12-19  Patrick McGehearty  <patrick.mcgehearty@oracle.com>

	* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
	<errno.h>.  Include "eexp.tbl".
	(half): New constant.
	(one): Likewise.
	(__ieee754_exp): Rewrite.
	(__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
	* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
	* sysdeps/i386/fpu/slowexp.c: Likewise.
	* sysdeps/ia64/fpu/slowexp.c: Likewise.
	* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
	* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
	* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
	* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
	comment.
	* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
	(CPPFLAGS-slowexp.c): Remove variable.
	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
	Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
	(CFLAGS-slowexp-fma.c): Remove variable.
	(CFLAGS-slowexp-fma4.c): Likewise.
	(CFLAGS-slowexp-avx.c): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
	define as macro.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
	* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
	* math/Makefile (type-double-routines): Remove slowexp.
	* manual/probes.texi (slowexp_p6): Remove.
	(slowexp_p32): Likewise.
2017-12-19 18:11:37 +00:00
Joseph Myers
6f58c10ded Update x86_64 libm-test-ulps.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2017-12-19 17:38:41 +00:00
H.J. Lu
e1f59bebd8 x86-64: Replace assembly versions of e_expf with generic e_expf.c
This patch replaces x86-64 assembly versions of e_expf with generic
e_expf.c.  For workload-spec2017.wrf, on Nehalem, it improves
performance by:

                           Before            After     Improvement
reciprocal-throughput      36.039           20.7749       73%
latency                    58.8096          40.8715       43%

On Skylake, it improves

                           Before            After     Improvement
reciprocal-throughput      18.4436          11.1693       65%
latency                    47.5162          37.5411       26%

	* sysdeps/x86_64/fpu/e_expf.S: Removed.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.S: Likewise.
	* sysdeps/x86_64/fpu/w_expf.c: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Updated for generic
	e_expf.c.
	* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_expf-fma.c):
	New.
	* sysdeps/x86_64/fpu/multiarch/e_expf-fma.c: New file.
	* sysdeps/x86_64/fpu/multiarch/e_expf.c (__redirect_ieee754_expf):
	Renamed to ...
	(__redirect_expf): This.
	(SYMBOL_NAME): Changed to expf.
	(__ieee754_expf): Renamed to ...
	(__expf): This.
	(__GI___expf): This.
	(__ieee754_expf): Add strong_alias.
	(__expf_finite): Likewise.
	(__expf): New.
	Include <sysdeps/ieee754/flt-32/e_expf.c>.
2017-10-22 07:49:55 -07:00
Joseph Myers
2f92505d20 Update x86_64 libm-test-ulps.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2017-09-29 18:03:48 +00:00
Markus Trippelsdorf
4c03a69680 Update x86_64 ulps for AMD Ryzen.
* sysdeps/x86_64/fpu/libm-test-ulps: Update for AMD Ryzen.
2017-09-08 19:57:12 +00:00
Joseph Myers
5a80d39d0d Obsolete pow10 functions.
This patch obsoletes the pow10, pow10f and pow10l functions (makes
them into compat symbols, not available for new ports or static
linking).  The exp10 names for these functions are standardized (in TS
18661-4) and were added in the same glibc version (2.1) as pow10 so
source code can change to use them without any loss of portability.
Since pow10 is deliberately not provided for _Float128, only exp10,
this slightly simplifies moving to the new wrapper templates in the
!LIBM_SVID_COMPAT case, by avoiding needing to arrange for pow10,
pow10f and pow10l to be defined by those templates.

Tested for x86_64, and with build-many-glibcs.py.

	* manual/math.texi (pow10): Do not document.
	(pow10f): Likewise.
	(pow10l): Likewise.
	* math/bits/mathcalls.h [__USE_GNU] (pow10): Do not declare.
	* math/bits/math-finite.h [__USE_GNU] (pow10): Likewise.
	* math/libm-test-exp10.inc (pow10_test): Remove.
	(do_test): Do not call pow10.
	* math/w_exp10_compat.c (pow10): Make into compat symbol.
	[NO_LONG_DOUBLE] (pow10l): Likewise.
	* math/w_exp10f_compat.c (pow10f): Likewise.
	* math/w_exp10l_compat.c (pow10l): Likewise.
	* sysdeps/ia64/fpu/e_exp10.S: Include <shlib-compat.h>.
	(pow10): Make into compat symbol.
	* sysdeps/ia64/fpu/e_exp10f.S: Include <shlib-compat.h>.
	(pow10f): Make into compat symbol.
	* sysdeps/ia64/fpu/e_exp10l.S: Include <shlib-compat.h>.
	(pow10l): Make into compat symbol.
	* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Remove
	pow10.
	(CFLAGS-nldbl-pow10.c): Remove variable..
	* sysdeps/ieee754/ldbl-opt/nldbl-pow10.c: Remove file.
	* sysdeps/ieee754/ldbl-opt/w_exp10_compat.c (pow10l): Condition on
	[SHLIB_COMPAT (libm, GLIBC_2_1, GLIBC_2_27)].
	* sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c (compat_symbol):
	Undefine and redefine.
	(pow10l): Make into compat symbol.
	* sysdeps/aarch64/libm-test-ulps: Remove pow10 ulps.
	* sysdeps/alpha/fpu/libm-test-ulps: Likewise.
	* sysdeps/arm/libm-test-ulps: Likewise.
	* sysdeps/hppa/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/microblaze/libm-test-ulps: Likewise.
	* sysdeps/mips/mips32/libm-test-ulps: Likewise.
	* sysdeps/mips/mips64/libm-test-ulps: Likewise.
	* sysdeps/nios2/libm-test-ulps: Likewise.
	* sysdeps/powerpc/fpu/libm-test-ulps: Likewise.
	* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
	* sysdeps/s390/fpu/libm-test-ulps: Likewise.
	* sysdeps/sh/libm-test-ulps: Likewise.
	* sysdeps/sparc/fpu/libm-test-ulps: Likewise.
	* sysdeps/tile/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2017-09-01 21:13:18 +00:00
H.J. Lu
fcaaca412f x86-64: Regenerate libm-test-ulps for AVX512 mathvec tests
Update libm-test-ulps for AVX512 mathvec tests by running
“make regen-ulps” on Intel Xeon processor with AVX512.

	* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
2017-08-23 09:11:55 -07:00
Joseph Myers
c86ed71d63 Add float128 support for x86_64, x86.
This patch enables float128 support for x86_64 and x86.  All GCC
versions that can build glibc provide the required support, but since
GCC 6 and before don't provide __builtin_nanq / __builtin_nansq, sNaN
tests and some tests of NaN payloads need to be disabled with such
compilers (this does not affect the generated glibc binaries at all,
just the tests).  bits/floatn.h declares float128 support to be
available for GCC versions that provide the required libgcc support
(4.3 for x86_64, 4.4 for i386 GNU/Linux, 4.5 for i386 GNU/Hurd);
compilation-only support was present some time before then, but not
really useful without the libgcc functions.

fenv_private.h needed updating to avoid trying to put _Float128 values
in registers.  I make no assertion of optimality of the
math_opt_barrier / math_force_eval definitions for this case; they are
simply intended to be sufficient to work correctly.

Tested for x86_64 and x86, with GCC 7 and GCC 6.  (Testing for x32 was
compilation tests only with build-many-glibcs.py to verify the ABI
baseline updates.  I have not done any testing for Hurd, although the
float128 support is enabled there as for GNU/Linux.)

	* sysdeps/i386/Implies: Add ieee754/float128.
	* sysdeps/x86_64/Implies: Likewise.
	* sysdeps/x86/bits/floatn.h: New file.
	* sysdeps/x86/float128-abi.h: Likewise.
	* manual/math.texi (Mathematics): Document support for _Float128
	on x86_64 and x86.
	* sysdeps/i386/fpu/fenv_private.h: Include <bits/floatn.h>.
	(math_opt_barrier): Do not put _Float128 values in floating-point
	registers.
	(math_force_eval): Likewise.
	[__x86_64__] (SET_RESTORE_ROUNDF128): New macro.
	* sysdeps/x86/fpu/Makefile [$(subdir) = math] (CPPFLAGS): Append
	to Makefile variable.
	* sysdeps/x86/fpu/e_sqrtf128.c: New file.
	* sysdeps/x86/fpu/sfp-machine.h: Likewise.  Based on libgcc.
	* sysdeps/x86/math-tests.h: New file.
	* math/libm-test-support.h (XFAIL_FLOAT128_PAYLOAD): New macro.
	* math/libm-test-getpayload.inc (getpayload_test_data): Use
	XFAIL_FLOAT128_PAYLOAD.
	* math/libm-test-setpayload.inc (setpayload_test_data): Likewise.
	* math/libm-test-totalorder.inc (totalorder_test_data): Likewise.
	* math/libm-test-totalordermag.inc (totalordermag_test_data):
	Likewise.
	* sysdeps/unix/sysv/linux/i386/libc.abilist: Update.
	* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
	* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
	* sysdeps/i386/fpu/libm-test-ulps: Likewise.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2017-06-26 22:02:24 +00:00
Joseph Myers
2c51dfd05d Move tests of catan, catanh to auto-libm-test-*.
This patch moves tests of catan and catanh with finite inputs (other
than the divide-by-zero cases producing an exact infinity) to using
the auto-libm-test machinery.  Each of auto-libm-test-out-catan and
auto-libm-test-out-catanh takes about three seconds to generate on my
system (so in fact it wasn't necessary after all to defer the move to
auto-libm-test-* until the output files were split up by function).

Tested for x86_64 and x86 and ulps updated accordingly.

	* math/auto-libm-test-in: Add tests of catan and catanh.
	* math/auto-libm-test-out-catan: New generated file.
	* math/auto-libm-test-out-catanh: Likewise.
	* math/libm-test-catan.inc (catan_test_data): Use AUTO_TESTS_c_c.
	Move tests with finite inputs, except divide-by-zero cases, to
	auto-libm-test-in.
	* math/libm-test-catanh.inc (catanh_test_data): Likewise.
	* math/Makefile (libm-test-funcs-auto): Add catan and catanh.
	(libm-test-funcs-noauto): Remove catan and catanh.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2017-02-17 18:42:37 +00:00
Joseph Myers
fa2a3dd7a3 Move tests of casin, casinh to auto-libm-test-*.
This patch moves tests of casin and casinh with finite inputs to using
the auto-libm-test machinery.  Each of auto-libm-test-out-casin and
auto-libm-test-out-casinh takes about 38 minutes to generate on my
system because of MPC slowness on special cases that appear in the
tests (with MPC 1.0.3; I don't know to what extent current MPC master
might speed it up).

Tested for x86_64 and x86 and ulps updated accordingly.

	* math/auto-libm-test-in: Add tests of casin and casinh.
	* math/auto-libm-test-out-casin: New generated file.
	* math/auto-libm-test-out-casinh: Likewise.
	* math/libm-test-casin.inc (casin_test_data): Use AUTO_TESTS_c_c.
	Move tests with finite inputs to auto-libm-test-in.
	* math/libm-test-casinh.inc (casinh_test_data): Likewise.
	* math/Makefile (libm-test-funcs-auto): Add casin and casinh.
	(libm-test-funcs-noauto): Remove casin and casinh.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2017-02-17 18:14:02 +00:00
Joseph Myers
6b8303a383 Move tests of cacos, cacosh to auto-libm-test-*.
This patch moves tests of cacos and cacosh with finite inputs to using
the auto-libm-test machinery.  Each of auto-libm-test-out-cacos and
auto-libm-test-out-cacosh takes about 80 minutes to generate on my
system because of MPC slowness on special cases that appear in the
tests (with MPC 1.0.3; I don't know to what extent current MPC master
might speed it up).

Tested for x86_64 and x86 and ulps updated accordingly.

	* math/auto-libm-test-in: Add tests of cacos and cacosh.
	* math/auto-libm-test-out-cacos: New generated file.
	* math/auto-libm-test-out-cacosh: Likewise.
	* math/libm-test-cacos.inc (cacos_test_data): Use AUTO_TESTS_c_c.
	Move tests with finite inputs to auto-libm-test-in.
	* math/libm-test-cacosh.inc (cacosh_test_data): Likewise.
	* math/Makefile (libm-test-funcs-auto): Add cacos and cacosh.
	(libm-test-funcs-noauto): Remove cacos and cacosh.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2017-02-17 17:44:23 +00:00
Joseph Myers
c898991d8b Fix x86_64 / x86 powl inaccuracy for integer exponents (bug 19848).
Bug 19848 reports cases where powl on x86 / x86_64 has error
accumulation, for small integer exponents, larger than permitted by
glibc's accuracy goals, at least in some rounding modes.  This patch
further restricts the exponent range for which the
small-integer-exponent logic is used to limit the possible error
accumulation.

Tested for x86_64 and x86 and ulps updated accordingly.

	[BZ #19848]
	* sysdeps/i386/fpu/e_powl.S (p3): Rename to p2 and change value
	from 8 to 4.
	(__ieee754_powl): Compare integer exponent against 4 not 8.
	* sysdeps/x86_64/fpu/e_powl.S (p3): Rename to p2 and change value
	from 8 to 4.
	(__ieee754_powl): Compare integer exponent against 4 not 8.
	* math/auto-libm-test-in: Add more tests of pow.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2016-03-24 01:32:52 +00:00
Florian Weimer
e39edb0552 x86_64: Regenerate ulps [BZ #19168]
This comes from running “make regen-ulps” on AMD Opteron 6272 CPUs.
2015-10-26 13:15:48 +01:00
Joseph Myers
9d1687b2df Add more libm tests (ilogb, is*, j0, j1, jn, lgamma, log*).
This patch improves the libm test coverage for a few more functions.

Tested for x86_64 and x86.

	* math/auto-libm-test-in: Add more tests of log, log10, log1p and
	log2.
	* math/auto-libm-test-out: Regenerated.
	* math/libm-test.inc (MAX_EXP): New macro.
	(ilogb_test_data): Add more tests.
	(isfinite_test_data): Likewise.
	(isgreater_test_data): Likewise.
	(isgreaterequal_test_data): Likewise.
	(isinf_test_data): Likewise.
	(isless_test_data): Likewise.
	(islessequal_test_data): Likewise.
	(islessgreater_test_data): Likewise.
	(isnan_test_data): Likewise.
	(isnormal_test_data): Likewise.
	(issignaling_test_data): Likewise.
	(isunordered_test_data): Likewise.
	(j0_test_data): Likewise.
	(j1_test_data): Likewise.
	(jn_test_data): Likewise.
	(lgamma_test_data): Likewise.
	(log_test_data): Likewise.
	(log10_test_data): Likewise.
	(log1p_test_data): Likewise.
	(log2_test_data): Likewise.
	(logb_test_data): Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2015-10-23 22:46:05 +00:00
Paul Pluzhnikov
57352c2201 sysdeps/x86_64/fpu/libm-test-ulps: Regenerated on Haswell. 2015-10-03 11:31:21 -07:00
Joseph Myers
93e448cbed Improve test coverage of real libm functions [a-e]*.
This patch improves test coverage of the real libm functions [a-e]*,
ensuring that special cases and ranges of input values of potential
significance (such as close to overflow and underflow thresholds) are
more systematically covered.

This is a followup to
<https://sourceware.org/ml/libc-alpha/2013-12/msg00757.html> which
covered [a-c]* (however, I found more weaknesses in the coverage of
those functions when preparing this patch, hence the additional tests
being added for them here).

Addition of a test for acosh (-qNaN) is temporarily deferred, to be
included as part of a fix for bug 19032 which was discovered in the
course of adding these tests (and which illustrates the use of testing
-qNaN as well as +qNaN as input even to functions for which the sign
of a NaN isn't meant to be significant).

Tested for x86_64 and x86.

	* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
	atan, atan2, atanh, cbrt, cos, cosh, erf, erfc, exp, exp10, exp2
	and expm1.
	* math/auto-libm-test-out: Regenerated.
	* math/libm-test.inc (acos_test_data): Add more tests.
	(asin_test_data): Likewise.
	(asinh_test_data): Likewise.
	(atan_test_data): Likewise.
	(atanh_test_data): Likewise.
	(atan2_test_data): Likewise.
	(cbrt_test_data): Likewise.
	(ceil_test_data): Likewise.
	(copysign_test_data): Likewise.
	(cos_test_data): Likewise.
	(cosh_test_data): Likewise.
	(erf_test_data): Likewise.
	(erfc_test_data): Likewise.
	(exp_test_data): Likewise.
	(exp10_test_data): Likewise.
	(exp2_test_data): Likewise.
	(expm1_test_data): Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2015-09-30 18:06:02 +00:00
Joseph Myers
a5721ebc68 Fix clog, clog10 inaccuracy (bug 19016).
For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large
errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids
cancellation error and then using log1p.

However, the thresholds for using that approach still result in log
being used on argument as large as sqrt(13/16) > 0.9, leading to
significant errors, in some cases above the 9ulp maximum allowed in
glibc libm.  This patch arranges for the approach using log1p to be
used in any cases where |X|, |Y| < 1 and X^2 + Y^2 >= 0.5 (with the
existing allowance for cases where one of X and Y is very small),
adjusting the __x2y2m1 functions to work with the wider range of
inputs.  This way, log only gets used on arguments below sqrt(1/2) (or
substantially above 1), where the error involved is much less.

Tested for x86_64, x86, mips64 and powerpc.  For the ulps regeneration
I removed the existing clog and clog10 ulps before regenerating to
allow any reduced ulps to appear.  Tests added include those found by
random test generation to produce large ulps either before or after
the patch, and some found by trying inputs close to the (0.75, 0.5)
threshold where the potential errors from using log are largest.

	[BZ #19016]
	* sysdeps/generic/math_private.h (__x2y2m1f): Update comment to
	allow more cases with X^2 + Y^2 >= 0.5.
	* sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise.  Add -1 as
	normal element in sum instead of special-casing based on values of
	arguments.
	* sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment.
	* sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise.  Add
	-1 as normal element in sum instead of special-casing based on
	values of arguments.
	* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise.
	* sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0]
	(__x2y2m1): Update comment.
	* sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise.  Add -1
	as normal element in sum instead of special-casing based on values
	of arguments.
	* math/s_clog.c (__clog): Handle more cases using log1p without
	hypot.
	* math/s_clog10.c (__clog10): Likewise.
	* math/s_clog10f.c (__clog10f): Likewise.
	* math/s_clog10l.c (__clog10l): Likewise.
	* math/s_clogf.c (__clogf): Likewise.
	* math/s_clogl.c (__clogl): Likewise.
	* math/auto-libm-test-in: Add more tests of clog and clog10.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-09-28 22:11:22 +00:00
Joseph Myers
fa752c6981 Fix powf inaccuracy (bug 18956).
The flt-32 version of powf can be inaccurate because of bugs in the
extra-precision calculation of (x-1)/(x+1) or (x-1.5)/(x+1.5) as part
of calculating log(x) with extra precision: a constant used (as part
of adding 1 or 1.5 through integer arithmetic) is incorrect, and then
the code fails to mask a computed high part before using it in
arithmetic that relies on s_h*t_h being exactly representable.  This
patch fixes these bugs.

Tested for x86_64 and x86.  x86_64 ulps for powf removed and
regenerated to reflect reduced ulps from the increased accuracy for
existing tests.

	[BZ #18956]
	* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Add 0x00400000
	not 0x0040000 for high bit of mantissa.  Mask with 0xfffff000 when
	extracting high part.
	* math/auto-libm-test-in: Add another test of pow.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2015-09-26 00:27:06 +00:00
Joseph Myers
a1f99ba28b Add more random libm test inputs (mainly for ldbl-128).
This patch adds more libm test inputs found through random test
generation to increase previously known ulps.  This particular test
generation was run for mips64, so most of the increased ulps are for
ldbl-128 (float and double having been fairly well covered by such
testing for x86_64), but there's the odd ulps increase for other
formats.

Tested for x86_64, x86 and mips64.

	* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
	asinh, atan, atan2, atanh, cabs, carg, cos, csqrt, erfc, exp,
	exp10, exp2, log, log1p, log2, pow, sin, sincos, sinh, tan and
	tanh.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/mips/mips32/libm-test-ulps: Likewise.
	* sysdeps/mips/mips64/libm-test-ulps: Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-09-12 00:01:38 +00:00
Joseph Myers
00a7073c38 Add more randomly-generated libm tests.
This patch adds more libm test inputs found through random test
generation to increase observed ulps on x86_64.

Tested for x86_64 and x86.

	* math/auto-libm-test-in: Add more tests of acosh, atanh, cbrt,
	cosh, csqrt, erfc, expm1 and lgamma.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-09-11 15:03:10 +00:00
Joseph Myers
050f29c188 Fix lgamma (negative) inaccuracy (bug 2542, bug 2543, bug 2558).
The existing implementations of lgamma functions (except for the ia64
versions) use the reflection formula for negative arguments.  This
suffers large inaccuracy from cancellation near zeros of lgamma (near
where the gamma function is +/- 1).

This patch fixes this inaccuracy.  For arguments above -2, there are
no zeros and no large cancellation, while for sufficiently large
negative arguments the zeros are so close to integers that even for
integers +/- 1ulp the log(gamma(1-x)) term dominates and cancellation
is not significant.  Thus, it is only necessary to take special care
about cancellation for arguments around a limited number of zeros.

Accordingly, this patch uses precomputed tables of relevant zeros,
expressed as the sum of two floating-point values.  The log of the
ratio of two sines can be computed accurately using log1p in cases
where log would lose accuracy.  The log of the ratio of two gamma(1-x)
values can be computed using Stirling's approximation (the difference
between two values of that approximation to lgamma being computable
without computing the two values and then subtracting), with
appropriate adjustments (which don't reduce accuracy too much) in
cases where 1-x is too small to use Stirling's approximation directly.

In the interval from -3 to -2, using the ratios of sines and of
gamma(1-x) can still produce too much cancellation between those two
parts of the computation (and that interval is also the worst interval
for computing the ratio between gamma(1-x) values, which computation
becomes more accurate, while being less critical for the final result,
for larger 1-x).  Because this can result in errors slightly above
those accepted in glibc, this interval is instead dealt with by
polynomial approximations.  Separate polynomial approximations to
(|gamma(x)|-1)(x-n)/(x-x0) are used for each interval of length 1/8
from -3 to -2, where n (-3 or -2) is the nearest integer to the
1/8-interval and x0 is the zero of lgamma in the relevant half-integer
interval (-3 to -2.5 or -2.5 to -2).

Together, the two approaches are intended to give sufficient accuracy
for all negative arguments in the problem range.  Outside that range,
the previous implementation continues to be used.

Tested for x86_64, x86, mips64 and powerpc.  The mips64 and powerpc
testing shows up pre-existing problems for ldbl-128 and ldbl-128ibm
with large negative arguments giving spurious "invalid" exceptions
(exposed by newly added tests for cases this patch doesn't affect the
logic for); I'll address those problems separately.

	[BZ #2542]
	[BZ #2543]
	[BZ #2558]
	* sysdeps/ieee754/dbl-64/e_lgamma_r.c (__ieee754_lgamma_r): Call
	__lgamma_neg for arguments from -28.0 to -2.0.
	* sysdeps/ieee754/flt-32/e_lgammaf_r.c (__ieee754_lgammaf_r): Call
	__lgamma_negf for arguments from -15.0 to -2.0.
	* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
	Call __lgamma_negl for arguments from -48.0 or -50.0 to -2.0.
	* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (__ieee754_lgammal_r):
	Call __lgamma_negl for arguments from -33.0 to -2.0.
	* sysdeps/ieee754/dbl-64/lgamma_neg.c: New file.
	* sysdeps/ieee754/dbl-64/lgamma_product.c: Likewise.
	* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
	* sysdeps/ieee754/flt-32/lgamma_productf.c: Likewise.
	* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
	* sysdeps/ieee754/ldbl-128/lgamma_productl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
	* sysdeps/ieee754/ldbl-128ibm/lgamma_productl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
	* sysdeps/ieee754/ldbl-96/lgamma_product.c: Likewise.
	* sysdeps/ieee754/ldbl-96/lgamma_productl.c: Likewise.
	* sysdeps/generic/math_private.h (__lgamma_negf): New prototype.
	(__lgamma_neg): Likewise.
	(__lgamma_negl): Likewise.
	(__lgamma_product): Likewise.
	(__lgamma_productl): Likewise.
	* math/Makefile (libm-calls): Add lgamma_neg and lgamma_product.
	* math/auto-libm-test-in: Add more tests of lgamma.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-09-10 22:27:58 +00:00
Paul Pluzhnikov
db7f8c8fe0 Regenerated sysdeps/x86_64/fpu/libm-test-ulps with AVX2. 2015-08-14 09:59:04 -07:00
Joseph Myers
3ba0ac10fa Add more random libm-test inputs.
This patch adds more test inputs to various libm functions found
through random generation to have larger ulps errors than previously
listed in libm-test-ulp, on at least one of x86_64 and x86.

Tested for x86_64 and x86.

	* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
	asinh, atan, atan2, atanh, cabs, cbrt, cosh, csqrt, erf, erfc,
	exp, exp2, lgamma, log, log1p, log2, pow, sin, sincos, tan, tanh
	and tgamma.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-08-13 23:23:23 +00:00
Joseph Myers
4afe4b20ce Add more tests of various libm functions.
This patch adds more tests of various libm functions found through
random test generation to give increased ulps on 32-bit x86.

Tested for x86_64 and x86.

	* math/auto-libm-test-in: Add more tests of acosh, asin, asinh,
	atanh, cabs, carg, cbrt, cosh, csqrt, erf, erfc, exp, exp10,
	expm1, hypot, log, log10, log1p, log2, pow, sinh, tan and tgamma.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-08-11 00:58:28 +00:00
Joseph Myers
e02920bc02 Improve tgamma accuracy (bug 18613).
In non-default rounding modes, tgamma can be slightly less accurate
than permitted by glibc's accuracy goals.

Part of the problem is error accumulation, addressed in this patch by
setting round-to-nearest for internal computations.  However, there
was also a bug in the code dealing with computing pow (x + n, x + n)
where x + n is not exactly representable, providing another source of
error even in round-to-nearest mode; it was necessary to address both
bugs to get errors for all testcases within glibc's accuracy goals.
Given this second fix, accuracy in round-to-nearest mode is also
improved (hence regeneration of ulps for tgamma should be from scratch
- truncate libm-test-ulps or at least remove existing tgamma entries -
so that the expected ulps can be reduced).

Some additional complications also arose.  Certain tgamma tests should
strictly, according to IEEE semantics, overflow or not depending on
the rounding mode; this is beyond the scope of glibc's accuracy goals
for any function without exactly-determined results, but
gen-auto-libm-tests doesn't handle being lax there as it does for
underflow.  (libm-test.inc also doesn't handle being lax about whether
the result in cases very close to the overflow threshold is infinity
or a finite value close to overflow, but that doesn't cause problems
in this case though I've seen it cause problems with random test
generation for some functions.)  Thus, spurious-overflow markings,
with a comment, are added to auto-libm-test-in (no bug in Bugzilla
because the issue is with the testsuite, not a user-visible bug in
glibc).  And on x86, after the patch I saw ERANGE issues as previously
reported by Carlos (see my commentary in
<https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>), which
needed addressing by ensuring excess range and precision were
eliminated at various points if FLT_EVAL_METHOD != 0.

I also noticed and fixed a cosmetic issue where 1.0f was used in long
double functions and should have been 1.0L.

This completes the move of all functions to testing in all rounding
modes with ALL_RM_TEST, so gen-libm-have-vector-test.sh is updated to
remove the workaround for some functions not using ALL_RM_TEST.

Tested for x86_64, x86, mips64 and powerpc.

	[BZ #18613]
	* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Take log of
	X_ADJ not X when adjusting exponent.
	(__ieee754_gamma_r): Do intermediate computations in
	round-to-nearest then adjust overflowing and underflowing results
	as needed.
	* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Take log
	of X_ADJ not X when adjusting exponent.
	(__ieee754_gammaf_r): Do intermediate computations in
	round-to-nearest then adjust overflowing and underflowing results
	as needed.
	* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Take
	log of X_ADJ not X when adjusting exponent.
	(__ieee754_gammal_r): Do intermediate computations in
	round-to-nearest then adjust overflowing and underflowing results
	as needed.  Use 1.0L not 1.0f as numerator of division.
	* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Take
	log of X_ADJ not X when adjusting exponent.
	(__ieee754_gammal_r): Do intermediate computations in
	round-to-nearest then adjust overflowing and underflowing results
	as needed.  Use 1.0L not 1.0f as numerator of division.
	* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Take log
	of X_ADJ not X when adjusting exponent.
	(__ieee754_gammal_r): Do intermediate computations in
	round-to-nearest then adjust overflowing and underflowing results
	as needed.  Use 1.0L not 1.0f as numerator of division.
	* math/libm-test.inc (tgamma_test_data): Remove one test.  Moved
	to auto-libm-test-in.
	(tgamma_test): Use ALL_RM_TEST.
	* math/auto-libm-test-in: Add one test of tgamma.  Mark some other
	tests of tgamma with spurious-overflow.
	* math/auto-libm-test-out: Regenerated.
	* math/gen-libm-have-vector-test.sh: Do not check for START.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-06-29 23:29:35 +00:00
Joseph Myers
a8e2112ae3 Use round-to-nearest internally in jn, test with ALL_RM_TEST (bug 18602).
Some existing jn tests, if run in non-default rounding modes, produce
errors above those accepted in glibc, which causes problems for moving
tests of jn to use ALL_RM_TEST.  This patch makes jn set rounding
to-nearest internally, as was done for yn some time ago, then computes
the appropriate underflowing value for results that underflowed to
zero in to-nearest, and moves the tests to ALL_RM_TEST.  It does
nothing about the general inaccuracy of Bessel function
implementations in glibc, though it should make jn more accurate on
average in non-default rounding modes through reduced error
accumulation.  The recomputation of results that underflowed to zero
should as a side-effect fix some cases of bug 16559, where jn just
used an exact zero, but that is *not* the goal of this patch and other
cases of that bug remain unfixed.

(Most of the changes in the patch are reindentation to add new scopes
for SET_RESTORE_ROUND*.)

Tested for x86_64, x86, powerpc and mips64.

	[BZ #16559]
	[BZ #18602]
	* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Set
	round-to-nearest internally then recompute results that
	underflowed to zero in the original rounding mode.
	* sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise.
	* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
	* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
	* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise
	* math/libm-test.inc (jn_test): Use ALL_RM_TEST.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-06-25 21:46:02 +00:00
Joseph Myers
a67894c505 Fix cexp, ccos, ccosh, csin, csinh spurious underflows (bug 18594).
cexp, ccos, ccosh, csin and csinh have spurious underflows in cases
where they compute sin of the smallest normal, that produces an
underflow exception (depending on which sin implementation is in use)
but the final result does not underflow.  ctan and ctanh may also have
such underflows, or they may be latent (the issue there is that
e.g. ctan (DBL_MIN) should, rounded upwards, be the next double value
above DBL_MIN, which under glibc's accuracy goals may not have an
underflow exception, but the intermediate computation of sin (DBL_MIN)
would legitimately underflow on before-rounding architectures).

This patch fixes all those functions so they use plain comparisons (>
DBL_MIN etc.) instead of comparing the result of fpclassify with
FP_SUBNORMAL (in all these cases, we already know the number being
compared is finite).  Note that in the case of csin / csinf / csinl,
there is no need for fabs calls in the comparison because the real
part has already been reduced to its absolute value.

As the patch fixes the failures that previously obstructed moving
tests of cexp to use ALL_RM_TEST, those tests are moved to ALL_RM_TEST
by the patch (two functions remain yet to be converted).

Tested for x86_64 and x86 and ulps updated accordingly.

	[BZ #18594]
	* math/s_ccosh.c (__ccosh): Compare with least normal value
	instead of comparing class with FP_SUBNORMAL.
	* math/s_ccoshf.c (__ccoshf): Likewise.
	* math/s_ccoshl.c (__ccoshl): Likewise.
	* math/s_cexp.c (__cexp): Likewise.
	* math/s_cexpf.c (__cexpf): Likewise.
	* math/s_cexpl.c (__cexpl): Likewise.
	* math/s_csin.c (__csin): Likewise.
	* math/s_csinf.c (__csinf): Likewise.
	* math/s_csinh.c (__csinh): Likewise.
	* math/s_csinhf.c (__csinhf): Likewise.
	* math/s_csinhl.c (__csinhl): Likewise.
	* math/s_csinl.c (__csinl): Likewise.
	* math/s_ctan.c (__ctan): Likewise.
	* math/s_ctanf.c (__ctanf): Likewise.
	* math/s_ctanh.c (__ctanh): Likewise.
	* math/s_ctanhf.c (__ctanhf): Likewise.
	* math/s_ctanhl.c (__ctanhl): Likewise.
	* math/s_ctanl.c (__ctanl): Likewise.
	* math/auto-libm-test-in: Add more tests of ccos, ccosh, cexp,
	csin, csinh, ctan and ctanh.
	* math/auto-libm-test-out: Regenerated.
	* math/libm-test.inc (cexp_test): Use ALL_RM_TEST.
	* sysdeps/i386/fpu/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2015-06-24 21:04:51 +00:00
Joseph Myers
ac831b362a Fix csin, csinh overflow in directed rounding modes (bug 18593).
csin and csinh can produce bad results when overflowing in directed
rounding modes, because a multiplication that can overflow is followed
by a possible negation.  This patch fixes this by negating one of the
arguments of the multiplication before the multiplication instead of
negating the result.

The new tests for this issue are added to auto-libm-test-in, starting
use of that file for csin and csinh.  The issue was found in the
course of moving existing tests for csin and csinh (existing tests, by
being enabled in more cases than previously, showed the issue for
float and double but not for long double); that move will now be done
separately.

Tested for x86_64 and x86 and ulps updated accordingly.

	[BZ #18593]
	* math/s_csin.c (__csin): Negate before rather than after possibly
	overflowing multiplication.
	* math/s_csinf.c (__csinf): Likewise.
	* math/s_csinh.c (__csinh): Likewise.
	* math/s_csinhf.c (__csinhf): Likewise.
	* math/s_csinhl.c (__csinhl): Likewise.
	* math/s_csinl.c (__csinl): Likewise.
	* math/auto-libm-test-in: Add some tests of csin and csinh.
	* math/auto-libm-test-out: Regenerated.
	* math/libm-test.inc (csin_test_data): Use AUTO_TESTS_c_c.
	(csinh_test_data): Likewise.
	* sysdeps/x86_64/fpu/libm-test-ulps: Update.
2015-06-24 16:20:48 +00:00
Andrew Senkevich
a6336cc446 Vector sincosf for x86_64 and tests.
Here is implementation of vectorized sincosf containing SSE, AVX,
AVX2 and AVX512 versions according to Vector ABI
<https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.

    * NEWS: Mention addition of x86_64 vector sincosf.
    * math/test-float-vlen16.h: Added wrapper for sincosf tests.
    * math/test-float-vlen4.h: Likewise.
    * math/test-float-vlen8.h: Likewise.
    * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added.
    * sysdeps/x86/fpu/bits/math-vector.h: Added sincosf SIMD declaration.
    * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
    * sysdeps/x86_64/fpu/Versions: New versions added.
    * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
    * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines):
    Added build of SSE, AVX2 and AVX512 IFUNC versions.
    * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
    * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
    * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
    * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
    * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
    * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
    * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
    * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
    * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
    * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
    * sysdeps/x86_64/fpu/svml_s_sincosf_data.S: New file.
    * sysdeps/x86_64/fpu/svml_s_sincosf_data.h: New file.
    * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Added 3 argument wrappers.
    * sysdeps/x86_64/fpu/test-float-vlen16.c: : Vector sincosf tests.
    * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
    * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
2015-06-18 20:11:27 +03:00