Commit Graph

548 Commits

Author SHA1 Message Date
Paul Zimmermann
392b3f0971 replace tgammaf by the CORE-MATH implementation
The CORE-MATH implementation is correctly rounded (for any rounding mode).
This can be checked by exhaustive tests in a few minutes since there are
less than 2^32 values to check against for example GNU MPFR.
This patch also adds some bench values for tgammaf.

Tested on x86_64 and x86 (cfarm26).

With the initial GNU libc code it gave on an Intel(R) Core(TM) i7-8700:

      "tgammaf": {
       "": {
        "duration": 3.50188e+09,
        "iterations": 2e+07,
        "max": 602.891,
        "min": 65.1415,
        "mean": 175.094
       }
      }

With the new code:

      "tgammaf": {
       "": {
        "duration": 3.30825e+09,
        "iterations": 5e+07,
        "max": 211.592,
        "min": 32.0325,
        "mean": 66.1649
       }
      }

With the initial GNU libc code it gave on cfarm26 (i686):

  "tgammaf": {
   "": {
    "duration": 3.70505e+09,
    "iterations": 6e+06,
    "max": 2420.23,
    "min": 243.154,
    "mean": 617.509
   }
  }

With the new code:

  "tgammaf": {
   "": {
    "duration": 3.24497e+09,
    "iterations": 1.8e+07,
    "max": 1238.15,
    "min": 101.155,
    "mean": 180.276
   }
  }

Signed-off-by: Alexei Sibidanov <sibid@uvic.ca>
Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>

Changes in v2:
    - include <math.h> (fix the linknamespace failures)
    - restored original benchtests/strcoll-inputs/filelist#en_US.UTF-8 file
    - restored original wrapper code (math/w_tgammaf_compat.c),
      except for the dealing with the sign
    - removed the tgammaf/float entries in all libm-test-ulps files
    - address other comments from Joseph Myers
      (https://sourceware.org/pipermail/libc-alpha/2024-July/158736.html)

Changes in v3:
    - pass NULL argument for signgam from w_tgammaf_compat.c
    - use of math_narrow_eval
    - added more comments

Changes in v4:
    - initialize local_signgam to 0 in math/w_tgamma_template.c
    - replace sysdeps/ieee754/dbl-64/gamma_productf.c by dummy file

Changes in v5:
    - do not mention local_signgam any more in math/w_tgammaf_compat.c
    - initialize local_signgam to 1 instead of 0 in w_tgamma_template.c
      and added comment

Changes in v6:
    - pass NULL as 2nd argument of __ieee754_gammaf_r in
      w_tgammaf_compat.c, and check for NULL in e_gammaf_r.c

Changes in v7:
    - added Signed-off-by line for Alexei Sibidanov (author of the code)

Changes in v8:
    - added Signed-off-by line for Paul Zimmermann (submitted of the patch)

Changes in v9:
    - address comments from review by Adhemerval Zanella
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-10-11 11:12:32 +02:00
Szabolcs Nagy
2a9943b4a0 math: Fix exp10 undefined left shift
Left shift of ki is undefined when ki<0, copy the logic from exp,
which uses unsigned arithmetics, to fix it.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-06-04 15:33:26 +01:00
H.J. Lu
4e21cb95e2 nearbyint: Don't define alias when used in IFUNC [BZ #31759]
Fix BZ #31759 by not defining nearbyint aliases when used in IFUNC.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2024-05-20 05:21:41 -07:00
Wilco Dijkstra
08ddd26814 math: remove exp10 wrappers
Remove the error handling wrapper from exp10.  This is very similar to
the changes done to exp and exp2, except that we also need to handle
pow10 and pow10l.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2024-01-12 16:02:12 +00:00
Paul Eggert
dff8da6b3e Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
Joe Ramsay
63d0a35d5f math: Add new exp10 implementation
New implementation is based on the existing exp/exp2, with different
reduction constants and polynomial. Worst-case error in round-to-
nearest is 0.513 ULP.

The exp/exp2 shared table is reused for exp10 - .rodata size of
e_exp_data increases by 64 bytes.

As for exp/exp2, targets with single-instruction rounding/conversion
intrinsics can use them by toggling TOINT_INTRINSICS=1 and adding the
necessary code to their math_private.h.

Improvements on Neoverse V1 compared to current GLIBC master:
exp10 thruput: 3.3x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8]
exp10 latency: 1.8x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8]

Tested on:
    aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and
    x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction)

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-12-04 15:52:11 +00:00
Andreas Schwab
5aa1ddfcb3 Avoid maybe-uninitialized warning in __kernel_rem_pio2
With GCC 14 on 32-bit x86 the compiler emits a maybe-uninitialized
warning:

../sysdeps/ieee754/dbl-64/k_rem_pio2.c: In function '__kernel_rem_pio2':
../sysdeps/ieee754/dbl-64/k_rem_pio2.c:364:20: error: 'fq' may be used uninitialized [-Werror=maybe-uninitialized]
  364 |           y[0] = fq[0]; y[1] = fq[1]; y[2] = fw;
      |                  ~~^~~

This is similar to the warning that is suppressed in the other branch of
the switch.  Help the compiler knowing that the variable is always
initialized, which also makes the suppression obsolete.
2023-10-16 09:59:32 +02:00
H.J. Lu
a8ecb126d4 x86_64: Add log1p with FMA
On Skylake, it changes log1p bench performance by:

        Before       After     Improvement
max     63.349       58.347       8%
min     4.448        5.651        -30%
mean    12.0674      10.336       14%

The minimum code path is

 if (hx < 0x3FDA827A)                          /* x < 0.41422  */
    {
      if (__glibc_unlikely (ax >= 0x3ff00000))           /* x <= -1.0 */
        {
	   ...
        }
      if (__glibc_unlikely (ax < 0x3e200000))           /* |x| < 2**-29 */
        {
          math_force_eval (two54 + x);          /* raise inexact */
          if (ax < 0x3c900000)                  /* |x| < 2**-54 */
            {
	      ...
            }
          else
            return x - x * x * 0.5;

FMA and non-FMA code sequences look similar.  Non-FMA version is slightly
faster.  Since log1p is called by asinh and atanh, it improves asinh
performance by:

        Before       After     Improvement
max     75.645       63.135       16%
min     10.074       10.071       0%
mean    15.9483      14.9089      6%

and improves atanh performance by:

        Before       After     Improvement
max     91.768       75.081       18%
min     15.548       13.883       10%
mean    18.3713      16.8011      8%
2023-08-21 10:44:26 -07:00
H.J. Lu
1b214630ce x86_64: Add expm1 with FMA
On Skylake, it improves expm1 bench performance by:

        Before       After     Improvement
max     70.204       68.054       3%
min     20.709       16.2         22%
mean    22.1221      16.7367      24%

NB: Add

extern long double __expm1l (long double);
extern long double __expm1f128 (long double);

for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is
defined since __expm1 may be expanded in their declarations which
causes the build failure.
2023-08-14 08:14:19 -07:00
Joe Ramsay
aed39a3aa3 aarch64: Add vector implementations of cos routines
Replace the loop-over-scalar placeholder routines with optimised
implementations from Arm Optimized Routines (AOR).

Also add some headers containing utilities for aarch64 libmvec
routines, and update libm-test-ulps.

Data tables for new routines are used via a pointer with a
barrier on it, in order to prevent overly aggressive constant
inlining in GCC. This allows a single adrp, combined with offset
loads, to be used for every constant in the table.

Special-case handlers are marked NOINLINE in order to confine the
save/restore overhead of switching from vector to normal calling
standard. This way we only incur the extra memory access in the
exceptional cases. NOINLINE definitions have been moved to
math_private.h in order to reduce duplication.

AOR exposes a config option, WANT_SIMD_EXCEPT, to enable
selective masking (and later fixing up) of invalid lanes, in
order to trigger fp exceptions correctly (AdvSIMD only). This is
tested and maintained in AOR, however it is configured off at
source level here for performance reasons. We keep the
WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify
the upstreaming process from AOR to glibc.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2023-06-30 09:04:10 +01:00
Paul Pluzhnikov
65cc53fe7c Fix misspellings in sysdeps/ -- BZ 25337 2023-05-30 23:02:29 +00:00
Wilco Dijkstra
76d0f094dd math: Improve fmod(f) performance
Optimize the fast paths (x < y) and (x/y < 2^12).  Delay handling of special
cases to reduce the number of instructions executed before the fast paths.
Performance improvements for fmod:

		Skylake	Zen2	Neoverse V1
subnormals	11.8%	4.2%	11.5%
normal		3.9%	0.01%	-0.5%
close-exponents	6.3%	5.6%	19.4%

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2023-04-17 13:03:10 +01:00
Adhemerval Zanella Netto
16439f419b math: Remove the error handling wrapper from fmod and fmodf
The error handling is moved to sysdeps/ieee754 version with no SVID
support.  The compatibility symbol versions still use the wrapper
with SVID error handling around the new code.  There is no new symbol
version nor compatibility code on !LIBM_SVID_COMPAT targets
(e.g. riscv).

The ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.  For both i686 and m68k,
which provive arch specific implementation, wrappers are added so
no new symbol are added (which would require to change the
implementations).

It shows an small improvement, the results for fmod:

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 12.5049  | 9.40992
  x86_64 (Ryzen 9) | normal          | 296.939  | 296.738
  x86_64 (Ryzen 9) | close-exponents | 16.0244  | 13.119
  aarch64 (N1)     | subnormal       | 6.81778  | 4.33313
  aarch64 (N1)     | normal          | 155.620  | 152.915
  aarch64 (N1)     | close-exponents | 8.21306  | 5.76138
  armhf (N1)       | subnormal       | 15.1083  | 14.5746
  armhf (N1)       | normal          | 244.833  | 241.738
  armhf (N1)       | close-exponents | 21.8182  | 22.457

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:45:27 -03:00
Adhemerval Zanella Netto
34b9f8bc17 math: Improve fmod
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:

   mx * 2^ex == 2 * mx * 2^(ex - 1)

   while (ex > ey)
     {
       mx *= 2;
       --ex;
       mx %= my;
     }

With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 11 (which is sizeo of uint64_t minus
MANTISSA_WIDTH plus the signal bit):

   while (ex > ey)
     {
       mx << 11;
       ex -= 11;
       mx %= my;
     }  */

The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles.  Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.

I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  x86_64 (Ryzen 9) | subnormals      | 19.1584  | 12.5049
  x86_64 (Ryzen 9) | normal          | 1016.51  | 296.939
  x86_64 (Ryzen 9) | close-exponents | 18.4428  | 16.0244
  aarch64 (N1)     | subnormal       | 11.153   | 6.81778
  aarch64 (N1)     | normal          | 528.649  | 155.62
  aarch64 (N1)     | close-exponents | 11.4517  | 8.21306

I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, clz, ctz, and multiplication):

  Architecture     | Input           | master   | patch
  -----------------|-----------------|----------|--------
  armhf (N1)       | subnormal       | 15.908   | 15.1083
  armhf (N1)       | normal          | 837.525  | 244.833
  armhf (N1)       | close-exponents | 16.2111  | 21.8182

Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.

Co-authored-by: kirill <kirill.okhotnikov@gmail.com>

[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2023-04-03 16:36:24 -03:00
Joseph Myers
6d7e8eda9b Update copyright dates with scripts/update-copyrights 2023-01-06 21:14:39 +00:00
Xiaolin Tang
2e2485ce05 Use GCC builtins for logb functions if desired.
This patch is using the corresponding GCC builtin for logbf, logb,
logbl and logbf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins-function.h.

Co-Authored-By: Xi Ruoyao <xry111@xry111.site>
2022-11-29 16:00:28 +08:00
Xiaolin Tang
a1981ecbfd Use GCC builtins for llrint functions if desired.
This patch is using the corresponding GCC builtin for llrintf, llrint,
llrintl and llrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins-function.h.

Co-Authored-By: Xi Ruoyao <xry111@xry111.site>
2022-11-29 16:00:28 +08:00
Xiaolin Tang
2b23ab1fea Use GCC builtins for lrint functions if desired.
This patch is using the corresponding GCC builtin for lrintf, lrint,
lrintl and lrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins-function.h.

Co-Authored-By: Xi Ruoyao <xry111@xry111.site>
2022-11-29 16:00:28 +08:00
Szabolcs Nagy
7363a9a9a0 math: Fix asin and acos invalid exception with old gcc
This works around a gcc issue where it const folded inf/inf into nan,
preventing the invalid exception to be signalled.

(x-x)/(x-x) is more robust against optimizations and works for all
out of bounds values including x==nan.

The gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115
should be fixed on release branches starting from gcc-10, but it is
better to change the code in case glibc is built with older gcc.

Reviewed-by: Wilco Dijkstra  <Wilco.Dijkstra@arm.com>
2022-10-17 08:18:52 +01:00
Andreas Schwab
dc1e5eeb25 x86_64: Optimize sincos where sin/cos is optimized (bug 29193)
The compiler may substitute calls to sin or cos with calls to sincos, thus
we should have the same optimized implementations for sincos.  The
optimized implementations may produce results that differ, that also makes
sure that the sincos call aggrees with the sin and cos calls.
2022-06-01 10:29:52 +02:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Akila Welihinda
3b1402b3fc sysdeps: Simplify sin Taylor Series calculation
The macro TAYLOR_SIN adds the term `-0.5*da*a^2 + da` in hopes
of regaining some precision as a function of da. However the
comment says we add the term `-0.5*da*a^2 + 0.5*da` which is
different. This fix updates the comment to reflect the
code and also simplifies the calculation by replacing `a` with `x`
because they always have the same value.

Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu>
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-12-13 15:31:05 +01:00
Adhemerval Zanella
104d2005d5 math: Remove the error handling wrapper from hypot and hypotf
The error handling is moved to sysdeps/ieee754 version with no SVID
support.  The compatibility symbol versions still use the wrapper with
SVID error handling around the new code.  There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).

Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.

Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
2021-12-13 10:08:46 -03:00
Wilco Dijkstra
2f44eef584 math: Use fmin/fmax on hypot
It optimizes for architectures that provides fast builtins.

Checked on aarch64-linux-gnu.
2021-12-13 10:08:46 -03:00
Wilco Dijkstra
ccfa865a82 math: Improve hypot performance with FMA
Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency.  The non-fma version has 30% higher throughput and 10%
higher latency.

Max ULP error is 0.949 with fma and 0.792 without fma.

Passes GLIBC testsuite.
2021-12-13 09:02:34 -03:00
Wilco Dijkstra
6c848d7038 math: Use an improved algorithm for hypot (dbl-64)
This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:

 - Handle qNaN and sNaN.
 - Tune the 'widely varying operands' to avoid spurious underflow
   due the multiplication and fix the return value for upwards
   rounding mode.
 - Handle required underflow exception for denormal results.

The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).

The performance result are also only slight worse than current
implementation.  On x86_64 (Ryzen 5900X) with gcc 12:

Before:

  "hypot": {
   "workload-random": {
    "duration": 3.73319e+09,
    "iterations": 1.12e+08,
    "reciprocal-throughput": 22.8737,
    "latency": 43.7904,
    "max-throughput": 4.37184e+07,
    "min-throughput": 2.28361e+07
   }
  }

After:

  "hypot": {
   "workload-random": {
    "duration": 3.7597e+09,
    "iterations": 9.8e+07,
    "reciprocal-throughput": 23.7547,
    "latency": 52.9739,
    "max-throughput": 4.2097e+07,
    "min-throughput": 1.88772e+07
   }
  }

Co-Authored-By: Adhemerval Zanella  <adhemerval.zanella@linaro.org>

Checked on x86_64-linux-gnu and aarch64-linux-gnu.

[1] https://arxiv.org/pdf/1904.09481.pdf
2021-12-13 09:02:34 -03:00
Joseph Myers
b3f27d8150 Add narrowing fma functions
This patch adds the narrowing fused multiply-add functions from TS
18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal,
f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x,
f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128,
f64xfmaf128 for configurations with _Float64x and _Float128;
__f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case
(for calls to ffmal and dfmal when long double is IEEE binary128).
Corresponding tgmath.h macro support is also added.

The changes are mostly similar to those for the other narrowing
functions previously added, especially that for sqrt, so the
description of those generally applies to this patch as well.  As with
sqrt, I reused the same test inputs in auto-libm-test-in as for
non-narrowing fma rather than adding extra or separate inputs for
narrowing fma.  The tests in libm-test-narrow-fma.inc also follow
those for non-narrowing fma.

The non-narrowing fma has a known bug (bug 6801) that it does not set
errno on errors (overflow, underflow, Inf * 0, Inf - Inf).  Rather
than fixing this or having narrowing fma check for errors when
non-narrowing does not (complicating the cases when narrowing fma can
otherwise be an alias for a non-narrowing function), this patch does
not attempt to check for errors from narrowing fma and set errno; the
CHECK_NARROW_FMA macro is still present, but as a placeholder that
does nothing, and this missing errno setting is considered to be
covered by the existing bug rather than needing a separate open bug.
missing-errno annotations are duly added to many of the
auto-libm-test-in test inputs for fma.

This completes adding all the new functions from TS 18661-1 to glibc,
so will be followed by corresponding stdc-predef.h changes to define
__STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support
for TS 18661-1 will be at a similar level to that for C standard
floating-point facilities up to C11 (pragmas not implemented, but
library functions done).  (There are still further changes to be done
to implement changes to the types of fromfp functions from N2548.)

Tested as followed: natively with the full glibc testsuite for x86_64
(GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC
11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32
hard float, mips64 (all three ABIs, both hard and soft float).  The
different GCC versions are to cover the different cases in tgmath.h
and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in
glibc headers, GCC 7 has proper _Float* support, GCC 8 adds
__builtin_tgmath).
2021-09-22 21:25:31 +00:00
Joseph Myers
1356f38df5 Fix f64xdivf128, f64xmulf128 spurious underflows (bug 28358)
As described in bug 28358, the round-to-odd computations used in the
libm functions that round their results to a narrower format can yield
spurious underflow exceptions in the following circumstances: the
narrowing only narrows the precision of the type and not the exponent
range (i.e., it's narrowing _Float128 to _Float64x on x86_64, x86 or
ia64), the architecture does after-rounding tininess detection (which
applies to all those architectures), the result is inexact, tiny
before rounding but not tiny after rounding (with the chosen rounding
mode) for _Float64x (which is possible for narrowing mul, div and fma,
not for narrowing add, sub or sqrt), so the underflow exception
resulting from the toward-zero computation in _Float128 is spurious
for _Float64x.

Fixed by making ROUND_TO_ODD call feclearexcept (FE_UNDERFLOW) in the
problem cases (as indicated by an extra argument to the macro); there
is never any need to preserve underflow exceptions from this part of
the computation, because the conversion of the round-to-odd value to
the narrower type will underflow in exactly the cases in which the
function should raise that exception, but it may be more efficient to
avoid the extra manipulation of the floating-point environment when
not needed.

Tested for x86_64 and x86, and with build-many-glibcs.py.
2021-09-21 21:54:37 +00:00
Joseph Myers
4b6574a6f6 Redirect fma calls to __fma in libm
include/math.h has a mechanism to redirect internal calls to various
libm functions, that can often be inlined by the compiler, to call
non-exported __* names for those functions in the case when the calls
aren't inlined, with the redirection being disabled when
NO_MATH_REDIRECT.  Add fma to the functions to which this mechanism is
applied.

At present, libm-internal fma calls (generally to __builtin_fma*
functions) are only done when it's known the call will be inlined,
with alternative code not relying on an fma operation being used in
the caller otherwise.  This patch is in preparation for adding the TS
18661 / C2X narrowing fma functions to glibc; it will be natural for
the narrowing function implementations to call the underlying fma
functions unconditionally, with this either being inlined or resulting
in an __fma* call.  (Using two levels of round-to-odd computation like
that, in the case where there isn't an fma hardware instruction, isn't
optimal but is certainly a lot simpler for the initial implementation
than writing different narrowing fma implementations for all the
various pairs of formats.)

Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by the patch (using
<https://sourceware.org/pipermail/libc-alpha/2021-September/130991.html>
to fix installed library stripping in build-many-glibcs.py).  Also
tested for x86_64.
2021-09-15 22:57:35 +00:00
Joseph Myers
abd383584b Add narrowing square root functions
This patch adds the narrowing square root functions from TS 18661-1 /
TS 18661-3 / C2X to glibc's libm: fsqrt, fsqrtl, dsqrtl, f32sqrtf64,
f32sqrtf32x, f32xsqrtf64 for all configurations; f32sqrtf64x,
f32sqrtf128, f64sqrtf64x, f64sqrtf128, f32xsqrtf64x, f32xsqrtf128,
f64xsqrtf128 for configurations with _Float64x and _Float128;
__f32sqrtieee128 and __f64sqrtieee128 aliases in the powerpc64le case
(for calls to fsqrtl and dsqrtl when long double is IEEE binary128).
Corresponding tgmath.h macro support is also added.

The changes are mostly similar to those for the other narrowing
functions previously added, so the description of those generally
applies to this patch as well.  However, the not-actually-narrowing
cases (where the two types involved in the function have the same
floating-point format) are aliased to sqrt, sqrtl or sqrtf128 rather
than needing a separately built not-actually-narrowing function such
as was needed for add / sub / mul / div.  Thus, there is no
__nldbl_dsqrtl name for ldbl-opt because no such name was needed
(whereas the other functions needed such a name since the only other
name for that entry point was e.g. f32xaddf64, not reserved by TS
18661-1); the headers are made to arrange for sqrt to be called in
that case instead.

The DIAG_* calls in sysdeps/ieee754/soft-fp/s_dsqrtl.c are because
they were observed to be needed in GCC 7 testing of
riscv32-linux-gnu-rv32imac-ilp32.  The other sysdeps/ieee754/soft-fp/
files added didn't need such DIAG_* in any configuration I tested with
build-many-glibcs.py, but if they do turn out to be needed in more
files with some other configuration / GCC version, they can always be
added there.

I reused the same test inputs in auto-libm-test-in as for
non-narrowing sqrt rather than adding extra or separate inputs for
narrowing sqrt.  The tests in libm-test-narrow-sqrt.inc also follow
those for non-narrowing sqrt.

Tested as followed: natively with the full glibc testsuite for x86_64
(GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC
11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32
hard float, mips64 (all three ABIs, both hard and soft float).  The
different GCC versions are to cover the different cases in tgmath.h
and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in
glibc headers, GCC 7 has proper _Float* support, GCC 8 adds
__builtin_tgmath).
2021-09-10 20:56:22 +00:00
Siddhesh Poyarekar
30891f35fa Remove "Contributed by" lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date.  Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.

Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions.  These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.

The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively.  These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:

https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc
https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-09-03 22:06:44 +05:30
Shen-Ta Hsieh
eb9066203f Use GCC builtins for roundeven functions if desired.
This patch is using the corresponding GCC builtin for roundevenf,
roundeven and roundevenl if the USE_FUNCTION_BUILTIN macros are defined
to one in math-use-builtins.h.

These builtin functions is supported since GCC 10.

The code of the generic implementation is not changed.

Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-27 07:56:57 -07:00
Shen-Ta Hsieh
447954a206 math: redirect roundeven function
This patch redirect roundeven function for futhermore changes.

Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-27 07:56:57 -07:00
Paul Zimmermann
43576de04a Improve the accuracy of tgamma (BZ #26983)
With this patch, the maximal known error for tgamma is now reduced to 9 ulps
for dbl-64, for all rounding modes. Since exhaustive testing is not possible
for dbl-64, it might be that there are still cases with an error larger than
9 ulps, but all known cases are fixed (intensive tests were done to find cases
with large errors).

Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm,
s390x, sparc, and i686).
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-07 13:23:39 +02:00
Wilco Dijkstra
92cfc9ad82 math: Remove mpa files (part 2) [BZ #15267]
Previous commit was missing deleted files in sysdeps/ieee754/dbl-64.

Finally remove all mpa related files, headers, declarations, probes, unused
tables and update makefiles.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 15:45:19 +00:00
Wilco Dijkstra
47ad14d789 math: Remove mpa files [BZ #15267]
Finally remove all mpa related files, headers, declarations, probes, unused
tables and update makefiles.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Wilco Dijkstra
4e1a870b9a math: Remove slow paths from atan2 [BZ #15267]
Remove slow paths from atan2. Add ULP annotations.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Wilco Dijkstra
e898cd1593 math: Remove slow paths from atan [BZ #15267]
Remove slow paths from atan. Add ULP annotations.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Wilco Dijkstra
476d692e8a math: Remove slow paths in tan [BZ #15267]
Remove slow paths in tan. Add ULP annotations. Merge 'number' into 'mynumber'.
Remove unused entries from tan constants.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Wilco Dijkstra
db3f7bb558 math: Remove slow paths from asin and acos [BZ #15267]
This patch series removes all remaining slow paths and related code.
First asin/acos, tan, atan, atan2 implementations are updated, and the final
patch removes the unused mpa files, headers and probes. Passes buildmanyglibc.

Remove slow paths from asin/acos. Add ULP annotations based on previous slow
path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps.

Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
2021-03-11 14:26:36 +00:00
Adhemerval Zanella
bf7db6d369 math: Add BZ#18980 fix back on dbl-64 cosh
It is regression from 9e97f239ea (Remove dbl-64/wordsize-64
(part 2)) where is missed to add the BZ#18980 fix (9e97f239ea).

Checked on i686-linux-gnu.
2021-01-11 16:56:33 -03:00
Wilco Dijkstra
9e97f239ea Remove dbl-64/wordsize-64 (part 2)
Remove the wordsize-64 implementations by merging them into the main dbl-64
directory.  The second patch just moves all wordsize-64 files and removes a
few wordsize-64 uses in comments and Implies files.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-07 15:26:26 +00:00
Wilco Dijkstra
caa884dda7 Remove dbl-64/wordsize-64
Remove the wordsize-64 implementations by merging them into the main dbl-64
directory.  The first patch adds special cases needed for 32-bit targets
(FIX_INT_FP_CONVERT_ZERO and FIX_DBL_LONG_CONVERT_OVERFLOW) to the
wordsize-64 versions.  This has no effect on 64-bit targets since they don't
define these macros.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-07 15:02:51 +00:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Anssi Hannula
69a7ca7705 ieee754: Remove unused __sin32 and __cos32
The __sin32 and __cos32 functions were only used in the now removed slow
path of asin and acos.
2020-12-18 12:10:31 +05:30
Anssi Hannula
f67f9c9af2 ieee754: Remove slow paths from asin and acos
asin and acos have slow paths for rounding the last bit that cause some
calls to be 500-1500x slower than average calls.

These slow paths are rare, a test of a trillion (1.000.000.000.000)
random inputs between -1 and 1 showed 32870 slow calls for acos and 4473
for asin, with most occurrences between -1.0 .. -0.9 and 0.9 .. 1.0.

The slow paths claim correct rounding and use __sin32() and __cos32()
(which compare two result candidates and return the closest one) as the
final step, with the second result candidate (res1) having a small offset
applied from res. This suggests that res and res1 are intended to be 1
ULP apart (which makes sense for rounding), barring bugs, allowing us to
pick either one and still remain within 1 ULP of the exact result.

Remove the slow paths as the accuracy is better than 1 ULP even without
them, which is enough for glibc.

Also remove code comments claiming correctly rounded results.

After slow path removal, checking the accuracy of 14.400.000.000 random
asin() and acos() inputs showed only three incorrectly rounded
(error > 0.5 ULP) results:
- asin(-0x1.ee2b43286db75p-1) (0.500002 ULP, same as before)
- asin(-0x1.f692ba202abcp-4)  (0.500003 ULP, same as before)
- asin(-0x1.9915e876fc062p-1) (0.50000000001 ULP, previously exact)
The first two had the same error even before this commit, and they did
not use the slow path at all.

Checking 4934 known randomly found previously-slow-path asin inputs
shows 25 calls with incorrectly rounded results, with a maximum error of
0.500000002 ULP (for 0x1.fcd5742999ab8p-1). The previous slow-path code
rounded all these inputs correctly (error < 0.5 ULP).
The observed average speed increase was 130x.

Checking 36240 known randomly found previously-slow-path acos inputs
shows 42 calls with incorrectly rounded results, with a maximum error of
0.500000008 ULP (for 0x1.f63845056f35ep-1). The previous "exact"
slow-path code showed 34 calls with incorrectly rounded results, with the
same maximum error of 0.500000008 ULP (for 0x1.f63845056f35ep-1).
The observed average speed increase was 130x.

The functions could likely be trimmed more while keeping acceptable
accuracy, but this at least gets rid of the egregiously slow cases.

Tested on x86_64.
2020-12-18 12:09:23 +05:30
Joseph Myers
6c010c5dde Use C2x return value from getpayload of non-NaN (bug 26073).
In TS 18661-1, getpayload had an unspecified return value for a
non-NaN argument, while C2x requires the return value -1 in that case.

This patch implements the return value of -1.  I don't think this is
worth having a new symbol version that's an alias of the old one,
although occasionally we do that in such cases where the new function
semantics are a refinement of the old ones (to avoid programs relying
on the new semantics running on older glibc versions but not behaving
as intended).

Tested for x86_64 and x86; also ran math/ tests for aarch64 and
powerpc.
2020-07-06 16:18:02 +00:00
Vineet Gupta
e93c264336 ieee754/dbl-64: Reduce the scope of temporary storage variables
This came to light when adding hard-flaot support to ARC glibc port
without hardware sqrt support causing glibc build to fail:

| ../sysdeps/ieee754/dbl-64/e_sqrt.c: In function '__ieee754_sqrt':
| ../sysdeps/ieee754/dbl-64/e_sqrt.c:58:54: error: unused variable 'ty' [-Werror=unused-variable]
|   double y, t, del, res, res1, hy, z, zz, p, hx, tx, ty, s;

The reason being EMULV() macro uses the hardware provided
__builtin_fma() variant, leaving temporary variables 'p, hx, tx, hy, ty'
unused hence compiler warning and ensuing error.

The intent of the patch was to fix that error, but EMULV is pervasive
and used fair bit indirectly via othe rmacros, hence this patch.
Functionally it should not result in code gen changes and if at all
those would be better since the scope of those temporaries is greatly
reduced now

Built tested with aarch64-linux-gnu arm-linux-gnueabi arm-linux-gnueabihf hppa-linux-gnu x86_64-linux-gnu arm-linux-gnueabihf riscv64-linux-gnu-rv64imac-lp64 riscv64-linux-gnu-rv64imafdc-lp64 powerpc-linux-gnu microblaze-linux-gnu nios2-linux-gnu hppa-linux-gnu

Also as suggested by Joseph [1] used --strip and compared the libs with
and w/o patch and they are byte-for-byte unchanged (with gcc 9).

| for i in `find . -name libm-2.31.9000.so`;
| do
|    echo $i; diff $i /SCRATCH/vgupta/gnu2/install/glibcs/$i ; echo $?;
| done

| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabi/lib/libm-2.31.9000.so
| 0
| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./powerpc-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./microblaze-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./nios2-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./hppa-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./s390x-linux-gnu/lib64/libm-2.31.9000.so

[1] https://sourceware.org/pipermail/libc-alpha/2019-November/108267.html
2020-06-15 13:09:21 -07:00
Vineet Gupta
628d90c5f9 ieee754: provide gcc builtins based generic fma functions
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-06-03 10:23:28 -07:00
Vineet Gupta
3374868668 ieee754: provide gcc builtins based generic sqrt functions
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-06-03 10:23:22 -07:00