2011-09-08 02:17:33 +00:00
|
|
|
/* Compute x * y + z as ternary operation.
|
2024-01-01 18:12:26 +00:00
|
|
|
Copyright (C) 2011-2024 Free Software Foundation, Inc.
|
2011-09-08 02:17:33 +00:00
|
|
|
This file is part of the GNU C Library.
|
|
|
|
|
|
|
|
The GNU C Library is free software; you can redistribute it and/or
|
|
|
|
modify it under the terms of the GNU Lesser General Public
|
|
|
|
License as published by the Free Software Foundation; either
|
|
|
|
version 2.1 of the License, or (at your option) any later version.
|
|
|
|
|
|
|
|
The GNU C Library is distributed in the hope that it will be useful,
|
|
|
|
but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
Lesser General Public License for more details.
|
|
|
|
|
|
|
|
You should have received a copy of the GNU Lesser General Public
|
2012-02-09 23:18:22 +00:00
|
|
|
License along with the GNU C Library; if not, see
|
Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:
sed -ri '
s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
$(find $(git ls-files) -prune -type f \
! -name '*.po' \
! -name 'ChangeLog*' \
! -path COPYING ! -path COPYING.LIB \
! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
! -path manual/texinfo.tex ! -path scripts/config.guess \
! -path scripts/config.sub ! -path scripts/install-sh \
! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
! -path INSTALL ! -path locale/programs/charmap-kw.h \
! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
! '(' -name configure \
-execdir test -f configure.ac -o -f configure.in ';' ')' \
! '(' -name preconfigure \
-execdir test -f preconfigure.ac ';' ')' \
-print)
and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:
chmod a+x sysdeps/unix/sysv/linux/riscv/configure
# Omit irrelevant whitespace and comment-only changes,
# perhaps from a slightly-different Autoconf version.
git checkout -f \
sysdeps/csky/configure \
sysdeps/hppa/configure \
sysdeps/riscv/configure \
sysdeps/unix/sysv/linux/csky/configure
# Omit changes that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
git checkout -f \
sysdeps/powerpc/powerpc64/ppc-mcount.S \
sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
# Omit change that caused a pre-commit check to fail like this:
# remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 05:40:42 +00:00
|
|
|
<https://www.gnu.org/licenses/>. */
|
2011-09-08 02:17:33 +00:00
|
|
|
|
2021-09-15 22:57:35 +00:00
|
|
|
#define NO_MATH_REDIRECT
|
Implement proper fmal for ldbl-128ibm (bug 13304).
ldbl-128ibm had an implementation of fmal that just did (x * y) + z in
most cases, with no attempt at actually being a fused operation.
This patch replaces it with a genuine fused operation. It is not
necessarily correctly rounding, but should produce a result at least
as accurate as the long double arithmetic operations in libgcc, which
I think is all that can reasonably be expected for such a non-IEEE
format where arithmetic is approximate rather than rounded according
to any particular rule for determining the exact result. Like the
libgcc arithmetic, it may produce spurious overflow and underflow
results, and it falls back to the libgcc multiplication in the case of
(finite, finite, zero).
This concludes the fixes for bug 13304; any subsequently found fma
issues should go in separate Bugzilla bugs. Various other pieces of
bug 13304 were fixed in past releases over the past several years.
Tested for powerpc.
[BZ #13304]
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Include <fenv.h>,
<float.h>, <math_private.h> and <stdlib.h>.
(add_split): New function.
(mul_split): Likewise.
(ext_val): New typedef.
(store_ext_val): New function.
(mul_ext_val): New function.
(compare): New function.
(add_split_ext): New function.
(__fmal): After checking for Inf, NaN and zero, compute result as
an exact sum of scaled double values in round-to-nearest before
adding those up and adjusting for other rounding modes.
* math/auto-libm-test-in: Remove xfail-rounding:ldbl-128ibm from
tests of fma.
* math/auto-libm-test-out: Regenerated.
2016-05-19 20:10:56 +00:00
|
|
|
#include <fenv.h>
|
|
|
|
#include <float.h>
|
2011-09-08 02:17:33 +00:00
|
|
|
#include <math.h>
|
2018-05-11 15:11:38 +00:00
|
|
|
#include <math-barriers.h>
|
Implement proper fmal for ldbl-128ibm (bug 13304).
ldbl-128ibm had an implementation of fmal that just did (x * y) + z in
most cases, with no attempt at actually being a fused operation.
This patch replaces it with a genuine fused operation. It is not
necessarily correctly rounding, but should produce a result at least
as accurate as the long double arithmetic operations in libgcc, which
I think is all that can reasonably be expected for such a non-IEEE
format where arithmetic is approximate rather than rounded according
to any particular rule for determining the exact result. Like the
libgcc arithmetic, it may produce spurious overflow and underflow
results, and it falls back to the libgcc multiplication in the case of
(finite, finite, zero).
This concludes the fixes for bug 13304; any subsequently found fma
issues should go in separate Bugzilla bugs. Various other pieces of
bug 13304 were fixed in past releases over the past several years.
Tested for powerpc.
[BZ #13304]
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Include <fenv.h>,
<float.h>, <math_private.h> and <stdlib.h>.
(add_split): New function.
(mul_split): Likewise.
(ext_val): New typedef.
(store_ext_val): New function.
(mul_ext_val): New function.
(compare): New function.
(add_split_ext): New function.
(__fmal): After checking for Inf, NaN and zero, compute result as
an exact sum of scaled double values in round-to-nearest before
adding those up and adjusting for other rounding modes.
* math/auto-libm-test-in: Remove xfail-rounding:ldbl-128ibm from
tests of fma.
* math/auto-libm-test-out: Regenerated.
2016-05-19 20:10:56 +00:00
|
|
|
#include <math_private.h>
|
Do not include fenv_private.h in math_private.h.
Continuing the clean-up related to the catch-all math_private.h
header, this patch stops math_private.h from including fenv_private.h.
Instead, fenv_private.h is included directly from those users of
math_private.h that also used interfaces from fenv_private.h. No
attempt is made to remove unused includes of math_private.h, but that
is a natural followup.
(However, since math_private.h sometimes defines optimized versions of
math.h interfaces or __* variants thereof, as well as defining its own
interfaces, I think it might make sense to get all those optimized
versions included from include/math.h, not requiring a separate header
at all, before eliminating unused math_private.h includes - that
avoids a file quietly becoming less-optimized if someone adds a call
to one of those interfaces without restoring a math_private.h include
to that file.)
There is still a pitfall that if code uses plain fe* and __fe*
interfaces, but only includes fenv.h and not fenv_private.h or (before
this patch) math_private.h, it will compile on platforms with
exceptions and rounding modes but not get the optimized versions (and
possibly not compile) on platforms without exception and rounding mode
support, so making it easy to break the build for such platforms
accidentally.
I think it would be most natural to move the inlines / macros for fe*
and __fe* in the case of no exceptions and rounding modes into
include/fenv.h, so that all code including fenv.h with _ISOMAC not
defined automatically gets them. Then fenv_private.h would be purely
the header for the libc_fe*, SET_RESTORE_ROUND etc. internal
interfaces and the risk of breaking the build on other platforms than
the one you tested on because of a missing fenv_private.h include
would be much reduced (and there would be some unused fenv_private.h
includes to remove along with unused math_private.h includes).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* sysdeps/generic/math_private.h: Do not include <fenv_private.h>.
* math/fromfp.h: Include <fenv_private.h>.
* math/math-narrow.h: Likewise.
* math/s_cexp_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/w_acos_compat.c: Likewise.
* math/w_acosf_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_ilogb_template.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_llogb_template.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* sysdeps/aarch64/fpu/feholdexcpt.c: Likewise.
* sysdeps/aarch64/fpu/fesetround.c: Likewise.
* sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise.
* sysdeps/aarch64/fpu/ftestexcept.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atan2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_remainder.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise.
* sysdeps/ieee754/dbl-64/gamma_product.c: Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/x2y2m1.c: Likewise.
* sysdeps/ieee754/float128/float128_private.h: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-96/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrtf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
* sysdeps/riscv/rvd/s_finite.c: Likewise.
* sysdeps/riscv/rvd/s_fmax.c: Likewise.
* sysdeps/riscv/rvd/s_fmin.c: Likewise.
* sysdeps/riscv/rvd/s_fpclassify.c: Likewise.
* sysdeps/riscv/rvd/s_isinf.c: Likewise.
* sysdeps/riscv/rvd/s_isnan.c: Likewise.
* sysdeps/riscv/rvd/s_issignaling.c: Likewise.
* sysdeps/riscv/rvf/fegetround.c: Likewise.
* sysdeps/riscv/rvf/feholdexcpt.c: Likewise.
* sysdeps/riscv/rvf/fesetenv.c: Likewise.
* sysdeps/riscv/rvf/fesetround.c: Likewise.
* sysdeps/riscv/rvf/feupdateenv.c: Likewise.
* sysdeps/riscv/rvf/fgetexcptflg.c: Likewise.
* sysdeps/riscv/rvf/ftestexcept.c: Likewise.
* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
* sysdeps/riscv/rvf/s_finitef.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/riscv/rvf/s_fmaxf.c: Likewise.
* sysdeps/riscv/rvf/s_fminf.c: Likewise.
* sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise.
* sysdeps/riscv/rvf/s_isinff.c: Likewise.
* sysdeps/riscv/rvf/s_isnanf.c: Likewise.
* sysdeps/riscv/rvf/s_issignalingf.c: Likewise.
* sysdeps/riscv/rvf/s_nearbyintf.c: Likewise.
* sysdeps/riscv/rvf/s_roundevenf.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/riscv/rvf/s_truncf.c: Likewise.
2018-09-03 21:09:04 +00:00
|
|
|
#include <fenv_private.h>
|
2018-05-10 00:53:04 +00:00
|
|
|
#include <math-underflow.h>
|
2011-09-08 02:17:33 +00:00
|
|
|
#include <math_ldbl_opt.h>
|
2016-08-08 20:58:28 +00:00
|
|
|
#include <mul_split.h>
|
Implement proper fmal for ldbl-128ibm (bug 13304).
ldbl-128ibm had an implementation of fmal that just did (x * y) + z in
most cases, with no attempt at actually being a fused operation.
This patch replaces it with a genuine fused operation. It is not
necessarily correctly rounding, but should produce a result at least
as accurate as the long double arithmetic operations in libgcc, which
I think is all that can reasonably be expected for such a non-IEEE
format where arithmetic is approximate rather than rounded according
to any particular rule for determining the exact result. Like the
libgcc arithmetic, it may produce spurious overflow and underflow
results, and it falls back to the libgcc multiplication in the case of
(finite, finite, zero).
This concludes the fixes for bug 13304; any subsequently found fma
issues should go in separate Bugzilla bugs. Various other pieces of
bug 13304 were fixed in past releases over the past several years.
Tested for powerpc.
[BZ #13304]
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Include <fenv.h>,
<float.h>, <math_private.h> and <stdlib.h>.
(add_split): New function.
(mul_split): Likewise.
(ext_val): New typedef.
(store_ext_val): New function.
(mul_ext_val): New function.
(compare): New function.
(add_split_ext): New function.
(__fmal): After checking for Inf, NaN and zero, compute result as
an exact sum of scaled double values in round-to-nearest before
adding those up and adjusting for other rounding modes.
* math/auto-libm-test-in: Remove xfail-rounding:ldbl-128ibm from
tests of fma.
* math/auto-libm-test-out: Regenerated.
2016-05-19 20:10:56 +00:00
|
|
|
#include <stdlib.h>
|
|
|
|
|
|
|
|
/* Calculate X + Y exactly and store the result in *HI + *LO. It is
|
|
|
|
given that |X| >= |Y| and the values are small enough that no
|
|
|
|
overflow occurs. */
|
|
|
|
|
|
|
|
static void
|
|
|
|
add_split (double *hi, double *lo, double x, double y)
|
|
|
|
{
|
|
|
|
/* Apply Dekker's algorithm. */
|
|
|
|
*hi = x + y;
|
|
|
|
*lo = (x - *hi) + y;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Value with extended range, used in intermediate computations. */
|
|
|
|
typedef struct
|
|
|
|
{
|
|
|
|
/* Value in [0.5, 1), as from frexp, or 0. */
|
|
|
|
double val;
|
|
|
|
/* Exponent of power of 2 it is multiplied by, or 0 for zero. */
|
|
|
|
int exp;
|
|
|
|
} ext_val;
|
|
|
|
|
|
|
|
/* Store D as an ext_val value. */
|
|
|
|
|
|
|
|
static void
|
|
|
|
store_ext_val (ext_val *v, double d)
|
|
|
|
{
|
|
|
|
v->val = __frexp (d, &v->exp);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Store X * Y as ext_val values *V0 and *V1. */
|
|
|
|
|
|
|
|
static void
|
|
|
|
mul_ext_val (ext_val *v0, ext_val *v1, double x, double y)
|
|
|
|
{
|
|
|
|
int xexp, yexp;
|
|
|
|
x = __frexp (x, &xexp);
|
|
|
|
y = __frexp (y, &yexp);
|
|
|
|
double hi, lo;
|
|
|
|
mul_split (&hi, &lo, x, y);
|
|
|
|
store_ext_val (v0, hi);
|
|
|
|
if (hi != 0)
|
|
|
|
v0->exp += xexp + yexp;
|
|
|
|
store_ext_val (v1, lo);
|
|
|
|
if (lo != 0)
|
|
|
|
v1->exp += xexp + yexp;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Compare absolute values of ext_val values pointed to by P and Q for
|
|
|
|
qsort. */
|
|
|
|
|
|
|
|
static int
|
|
|
|
compare (const void *p, const void *q)
|
|
|
|
{
|
|
|
|
const ext_val *pe = p;
|
|
|
|
const ext_val *qe = q;
|
|
|
|
if (pe->val == 0)
|
|
|
|
return qe->val == 0 ? 0 : -1;
|
|
|
|
else if (qe->val == 0)
|
|
|
|
return 1;
|
|
|
|
else if (pe->exp < qe->exp)
|
|
|
|
return -1;
|
|
|
|
else if (pe->exp > qe->exp)
|
|
|
|
return 1;
|
|
|
|
else
|
|
|
|
{
|
|
|
|
double pd = fabs (pe->val);
|
|
|
|
double qd = fabs (qe->val);
|
|
|
|
if (pd < qd)
|
|
|
|
return -1;
|
|
|
|
else if (pd == qd)
|
|
|
|
return 0;
|
|
|
|
else
|
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Calculate *X + *Y exactly, storing the high part in *X (rounded to
|
|
|
|
nearest) and the low part in *Y. It is given that |X| >= |Y|. */
|
|
|
|
|
|
|
|
static void
|
|
|
|
add_split_ext (ext_val *x, ext_val *y)
|
|
|
|
{
|
|
|
|
int xexp = x->exp, yexp = y->exp;
|
|
|
|
if (y->val == 0 || xexp - yexp > 53)
|
|
|
|
return;
|
|
|
|
double hi = x->val;
|
|
|
|
double lo = __scalbn (y->val, yexp - xexp);
|
|
|
|
add_split (&hi, &lo, hi, lo);
|
|
|
|
store_ext_val (x, hi);
|
|
|
|
if (hi != 0)
|
|
|
|
x->exp += xexp;
|
|
|
|
store_ext_val (y, lo);
|
|
|
|
if (lo != 0)
|
|
|
|
y->exp += xexp;
|
|
|
|
}
|
2011-09-08 02:17:33 +00:00
|
|
|
|
|
|
|
long double
|
|
|
|
__fmal (long double x, long double y, long double z)
|
|
|
|
{
|
Implement proper fmal for ldbl-128ibm (bug 13304).
ldbl-128ibm had an implementation of fmal that just did (x * y) + z in
most cases, with no attempt at actually being a fused operation.
This patch replaces it with a genuine fused operation. It is not
necessarily correctly rounding, but should produce a result at least
as accurate as the long double arithmetic operations in libgcc, which
I think is all that can reasonably be expected for such a non-IEEE
format where arithmetic is approximate rather than rounded according
to any particular rule for determining the exact result. Like the
libgcc arithmetic, it may produce spurious overflow and underflow
results, and it falls back to the libgcc multiplication in the case of
(finite, finite, zero).
This concludes the fixes for bug 13304; any subsequently found fma
issues should go in separate Bugzilla bugs. Various other pieces of
bug 13304 were fixed in past releases over the past several years.
Tested for powerpc.
[BZ #13304]
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Include <fenv.h>,
<float.h>, <math_private.h> and <stdlib.h>.
(add_split): New function.
(mul_split): Likewise.
(ext_val): New typedef.
(store_ext_val): New function.
(mul_ext_val): New function.
(compare): New function.
(add_split_ext): New function.
(__fmal): After checking for Inf, NaN and zero, compute result as
an exact sum of scaled double values in round-to-nearest before
adding those up and adjusting for other rounding modes.
* math/auto-libm-test-in: Remove xfail-rounding:ldbl-128ibm from
tests of fma.
* math/auto-libm-test-out: Regenerated.
2016-05-19 20:10:56 +00:00
|
|
|
double xhi, xlo, yhi, ylo, zhi, zlo;
|
|
|
|
int64_t hx, hy, hz;
|
|
|
|
int xexp, yexp, zexp;
|
|
|
|
double scale_val;
|
|
|
|
int scale_exp;
|
|
|
|
ldbl_unpack (x, &xhi, &xlo);
|
|
|
|
EXTRACT_WORDS64 (hx, xhi);
|
|
|
|
xexp = (hx & 0x7ff0000000000000LL) >> 52;
|
|
|
|
ldbl_unpack (y, &yhi, &ylo);
|
|
|
|
EXTRACT_WORDS64 (hy, yhi);
|
|
|
|
yexp = (hy & 0x7ff0000000000000LL) >> 52;
|
|
|
|
ldbl_unpack (z, &zhi, &zlo);
|
|
|
|
EXTRACT_WORDS64 (hz, zhi);
|
|
|
|
zexp = (hz & 0x7ff0000000000000LL) >> 52;
|
|
|
|
|
|
|
|
/* If z is Inf or NaN, but x and y are finite, avoid any exceptions
|
|
|
|
from computing x * y. */
|
|
|
|
if (zexp == 0x7ff && xexp != 0x7ff && yexp != 0x7ff)
|
|
|
|
return (z + x) + y;
|
|
|
|
|
|
|
|
/* If z is zero and x are y are nonzero, compute the result as x * y
|
|
|
|
to avoid the wrong sign of a zero result if x * y underflows to
|
|
|
|
0. */
|
|
|
|
if (z == 0 && x != 0 && y != 0)
|
|
|
|
return x * y;
|
|
|
|
|
|
|
|
/* If x or y or z is Inf/NaN, or if x * y is zero, compute as x * y
|
|
|
|
+ z. */
|
|
|
|
if (xexp == 0x7ff || yexp == 0x7ff || zexp == 0x7ff
|
|
|
|
|| x == 0 || y == 0)
|
|
|
|
return (x * y) + z;
|
|
|
|
|
|
|
|
{
|
|
|
|
SET_RESTORE_ROUND (FE_TONEAREST);
|
|
|
|
|
|
|
|
ext_val vals[10];
|
|
|
|
store_ext_val (&vals[0], zhi);
|
|
|
|
store_ext_val (&vals[1], zlo);
|
|
|
|
mul_ext_val (&vals[2], &vals[3], xhi, yhi);
|
|
|
|
mul_ext_val (&vals[4], &vals[5], xhi, ylo);
|
|
|
|
mul_ext_val (&vals[6], &vals[7], xlo, yhi);
|
|
|
|
mul_ext_val (&vals[8], &vals[9], xlo, ylo);
|
|
|
|
qsort (vals, 10, sizeof (ext_val), compare);
|
|
|
|
/* Add up the values so that each element of VALS has absolute
|
|
|
|
value at most equal to the last set bit of the next nonzero
|
|
|
|
element. */
|
|
|
|
for (size_t i = 0; i <= 8; i++)
|
|
|
|
{
|
|
|
|
add_split_ext (&vals[i + 1], &vals[i]);
|
|
|
|
qsort (vals + i + 1, 9 - i, sizeof (ext_val), compare);
|
|
|
|
}
|
|
|
|
/* Add up the values in the other direction, so that each element
|
|
|
|
of VALS has absolute value less than 5ulp of the next
|
|
|
|
value. */
|
|
|
|
size_t dstpos = 9;
|
|
|
|
for (size_t i = 1; i <= 9; i++)
|
|
|
|
{
|
|
|
|
if (vals[dstpos].val == 0)
|
|
|
|
{
|
|
|
|
vals[dstpos] = vals[9 - i];
|
|
|
|
vals[9 - i].val = 0;
|
|
|
|
vals[9 - i].exp = 0;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
add_split_ext (&vals[dstpos], &vals[9 - i]);
|
|
|
|
if (vals[9 - i].val != 0)
|
|
|
|
{
|
|
|
|
if (9 - i < dstpos - 1)
|
|
|
|
{
|
|
|
|
vals[dstpos - 1] = vals[9 - i];
|
|
|
|
vals[9 - i].val = 0;
|
|
|
|
vals[9 - i].exp = 0;
|
|
|
|
}
|
|
|
|
dstpos--;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
/* If the result is an exact zero, it results from adding two
|
|
|
|
values with opposite signs; recompute in the original rounding
|
|
|
|
mode. */
|
|
|
|
if (vals[9].val == 0)
|
|
|
|
goto zero_out;
|
|
|
|
/* Adding the top three values will now give a result as accurate
|
|
|
|
as the underlying long double arithmetic. */
|
|
|
|
add_split_ext (&vals[9], &vals[8]);
|
|
|
|
if (compare (&vals[8], &vals[7]) < 0)
|
|
|
|
{
|
|
|
|
ext_val tmp = vals[7];
|
|
|
|
vals[7] = vals[8];
|
|
|
|
vals[8] = tmp;
|
|
|
|
}
|
|
|
|
add_split_ext (&vals[8], &vals[7]);
|
|
|
|
add_split_ext (&vals[9], &vals[8]);
|
|
|
|
if (vals[9].exp > DBL_MAX_EXP || vals[9].exp < DBL_MIN_EXP)
|
|
|
|
{
|
|
|
|
/* Overflow or underflow, with the result depending on the
|
|
|
|
original rounding mode, but not on the low part computed
|
|
|
|
here. */
|
|
|
|
scale_val = vals[9].val;
|
|
|
|
scale_exp = vals[9].exp;
|
|
|
|
goto scale_out;
|
|
|
|
}
|
|
|
|
double hi = __scalbn (vals[9].val, vals[9].exp);
|
|
|
|
double lo = __scalbn (vals[8].val, vals[8].exp);
|
|
|
|
/* It is possible that the low part became subnormal and was
|
|
|
|
rounded so that the result is no longer canonical. */
|
|
|
|
ldbl_canonicalize (&hi, &lo);
|
|
|
|
long double ret = ldbl_pack (hi, lo);
|
|
|
|
math_check_force_underflow (ret);
|
|
|
|
return ret;
|
|
|
|
}
|
2011-09-08 02:17:33 +00:00
|
|
|
|
Implement proper fmal for ldbl-128ibm (bug 13304).
ldbl-128ibm had an implementation of fmal that just did (x * y) + z in
most cases, with no attempt at actually being a fused operation.
This patch replaces it with a genuine fused operation. It is not
necessarily correctly rounding, but should produce a result at least
as accurate as the long double arithmetic operations in libgcc, which
I think is all that can reasonably be expected for such a non-IEEE
format where arithmetic is approximate rather than rounded according
to any particular rule for determining the exact result. Like the
libgcc arithmetic, it may produce spurious overflow and underflow
results, and it falls back to the libgcc multiplication in the case of
(finite, finite, zero).
This concludes the fixes for bug 13304; any subsequently found fma
issues should go in separate Bugzilla bugs. Various other pieces of
bug 13304 were fixed in past releases over the past several years.
Tested for powerpc.
[BZ #13304]
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Include <fenv.h>,
<float.h>, <math_private.h> and <stdlib.h>.
(add_split): New function.
(mul_split): Likewise.
(ext_val): New typedef.
(store_ext_val): New function.
(mul_ext_val): New function.
(compare): New function.
(add_split_ext): New function.
(__fmal): After checking for Inf, NaN and zero, compute result as
an exact sum of scaled double values in round-to-nearest before
adding those up and adjusting for other rounding modes.
* math/auto-libm-test-in: Remove xfail-rounding:ldbl-128ibm from
tests of fma.
* math/auto-libm-test-out: Regenerated.
2016-05-19 20:10:56 +00:00
|
|
|
scale_out:
|
|
|
|
scale_val = math_opt_barrier (scale_val);
|
|
|
|
scale_val = __scalbn (scale_val, scale_exp);
|
|
|
|
if (fabs (scale_val) == DBL_MAX)
|
2018-09-27 20:04:48 +00:00
|
|
|
return copysignl (LDBL_MAX, scale_val);
|
Implement proper fmal for ldbl-128ibm (bug 13304).
ldbl-128ibm had an implementation of fmal that just did (x * y) + z in
most cases, with no attempt at actually being a fused operation.
This patch replaces it with a genuine fused operation. It is not
necessarily correctly rounding, but should produce a result at least
as accurate as the long double arithmetic operations in libgcc, which
I think is all that can reasonably be expected for such a non-IEEE
format where arithmetic is approximate rather than rounded according
to any particular rule for determining the exact result. Like the
libgcc arithmetic, it may produce spurious overflow and underflow
results, and it falls back to the libgcc multiplication in the case of
(finite, finite, zero).
This concludes the fixes for bug 13304; any subsequently found fma
issues should go in separate Bugzilla bugs. Various other pieces of
bug 13304 were fixed in past releases over the past several years.
Tested for powerpc.
[BZ #13304]
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Include <fenv.h>,
<float.h>, <math_private.h> and <stdlib.h>.
(add_split): New function.
(mul_split): Likewise.
(ext_val): New typedef.
(store_ext_val): New function.
(mul_ext_val): New function.
(compare): New function.
(add_split_ext): New function.
(__fmal): After checking for Inf, NaN and zero, compute result as
an exact sum of scaled double values in round-to-nearest before
adding those up and adjusting for other rounding modes.
* math/auto-libm-test-in: Remove xfail-rounding:ldbl-128ibm from
tests of fma.
* math/auto-libm-test-out: Regenerated.
2016-05-19 20:10:56 +00:00
|
|
|
math_check_force_underflow (scale_val);
|
|
|
|
return scale_val;
|
2012-11-22 15:00:35 +00:00
|
|
|
|
Implement proper fmal for ldbl-128ibm (bug 13304).
ldbl-128ibm had an implementation of fmal that just did (x * y) + z in
most cases, with no attempt at actually being a fused operation.
This patch replaces it with a genuine fused operation. It is not
necessarily correctly rounding, but should produce a result at least
as accurate as the long double arithmetic operations in libgcc, which
I think is all that can reasonably be expected for such a non-IEEE
format where arithmetic is approximate rather than rounded according
to any particular rule for determining the exact result. Like the
libgcc arithmetic, it may produce spurious overflow and underflow
results, and it falls back to the libgcc multiplication in the case of
(finite, finite, zero).
This concludes the fixes for bug 13304; any subsequently found fma
issues should go in separate Bugzilla bugs. Various other pieces of
bug 13304 were fixed in past releases over the past several years.
Tested for powerpc.
[BZ #13304]
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Include <fenv.h>,
<float.h>, <math_private.h> and <stdlib.h>.
(add_split): New function.
(mul_split): Likewise.
(ext_val): New typedef.
(store_ext_val): New function.
(mul_ext_val): New function.
(compare): New function.
(add_split_ext): New function.
(__fmal): After checking for Inf, NaN and zero, compute result as
an exact sum of scaled double values in round-to-nearest before
adding those up and adjusting for other rounding modes.
* math/auto-libm-test-in: Remove xfail-rounding:ldbl-128ibm from
tests of fma.
* math/auto-libm-test-out: Regenerated.
2016-05-19 20:10:56 +00:00
|
|
|
zero_out:;
|
|
|
|
double zero = 0.0;
|
|
|
|
zero = math_opt_barrier (zero);
|
|
|
|
return zero - zero;
|
2011-09-08 02:17:33 +00:00
|
|
|
}
|
2014-11-20 15:34:47 +00:00
|
|
|
#if IS_IN (libm)
|
2011-09-08 02:17:33 +00:00
|
|
|
long_double_symbol (libm, __fmal, fmal);
|
|
|
|
#else
|
|
|
|
long_double_symbol (libc, __fmal, fmal);
|
|
|
|
#endif
|