The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper
with SVID error handling around the new code. There is no new symbol
version nor compatibility code on !LIBM_SVID_COMPAT targets
(e.g. riscv).
The ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation. For both i686 and m68k,
which provive arch specific implementation, wrappers are added so
no new symbol are added (which would require to change the
implementations).
It shows an small improvement, the results for fmod:
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 12.5049 | 9.40992
x86_64 (Ryzen 9) | normal | 296.939 | 296.738
x86_64 (Ryzen 9) | close-exponents | 16.0244 | 13.119
aarch64 (N1) | subnormal | 6.81778 | 4.33313
aarch64 (N1) | normal | 155.620 | 152.915
aarch64 (N1) | close-exponents | 8.21306 | 5.76138
armhf (N1) | subnormal | 15.1083 | 14.5746
armhf (N1) | normal | 244.833 | 241.738
armhf (N1) | close-exponents | 21.8182 | 22.457
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:
mx * 2^ex == 2 * mx * 2^(ex - 1)
while (ex > ey)
{
mx *= 2;
--ex;
mx %= my;
}
With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 8 (which is sizeof of uint32_t minus
MANTISSA_WIDTH plus the signal bit):
while (ex > ey)
{
mx << 8;
ex -= 8;
mx %= my;
} */
The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles. Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.
I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.0318
x86_64 (Ryzen 9) | normal | 85.4096 | 49.9641
x86_64 (Ryzen 9) | close-exponents | 19.1072 | 15.8224
aarch64 (N1) | subnormal | 10.2182 | 6.81778
aarch64 (N1) | normal | 60.0616 | 20.3667
aarch64 (N1) | close-exponents | 11.5256 | 8.39685
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 11.6662 | 10.8955
armhf (N1) | normal | 69.2759 | 34.1524
armhf (N1) | close-exponents | 13.6472 | 18.2131
Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.
Co-authored-by: kirill <kirill.okhotnikov@gmail.com>
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:
mx * 2^ex == 2 * mx * 2^(ex - 1)
while (ex > ey)
{
mx *= 2;
--ex;
mx %= my;
}
With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 11 (which is sizeo of uint64_t minus
MANTISSA_WIDTH plus the signal bit):
while (ex > ey)
{
mx << 11;
ex -= 11;
mx %= my;
} */
The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles. Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.
I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 19.1584 | 12.5049
x86_64 (Ryzen 9) | normal | 1016.51 | 296.939
x86_64 (Ryzen 9) | close-exponents | 18.4428 | 16.0244
aarch64 (N1) | subnormal | 11.153 | 6.81778
aarch64 (N1) | normal | 528.649 | 155.62
aarch64 (N1) | close-exponents | 11.4517 | 8.21306
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, clz, ctz, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 15.908 | 15.1083
armhf (N1) | normal | 837.525 | 244.833
armhf (N1) | close-exponents | 16.2111 | 21.8182
Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.
Co-authored-by: kirill <kirill.okhotnikov@gmail.com>
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
<tgmath.h> implements semantics for integer generic arguments that
handle cases involving _FloatN / _FloatNx types as specified in TS
18661-3 plus some defect fixes.
C2x has further changes to the semantics for <tgmath.h> macros with
such types, which should also be considered defect fixes (although
handled through the integration of TS 18661-3 in C2x rather than
through an issue tracking process). Specifically, the rules were
changed because of problems raised with using the macros with the
evaluation format types such as float_t and _Float32_t: the older
version of the rules didn't allow passing _FloatN / _FloatNx types to
the narrowing macros returning float or double, or passing float /
double / long double to the narrowing macros returning _FloatN /
_FloatNx, which was a problem with the evaluation format types which
could be either kind of type depending on the value of
FLT_EVAL_METHOD.
Thus the new rules allow cases of mixing types which were not allowed
before, and, as part of the changes, the handling of integer arguments
was also changed: if there is any _FloatNx generic argument, integer
generic arguments are treated as _Float32x (not double), while the
rule about treating integer arguments to narrowing macros returning
_FloatN or _FloatNx as _Float64 not double was removed (no longer
needed now double is a valid argument to such macros).
I've implemented the changes in GCC's __builtin_tgmath, which thus
requires updates to glibc's test expectations so that the tests
continue to build with GCC 13 (the test is also updated to test the
argument types that weren't allowed before but are now valid under C2x
rules).
Given those test changes, it's then also necessary to fix the
implementations in <tgmath.h> to have appropriate semantics with older
GCC so that the tests pass with GCC versions before GCC 13 as well.
For some cases (non-narrowing macros with two or three generic
arguments; narrowing macros returning _Float32x), the older version of
__builtin_tgmath doesn't correspond sufficiently well to C2x
semantics, so in those cases <tgmath.h> is adjusted to use the older
macro implementation instead of __builtin_tgmath. The older macro
implementation is itself adjusted to give the desired semantics, with
GCC 7 and later. (It's not possible to get the right semantics in all
cases for the narrowing macros with GCC 6 and before when the _FloatN
/ _FloatNx names are typedefs rather than distinct types.)
Tested as follows: with the full glibc testsuite for x86_64, GCC 6, 7,
11, 13; with execution of the math/tests for aarch64, arm, powerpc and
powerpc64le, GCC 6, 7, 12 and 13 (powerpc64le only with GCC 12 and
13); with build-many-glibcs.py with GCC 6, 7, 12 and 13.
GCC 13 has added more _FloatN and _FloatNx versions of existing
<math.h> and <complex.h> built-in functions, for use in libstdc++-v3.
This breaks the glibc build because of how those functions are defined
as aliases to functions with the same ABI but different types. Add
appropriate -fno-builtin-* options for compiling relevant files, as
already done for the case of long double functions aliasing double
ones and based on the list of files used there.
I fixed some mistakes in that list of double files that I noticed
while implementing this fix, but there may well be more such
(harmless) cases, in this list or the new one (files that don't
actually exist or don't define the named functions as aliases so don't
need the options). I did try to exclude cases where glibc doesn't
define certain functions for _FloatN or _FloatNx types at all from the
new uses of -fno-builtin-* options. As with the options for double
files (see the commit message for commit
49348beafe, "Fix build with GCC 10 when
long double = double."), it's deliberate that the options are used
even if GCC currently doesn't have a built-in version of a given
functions, so providing some level of future-proofing against more
such built-in functions being added in future.
Tested with build-many-glibcs.py for aarch64-linux-gnu
powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu (compilers
and glibcs builds) with GCC mainline.
With GCC 13, _FloatN and _FloatNx types, when they exist, are distinct
types like they are in C with GCC 7 and later, rather than typedefs
for types such as float, double or long double.
This breaks the templated iseqsig implementation for C++ in <math.h>,
when used with types that were previously implemented as aliases. Add
the necessary definitions for _Float32, _Float64, _Float128 (when the
same format as long double), _Float32x and _Float64x in this case, so
that iseqsig can be used with such types in C++ with GCC 13 as it
could with previous GCC versions.
Also add tests for calling iseqsig in C++ with arguments of such types
(more minimal than existing tests, so that they can work with older
GCC versions and without relying on any C++ library support for the
types or on hardcoding details of their formats). The LDBL_MANT_DIG
!= 106 conditionals on some tests are because the type-generic
comparison macros have undefined behavior when neither argument has a
type whose set of values is a subset of those for the type of the
other argument, which applies when one argument is IBM long double and
the other is an IEEE format wider than binary64.
Tested with build-many-glibcs.py glibcs build for aarch64-linux-gnu
i686-linux-gnu mips-linux-gnu mips64-linux-gnu-n32 powerpc-linux-gnu
powerpc64le-linux-gnu x86_64-linux-gnu.
The Z modifier is a nonstandard synonymn for z (that predates z
itself) and compiler might issue an warning for in invalid
conversion specifier.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This patch adds following input to atanh accuracy test.
0x1.f80094p-8
Tested on x86-64 and i686 platforms.
Other platforms may have to regenerate ulps file.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
This patch adds following inputs:
0x1.bcab29da0e947p-54 0x1.bc41f4d2294b8p-54
0x1.a11891ec004d4p-348 0x1.814830510be26p-348
0x1.b836ed678be29p-588 0x1.b7be6f5a03a8cp-588
0x1.a83f842ef3f73p-633 0x1.a799d8a6677ep-633
to atan2 tests and updates x86_64 double atan2 ulps.
This fixes BZ #28765.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Converting double precision constants to float is now affected by the
runtime dynamic rounding mode instead of being evaluated at compile
time with default rounding mode (except static object initializers).
This can change the computed result and cause performance regression.
The known correctness issues (increased ulp errors) are already fixed,
this patch fixes remaining cases of unnecessary runtime conversions.
Add float M_* macros to math.h as new GNU extension API. To avoid
conversions the new M_* macros are used and instead of casting double
literals to float, use float literals (only required if the conversion
is inexact).
The patch was tested on aarch64 where the following symbols had new
spurious conversion instructions that got fixed:
__clog10f
__gammaf_r_finite@GLIBC_2.17
__j0f_finite@GLIBC_2.17
__j1f_finite@GLIBC_2.17
__jnf_finite@GLIBC_2.17
__kernel_casinhf
__lgamma_negf
__log1pf
__y0f_finite@GLIBC_2.17
__y1f_finite@GLIBC_2.17
cacosf
cacoshf
casinhf
catanf
catanhf
clogf
gammaf_positive
Fixes bug 28713.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
Implement vectorized tan/tanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector tan/tanf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized erfc/erfcf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector erfc/erfcf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized asinh/asinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector asinh/asinhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized tanh/tanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector tanh/tanhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized erf/erff containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector erf/erff with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized acosh/acoshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector acosh/acoshf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized atanh/atanhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atanh/atanhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized log1p/log1pf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector log1p/log1pf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized log2/log2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector log2/log2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized log10/log10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector log10/log10f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized atan2/atan2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atan2/atan2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized cbrt/cbrtf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector cbrt/cbrtf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized sinh/sinhf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector sinh/sinhf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector expm1/expm1f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector cosh/coshf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector exp10/exp10f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector exp2/exp2f with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector hypot/hypotf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector asin/asinf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector atan/atanf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI. It also contains
accuracy and ABI tests for vector acos/acosf with regenerated ulps.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper with
SVID error handling around the new code. There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
The largest errors over the full binary32 range are after this
patch (on x86_64):
RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7
Inputs that were yielding huge errors have been added to "make check".
Reviewed-by: Adhemeral Zanella <adhemerval.zanella@linaro.org>
glibc has had exp10 functions since long before they were
standardized; now they are standardized in TS 18661-4 and C2X, they
are also specified there to have a corresponding type-generic macro.
Add one to <tgmath.h>, so fixing bug 26108.
glibc doesn't have other functions from TS 18661-4 yet, but when
added, it will be natural to add the type-generic macro for each
function family at the same time as the functions.
Tested for x86_64.
At the last WG14 meeting,
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2711.htm> was
accepted, which places more emphasis on the new fmaximum / fminimum
functions and less on the old fmax / fmin functions. Some of the
changes are to examples, notes or otherwise don't require
implementation changes. However, the changes include removing the
_FloatN / _FloatNx versions of the fmax and fmin functions that came
from TS 18661-3.
Thus, those function versions should only be declared under similar
conditions to the _FloatN / _FloatNx versions of fmaxmag and fminmag:
for _GNU_SOURCE and pre-C2X use of __STDC_WANT_IEC_60559_TYPES_EXT__,
but not for C2X without _GNU_SOURCE.
In turn this requires a tgmath.h change so that the corresponding
tgmath.h macros, for C2X with __STDC_WANT_IEC_60559_TYPES_EXT__ but
without _GNU_SOURCE, don't try to use function variants that aren't
declared. (That issue doesn't arise for the tgmath.h macros for
fmaxmag and fminmag, because those aren't defined at all in those
circumstances unless __STDC_WANT_IEC_60559_BFP_EXT__ (from TS 18661-1
and not specified at all by C2X) is also defined, and in that case the
_FloatN / _FloatNx versions of fmaxmag and fminmag get declared - this
is only ever an issue when it's possible for some functions
corresponding to a type-generic-macro to be declared, and for _FloatN
/ _FloatNx functions in general to be declared, but without the
_FloatN / _FloatNx functions corresponding to that particular macro
being declared.)
Tested for x86_64.
C2X does not include fmaxmag and fminmag. When I updated feature test
macro handling accordingly (commit
858045ad1c, "Update floating-point
feature test macro handling for C2X", included in 2.34), I missed
updating tgmath.h so it doesn't define the corresponding type-generic
macros unless __STDC_WANT_IEC_60559_BFP_EXT__ is defined; I've now
reported this as bug 28397. Adjust the conditionals in tgmath.h
accordingly.
Tested for x86_64.
C2X adds new <math.h> functions for floating-point maximum and
minimum, corresponding to the new operations that were added in IEEE
754-2019 because of concerns about the old operations not being
associative in the presence of signaling NaNs. fmaximum and fminimum
handle NaNs like most <math.h> functions (any NaN argument means the
result is a quiet NaN). fmaximum_num and fminimum_num handle both
quiet and signaling NaNs the way fmax and fmin handle quiet NaNs (if
one argument is a number and the other is a NaN, return the number),
but still raise "invalid" for a signaling NaN argument, making them
exceptions to the normal rule that a function with a floating-point
result raising "invalid" also returns a quiet NaN. fmaximum_mag,
fminimum_mag, fmaximum_mag_num and fminimum_mag_num are corresponding
functions returning the argument with greatest or least absolute
value. All these functions also treat +0 as greater than -0. There
are also corresponding <tgmath.h> type-generic macros.
Add these functions to glibc. The implementations use type-generic
templates based on those for fmax, fmin, fmaxmag and fminmag, and test
inputs are based on those for those functions with appropriate
adjustments to the expected results. The RISC-V maintainers might
wish to add optimized versions of fmaximum_num and fminimum_num (for
float and double), since RISC-V (F extension version 2.2 and later)
provides instructions corresponding to those functions - though it
might be at least as useful to add architecture-independent built-in
functions to GCC and teach the RISC-V back end to expand those
functions inline, which is what you generally want for functions that
can be implemented with a single instruction.
Tested for x86_64 and x86, and with build-many-glibcs.py.
This patch adds the narrowing fused multiply-add functions from TS
18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal,
f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x,
f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128,
f64xfmaf128 for configurations with _Float64x and _Float128;
__f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case
(for calls to ffmal and dfmal when long double is IEEE binary128).
Corresponding tgmath.h macro support is also added.
The changes are mostly similar to those for the other narrowing
functions previously added, especially that for sqrt, so the
description of those generally applies to this patch as well. As with
sqrt, I reused the same test inputs in auto-libm-test-in as for
non-narrowing fma rather than adding extra or separate inputs for
narrowing fma. The tests in libm-test-narrow-fma.inc also follow
those for non-narrowing fma.
The non-narrowing fma has a known bug (bug 6801) that it does not set
errno on errors (overflow, underflow, Inf * 0, Inf - Inf). Rather
than fixing this or having narrowing fma check for errors when
non-narrowing does not (complicating the cases when narrowing fma can
otherwise be an alias for a non-narrowing function), this patch does
not attempt to check for errors from narrowing fma and set errno; the
CHECK_NARROW_FMA macro is still present, but as a placeholder that
does nothing, and this missing errno setting is considered to be
covered by the existing bug rather than needing a separate open bug.
missing-errno annotations are duly added to many of the
auto-libm-test-in test inputs for fma.
This completes adding all the new functions from TS 18661-1 to glibc,
so will be followed by corresponding stdc-predef.h changes to define
__STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support
for TS 18661-1 will be at a similar level to that for C standard
floating-point facilities up to C11 (pragmas not implemented, but
library functions done). (There are still further changes to be done
to implement changes to the types of fromfp functions from N2548.)
Tested as followed: natively with the full glibc testsuite for x86_64
(GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC
11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32
hard float, mips64 (all three ABIs, both hard and soft float). The
different GCC versions are to cover the different cases in tgmath.h
and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in
glibc headers, GCC 7 has proper _Float* support, GCC 8 adds
__builtin_tgmath).
As described in bug 28358, the round-to-odd computations used in the
libm functions that round their results to a narrower format can yield
spurious underflow exceptions in the following circumstances: the
narrowing only narrows the precision of the type and not the exponent
range (i.e., it's narrowing _Float128 to _Float64x on x86_64, x86 or
ia64), the architecture does after-rounding tininess detection (which
applies to all those architectures), the result is inexact, tiny
before rounding but not tiny after rounding (with the chosen rounding
mode) for _Float64x (which is possible for narrowing mul, div and fma,
not for narrowing add, sub or sqrt), so the underflow exception
resulting from the toward-zero computation in _Float128 is spurious
for _Float64x.
Fixed by making ROUND_TO_ODD call feclearexcept (FE_UNDERFLOW) in the
problem cases (as indicated by an extra argument to the macro); there
is never any need to preserve underflow exceptions from this part of
the computation, because the conversion of the round-to-odd value to
the narrower type will underflow in exactly the cases in which the
function should raise that exception, but it may be more efficient to
avoid the extra manipulation of the floating-point environment when
not needed.
Tested for x86_64 and x86, and with build-many-glibcs.py.