Converting double precision constants to float is now affected by the
runtime dynamic rounding mode instead of being evaluated at compile
time with default rounding mode (except static object initializers).
This can change the computed result and cause performance regression.
The known correctness issues (increased ulp errors) are already fixed,
this patch fixes remaining cases of unnecessary runtime conversions.
Add float M_* macros to math.h as new GNU extension API. To avoid
conversions the new M_* macros are used and instead of casting double
literals to float, use float literals (only required if the conversion
is inexact).
The patch was tested on aarch64 where the following symbols had new
spurious conversion instructions that got fixed:
__clog10f
__gammaf_r_finite@GLIBC_2.17
__j0f_finite@GLIBC_2.17
__j1f_finite@GLIBC_2.17
__jnf_finite@GLIBC_2.17
__kernel_casinhf
__lgamma_negf
__log1pf
__y0f_finite@GLIBC_2.17
__y1f_finite@GLIBC_2.17
cacosf
cacoshf
casinhf
catanf
catanhf
clogf
gammaf_positive
Fixes bug 28713.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
s_cosf.c and s_sinf.c have
if (abstop12 (y) < abstop12 (pio4))
where abstop12 takes a float argument, but pio4 is static const double.
pio4 is used only in calls to abstop12 and never in arithmetic. Apply
-static const double pio4 = 0x1.921FB54442D18p-1;
+static const float pio4 = 0x1.921FB6p-1f;
to fix:
FAIL: math/test-float-cos
FAIL: math/test-float-sin
FAIL: math/test-float-sincos
FAIL: math/test-float32-cos
FAIL: math/test-float32-sin
FAIL: math/test-float32-sincos
when compiling with GCC 12.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper with
SVID error handling around the new code. There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float. With gcc it
results in 2 FP comparisons.
The file Copyright is also changed to use GPL, the implementation was
completely changed by 7c10fd3515 to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.
Checked on x86_64-linux-gnu.
The largest errors over the full binary32 range are after this
patch (on x86_64):
RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7
Inputs that were yielding huge errors have been added to "make check".
Reviewed-by: Adhemeral Zanella <adhemerval.zanella@linaro.org>
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch is using the corresponding GCC builtin for roundevenf,
roundeven and roundevenl if the USE_FUNCTION_BUILTIN macros are defined
to one in math-use-builtins.h.
These builtin functions is supported since GCC 10.
The code of the generic implementation is not changed.
Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This patch redirect roundeven function for futhermore changes.
Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.
The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average. Two different algorithms are used:
* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)
* for large inputs, an asymptotic formula from [1] is used
[1] Fast and Accurate Bessel Function Computation,
John Harrison, Proceedings of Arith 19, 2009.
Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).
Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
In TS 18661-1, getpayload had an unspecified return value for a
non-NaN argument, while C2x requires the return value -1 in that case.
This patch implements the return value of -1. I don't think this is
worth having a new symbol version that's an alias of the old one,
although occasionally we do that in such cases where the new function
semantics are a refinement of the old ones (to avoid programs relying
on the new semantics running on older glibc versions but not behaving
as intended).
Tested for x86_64 and x86; also ran math/ tests for aarch64 and
powerpc.
This patch changes the exp10f error handling semantics to only set
errno according to POSIX rules. New symbol version is introduced at
GLIBC_2.32. The old wrappers are kept for compat symbols.
There are some outliers that need special handling:
- ia64 provides an optimized implementation of exp10f that uses ia64
specific routines to set SVID compatibility. The new symbol version
is aliased to the exp10f one.
- m68k also provides an optimized implementation, and the new version
uses it instead of the sysdeps/ieee754/flt32 one.
- riscv and csky uses the generic template implementation that
does not provide SVID support. For both cases a new exp10f
version is not added, but rather the symbols version of the
generic sysdeps/ieee754/flt32 is adjusted instead.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, i686-linux-gnu,
powerpc64le-linux-gnu.
It is inspired by expf and reuses its tables and internal functions.
The error checks are inlined and errno setting is in separate tail
called functions, but the wrappers are kept in this patch to handle
the _LIB_VERSION==_SVID_ case.
Double precision arithmetics is used which is expected to be faster on
most targets (including soft-float) than using single precision and it
is easier to get good precision result with it.
Result for x86_64 (i7-4790K CPU @ 4.00GHz) are:
Before new code:
"exp10f": {
"workload-spec2017.wrf (adapted)": {
"duration": 4.0414e+09,
"iterations": 1.00128e+08,
"reciprocal-throughput": 26.6818,
"latency": 54.043,
"max-throughput": 3.74787e+07,
"min-throughput": 1.85038e+07
}
With new code:
"exp10f": {
"workload-spec2017.wrf (adapted)": {
"duration": 4.11951e+09,
"iterations": 1.23968e+08,
"reciprocal-throughput": 21.0581,
"latency": 45.4028,
"max-throughput": 4.74876e+07,
"min-throughput": 2.20251e+07
}
Result for aarch64 (A72 @ 2GHz) are:
Before new code:
"exp10f": {
"workload-spec2017.wrf (adapted)": {
"duration": 4.62362e+09,
"iterations": 3.3376e+07,
"reciprocal-throughput": 127.698,
"latency": 149.365,
"max-throughput": 7.831e+06,
"min-throughput": 6.69501e+06
}
With new code:
"exp10f": {
"workload-spec2017.wrf (adapted)": {
"duration": 4.29108e+09,
"iterations": 6.6752e+07,
"reciprocal-throughput": 51.2111,
"latency": 77.3568,
"max-throughput": 1.9527e+07,
"min-throughput": 1.29271e+07
}
Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, aarch64-linux-gnu,
and sparc64-linux-gnu.
A recent discussion in bug 14469 notes that a threshold in float
Bessel function implementations, used to determine when to use a
simpler implementation approach, results in substantially inaccurate
results.
As I discussed in
<https://sourceware.org/ml/libc-alpha/2013-03/msg00345.html>, a
heuristic argument suggests 2^(S+P) as the right order of magnitude
for a suitable threshold, where S is the number of significand bits in
the floating-point type and P is the number of significant bits in the
representation of the floating-point type, and the float and ldbl-96
implementations use thresholds that are too small. Some threshold
does need using, there or elsewhere in the implementation, to avoid
spurious underflow and overflow for large arguments.
This patch sets the thresholds in the affected implementations to more
heuristically justifiable values. Results will still be inaccurate
close to zeroes of the functions (thus this patch does *not* fix any
of the bugs for Bessel function inaccuracy); fixing that would require
a different implementation approach, likely along the lines described
in <http://www.cl.cam.ac.uk/~jrh13/papers/bessel.ps.gz>.
So the justification for a change such as this would be statistical
rather than based on particular tests that had excessive errors and no
longer do so (no doubt such tests could be found, but would probably
be too fragile to add to the testsuite, as liable to give large errors
again from very small implementation changes or even from compiler
changes). See
<https://sourceware.org/ml/libc-alpha/2020-02/msg00638.html> for such
statistics of the resulting improvements for float functions.
Tested (glibc testsuite) for x86_64.
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol. It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).
The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.
Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).
Passes buildmanyglibc.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
This patch just adjusts the generic implementation regarding code style.
No functional change.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch just adjusts the generic implementation regarding code style.
No functional change.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch just adjusts the generic implementation regarding code style.
No functional change.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch just adjusts the generic implementation regarding code style.
No functional change.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch is always using the corresponding GCC builtin for copysignf, copysign,
and is using the builtin for copysignl, copysignf128 if the USE_FUNCTION_BUILTIN
macros are defined to one in math-use-builtins.h.
Altough the long double version is enabled by default we still need
the macro and the alternative implementation as the _Float128 version
of the builtin is not available with all supported GCC versions.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch is using the corresponding GCC builtin for roundf, round,
roundl and roundf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins.h.
This is the case for s390 if build with at least --march=z196 --mzarch.
Otherwise the generic implementation is used. The code of the generic
implementation is not changed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch is using the corresponding GCC builtin for truncf, trunc,
truncl and truncf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins.h.
This is the case for s390 if build with at least --march=z196 --mzarch.
Otherwise the generic implementation is used. The code of the generic
implementation is not changed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch is using the corresponding GCC builtin for ceilf, ceil,
ceill and ceilf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins.h.
This is the case for s390 if build with at least --march=z196 --mzarch.
Otherwise the generic implementation is used. The code of the generic
implementation is not changed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch is using the corresponding GCC builtin for floorf, floor,
floorl and floorf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins.h.
This is the case for s390 if build with at least --march=z196 --mzarch.
Otherwise the generic implementation is used. The code of the generic
implementation is not changed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch is using the corresponding GCC builtin for rintf, rint,
rintl and rintf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins.h.
This is the case for s390 if build with at least --march=z196 --mzarch.
Otherwise the generic implementation is used. The code of the generic
implementation is not changed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch is using the corresponding GCC builtin for nearbyintf, nearbyint,
nearbintl and nearbyintf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins.h.
This is the case for s390 if build with at least --march=z196 --mzarch.
Otherwise the generic implementation is used. The code of the generic
implementation is not changed.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Very recent commit 854e91bf6b enabled
inline of issignalingf() in general (__issignalingf in include/math.h).
There is another implementation for an inline use of issignalingf
(issignalingf_inline in sysdeps/ieee754/flt-32/math_config.h)
which could instead make use of the new enablement.
Replace the use of issignalingf_inline with __issignaling. Using
issignaling (instead of __issignalingf) will allow future enhancements
to the type-generic implementation, issignaling, to be automatically
adopted.
The implementations are slightly different, and compile to slightly
different code, but I measured no significant performance difference.
The second implementation was brought to my attention by:
Suggested-by: Joseph Myers <joseph@codesourcery.com>
Reviewed-by: Joseph Myers <joseph@codesourcery.com>
The resolution of C floating-point Clarification Request 25
<http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2397.htm#dr_25> is
that the totalorder and totalordermag functions should take pointer
arguments, and this has been adopted in C2X (with const added; note
that the integration of this change into C2X is present in the C
standard git repository but postdates the most recent public PDF
draft).
This patch updates glibc accordingly. As a defect resolution, the API
is changed unconditionally rather than supporting any sort of TS
18661-1 mode for compilation with the old version of the API. There
are compat symbols for existing binaries that pass floating-point
arguments directly. As a consequence of changing to pointer
arguments, there are no longer type-generic macros in tgmath.h for
these functions.
Because of the fairly complicated logic for creating libm function
aliases and determining the set of aliases to create in a given glibc
configuration, rather than duplicating all that in individual source
files to create the versioned and compat symbols, the source files for
the various versions of totalorder functions are set up to redefine
weak_alias before using libm_alias_* macros to create the symbols
required. In turn, this requires creating a separate alias for each
symbol version pointing to the same implementation (see binutils bug
<https://sourceware.org/bugzilla/show_bug.cgi?id=23840>), which is
done automatically using __COUNTER__. (As I noted in
<https://sourceware.org/ml/libc-alpha/2018-10/msg00631.html>, it might
well make sense for glibc's symbol versioning macros to do that alias
creation with __COUNTER__ themselves, which would somewhat simplify
the logic in the totalorder source files.)
It is of course desirable to test the compat symbols. I did this with
the generic libm-test machinery, but didn't wish to duplicate the
actual tables of test inputs and outputs, and thought it risky to
attempt to have a single object file refer to both default and compat
versions of the same function in order to test them together. Thus, I
created libm-test-compat_totalorder.inc and
libm-test-compat_totalordermag.inc which include the generated .c
files (with the processed version of those tables of inputs) from the
non-compat tests, and added appropriate dependencies. I think this
provides sufficient test coverage for the compat symbols without also
needing to make the special ldbl-96 and ldbl-128ibm tests (of
peculiarities relating to the representations of those formats that
can't be covered in the generic tests) run for the compat symbols.
Tests of compat symbols need to be internal tests, meaning _ISOMAC is
not defined. Making some libm-test tests into internal tests showed
up two other issues. GCC diagnoses duplicate macro definitions of
__STDC_* macros, including __STDC_WANT_IEC_60559_TYPES_EXT__; I added
an appropriate conditional and filed
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91451> for this issue.
On ia64, include/setjmp.h ends up getting included indirectly from
libm-symbols.h, resulting in conflicting definitions of the STR macro
(also defined in libm-test-driver.c); I renamed the macros in
include/setjmp.h. (It's arguable that we should have common internal
headers used everywhere for stringizing and concatenation macros.)
Tested for x86_64 and x86, and with build-many-glibcs.py.
* math/bits/mathcalls.h
[__GLIBC_USE (IEC_60559_BFP_EXT) || __MATH_DECLARING_FLOATN]
(totalorder): Take pointer arguments.
[__GLIBC_USE (IEC_60559_BFP_EXT) || __MATH_DECLARING_FLOATN]
(totalordermag): Likewise.
* manual/arith.texi (totalorder): Likewise.
(totalorderf): Likewise.
(totalorderl): Likewise.
(totalorderfN): Likewise.
(totalorderfNx): Likewise.
(totalordermag): Likewise.
(totalordermagf): Likewise.
(totalordermagl): Likewise.
(totalordermagfN): Likewise.
(totalordermagfNx): Likewise.
* math/tgmath.h (__TGMATH_BINARY_REAL_RET_ONLY): Remove macro.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (totalorder): Likewise.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (totalordermag): Likewise.
* math/Versions (GLIBC_2.31): Add totalorder, totalorderf,
totalorderl, totalordermag, totalordermagf, totalordermagl,
totalorderf32, totalorderf64, totalorderf32x, totalordermagf32,
totalordermagf64, totalordermagf32x, totalorderf64x,
totalordermagf64x, totalorderf128 and totalordermagf128.
* math/Makefile (libm-test-funcs-noauto): Add compat_totalorder
and compat_totalordermag.
(libm-test-funcs-compat): New variable.
(libm-tests-compat): Likewise.
(tests): Do not include compat tests.
(tests-internal): Add compat tests.
($(foreach t,$(libm-tests-base),
$(objpfx)$(t)-compat_totalorder.o)): Depend
on $(objpfx)libm-test-totalorder.c.
($(foreach t,$(libm-tests-base),
$(objpfx)$(t)-compat_totalordermag.o): Depend on
$(objpfx)libm-test-totalordermag.c.
(tgmath3-macros): Remove totalorder and totalordermag.
* math/libm-test-compat_totalorder.inc: New file.
* math/libm-test-compat_totalordermag.inc: Likewise.
* math/libm-test-driver.c (struct test_ff_i_data): Update comment.
(RUN_TEST_fpfp_b): New macro.
(RUN_TEST_LOOP_fpfp_b): Likewise.
* math/libm-test-totalorder.inc (totalorder_test_data): Use
TEST_fpfp_b.
(totalorder_test): Condition on [!COMPAT_TEST].
(do_test): Likewise.
* math/libm-test-totalordermag.inc (totalordermag_test_data): Use
TEST_fpfp_b.
(totalordermag_test): Condition on [!COMPAT_TEST].
(do_test): Likewise.
* math/gen-tgmath-tests.py (Tests.add_all_tests): Remove
totalorder and totalordermag.
* math/test-tgmath.c (NCALLS): Change to 132.
(F(compile_test)): Do not call totalorder or totalordermag.
(F(totalorder)): Remove.
(F(totalordermag)): Likewise.
* include/float.h (__STDC_WANT_IEC_60559_TYPES_EXT__): Do not
define if [__STDC_WANT_IEC_60559_TYPES_EXT__].
* include/setjmp.h [!_ISOMAC] (STR_HELPER): Rename to
SJSTR_HELPER.
[!_ISOMAC] (STR): Rename to SJSTR. Update call to STR_HELPER.
[!_ISOMAC] (TEST_SIZE): Update call to STR.
[!_ISOMAC] (TEST_ALIGN): Likewise.
[!_ISOMAC] (TEST_OFFSET): Likewise.
* sysdeps/ieee754/dbl-64/s_totalorder.c: Include <shlib-compat.h>
and <first-versions.h>.
(__totalorder): Take pointer arguments. Add symbol versions and
compat symbols.
* sysdeps/ieee754/dbl-64/s_totalordermag.c: Include
<shlib-compat.h> and <first-versions.h>.
(__totalordermag): Take pointer arguments. Add symbol versions
and compat symbols.
* sysdeps/ieee754/dbl-64/wordsize-64/s_totalorder.c: Include
<shlib-compat.h> and <first-versions.h>.
(__totalorder): Take pointer arguments. Add symbol versions and
compat symbols.
* sysdeps/ieee754/dbl-64/wordsize-64/s_totalordermag.c: Include
<shlib-compat.h> and <first-versions.h>.
(__totalordermag): Take pointer arguments. Add symbol versions
and compat symbols.
* sysdeps/ieee754/float128/float128_private.h
(__totalorder_compatl): New macro.
(__totalordermag_compatl): Likewise.
* sysdeps/ieee754/flt-32/s_totalorderf.c: Include <shlib-compat.h>
and <first-versions.h>.
(__totalorderf): Take pointer arguments. Add symbol versions and
compat symbols.
* sysdeps/ieee754/flt-32/s_totalordermagf.c: Include
<shlib-compat.h> and <first-versions.h>.
(__totalordermagf): Take pointer arguments. Add symbol versions
and compat symbols.
* sysdeps/ieee754/ldbl-128/s_totalorderl.c: Include
<shlib-compat.h> and <first-versions.h>.
(__totalorderl): Take pointer arguments. Add symbol versions and
compat symbols.
* sysdeps/ieee754/ldbl-128/s_totalordermagl.c: Include
<shlib-compat.h> and <first-versions.h>.
(__totalordermagl): Take pointer arguments. Add symbol versions
and compat symbols.
* sysdeps/ieee754/ldbl-128ibm/s_totalorderl.c: Include
<shlib-compat.h>.
(__totalorderl): Take pointer arguments. Add symbol versions and
compat symbols.
* sysdeps/ieee754/ldbl-128ibm/s_totalordermagl.c: Include
<shlib-compat.h>.
(__totalordermagl): Take pointer arguments. Add symbol versions
and compat symbols.
* sysdeps/ieee754/ldbl-96/s_totalorderl.c: Include
<shlib-compat.h> and <first-versions.h>.
(__totalorderl): Take pointer arguments. Add symbol versions and
compat symbols.
* sysdeps/ieee754/ldbl-96/s_totalordermagl.c: Include
<shlib-compat.h> and <first-versions.h>.
(__totalordermagl): Take pointer arguments. Add symbol versions
and compat symbols.
* sysdeps/ieee754/ldbl-opt/nldbl-totalorder.c (totalorderl): Take
pointer arguments.
* sysdeps/ieee754/ldbl-opt/nldbl-totalordermag.c (totalordermagl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/test-totalorderl-ldbl-128ibm.c
(do_test): Update calls to totalorderl and totalordermagl.
* sysdeps/ieee754/ldbl-96/test-totalorderl-ldbl-96.c (do_test):
Update calls to totalorderl and totalordermagl.
* sysdeps/mach/hurd/i386/libm.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/csky/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
Add <sincosf_poly.h> and include it in s_sincosf.h to allow vectorized
sincosf_poly. Add x86 sincosf_poly.h to vectorize sincosf_poly. On
Broadwell, bench-sincosf shows:
Before After Improvement
max 160.273 114.198 40%
min 6.25 5.625 11%
mean 13.0325 10.6462 22%
Vectorized sincosf_poly shows
Before After Improvement
max 138.653 114.198 21%
min 5.004 5.625 -11%
mean 11.5934 10.6462 9%
Tested on x86-64 and i686 as well as with build-many-glibcs.py.
* sysdeps/ieee754/flt-32/s_sincosf.h: Include <sincosf_poly.h>.
(sincos_t, sincosf_poly, sinf_poly): Moved to ...
* sysdeps/ieee754/flt-32/sincosf_poly.h: Here. New file.
* sysdeps/x86/fpu/s_sincosf_data.c: New file.
* sysdeps/x86/fpu/sincosf_poly.h: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Just include
<sysdeps/ieee754/flt-32/s_sincosf.c>.
The threshold value at which powf overflows depends on the rounding mode
and the current check did not take this into account. So when the result
was rounded away from zero it could become infinity without setting
errno to ERANGE.
Example: pow(0x1.7ac7cp+5, 23) is 0x1.fffffep+127 + 0.1633ulp
If the result goes above 0x1.fffffep+127 + 0.5ulp then errno is set,
which is fine in nearest rounding mode, but
powf(0x1.7ac7cp+5, 23) is inf in upward rounding mode
powf(-0x1.7ac7cp+5, 23) is -inf in downward rounding mode
and the previous implementation did not set errno in these cases.
The fix tries to avoid affecting the common code path or calling a
function that may introduce a stack frame, so float arithmetics is used
to check the rounding mode and the threshold is selected accordingly.
[BZ #23961]
* math/auto-libm-test-in: Add new test case.
* math/auto-libm-test-out-pow: Regenerated.
* sysdeps/ieee754/flt-32/e_powf.c (__powf): Fix overflow check.
After my changes to move various macros, inlines and other content
from math_private.h to more specific headers, many files including
math_private.h no longer need to do so. Furthermore, since the
optimized inlines of various functions have been moved to
include/fenv.h or replaced by use of function names GCC inlines
automatically, a missing math_private.h include where one is
appropriate will reliably cause a build failure rather than possibly
causing code to be less well optimized while still building
successfully. Thus, this patch removes includes of math_private.h
that are now unnecessary. In the case of two RISC-V files, the
include is replaced by one of stdbool.h because the files in question
were relying on math_private.h to get a definition of bool.
Tested for x86_64 and x86, and with build-many-glibcs.py.
* math/fromfp.h: Do not include <math_private.h>.
* math/s_cacosh_template.c: Likewise.
* math/s_casin_template.c: Likewise.
* math/s_casinh_template.c: Likewise.
* math/s_ccos_template.c: Likewise.
* math/s_cproj_template.c: Likewise.
* math/s_fdim_template.c: Likewise.
* math/s_fmaxmag_template.c: Likewise.
* math/s_fminmag_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/s_ldexp_template.c: Likewise.
* math/s_nextdown_template.c: Likewise.
* math/w_log1p_template.c: Likewise.
* math/w_scalbln_template.c: Likewise.
* sysdeps/aarch64/fpu/feholdexcpt.c: Likewise.
* sysdeps/aarch64/fpu/fesetround.c: Likewise.
* sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise.
* sysdeps/aarch64/fpu/ftestexcept.c: Likewise.
* sysdeps/aarch64/fpu/s_llrint.c: Likewise.
* sysdeps/aarch64/fpu/s_llrintf.c: Likewise.
* sysdeps/aarch64/fpu/s_lrint.c: Likewise.
* sysdeps/aarch64/fpu/s_lrintf.c: Likewise.
* sysdeps/i386/fpu/s_atanl.c: Likewise.
* sysdeps/i386/fpu/s_f32xaddf64.c: Likewise.
* sysdeps/i386/fpu/s_f32xsubf64.c: Likewise.
* sysdeps/i386/fpu/s_fdim.c: Likewise.
* sysdeps/i386/fpu/s_logbl.c: Likewise.
* sysdeps/i386/fpu/s_rintl.c: Likewise.
* sysdeps/i386/fpu/s_significandl.c: Likewise.
* sysdeps/ia64/fpu/s_matherrf.c: Likewise.
* sysdeps/ia64/fpu/s_matherrl.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_cbrt.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/flt-32/s_cbrtf.c: Likewise.
* sysdeps/ieee754/k_standardf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_finitel.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_fpclassifyl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_isinfl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_isnanl.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/s_signbitl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_cbrtl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/s_signgam.c: Likewise.
* sysdeps/powerpc/power5+/fpu/s_modf.c: Likewise.
* sysdeps/powerpc/power5+/fpu/s_modff.c: Likewise.
* sysdeps/powerpc/power7/fpu/s_logbf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
* sysdeps/riscv/rvd/s_finite.c: Likewise.
* sysdeps/riscv/rvd/s_fmax.c: Likewise.
* sysdeps/riscv/rvd/s_fmin.c: Likewise.
* sysdeps/riscv/rvd/s_fpclassify.c: Likewise.
* sysdeps/riscv/rvd/s_isinf.c: Likewise.
* sysdeps/riscv/rvd/s_isnan.c: Likewise.
* sysdeps/riscv/rvd/s_issignaling.c: Likewise.
* sysdeps/riscv/rvf/fegetround.c: Likewise.
* sysdeps/riscv/rvf/feholdexcpt.c: Likewise.
* sysdeps/riscv/rvf/fesetenv.c: Likewise.
* sysdeps/riscv/rvf/fesetround.c: Likewise.
* sysdeps/riscv/rvf/feupdateenv.c: Likewise.
* sysdeps/riscv/rvf/fgetexcptflg.c: Likewise.
* sysdeps/riscv/rvf/ftestexcept.c: Likewise.
* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
* sysdeps/riscv/rvf/s_finitef.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/riscv/rvf/s_fmaxf.c: Likewise.
* sysdeps/riscv/rvf/s_fminf.c: Likewise.
* sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise.
* sysdeps/riscv/rvf/s_isinff.c: Likewise.
* sysdeps/riscv/rvf/s_isnanf.c: Likewise.
* sysdeps/riscv/rvf/s_issignalingf.c: Likewise.
* sysdeps/riscv/rvf/s_nearbyintf.c: Likewise.
* sysdeps/riscv/rvf/s_roundevenf.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/riscv/rvf/s_truncf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_rint.c: Include <stdbool.h> instead of
<math_private.h>.
* sysdeps/riscv/rvf/s_rintf.c: Likewise.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __round functions to call the
corresponding round names instead, with asm redirection to __round
when the calls are not inlined.
An additional complication arises in
sysdeps/ieee754/ldbl-128ibm/e_expl.c, where a call to roundl, with the
result converted to int, gets converted by the compiler to call
lroundl in the case of 32-bit long, so resulting in localplt test
failures. It's logically correct to let the compiler make such an
optimization; an appropriate asm redirection of lroundl to __lroundl
is thus added to that file (it's not needed anywhere else).
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (round): Redirect
using MATH_REDIRECT.
* sysdeps/aarch64/fpu/s_round.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_roundf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_round.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_round.c: Likewise.
* sysdeps/ieee754/float128/s_roundf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_roundf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_roundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_roundl.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_roundl.c: Likewise.
(round): Redirect to __round.
(__roundl): Call round instead of __round.
* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__round):
Remove macro.
[_ARCH_PWR5X] (__roundf): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use round
functions instead of __round variants.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c (lroundl): Redirect to
__lroundl.
(__ieee754_expl): Call roundl instead of __roundl.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __rint functions to call the
corresponding rint names instead, with asm redirection to __rint when
the calls are not inlined. The x86_64 math_private.h is removed as no
longer useful after this patch.
This patch is relative to a tree with my floor patch
<https://sourceware.org/ml/libc-alpha/2018-09/msg00148.html> applied,
and much the same considerations arise regarding possibly replacing an
IFUNC call with a direct inline expansion.
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (rint): Redirect
using MATH_REDIRECT.
* sysdeps/aarch64/fpu/s_rint.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_rintf.c: Likewise.
* sysdeps/alpha/fpu/s_rint.c: Likewise.
* sysdeps/alpha/fpu/s_rintf.c: Likewise.
* sysdeps/i386/fpu/s_rintl.c: Likewise.
* sysdeps/ieee754/dbl-64/s_rint.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_rint.c: Likewise.
* sysdeps/ieee754/float128/s_rintf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_rintf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/m68k/coldfire/fpu/s_rint.c: Likewise.
* sysdeps/m68k/coldfire/fpu/s_rintf.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_rint.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_rintf.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_rintl.c: Likewise.
* sysdeps/powerpc/fpu/s_rint.c: Likewise.
* sysdeps/powerpc/fpu/s_rintf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_rint.c: Likewise.
* sysdeps/riscv/rvf/s_rintf.c: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rint.c: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_rintf.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_rint.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_rintf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rint.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_rintf.c: Likewise.
* sysdeps/x86_64/fpu/math_private.h: Remove file.
* math/e_scalb.c (invalid_fn): Use rint functions instead of
__rint variants.
* math/e_scalbf.c (invalid_fn): Likewise.
* math/e_scalbl.c (invalid_fn): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (__ieee754_gamma_r):
Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (__ieee754_gammaf_r):
Likewise.
* sysdeps/ieee754/k_standard.c (__kernel_standard): Likewise.
* sysdeps/ieee754/k_standardl.c (__kernel_standard_l): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_llrint.c (__llrint): Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_llrintf.c (__llrintf): Likewise.
Similar to the changes that were made to call sqrt functions directly
in glibc, instead of __ieee754_sqrt variants, so that the compiler
could inline them automatically without needing special inline
definitions in lots of math_private.h headers, this patch makes libm
code call floor functions directly instead of __floor variants,
removing the inlines / macros for x86_64 (SSE4.1) and powerpc
(POWER5).
The redirection used to ensure that __ieee754_sqrt does still get
called when the compiler doesn't inline a built-in function expansion
is refactored so it can be applied to other functions; the refactoring
is arranged so it's not limited to unary functions either (it would be
reasonable to use this mechanism for copysign - removing the inline in
math_private_calls.h but also eliminating unnecessary local PLT entry
use in the cases (powerpc soft-float and e500v1, for IBM long double)
where copysign calls don't get inlined).
The point of this change is that more architectures can get floor
calls inlined where they weren't previously (AArch64, for example),
without needing special inline definitions in their math_private.h,
and existing such definitions in math_private.h headers can be
removed.
Note that it's possible that in some cases an inline may be used where
an IFUNC call was previously used - this is the case on x86_64, for
example. I think the direct calls to floor are still appropriate; if
there's any significant performance cost from inline SSE2 floor
instead of an IFUNC call ending up with SSE4.1 floor, that indicates
that either the function should be doing something else that's faster
than using floor at all, or it should itself have IFUNC variants, or
that the compiler choice of inlining for generic tuning should change
to allow for the possibility that, by not inlining, an SSE4.1 IFUNC
might be called at runtime - but not that glibc should avoid calling
floor internally. (After all, all the same considerations would apply
to any user program calling floor, where it might either be inlined or
left as an out-of-line call allowing for a possible IFUNC.)
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (MATH_REDIRECT):
New macro.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (MATH_REDIRECT_LDBL): Likewise.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (MATH_REDIRECT_F128): Likewise.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (MATH_REDIRECT_UNARY_ARGS): Likewise.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (sqrt): Redirect using MATH_REDIRECT.
[!_ISOMAC && !(__FINITE_MATH_ONLY__ && __FINITE_MATH_ONLY__ > 0)
&& !NO_MATH_REDIRECT] (floor): Likewise.
* sysdeps/aarch64/fpu/s_floor.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_floorf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_floor.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_floor.c: Likewise.
* sysdeps/ieee754/float128/s_floorf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_floorf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_floorl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_floorl.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_floor_template.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/sparc/sparc64/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floor.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_floorf.c: Likewise.
* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__floor):
Remove macro.
[_ARCH_PWR5X] (__floorf): Likewise.
* sysdeps/x86_64/fpu/math_private.h [__SSE4_1__] (__floor): Remove
inline function.
[__SSE4_1__] (__floorf): Likewise.
* math/w_lgamma_main.c (LGFUNC (__lgamma)): Use floor functions
instead of __floor variants.
* math/w_lgamma_r_compat.c (__lgamma_r): Likewise.
* math/w_lgammaf_main.c (LGFUNC (__lgammaf)): Likewise.
* math/w_lgammaf_r_compat.c (__lgammaf_r): Likewise.
* math/w_lgammal_main.c (LGFUNC (__lgammal)): Likewise.
* math/w_lgammal_r_compat.c (__lgammal_r): Likewise.
* math/w_tgamma_compat.c (__tgamma): Likewise.
* math/w_tgamma_template.c (M_DECL_FUNC (__tgamma)): Likewise.
* math/w_tgammaf_compat.c (__tgammaf): Likewise.
* math/w_tgammal_compat.c (__tgammal): Likewise.
* sysdeps/ieee754/dbl-64/e_lgamma_r.c (sin_pi): Likewise.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c (__lgamma_negl):
Likewise.
* sysdeps/ieee754/ldbl-128/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c (__ieee754_lgammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c (__lgamma_negl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_truncl.c (__truncl): Likewise.
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (sin_pi): Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c (__lgamma_negl): Likewise.
* sysdeps/powerpc/power5+/fpu/s_modf.c (__modf): Likewise.
* sysdeps/powerpc/power5+/fpu/s_modff.c (__modff): Likewise.
<fenv_private.h> has inline versions of various <fenv.h> functions,
and their __fe* variants, for systems (generally soft-float) without
support for floating-point exceptions, rounding modes or both.
Having these inlines in a separate header introduces a risk of a
source file including <fenv.h> and compiling OK on x86_64, but failing
to compile (because the feraiseexcept inline is actually a macro that
discards its argument, to avoid the need for #ifdef FE_INVALID
conditionals), or not being properly optimized, on systems without the
exceptions and rounding modes support (when these inlines were in
math_private.h, we had a few cases where this broke the build because
there was no obvious reason for a file to need math_private.h and it
didn't need that header on x86_64). By moving those inlines to
include/fenv.h, this risk can be avoided, and fenv_private.h becomes
more clearly defined as specifically the header for the internal
libc_fe* and SET_RESTORE_ROUND* interfaces.
This patch makes that move, removing fenv_private.h includes that are
no longer needed (or replacing them by fenv.h includes in a few cases
that didn't already have such an include).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/generic/fenv_private.h [FE_ALL_EXCEPT == 0]: Move this
code ....
[!FE_HAVE_ROUNDING_MODES]: And this code ....
* include/fenv.h [!_ISOMAC]: ... to here.
* math/fraiseexcpt.c (__feraiseexcept): Undefine as macro.
(feraiseexcept): Likewise.
* math/fromfp.h: Do not include <fenv_private.h>.
* math/s_cexp_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/w_acos_compat.c: Likewise.
* math/w_acosf_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lroundf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Likewise.
* math/w_ilogb_template.c: Include <fenv.h> instead of
<fenv_private.h>.
* math/w_llogb_template.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrtf.c: Likewise.
Continuing the clean-up related to the catch-all math_private.h
header, this patch stops math_private.h from including fenv_private.h.
Instead, fenv_private.h is included directly from those users of
math_private.h that also used interfaces from fenv_private.h. No
attempt is made to remove unused includes of math_private.h, but that
is a natural followup.
(However, since math_private.h sometimes defines optimized versions of
math.h interfaces or __* variants thereof, as well as defining its own
interfaces, I think it might make sense to get all those optimized
versions included from include/math.h, not requiring a separate header
at all, before eliminating unused math_private.h includes - that
avoids a file quietly becoming less-optimized if someone adds a call
to one of those interfaces without restoring a math_private.h include
to that file.)
There is still a pitfall that if code uses plain fe* and __fe*
interfaces, but only includes fenv.h and not fenv_private.h or (before
this patch) math_private.h, it will compile on platforms with
exceptions and rounding modes but not get the optimized versions (and
possibly not compile) on platforms without exception and rounding mode
support, so making it easy to break the build for such platforms
accidentally.
I think it would be most natural to move the inlines / macros for fe*
and __fe* in the case of no exceptions and rounding modes into
include/fenv.h, so that all code including fenv.h with _ISOMAC not
defined automatically gets them. Then fenv_private.h would be purely
the header for the libc_fe*, SET_RESTORE_ROUND etc. internal
interfaces and the risk of breaking the build on other platforms than
the one you tested on because of a missing fenv_private.h include
would be much reduced (and there would be some unused fenv_private.h
includes to remove along with unused math_private.h includes).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* sysdeps/generic/math_private.h: Do not include <fenv_private.h>.
* math/fromfp.h: Include <fenv_private.h>.
* math/math-narrow.h: Likewise.
* math/s_cexp_template.c: Likewise.
* math/s_csin_template.c: Likewise.
* math/s_csinh_template.c: Likewise.
* math/s_ctan_template.c: Likewise.
* math/s_ctanh_template.c: Likewise.
* math/s_iseqsig_template.c: Likewise.
* math/w_acos_compat.c: Likewise.
* math/w_acosf_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_ilogb_template.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_llogb_template.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* sysdeps/aarch64/fpu/feholdexcpt.c: Likewise.
* sysdeps/aarch64/fpu/fesetround.c: Likewise.
* sysdeps/aarch64/fpu/fgetexcptflg.c: Likewise.
* sysdeps/aarch64/fpu/ftestexcept.c: Likewise.
* sysdeps/ieee754/dbl-64/e_atan2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp2.c: Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c: Likewise.
* sysdeps/ieee754/dbl-64/e_jn.c: Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_remainder.c: Likewise.
* sysdeps/ieee754/dbl-64/e_sqrt.c: Likewise.
* sysdeps/ieee754/dbl-64/gamma_product.c: Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: Likewise.
* sysdeps/ieee754/dbl-64/s_atan.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fma.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fmaf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_llround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c: Likewise.
* sysdeps/ieee754/dbl-64/s_sincos.c: Likewise.
* sysdeps/ieee754/dbl-64/s_tan.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_nearbyint.c: Likewise.
* sysdeps/ieee754/dbl-64/x2y2m1.c: Likewise.
* sysdeps/ieee754/float128/float128_private.h: Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c: Likewise.
* sysdeps/ieee754/flt-32/e_j1f.c: Likewise.
* sysdeps/ieee754/flt-32/e_jnf.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_llroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c: Likewise.
* sysdeps/ieee754/flt-32/s_lroundf.c: Likewise.
* sysdeps/ieee754/flt-32/s_nearbyintf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c: Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c: Likewise.
* sysdeps/ieee754/ldbl-96/gamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fma.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_fmal.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c: Likewise.
* sysdeps/powerpc/fpu/e_sqrtf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_ceil.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_floor.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_nearbyint.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_roundeven.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_trunc.c: Likewise.
* sysdeps/riscv/rvd/s_finite.c: Likewise.
* sysdeps/riscv/rvd/s_fmax.c: Likewise.
* sysdeps/riscv/rvd/s_fmin.c: Likewise.
* sysdeps/riscv/rvd/s_fpclassify.c: Likewise.
* sysdeps/riscv/rvd/s_isinf.c: Likewise.
* sysdeps/riscv/rvd/s_isnan.c: Likewise.
* sysdeps/riscv/rvd/s_issignaling.c: Likewise.
* sysdeps/riscv/rvf/fegetround.c: Likewise.
* sysdeps/riscv/rvf/feholdexcpt.c: Likewise.
* sysdeps/riscv/rvf/fesetenv.c: Likewise.
* sysdeps/riscv/rvf/fesetround.c: Likewise.
* sysdeps/riscv/rvf/feupdateenv.c: Likewise.
* sysdeps/riscv/rvf/fgetexcptflg.c: Likewise.
* sysdeps/riscv/rvf/ftestexcept.c: Likewise.
* sysdeps/riscv/rvf/s_ceilf.c: Likewise.
* sysdeps/riscv/rvf/s_finitef.c: Likewise.
* sysdeps/riscv/rvf/s_floorf.c: Likewise.
* sysdeps/riscv/rvf/s_fmaxf.c: Likewise.
* sysdeps/riscv/rvf/s_fminf.c: Likewise.
* sysdeps/riscv/rvf/s_fpclassifyf.c: Likewise.
* sysdeps/riscv/rvf/s_isinff.c: Likewise.
* sysdeps/riscv/rvf/s_isnanf.c: Likewise.
* sysdeps/riscv/rvf/s_issignalingf.c: Likewise.
* sysdeps/riscv/rvf/s_nearbyintf.c: Likewise.
* sysdeps/riscv/rvf/s_roundevenf.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/riscv/rvf/s_truncf.c: Likewise.
Speedup tanf range reduction by using the new sincosf range
reduction algorithm. Overall code quality is improved due to
inlining, so there is a speedup even if no range reduction is
required.
tanf throughput gains on Cortex-A72:
* |x| < M_PI_4 : 1.1x
* |x| < M_PI_2 : 1.2x
* |x| < 2 * M_PI: 1.5x
* |x| < 120.0 : 1.6x
* |x| < Inf : 12.1x
* sysdeps/ieee754/flt-32/s_tanf.c (__tanf): Use fast range reduction.
The internal functions __kernel_sinf and __kernel_cosf are used only by
lgammaf_r. Removing the internal functions and using the generic sinf
and cosf is better overall. Benchmarking on Cortex-A72 shows the generic
sinf and cosf are 1.4x and 2.3x faster in the range |x| < PI/4, and 0.66x
and 1.1x for |x| < PI/2, so it should make lgammaf_r faster on average.
GLIBC regression tests pass on AArch64.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Use __sinf/__cosf.
* sysdeps/ieee754/flt-32/k_cosf.c (__kernel_cosf): Remove all code.
* sysdeps/ieee754/flt-32/k_sinf.c (__kernel_sinf): Likewise.
The second patch improves performance of sinf and cosf using the same
algorithms and polynomials. The returned values are identical to sincosf
for the same input. ULP definitions for AArch64 and x64 are updated.
sinf/cosf througput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.2x
* |x| < M_PI_4 : 1.8x
* |x| < 2 * M_PI: 1.7x
* |x| < 120.0 : 2.3x
* |x| < Inf : 3.0x
* NEWS: Mention sinf, cosf, sincosf.
* sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf.
* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf.
* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of
constants rather than including generic sincosf.h.
* sysdeps/x86_64/fpu/s_sincosf_data.c: Remove.
* sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite.
* sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove.
(reduced_cos): Remove.
(sinf_poly): New function.
* sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.