This patch replaces i386 assembly versions of e_exp2f with generic
e_exp2f.c. For workload-spec2017.wrf, on Nehalem, it improves
performance by:
Before After Improvement
reciprocal-throughput 112.996 40.0454 182%
latency 126.581 54.4479 132%
On Skylake, it improves performance by:
Before After Improvement
reciprocal-throughput 113.14 39.447 186%
latency 136.068 55.684 144%
On IvyBridge with --disable-multi-arch, it improves performance by:
Before After Improvement
reciprocal-throughput 132.521 40.3759 228%
latency 145.791 58.4587 149%
* sysdeps/i386/fpu/e_exp2f.S: Removed.
* sysdeps/i386/fpu/w_exp2f.c: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_exp2f.c.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
Add e_exp2f-sse2.
(CFLAGS-e_exp2f-sse2.c): New.
* sysdeps/i386/i686/fpu/multiarch/e_exp2f-sse2.c: New file.
* sysdeps/i386/i686/fpu/multiarch/e_exp2f.c: Likewise.
The new generic logf, log2f and powf code don't need wrappers any more,
they set errno inline so only use the wrappers on targets that need it.
* sysdeps/ieee754/flt-32/e_log2f.c (__log2f): Define without wrapper.
* sysdeps/ieee754/flt-32/e_logf.c (__logf): Likewise
* sysdeps/ieee754/flt-32/e_powf.c (__powf): Likewise
* sysdeps/ieee754/flt-32/w_log2f.c: New file.
* sysdeps/ieee754/flt-32/w_logf.c: New file.
* sysdeps/ieee754/flt-32/w_powf.c: New file.
* sysdeps/i386/fpu/w_log2f.c: New file.
* sysdeps/i386/fpu/w_logf.c: New file.
* sysdeps/i386/fpu/w_powf.c: New file.
* sysdeps/m68k/m680x0/fpu/w_log2f.c: New file.
* sysdeps/m68k/m680x0/fpu/w_logf.c: New file.
* sysdeps/m68k/m680x0/fpu/w_powf.c: New file.
The new generic expf and exp2f code don't need wrappers any more, they
set errno inline, so only use the wrappers on targets that need it.
(If the wrapper is needed, then the top level wrapper code is included,
otherwise empty w_exp*f.c is used to suppress the wrapper.)
A powerpc64 expf implementation includes the expf c code directly which
needed some changes.
* sysdeps/ieee754/flt-32/e_exp2f.c (__exp2f): Define without wrapper.
* sysdeps/ieee754/flt-32/e_expf.c (__expf): Likewise
* sysdeps/ieee754/flt-32/w_exp2f.c: New file.
* sysdeps/ieee754/flt-32/w_expf.c: New file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Update for
the new expf code.
* sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: New file.
* sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: New file.
* sysdeps/m68k/m680x0/fpu/w_exp2f.c: New file.
* sysdeps/m68k/m680x0/fpu/w_expf.c: New file.
* sysdeps/i386/fpu/w_exp2f.c: New file.
* sysdeps/i386/fpu/w_expf.c: New file.
* sysdeps/i386/i686/fpu/multiarch/w_expf.c: New file.
* sysdeps/x86_64/fpu/w_expf.c: New file.
without wrapper on aarch64:
powf reciprocal-throughput: 4.2x faster
powf latency: 2.6x faster
old worst-case error: 1.11 ulp
new worst-case error: 0.82 ulp
aarch64 .text size: -780 bytes
aarch64 .rodata size: +144 bytes
powf(x,y) is implemented as exp2(y*log2(x)) with the same algorithms
that are used in exp2f and log2f, except that the log2f polynomial is
larger for extra precision and its output (and exp2f input) may be
scaled by a power of 2 (POWF_SCALE) to simplify the argument reduction
step of exp2 (possible when efficient round and convert toint operation
is available).
The special case handling tries to minimize the checks in the hot path.
When the input of exp2_inline is checked, int arithmetics is used as
that was faster on the tested aarch64 cores.
* math/Makefile (type-float-routines): Add e_powf_log2_data.
* sysdeps/ieee754/flt-32/e_powf.c: New implementation.
* sysdeps/ieee754/flt-32/e_powf_log2_data.c: New file.
* sysdeps/ieee754/flt-32/math_config.h (__powf_log2_data): Define.
(issignalingf_inline): Likewise.
(POWF_LOG2_TABLE_BITS): Likewise.
(POWF_LOG2_POLY_ORDER): Likewise.
(POWF_SCALE_BITS): Likewise.
(POWF_SCALE): Likewise.
* sysdeps/i386/fpu/e_powf_log2_data.c: New file.
* sysdeps/ia64/fpu/e_powf_log2_data.c: New file.
* sysdeps/m68k/m680x0/fpu/e_powf_log2_data.c: New file.
Similar to the new logf: double precision arithmetics and a small
lookup table is used. The argument reduction step is the same as in
the new logf.
without wrapper on aarch64:
log2f reciprocal-throughput: 2.3x faster
log2f latency: 2.1x faster
old worst case error: 1.72 ulp
new worst case error: 0.75 ulp
aarch64 .text size: -252 bytes
aarch64 .rodata size: +244 bytes
* math/Makefile (type-float-routines): Add e_log2f_data.
* sysdeps/ieee754/flt-32/e_log2f.c: New implementation.
* sysdeps/ieee754/flt-32/e_log2f_data.c: New file.
* sysdeps/ieee754/flt-32/math_config.h (__log2f_data): Define.
(LOG2F_TABLE_BITS, LOG2F_POLY_ORDER): Define.
* sysdeps/i386/fpu/e_log2f_data.c: New file.
* sysdeps/ia64/fpu/e_log2f_data.c: New file.
* sysdeps/m68k/m680x0/fpu/e_log2f_data.c: New file.
without wrapper on aarch64:
logf reciprocal-throughput: 2.2x faster
logf latency: 1.9x faster
old worst case error: 0.89 ulp
new worst case error: 0.82 ulp
aarch64 .text size: -356 bytes
aarch64 .rodata size: +240 bytes
Uses double precision arithmetics and a lookup table to allow smaller
polynomial and avoid the use of division.
Data is in a separate translation unit with fixed layout to prevent the
compiler generating suboptimal literal access.
Errors are handled inline according to POSIX rules, but this patch
keeps the wrapper with SVID compatible error handling.
Needs libm-test-ulps adjustment for clogf in non-nearest rounding mode.
* math/Makefile (type-float-routines): Add e_logf_data.
* sysdeps/ieee754/flt-32/e_logf.c: New implementation.
* sysdeps/ieee754/flt-32/e_logf_data.c: New file.
* sysdeps/ieee754/flt-32/math_config.h (__logf_data): Define.
(LOGF_TABLE_BITS, LOGF_POLY_ORDER): Define.
* sysdeps/i386/fpu/e_logf_data.c: New file.
* sysdeps/ia64/fpu/e_logf_data.c: New file.
* sysdeps/m68k/m680x0/fpu/e_logf_data.c: New file.
Based on new expf and exp2f code from
https://github.com/ARM-software/optimized-routines/
with wrapper on aarch64:
expf reciprocal-throughput: 2.3x faster
expf latency: 1.7x faster
without wrapper on aarch64:
expf reciprocal-throughput: 3.3x faster
expf latency: 1.7x faster
without wrapper on aarch64:
exp2f reciprocal-throughput: 2.8x faster
exp2f latency: 1.3x faster
libm.so size on aarch64:
.text size: -152 bytes
.rodata size: -1740 bytes
expf/exp2f worst case nearest rounding error: 0.502 ulp
worst case non-nearest rounding error: 1 ulp
Error checks are inline and errno setting is in separate tail called
functions, but the wrappers are kept in this patch to handle the
_LIB_VERSION==_SVID_ case. (So e.g. errno is set twice for expf calls
and once for __expf_finite calls on targets where the new code is used.)
Double precision arithmetics is used which is expected to be faster on
most targets (including soft-float) than using single precision and it
is easier to get good precision result with it.
Const data is kept in a separate translation unit which complicates
maintenance a bit, but is expected to give good code for literal loads
on most targets and allows sharing data across expf, exp2f and powf.
(This data is disabled on i386, m68k and ia64 which have their own
expf, exp2f and powf code.)
Some details may need target specific tweaks:
- best convert and round to int operation in the arg reduction may be
different across targets.
- code was optimized on fma target, optimal polynomial eval may be
different without fma.
- gcc does not always generate good code for fp bit representation
access via unions or it may be inherently slow on some targets.
The libm-test-ulps will need adjustment because..
- The argument reduction ideally uses nearest rounded rint, but that is
not efficient on most targets, so the polynomial can get evaluated on a
wider interval in non-nearest rounding mode making 1 ulp errors common
in that case.
- The polynomial is evaluated such that it may have 1 ulp error on
negative tiny inputs with upward rounding.
* math/Makefile (type-float-routines): Add math_errf and e_exp2f_data.
* sysdeps/aarch64/fpu/math_private.h (TOINT_INTRINSICS): Define.
(roundtoint, converttoint): Likewise.
* sysdeps/ieee754/flt-32/e_expf.c: New implementation.
* sysdeps/ieee754/flt-32/e_exp2f.c: New implementation.
* sysdeps/ieee754/flt-32/e_exp2f_data.c: New file.
* sysdeps/ieee754/flt-32/math_config.h: New file.
* sysdeps/ieee754/flt-32/math_errf.c: New file.
* sysdeps/ieee754/flt-32/t_exp2f.h: Remove.
* sysdeps/i386/fpu/e_exp2f_data.c: New file.
* sysdeps/i386/fpu/math_errf.c: New file.
* sysdeps/ia64/fpu/e_exp2f_data.c: New file.
* sysdeps/ia64/fpu/math_errf.c: New file.
* sysdeps/m68k/m680x0/fpu/e_exp2f_data.c: New file.
* sysdeps/m68k/m680x0/fpu/math_errf.c: New file.
The initial obsoletion of SVID libm error handling left the old
wrappers and __kernel_standard still being used for new ports and
static linking, just with macro definitions of _LIB_VERSION and
matherr that meant symbols with those names were never actually used
and the code for different error handling variants could be optimized
out.
This patch cleans things up further by eliminating the
__kernel_standard use for new ports and static linking. Now, the old
wrappers no longer generate any code in the !LIBM_SVID_COMPAT case,
while the new errno-only wrappers that were added for float128 support
are now also used for float, double and long double in that case.
The changes are generally straightforward. The w_scalb*_compat
wrappers continue to be used (scalb is obsolescent in the sense of not
being supported for float128, but is present in supported standards -
the 2001 edition of POSIX and earlier XSI versions - so remains
supported for static linking and new ports, as do the float and long
double variants that are existing GNU extensions). Those wrappers
would only call __kernel_standard in the _LIB_VERSION == _SVID_ case.
Since we would like to be able to compile most of glibc without
optimization, relying on a static function whose only use is under an
if (0) condition being optimized away to avoid an undefined
__kernel_standard reference may not be a good idea. Thus, the
relevant code in the scalb wrappers has LIBM_SVID_COMPAT conditionals
added to guarantee it's not built at all in the case where
__kernel_standard does not exist.
Just as i386 has its own w_sqrt_compat.c, so w_sqrt.c is also added.
ia64 gets dummy w_*.c to prevent those files being built where they
would conflict with the ia64 libm, as with its existing w_*_compat.c.
Conditions disabling code for !LIBM_SVID_COMPAT are needed in both the
math/ wrappers and in the long double wrappers in ldbl-opt (to avoid
them setting up aliases and symbol versions for undefined symbols). I
hope that future cleanups to how libm function aliases and symbol
versioning are done will eliminate the need for most of the ldbl-opt
wrappers.
Tested for x86_64 and x86, and with build-many-glibcs.py.
* sysdeps/generic/math-type-macros-double.h: Include
<math-svid-compat.h>.
(__USE_WRAPPER_TEMPLATE): Define to !LIBM_SVID_COMPAT.
* sysdeps/generic/math-type-macros-float.h: Include
<math-svid-compat.h>.
(__USE_WRAPPER_TEMPLATE): Define to !LIBM_SVID_COMPAT.
* sysdeps/generic/math-type-macros-ldouble.h: Include
<math-svid-compat.h>.
(__USE_WRAPPER_TEMPLATE): Define to !LIBM_SVID_COMPAT.
* math/lgamma-compat.h (BUILD_LGAMMA): Include LIBM_SVID_COMPAT
condition.
* math/w_acos_compat.c: Condition contents on [LIBM_SVID_COMPAT].
* math/w_acosf_compat.c: Likewise.
* math/w_acosh_compat.c: Likewise.
* math/w_acoshf_compat.c: Likewise.
* math/w_acoshl_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_atan2_compat.c: Likewise.
* math/w_atan2f_compat.c: Likewise.
* math/w_atan2l_compat.c: Likewise.
* math/w_atanh_compat.c: Likewise.
* math/w_atanhf_compat.c: Likewise.
* math/w_atanhl_compat.c: Likewise.
* math/w_cosh_compat.c: Likewise.
* math/w_coshf_compat.c: Likewise.
* math/w_coshl_compat.c: Likewise.
* math/w_exp10_compat.c: Likewise.
* math/w_exp10f_compat.c: Likewise.
* math/w_exp10l_compat.c: Likewise.
* math/w_exp2_compat.c: Likewise.
* math/w_exp2f_compat.c: Likewise.
* math/w_exp2l_compat.c: Likewise.
* math/w_fmod_compat.c: Likewise.
* math/w_fmodf_compat.c: Likewise.
* math/w_fmodl_compat.c: Likewise.
* math/w_hypot_compat.c: Likewise.
* math/w_hypotf_compat.c: Likewise.
* math/w_hypotl_compat.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_jnl_compat.c: Likewise.
* math/w_lgamma_r_compat.c: Likewise.
* math/w_lgammaf_r_compat.c: Likewise.
* math/w_lgammal_r_compat.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* math/w_pow_compat.c: Likewise.
* math/w_powf_compat.c: Likewise.
* math/w_powl_compat.c: Likewise.
* math/w_remainder_compat.c: Likewise.
* math/w_remainderf_compat.c: Likewise.
* math/w_remainderl_compat.c: Likewise.
* math/w_sinh_compat.c: Likewise.
* math/w_sinhf_compat.c: Likewise.
* math/w_sinhl_compat.c: Likewise.
* math/w_sqrt_compat.c: Likewise.
* math/w_sqrtf_compat.c: Likewise.
* math/w_sqrtl_compat.c: Likewise.
* math/w_tgamma_compat.c: Likewise.
* math/w_tgammaf_compat.c: Likewise.
* math/w_tgammal_compat.c: Likewise.
* math/w_scalb_compat.c (sysv_scalb): Condition definition on
[LIBM_SVID_COMPAT].
(__scalb): Condition call to sysv_scalb on [LIBM_SVID_COMPAT].
* math/w_scalbf_compat.c (sysv_scalbf): Condition definition on
[LIBM_SVID_COMPAT].
(__scalbf): Condition call to sysv_scalbf on [LIBM_SVID_COMPAT].
* math/w_scalbl_compat.c (sysv_scalbl): Condition definition on
[LIBM_SVID_COMPAT].
(__scalbl): Condition call to sysv_scalbl on [LIBM_SVID_COMPAT].
* sysdeps/i386/fpu/w_sqrt.c: New file.
* sysdeps/ia64/fpu/w_acos.c: Likewise.
* sysdeps/ia64/fpu/w_acosf.c: Likewise.
* sysdeps/ia64/fpu/w_acosh.c: Likewise.
* sysdeps/ia64/fpu/w_acoshf.c: Likewise.
* sysdeps/ia64/fpu/w_acoshl.c: Likewise.
* sysdeps/ia64/fpu/w_acosl.c: Likewise.
* sysdeps/ia64/fpu/w_asin.c: Likewise.
* sysdeps/ia64/fpu/w_asinf.c: Likewise.
* sysdeps/ia64/fpu/w_asinl.c: Likewise.
* sysdeps/ia64/fpu/w_atan2.c: Likewise.
* sysdeps/ia64/fpu/w_atan2f.c: Likewise.
* sysdeps/ia64/fpu/w_atan2l.c: Likewise.
* sysdeps/ia64/fpu/w_atanh.c: Likewise.
* sysdeps/ia64/fpu/w_atanhf.c: Likewise.
* sysdeps/ia64/fpu/w_atanhl.c: Likewise.
* sysdeps/ia64/fpu/w_cosh.c: Likewise.
* sysdeps/ia64/fpu/w_coshf.c: Likewise.
* sysdeps/ia64/fpu/w_coshl.c: Likewise.
* sysdeps/ia64/fpu/w_exp.c: Likewise.
* sysdeps/ia64/fpu/w_exp10.c: Likewise.
* sysdeps/ia64/fpu/w_exp10f.c: Likewise.
* sysdeps/ia64/fpu/w_exp10l.c: Likewise.
* sysdeps/ia64/fpu/w_exp2.c: Likewise.
* sysdeps/ia64/fpu/w_exp2f.c: Likewise.
* sysdeps/ia64/fpu/w_exp2l.c: Likewise.
* sysdeps/ia64/fpu/w_expf.c: Likewise.
* sysdeps/ia64/fpu/w_expl.c: Likewise.
* sysdeps/ia64/fpu/w_fmod.c: Likewise.
* sysdeps/ia64/fpu/w_fmodf.c: Likewise.
* sysdeps/ia64/fpu/w_fmodl.c: Likewise.
* sysdeps/ia64/fpu/w_hypot.c: Likewise.
* sysdeps/ia64/fpu/w_hypotf.c: Likewise.
* sysdeps/ia64/fpu/w_hypotl.c: Likewise.
* sysdeps/ia64/fpu/w_lgamma_r.c: Likewise.
* sysdeps/ia64/fpu/w_lgammaf_r.c: Likewise.
* sysdeps/ia64/fpu/w_lgammal_r.c: Likewise.
* sysdeps/ia64/fpu/w_log.c: Likewise.
* sysdeps/ia64/fpu/w_log10.c: Likewise.
* sysdeps/ia64/fpu/w_log10f.c: Likewise.
* sysdeps/ia64/fpu/w_log10l.c: Likewise.
* sysdeps/ia64/fpu/w_log2.c: Likewise.
* sysdeps/ia64/fpu/w_log2f.c: Likewise.
* sysdeps/ia64/fpu/w_log2l.c: Likewise.
* sysdeps/ia64/fpu/w_logf.c: Likewise.
* sysdeps/ia64/fpu/w_logl.c: Likewise.
* sysdeps/ia64/fpu/w_pow.c: Likewise.
* sysdeps/ia64/fpu/w_powf.c: Likewise.
* sysdeps/ia64/fpu/w_powl.c: Likewise.
* sysdeps/ia64/fpu/w_remainder.c: Likewise.
* sysdeps/ia64/fpu/w_remainderf.c: Likewise.
* sysdeps/ia64/fpu/w_remainderl.c: Likewise.
* sysdeps/ia64/fpu/w_sinh.c: Likewise.
* sysdeps/ia64/fpu/w_sinhf.c: Likewise.
* sysdeps/ia64/fpu/w_sinhl.c: Likewise.
* sysdeps/ia64/fpu/w_sqrt.c: Likewise.
* sysdeps/ia64/fpu/w_sqrtf.c: Likewise.
* sysdeps/ia64/fpu/w_sqrtl.c: Likewise.
* sysdeps/ia64/fpu/w_tgamma.c: Likewise.
* sysdeps/ia64/fpu/w_tgammaf.c: Likewise.
* sysdeps/ia64/fpu/w_tgammal.c: Likewise.
* sysdeps/ieee754/dbl-64/w_exp_compat.c: Condition contents on
[LIBM_SVID_COMPAT].
* sysdeps/ieee754/flt-32/w_expf_compat.c: Likewise.
* sysdeps/ieee754/k_standard.c: Likewise.
* sysdeps/ieee754/k_standardf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/w_expl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/w_expl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-96/w_expl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/w_expl_compat.c: Condition
long_double_symbol call on [LIBM_SVID_COMPAT].
* sysdeps/ieee754/ldbl-opt/w_acoshl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_acosl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_asinl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_atan2l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_atanhl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_coshl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_fmodl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_hypotl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_j0l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_j1l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_jnl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_lgammal_r_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_log10l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_log2l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_logl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_powl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_remainderl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_sinhl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_sqrtl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_tgammal_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c: Condition
long_double_symbol and compat_symbol calls on [LIBM_SVID_COMPAT].
This patch obsoletes the pow10, pow10f and pow10l functions (makes
them into compat symbols, not available for new ports or static
linking). The exp10 names for these functions are standardized (in TS
18661-4) and were added in the same glibc version (2.1) as pow10 so
source code can change to use them without any loss of portability.
Since pow10 is deliberately not provided for _Float128, only exp10,
this slightly simplifies moving to the new wrapper templates in the
!LIBM_SVID_COMPAT case, by avoiding needing to arrange for pow10,
pow10f and pow10l to be defined by those templates.
Tested for x86_64, and with build-many-glibcs.py.
* manual/math.texi (pow10): Do not document.
(pow10f): Likewise.
(pow10l): Likewise.
* math/bits/mathcalls.h [__USE_GNU] (pow10): Do not declare.
* math/bits/math-finite.h [__USE_GNU] (pow10): Likewise.
* math/libm-test-exp10.inc (pow10_test): Remove.
(do_test): Do not call pow10.
* math/w_exp10_compat.c (pow10): Make into compat symbol.
[NO_LONG_DOUBLE] (pow10l): Likewise.
* math/w_exp10f_compat.c (pow10f): Likewise.
* math/w_exp10l_compat.c (pow10l): Likewise.
* sysdeps/ia64/fpu/e_exp10.S: Include <shlib-compat.h>.
(pow10): Make into compat symbol.
* sysdeps/ia64/fpu/e_exp10f.S: Include <shlib-compat.h>.
(pow10f): Make into compat symbol.
* sysdeps/ia64/fpu/e_exp10l.S: Include <shlib-compat.h>.
(pow10l): Make into compat symbol.
* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Remove
pow10.
(CFLAGS-nldbl-pow10.c): Remove variable..
* sysdeps/ieee754/ldbl-opt/nldbl-pow10.c: Remove file.
* sysdeps/ieee754/ldbl-opt/w_exp10_compat.c (pow10l): Condition on
[SHLIB_COMPAT (libm, GLIBC_2_1, GLIBC_2_27)].
* sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c (compat_symbol):
Undefine and redefine.
(pow10l): Make into compat symbol.
* sysdeps/aarch64/libm-test-ulps: Remove pow10 ulps.
* sysdeps/alpha/fpu/libm-test-ulps: Likewise.
* sysdeps/arm/libm-test-ulps: Likewise.
* sysdeps/hppa/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/microblaze/libm-test-ulps: Likewise.
* sysdeps/mips/mips32/libm-test-ulps: Likewise.
* sysdeps/mips/mips64/libm-test-ulps: Likewise.
* sysdeps/nios2/libm-test-ulps: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Likewise.
* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
* sysdeps/s390/fpu/libm-test-ulps: Likewise.
* sysdeps/sh/libm-test-ulps: Likewise.
* sysdeps/sparc/fpu/libm-test-ulps: Likewise.
* sysdeps/tile/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch enables float128 support for x86_64 and x86. All GCC
versions that can build glibc provide the required support, but since
GCC 6 and before don't provide __builtin_nanq / __builtin_nansq, sNaN
tests and some tests of NaN payloads need to be disabled with such
compilers (this does not affect the generated glibc binaries at all,
just the tests). bits/floatn.h declares float128 support to be
available for GCC versions that provide the required libgcc support
(4.3 for x86_64, 4.4 for i386 GNU/Linux, 4.5 for i386 GNU/Hurd);
compilation-only support was present some time before then, but not
really useful without the libgcc functions.
fenv_private.h needed updating to avoid trying to put _Float128 values
in registers. I make no assertion of optimality of the
math_opt_barrier / math_force_eval definitions for this case; they are
simply intended to be sufficient to work correctly.
Tested for x86_64 and x86, with GCC 7 and GCC 6. (Testing for x32 was
compilation tests only with build-many-glibcs.py to verify the ABI
baseline updates. I have not done any testing for Hurd, although the
float128 support is enabled there as for GNU/Linux.)
* sysdeps/i386/Implies: Add ieee754/float128.
* sysdeps/x86_64/Implies: Likewise.
* sysdeps/x86/bits/floatn.h: New file.
* sysdeps/x86/float128-abi.h: Likewise.
* manual/math.texi (Mathematics): Document support for _Float128
on x86_64 and x86.
* sysdeps/i386/fpu/fenv_private.h: Include <bits/floatn.h>.
(math_opt_barrier): Do not put _Float128 values in floating-point
registers.
(math_force_eval): Likewise.
[__x86_64__] (SET_RESTORE_ROUNDF128): New macro.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (CPPFLAGS): Append
to Makefile variable.
* sysdeps/x86/fpu/e_sqrtf128.c: New file.
* sysdeps/x86/fpu/sfp-machine.h: Likewise. Based on libgcc.
* sysdeps/x86/math-tests.h: New file.
* math/libm-test-support.h (XFAIL_FLOAT128_PAYLOAD): New macro.
* math/libm-test-getpayload.inc (getpayload_test_data): Use
XFAIL_FLOAT128_PAYLOAD.
* math/libm-test-setpayload.inc (setpayload_test_data): Likewise.
* math/libm-test-totalorder.inc (totalorder_test_data): Likewise.
* math/libm-test-totalordermag.inc (totalordermag_test_data):
Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Update.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Testing with GCC 7 for 32-bit x86 showed some ulps differences,
presumably from variation in when values with excess precision get
spilled to the stack and so lose that precision. This patch updates
the libm-test-ulps files accordingly.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
This patch moves tests of catan and catanh with finite inputs (other
than the divide-by-zero cases producing an exact infinity) to using
the auto-libm-test machinery. Each of auto-libm-test-out-catan and
auto-libm-test-out-catanh takes about three seconds to generate on my
system (so in fact it wasn't necessary after all to defer the move to
auto-libm-test-* until the output files were split up by function).
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add tests of catan and catanh.
* math/auto-libm-test-out-catan: New generated file.
* math/auto-libm-test-out-catanh: Likewise.
* math/libm-test-catan.inc (catan_test_data): Use AUTO_TESTS_c_c.
Move tests with finite inputs, except divide-by-zero cases, to
auto-libm-test-in.
* math/libm-test-catanh.inc (catanh_test_data): Likewise.
* math/Makefile (libm-test-funcs-auto): Add catan and catanh.
(libm-test-funcs-noauto): Remove catan and catanh.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch moves tests of casin and casinh with finite inputs to using
the auto-libm-test machinery. Each of auto-libm-test-out-casin and
auto-libm-test-out-casinh takes about 38 minutes to generate on my
system because of MPC slowness on special cases that appear in the
tests (with MPC 1.0.3; I don't know to what extent current MPC master
might speed it up).
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add tests of casin and casinh.
* math/auto-libm-test-out-casin: New generated file.
* math/auto-libm-test-out-casinh: Likewise.
* math/libm-test-casin.inc (casin_test_data): Use AUTO_TESTS_c_c.
Move tests with finite inputs to auto-libm-test-in.
* math/libm-test-casinh.inc (casinh_test_data): Likewise.
* math/Makefile (libm-test-funcs-auto): Add casin and casinh.
(libm-test-funcs-noauto): Remove casin and casinh.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch moves tests of cacos and cacosh with finite inputs to using
the auto-libm-test machinery. Each of auto-libm-test-out-cacos and
auto-libm-test-out-cacosh takes about 80 minutes to generate on my
system because of MPC slowness on special cases that appear in the
tests (with MPC 1.0.3; I don't know to what extent current MPC master
might speed it up).
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add tests of cacos and cacosh.
* math/auto-libm-test-out-cacos: New generated file.
* math/auto-libm-test-out-cacosh: Likewise.
* math/libm-test-cacos.inc (cacos_test_data): Use AUTO_TESTS_c_c.
Move tests with finite inputs to auto-libm-test-in.
* math/libm-test-cacosh.inc (cacosh_test_data): Likewise.
* math/Makefile (libm-test-funcs-auto): Add cacos and cacosh.
(libm-test-funcs-noauto): Remove cacos and cacosh.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This commit moves one step towards the deprecation of wrappers that
use _LIB_VERSION / matherr / __kernel_standard functionality, by
adding the suffix '_compat' to their filenames and adjusting Makefiles
and #includes accordingly.
New template wrappers that do not use such functionality will be added
by future patches and will be first used by the float128 wrappers.
When testing changes to i386 libm functions (that are shadowed for
i686 builds by i686 versions) recently, I saw that the plain i386
libm-test-ulps (as opposed to the i686 multiarch version) needed
updating for tests that had been added since it was last updated.
This patch updates it accordingly.
* sysdeps/i386/fpu/libm-test-ulps: Update.
Various fmax and fmin function implementations mishandle sNaN
arguments:
(a) When both arguments are NaNs, the return value should be a qNaN,
but sometimes it is an sNaN if at least one argument is an sNaN.
(b) Under TS 18661-1 semantics, if either argument is an sNaN then the
result should be a qNaN (whereas if one argument is a qNaN and the
other is not a NaN, the result should be the non-NaN argument).
Various implementations treat sNaNs like qNaNs here.
This patch fixes the x86 and x86_64 versions (ignoring float and
double for 32-bit x86 given the inability to reliably avoid the sNaN
turning into a qNaN before it gets to the called function). Tests of
sNaN inputs to these functions are added.
Note on architecture versions I haven't changed for this issue:
AArch64 already gets this right (it uses a hardware instruction with
the correct semantics for both quiet and signaling NaNs) and does not
need changes. It's possible Alpha, IA64, SPARC might need changes
(this would be shown by the testsuite if so).
Tested for x86_64 and x86 (both i686 and i586 builds, to cover the
different x86 implementations).
[BZ #20947]
* sysdeps/i386/fpu/s_fmaxl.S (__fmaxl): Add the arguments when
either is a signaling NaN.
* sysdeps/i386/fpu/s_fminl.S (__fminl): Likewise. Make code
follow fmaxl more closely.
* sysdeps/i386/i686/fpu/s_fmaxl.S (__fmaxl): Add the arguments
when either is a signaling NaN.
* sysdeps/i386/i686/fpu/s_fminl.S (__fminl): Likewise.
* sysdeps/x86_64/fpu/s_fmax.S (__fmax): Likewise.
* sysdeps/x86_64/fpu/s_fmaxf.S (__fmaxf): Likewise.
* sysdeps/x86_64/fpu/s_fmaxl.S (__fmaxl): Likewise.
* sysdeps/x86_64/fpu/s_fmin.S (__fmin): Likewise.
* sysdeps/x86_64/fpu/s_fminf.S (__fminf): Likewise.
* sysdeps/x86_64/fpu/s_fminl.S (__fminl): Likewise.
* math/libm-test.inc (fmax_test_data): Add tests of sNaN inputs.
(fmin_test_data): Likewise.
The x86_64/x86 powl implementations mishandle sNaN arguments, both by
returning sNaN in some cases (instead of doing arithmetic on the
arguments to produce the result when NaN arguments result in NaN
results) and by treating sNaN the same as qNaN for arguments (1, sNaN)
and (sNaN, 0), contrary to TS 18661-1 which requires those cases to
return qNaN instead of 1.
This patch makes the x86_64/x86 powl implementations follow TS 18661-1
semantics for sNaN arguments; sNaN tests are also added for pow.
Given the problems with testing float and double sNaN arguments on
32-bit x86 (sNaN tests disabled because the compiler may convert
unnecessarily to a qNaN when passing arguments), no changes are made
to the powf and pow implementations there.
Tested for x86_64 and x86.
[BZ #20916]
* sysdeps/i386/fpu/e_powl.S (__ieee754_powl): Do not return 1 for
arguments (sNaN, 0) or (1, sNaN). Do arithmetic on NaN arguments
to compute result.
* sysdeps/x86_64/fpu/e_powl.S (__ieee754_powl): Likewise.
* math/libm-test.inc (pow_test_data): Add tests of sNaN arguments.
manual/libm-err-tab.pl hardcodes a list of names for particular
platforms (mapping from sysdeps directory name to friendly name for
the manual). This goes against the principle of keeping information
about individual platforms in their corresponding sysdeps directory,
and the list is also very out-of-date regarding supported platforms
and their corresponding sysdeps directories.
This patch fixes this by adding a libm-test-ulps-name file alongside
each libm-test-ulps file. The script then gets the friendly name from
that file, which is required to exist, so it no longer needs to allow
for the mapping being missing.
Tested for x86_64.
[BZ #14139]
* manual/libm-err-tab.pl (%pplatforms): Initialize to empty.
(find_files): Obtain platform name from libm-test-ulps-name and
store in %pplatforms.
(canonicalize_platform): Remove.
(print_platforms): Use $pplatforms directly.
(by_platforms): Do not allow for platforms missing from
%pplatforms.
* sysdeps/aarch64/libm-test-ulps-name: New file.
* sysdeps/alpha/fpu/libm-test-ulps-name: Likewise.
* sysdeps/arm/libm-test-ulps-name: Likewise.
* sysdeps/generic/libm-test-ulps-name: Likewise.
* sysdeps/hppa/fpu/libm-test-ulps-name: Likewise.
* sysdeps/i386/fpu/libm-test-ulps-name: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps-name: Likewise.
* sysdeps/ia64/fpu/libm-test-ulps-name: Likewise.
* sysdeps/m68k/coldfire/fpu/libm-test-ulps-name: Likewise.
* sysdeps/m68k/m680x0/fpu/libm-test-ulps-name: Likewise.
* sysdeps/microblaze/libm-test-ulps-name: Likewise.
* sysdeps/mips/mips32/libm-test-ulps-name: Likewise.
* sysdeps/mips/mips64/libm-test-ulps-name: Likewise.
* sysdeps/nios2/libm-test-ulps-name: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps-name: Likewise.
* sysdeps/powerpc/nofpu/libm-test-ulps-name: Likewise.
* sysdeps/s390/fpu/libm-test-ulps-name: Likewise.
* sysdeps/sh/libm-test-ulps-name: Likewise.
* sysdeps/sparc/fpu/libm-test-ulps-name: Likewise.
* sysdeps/tile/libm-test-ulps-name: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps-name: Likewise.
TS 18661-1 defines a type femode_t to represent the set of dynamic
floating-point control modes (such as the rounding mode and trap
enablement modes), and functions fegetmode and fesetmode to manipulate
those modes (without affecting other state such as the raised
exception flags) and a corresponding macro FE_DFL_MODE.
This patch series implements those interfaces for glibc. This first
patch adds the architecture-independent pieces, the x86 and x86_64
implementations, and the <bits/fenv.h> and ABI baseline updates for
all architectures so glibc keeps building and passing the ABI tests on
all architectures. Subsequent patches add the fegetmode and fesetmode
implementations for other architectures.
femode_t is generally an integer type - the same type as fenv_t, or as
the single element of fenv_t where fenv_t is a structure containing a
single integer (or the single relevant element, where it has elements
for both status and control registers) - except where architecture
properties or consistency with the fenv_t implementation indicate
otherwise. FE_DFL_MODE follows FE_DFL_ENV in whether it's a magic
pointer value (-1 cast to const femode_t *), a value that can be
distinguished from valid pointers by its high bits but otherwise
contains a representation of the desired register contents, or a
pointer to a constant variable (the powerpc case; __fe_dfl_mode is
added as an exported constant object, an alias to __fe_dfl_env).
Note that where architectures (that share a register between control
and status bits) gain definitions of new floating-point control or
status bits in future, the implementations of fesetmode for those
architectures may need updating (depending on whether the new bits are
control or status bits and what the implementation does with
previously unknown bits), just like existing implementations of
<fenv.h> functions that take care not to touch reserved bits may need
updating when the set of reserved bits changes. (As any new bits are
outside the scope of ISO C, that's just a quality-of-implementation
issue for supporting them, not a conformance issue.)
As with fenv_t, femode_t should properly include any software DFP
rounding mode (and for both fenv_t and femode_t I'd consider that
fragment of DFP support appropriate for inclusion in glibc even in the
absence of the rest of libdfp; hardware DFP rounding modes should
already be included if the definitions of which bits are status /
control bits are correct).
Tested for x86_64, x86, mips64 (hard float, and soft float to test the
fallback version), arm (hard float) and powerpc (hard float, soft
float and e500). Other architecture versions are untested.
* math/fegetmode.c: New file.
* math/fesetmode.c: Likewise.
* sysdeps/i386/fpu/fegetmode.c: Likewise.
* sysdeps/i386/fpu/fesetmode.c: Likewise.
* sysdeps/x86_64/fpu/fegetmode.c: Likewise.
* sysdeps/x86_64/fpu/fesetmode.c: Likewise.
* math/fenv.h: Update comment on inclusion of <bits/fenv.h>.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (fegetmode): New function
declaration.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (fesetmode): Likewise.
* bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New
typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/aarch64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/alpha/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/arm/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/hppa/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/ia64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/m68k/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/microblaze/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/mips/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/nios2/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/powerpc/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (__fe_dfl_mode): New variable
declaration.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/s390/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/sh/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/sparc/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/tile/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/x86/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* manual/arith.texi (FE_DFL_MODE): Document macro.
(fegetmode): Document function.
(fesetmode): Likewise.
* math/Versions (fegetmode): New libm symbol at version
GLIBC_2.25.
(fesetmode): Likewise.
* math/Makefile (libm-support): Add fegetmode and fesetmode.
(tests): Add test-femode and test-femode-traps.
* math/test-femode-traps.c: New file.
* math/test-femode.c: Likewise.
* sysdeps/powerpc/fpu/fenv_const.c (__fe_dfl_mode): Declare as
alias for __fe_dfl_env.
* sysdeps/powerpc/nofpu/fenv_const.c (__fe_dfl_mode): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/fenv_const.c
(__fe_dfl_mode): Likewise.
* sysdeps/powerpc/Versions (__fe_dfl_mode): New libm symbol at
version GLIBC_2.25.
* sysdeps/nacl/libm.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
This is only used for the float and double variants.
Instead, just add it to the type specific list of files,
and remove all stubs, and remove the declaration from
math_private.h.
I verified x86_64, i486, ia64, m68k, and ppc64 build.
TS 18661-1 defines an fesetexcept function for setting floating-point
exception flags without the side-effect of causing enabled traps to be
taken.
This patch series implements this function for glibc. The present
patch adds the fallback stub implementation, x86 and x86_64
implementations, documentation, tests and ABI baseline updates. The
remaining patches, some of them untested, add implementations for
other architectures. The implementations generally follow those of
the fesetexceptflag function.
As for fesetexceptflag, the approach taken for architectures where
setting flags causes enabled traps to be taken is to set the flags
(and potentially cause traps) rather than refusing to set the flags
and returning an error. Since ISO C and TS 18661 provide no way to
enable traps, this is formally in accordance with the standards.
The NEWS entry should be considered a placeholder, since this patch
series is intended to be followed by further such series adding other
TS 18661-1 features, so that the NEWS entry would end up looking more
like
* New <fenv.h> features from TS 18661-1:2014 are added to libm: the
fesetexcept, fetestexceptflag, fegetmode and fesetmode functions,
the femode_t type and the FE_DFL_MODE macro.
with hopefully more such entries for other features, rather than
having an entry for a single function in the end.
I believe we have consensus for adding TS 18661-1 interfaces as per
<https://sourceware.org/ml/libc-alpha/2016-06/msg00421.html>.
Tested for x86_64, x86, mips64 (hard float, and soft float to test the
fallback version), arm (hard float) and powerpc (hard float, soft
float and e500).
* math/fesetexcept.c: New file.
* sysdeps/i386/fpu/fesetexcept.c: Likewise.
* sysdeps/x86_64/fpu/fesetexcept.c: Likewise.
* math/fenv.h: Define
__GLIBC_INTERNAL_STARTING_HEADER_IMPLEMENTATION and include
<bits/libc-header-start.h> instead of including <features.h>.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (fesetexcept): New function
declaration.
* manual/arith.texi (fesetexcept): Document function.
* math/Versions (fesetexcept): New libm symbol at version
GLIBC_2.25.
* math/Makefile (libm-support): Add fesetexcept.
(tests): Add test-fesetexcept and test-fesetexcept-traps.
* math/test-fesetexcept.c: New file.
* math/test-fesetexcept-traps.c: Likewise.
* sysdeps/nacl/libm.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
As discussed in
<https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS
18661-1 disallows ceil, floor, round and trunc functions from raising
the "inexact" exception, in accordance with general IEEE 754 semantics
for when that exception is raised. Fixing this for x87 floating point
is more complicated than for the other versions of these functions,
because they use the frndint instruction that raises "inexact" and
this can only be avoided by saving and restoring the whole
floating-point environment.
As I noted in
<https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have
now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7,
such that GCC will inline these functions on x86, without caring about
"inexact", when the default -ffp-int-builtin-inexact is in effect.
This allows users to get optimized code depending on the options they
pass to the compiler, while making the out-of-line functions follow TS
18661-1 semantics and avoid "inexact".
This patch duly fixes the out-of-line trunc function implementations
to avoid "inexact", in the same way as the nearbyint implementations.
I do not know how the performance of implementations such as these
based on saving the environment and changing the rounding mode
temporarily compares to that of the C versions or SSE 4.1 versions (of
course, for 32-bit x86 SSE implementations still need to get the
return value in an x87 register); it's entirely possible other
implementations could be faster in some cases.
Tested for x86_64 and x86.
[BZ #15479]
* sysdeps/i386/fpu/s_trunc.S (__trunc): Save and restore
floating-point environment rather than just control word.
* sysdeps/i386/fpu/s_truncf.S (__truncf): Likewise.
* sysdeps/i386/fpu/s_truncl.S (__truncl): Save and restore
floating-point environment, with "invalid" exceptions merged in,
rather than just control word.
* sysdeps/x86_64/fpu/s_truncl.S (__truncl): Likewise.
* math/libm-test.inc (trunc_test_data): Do not allow spurious
"inexact" exceptions.
As discussed in
<https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS
18661-1 disallows ceil, floor, round and trunc functions from raising
the "inexact" exception, in accordance with general IEEE 754 semantics
for when that exception is raised. Fixing this for x87 floating point
is more complicated than for the other versions of these functions,
because they use the frndint instruction that raises "inexact" and
this can only be avoided by saving and restoring the whole
floating-point environment.
As I noted in
<https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have
now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7,
such that GCC will inline these functions on x86, without caring about
"inexact", when the default -ffp-int-builtin-inexact is in effect.
This allows users to get optimized code depending on the options they
pass to the compiler, while making the out-of-line functions follow TS
18661-1 semantics and avoid "inexact".
This patch duly fixes the out-of-line floor function implementations
to avoid "inexact", in the same way as the nearbyint implementations.
I do not know how the performance of implementations such as these
based on saving the environment and changing the rounding mode
temporarily compares to that of the C versions or SSE 4.1 versions (of
course, for 32-bit x86 SSE implementations still need to get the
return value in an x87 register); it's entirely possible other
implementations could be faster in some cases.
Tested for x86_64 and x86.
[BZ #15479]
* sysdeps/i386/fpu/s_floor.S (__floor): Save and restore
floating-point environment rather than just control word.
* sysdeps/i386/fpu/s_floorf.S (__floorf): Likewise.
* sysdeps/i386/fpu/s_floorl.S (__floorl): Save and restore
floating-point environment, with "invalid" exceptions merged in,
rather than just control word.
* sysdeps/x86_64/fpu/s_floorl.S (__floorl): Likewise.
* math/libm-test.inc (floor_test_data): Do not allow spurious
"inexact" exceptions.
As discussed in
<https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS
18661-1 disallows ceil, floor, round and trunc functions from raising
the "inexact" exception, in accordance with general IEEE 754 semantics
for when that exception is raised. Fixing this for x87 floating point
is more complicated than for the other versions of these functions,
because they use the frndint instruction that raises "inexact" and
this can only be avoided by saving and restoring the whole
floating-point environment.
As I noted in
<https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have
now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7,
such that GCC will inline these functions on x86, without caring about
"inexact", when the default -ffp-int-builtin-inexact is in effect.
This allows users to get optimized code depending on the options they
pass to the compiler, while making the out-of-line functions follow TS
18661-1 semantics and avoid "inexact".
This patch duly fixes the out-of-line ceil function implementations to
avoid "inexact", in the same way as the nearbyint implementations.
I do not know how the performance of implementations such as these
based on saving the environment and changing the rounding mode
temporarily compares to that of the C versions or SSE 4.1 versions (of
course, for 32-bit x86 SSE implementations still need to get the
return value in an x87 register); it's entirely possible other
implementations could be faster in some cases.
Tested for x86_64 and x86.
[BZ #15479]
* sysdeps/i386/fpu/s_ceil.S (__ceil): Save and restore
floating-point environment rather than just control word.
* sysdeps/i386/fpu/s_ceilf.S (__ceilf): Likewise.
* sysdeps/i386/fpu/s_ceill.S (__ceill): Save and restore
floating-point environment, with "invalid" exceptions merged in,
rather than just control word.
* sysdeps/x86_64/fpu/s_ceill.S (__ceill): Likewise.
* math/libm-test.inc (ceil_test_data): Do not allow spurious
"inexact" exceptions.
The x86_64 and i386 versions of scalbl return sNaN for some cases of
sNaN input and are missing "invalid" exceptions for other cases. This
results from overly complicated code that either returns a NaN input,
or discards both inputs when one is NaN and loads a NaN from memory.
This patch fixes this by simplifying the code to add the arguments
when either one is NaN.
Tested for x86_64 and x86.
[BZ #20296]
* sysdeps/i386/fpu/e_scalbl.S (__ieee754_scalbl): Add arguments
when either argument is a NaN.
* sysdeps/x86_64/fpu/e_scalbl.S (__ieee754_scalbl): Likewise.
* math/libm-test.inc (scalb_test_data): Add sNaN tests.
The i386 implementations of nearbyint functions, and x86_64
nearbyintl, contain code to mask the "inexact" exception. However,
the fnstenv instruction has the effect of masking all exceptions, so
this masking code has been redundant since fnstenv was added to those
implementations (by commit 846d9a4a3acdb4939ca7bf6aed48f9f6f26911be;
commit 71d1b0166b added the test
math/test-nearbyint-except-2.c that verifies these functions do work
when called with "inexact" traps enabled); this patch removes the
redundant code.
Tested for x86_64 and x86.
* sysdeps/i386/fpu/s_nearbyint.S (__nearbyint): Do not mask
"inexact" exceptions after fnstenv.
* sysdeps/i386/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sysdeps/i386/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
* sysdeps/x86_64/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
fdim suffers from double rounding on i386 because subtracting two
double values can produce an inexact long double value exactly half
way between two double values. This patch fixes this by creating an
i386-specific version of fdim - C, based on the generic version,
unlike the previous .S version - which sets the x87 precision control
to double precision for the subtraction and then restores it
afterwards. As noted in the comment added, there are no issues of
double rounding for subnormals (a case that setting precision control
does not address) because subtraction cannot produce an inexact result
in the subnormal range.
Tested for x86_64 and x86.
[BZ #20255]
* sysdeps/i386/fpu/s_fdim.c: New file. Based on math/s_fdim.c.
* math/libm-test.inc (fdim_test_data): Add another test.
Some architectures have their own versions of fdim functions, which
are missing errno setting (bug 6796) and may also return sNaN instead
of qNaN for sNaN input, in the case of the x86 / x86_64 long double
versions (bug 20256).
These versions are not actually doing anything that a compiler
couldn't generate, just straightforward comparisons / arithmetic (and,
in the x86 / x86_64 case, testing for NaNs with fxam, which isn't
actually needed once you use an unordered comparison and let the NaNs
pass through the same subtraction as non-NaN inputs). This patch
removes the x86 / x86_64 / powerpc versions, so that those
architectures use the generic C versions, which correctly handle
setting errno and deal properly with sNaN inputs. This seems better
than dealing with setting errno in lots of .S versions.
The i386 versions also return results with excess range and precision,
which is not appropriate for a function exactly defined by reference
to IEEE operations. For errno setting to work correctly on overflow,
it's necessary to remove excess range with math_narrow_eval, which
this patch duly does in the float and double versions so that the
tests can reliably pass on x86. For float, this avoids any double
rounding issues as the long double precision is more than twice that
of float. For double, double rounding issues will need to be
addressed separately, so this patch does not fully fix bug 20255.
Tested for x86_64, x86 and powerpc.
[BZ #6796]
[BZ #20255]
[BZ #20256]
* math/s_fdim.c: Include <math_private.h>.
(__fdim): Use math_narrow_eval on result.
* math/s_fdimf.c: Include <math_private.h>.
(__fdimf): Use math_narrow_eval on result.
* sysdeps/i386/fpu/s_fdim.S: Remove file.
* sysdeps/i386/fpu/s_fdimf.S: Likewise.
* sysdeps/i386/fpu/s_fdiml.S: Likewise.
* sysdeps/i386/i686/fpu/s_fdim.S: Likewise.
* sysdeps/i386/i686/fpu/s_fdimf.S: Likewise.
* sysdeps/i386/i686/fpu/s_fdiml.S: Likewise.
* sysdeps/powerpc/fpu/s_fdim.c: Likewise.
* sysdeps/powerpc/fpu/s_fdimf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_fdim.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_fdim.c: Likewise.
* sysdeps/x86_64/fpu/s_fdiml.S: Likewise.
* math/libm-test.inc (fdim_test_data): Expect errno setting on
overflow. Add sNaN tests.
Various implementations of frexp functions return sNaN for sNaN
input. This patch fixes them to add such arguments to themselves so
that qNaN is returned.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #20250]
* sysdeps/i386/fpu/s_frexpl.S (__frexpl): Add non-finite input to
itself.
* sysdeps/ieee754/dbl-64/s_frexp.c (__frexp): Add non-finite or
zero input to itself.
* sysdeps/ieee754/dbl-64/wordsize-64/s_frexp.c (__frexp):
Likewise.
* sysdeps/ieee754/flt-32/s_frexpf.c (__frexpf): Likewise.
* sysdeps/ieee754/ldbl-128/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-96/s_frexpl.c (__frexpl): Likewise.
* math/libm-test.inc (frexp_test_data): Add sNaN tests.
The i386/x86_64 versions of log2l return sNaN for sNaN input. This
patch fixes them to add NaN inputs to themselves so that qNaN is
returned in this case.
Tested for x86_64 and x86.
[BZ #20235]
* sysdeps/i386/fpu/e_log2l.S (__ieee754_log2l): Add NaN input to
itself.
* sysdeps/x86_64/fpu/e_log2l.S (__ieee754_log2l): Likewise.
* math/libm-test.inc (log2_test_data): Add sNaN tests.
The i386/x86_64 versions of log1pl return sNaN for sNaN input. This
patch fixes them to add a NaN input to itself so that qNaN is returned
in this case.
Tested for x86_64 and x86.
[BZ #20229]
* sysdeps/i386/fpu/s_log1pl.S (__log1pl): Add NaN input to itself.
* sysdeps/x86_64/fpu/s_log1pl.S (__log1pl): Likewise.
* math/libm-test.inc (log1p_test_data): Add sNaN tests.
The i386/x86_64 versions of log10l return sNaN for sNaN input. This
patch fixes them to add a NaN input to itself so that qNaN is returned
in this case.
Tested for x86_64 and x86.
[BZ #20228]
* sysdeps/i386/fpu/e_log10l.S (__ieee754_log10l): Add NaN input to
itself.
* sysdeps/x86_64/fpu/e_log10l.S (__ieee754_log10l): Likewise.
* math/libm-test.inc (log10_test_data): Add sNaN tests.
The i386/x86_64 versions of logl return sNaN for sNaN input. This
patch fixes them to add a NaN input to itself so that qNaN is returned
in this case.
Tested for x86_64 and x86 (including a build for i586 to cover the
non-i686 logl version).
[BZ #20227]
* sysdeps/i386/fpu/e_logl.S (__ieee754_logl): Add NaN input to
itself.
* sysdeps/i386/i686/fpu/e_logl.S (__ieee754_logl): Likewise.
* sysdeps/x86_64/fpu/e_logl.S (__ieee754_logl): Likewise.
* math/libm-test.inc (log_test_data): Add sNaN tests.
The i386 and x86_64 implementations of expl, exp10l and expm1l (code
shared between the functions) return sNaN for sNaN input. This patch
fixes them to add NaN inputs to themselves so that qNaN is returned in
this case.
Tested for x86_64 and x86.
[BZ #20226]
* sysdeps/i386/fpu/e_expl.S (IEEE754_EXPL): Add NaN argument to
itself.
* sysdeps/x86_64/fpu/e_expl.S (IEEE754_EXPL): Likewise.
* math/libm-test.inc (exp_test_data): Add sNaN tests.
(exp10_test_data): Likewise.
(expm1_test_data): Likewise.
The i386 version of cbrtl returns sNaN (without raising any
exceptions) for sNaN input. This patch fixes it to add non-finite
arguments to themselves (the code path in question is also reached for
zero arguments, for which adding them to themselves is also harmless),
so that "invalid" is raised and qNaN returned.
Tested for x86_64 and x86.
[BZ #20224]
* sysdeps/i386/fpu/s_cbrtl.S (__cbrtl): Add non-finite or zero
argument to itself.
* math/libm-test.inc (cbrt_test_data): Add sNaN tests.
The i386 version of atanhl returns sNaN for sNaN input. This patch
fixes it to add NaN arguments to themselves so it returns qNaN in this
case.
Tested for x86_64 and x86.
[BZ #20219]
* sysdeps/i386/fpu/e_atanhl.S (__ieee754_atanhl): Add NaN argument
to itself.
* math/libm-test.inc (atanh_test_data): Add sNaN tests.
The i386 version of asinhl returns sNaN (without raising any
exceptions) for sNaN input. This patch fixes it to add non-finite
arguments to themselves, so that "invalid" is raised and qNaN
returned.
Tested for x86_64 and x86.
[BZ #20218]
* sysdeps/i386/fpu/s_asinhl.S (__asinhl): Add non-finite argument
to itself.
* math/libm-test.inc (asinh_test_data): Add sNaN tests.
The x86 / x86_64 implementation of nextafterl (also used for
nexttowardl) produces incorrect results (NaNs) when negative
subnormals, the low 32 bits of whose mantissa are zero, are
incremented towards zero. This patch fixes this by disabling the
logic to decrement the exponent in that case.
Tested for x86_64 and x86.
[BZ #20205]
* sysdeps/i386/fpu/s_nextafterl.c (__nextafterl): Do not adjust
exponent when incrementing negative subnormal with low mantissa
word zero.
* math/libm-test.inc (nextafter_test_data) [TEST_COND_intel96]:
Add another test.
Bug 19848 reports cases where powl on x86 / x86_64 has error
accumulation, for small integer exponents, larger than permitted by
glibc's accuracy goals, at least in some rounding modes. This patch
further restricts the exponent range for which the
small-integer-exponent logic is used to limit the possible error
accumulation.
Tested for x86_64 and x86 and ulps updated accordingly.
[BZ #19848]
* sysdeps/i386/fpu/e_powl.S (p3): Rename to p2 and change value
from 8 to 4.
(__ieee754_powl): Compare integer exponent against 4 not 8.
* sysdeps/x86_64/fpu/e_powl.S (p3): Rename to p2 and change value
from 8 to 4.
(__ieee754_powl): Compare integer exponent against 4 not 8.
* math/auto-libm-test-in: Add more tests of pow.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The i386 ULPs are actually the i686/multiarch ones. The i686/multiarch
float ULPs are more precise as the SSE2 version (when available) uses
double for the cosf and sinf functions.
On the other hand the higher precision of the x86 FPU improves the
precision for a few other math functions.
* sysdeps/i386/fpu/libm-test-ulps: Move to ....
* sysdeps/i386/i686/multiarch/fpu/libm-test-ulps: ...here.
* sysdeps/i386/fpu/libm-test-ulps: Regenerate.
Various math_private.h headers are guarded by "#ifndef
_MATH_PRIVATE_H", but never define the macro. Nothing else defines
the macro either (the generic math_private.h that they include defines
a different macro, _MATH_PRIVATE_H_), so those guards are ineffective.
With the recent inclusion of s_sin.c in s_sincos.c, this breaks the
build for MIPS, since the build of s_sincos.c ends up including
<math_private.h> twice and the MIPS version defines inline functions
such as libc_feholdexcept_mips, without a separate fenv_private.h
header with its own guards such as some architectures have.
This patch fixes all the problem headers to use architecture-specific
guard macro names, and to define those macros in the headers they
guard, just as some architectures already do.
Tested for x86 (testsuite, and that installed shared libraries are
unchanged by the patch), and for mips64 (that it fixes the build).
* sysdeps/arm/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!ARM_MATH_PRIVATE_H].
[!ARM_MATH_PRIVATE_H] (ARM_MATH_PRIVATE_H): Define macro.
* sysdeps/hppa/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!HPPA_MATH_PRIVATE_H].
[!HPPA_MATH_PRIVATE_H] (HPPA_MATH_PRIVATE_H): Define macro.
* sysdeps/i386/fpu/math_private.h [!_MATH_PRIVATE_H]: Change guard
to [!I386_MATH_PRIVATE_H].
[!I386_MATH_PRIVATE_H] (I386_MATH_PRIVATE_H): Define macro.
* sysdeps/m68k/m680x0/fpu/math_private.h [!_MATH_PRIVATE_H]:
Change guard to [!M68K_MATH_PRIVATE_H].
[!M68K_MATH_PRIVATE_H] (M68K_MATH_PRIVATE_H): Define macro.
* sysdeps/microblaze/math_private.h [!_MATH_PRIVATE_H]: Change
guard to [!MICROBLAZE_MATH_PRIVATE_H].
[!MICROBLAZE_MATH_PRIVATE_H] (MICROBLAZE_MATH_PRIVATE_H): Define
macro.
* sysdeps/mips/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!MIPS_MATH_PRIVATE_H].
[!MIPS_MATH_PRIVATE_H] (MIPS_MATH_PRIVATE_H): Define macro.
* sysdeps/nios2/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!NIO2_MATH_PRIVATE_H].
[!NIO2_MATH_PRIVATE_H] (NIO2_MATH_PRIVATE_H): Define macro.
* sysdeps/tile/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!TILE_MATH_PRIVATE_H].
[!TILE_MATH_PRIVATE_H] (TILE_MATH_PRIVATE_H): Define macro.
For the -ffinite-math-only versions of various x86_64 and x86 log*
functions, a zero result from log* (1) is returned with incorrect sign
in round-downward mode. This patch fixes this in a similar way to the
previous fixes for the non-*_finite versions of the functions.
Tested for x86_64 and x86 (including an i586 build), together with a
patch that will be applied separately to enable the main libm-test.inc
tests for the finite-math-only functions.
[BZ #19213]
* sysdeps/i386/fpu/e_log.S (__log_finite): Ensure +0 is always
returned for argument 1.
* sysdeps/i386/fpu/e_logf.S (__logf_finite): Likewise.
* sysdeps/i386/fpu/e_logl.S (__logl_finite): Likewise.
* sysdeps/i386/i686/fpu/e_logl.S (__logl_finite): Likewise.
* sysdeps/x86_64/fpu/e_log10l.S (__log10l_finite): Likewise.
* sysdeps/x86_64/fpu/e_log2l.S (__log2l_finite): Likewise.
* sysdeps/x86_64/fpu/e_logl.S (__logl_finite): Likewise.
nextafter and nexttoward fail to set errno on overflow and underflow.
This patch makes them do so in cases that should include all the cases
where such errno setting is required by glibc's goals for when to set
errno (but not all cases of underflow where the result is nonzero and
so glibc's goals do not require errno setting).
Tested for x86_64, x86, mips64 and powerpc.
[BZ #6799]
* math/s_nextafter.c: Include <errno.h>.
(__nextafter): Set errno on overflow and underflow.
* math/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/i386/fpu/s_nextafterl.c: Include <errno.h>.
(__nextafterl): Set errno on overflow and underflow.
* sysdeps/i386/fpu/s_nexttoward.c: Include <errno.h>.
(__nexttoward): Set errno on overflow and underflow.
* sysdeps/i386/fpu/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/ieee754/flt-32/s_nextafterf.c: Include <errno.h>.
(__nextafterf): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128/s_nextafterl.c: Include <errno.h>.
(__nextafterl): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c: Include <errno.h>.
(__nexttoward): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c: Include <errno.h>.
(__nextafterl): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c: Include <errno.h>.
(__nexttoward): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c: Include <errno.h>.
(__nexttoward): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c: Include <errno.h>.
(__nldbl_nexttowardf): Set errno on overflow and underflow.
* sysdeps/m68k/m680x0/fpu/s_nextafterl.c: Include <errno.h>.
(__nextafterl): Set errno on overflow and underflow.
* math/libm-test.inc (nextafter_test_data): Do not allow errno
setting to be missing on overflow. Add more tests.
(nexttoward_test_data): Likewise.
fenv_t should include architecture-specific floating-point modes and
status flags. i386 and x86_64 fesetenv limit which bits they use from
the x87 status and control words, when using saved state, and limit
which parts of the state they set to fixed values, when using
FE_DFL_ENV / FE_NOMASK_ENV. The following should be included but are
excluded in at least some cases: status and masking for the "denormal
operand" exception (which isn't part of FE_ALL_EXCEPT); precision
control (explicitly mentioned in Annex F as something that counts as
part of the floating-point environment); MXCSR FZ and DAZ bits (for
FE_DFL_ENV and FE_NOMASK_ENV). This patch arranges for this extra
state to be handled by fesetenv (and thereby by feupdateenv, which
calls fesetenv).
(Note that glibc functions using floating point are not generally
expected to work correctly with non-default values of this state,
especially precision control, but it is still logically part of the
floating-point environment and should be handled as such by fesetenv.
Changes to the state relating to subnormals ought generally to work
with libm functions when the arguments aren't subnormal and neither
are the expected results; that's a consequence of functions avoiding
spurious internal underflows.)
A question arising from this is whether FE_NOMASK_ENV should or should
not mask the "denormal operand" exception. I decided it should mask
that exception. This is the status quo - previously that exception
could only be unmasked by direct manipulation of control registers
(possibly via <fpu_control.h>). In addition, it means that use of
FE_NOMASK_ENV leaves a floating-point environment the same as could be
obtained by fesetenv (FE_DFL_ENV); feenableexcept (FE_ALL_EXCEPT);,
rather than an environment in which an exception is unmasked that
could only be masked again by using fesetenv with FE_DFL_ENV (or a
previously saved environment) - this exception not being usable with
other <fenv.h> functions because it's outside FE_ALL_EXCEPT.
Tested for x86_64 and x86.
[BZ #16068]
* sysdeps/i386/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86_64/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86/fpu/test-fenv-sse-2.c: New file.
* sysdeps/x86/fpu/test-fenv-x87.c: Likewise.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-x87 and test-fenv-sse-2.
[$(subdir) = math] (CFLAGS-test-fenv-sse-2.c): New variable.
The i386 and x86_64 versions of fesetenv, when called with FE_DFL_ENV
or FE_NOMASK_ENV as argument, do not clear SSE exceptions raised in
MXCSR. These arguments should, like other fenv_t values, represent
the whole of the floating-point state, so such exceptions should be
cleared; this patch adds the required clearing. (Discovered while
working on bug 16068.)
Tested for x86_64 and x86.
[BZ #19181]
* sysdeps/i386/fpu/fesetenv.c (__fesetenv): Clear already-raised
SSE exceptions when argument is FE_DFL_ENV or FE_NOMASK_ENV.
* sysdeps/x86_64/fpu/fesetenv.c (__fesetenv): Likewise.
* math/test-fenv-clear-main.c: New file.
* math/test-fenv-clear.c: Likewise.
* math/Makefile (tests): Add test-fenv-clear.
* sysdeps/x86/fpu/test-fenv-clear-sse.c: New file.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-clear-sse.
[$(subdir) = math] (CFLAGS-test-fenv-clear-sse.c): New variable.
The implementations of nearbyint functions using x87 floating point
(i386 all versions, x86_64 long double only) use the fclex
instruction, which clears any exceptions that were raised before the
function was called. These functions must not clear exceptions that
were raised before they were called.
This patch fixes these functions to save and restore the whole
floating-point environment (fnstenv / fldenv) as the way of avoiding
raising "inexact" (recall that there isn't an x87 instruction for
loading just the status word, so the whole environment has to be saved
and loaded instead - the code already saved and loaded the control
word, which is now obtained from the saved environment after this
patch, to disable traps on "inexact"). In the case of the long double
functions, any "invalid" exception from frndint (applied to a
signaling NaN) needs merging into the saved state; this issue doesn't
apply to the float and double functions because that exception would
have been raised when the argument is loaded, before the environment
is saved.
[BZ #15491]
* sysdeps/i386/fpu/s_nearbyint.S (__nearbyint): Save and restore
floating-point environment instead of clearing all exceptions.
* sysdeps/i386/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sysdeps/i386/fpu/s_nearbyintl.S (__nearbyintl): Likewise,
merging in "invalid" exceptions from frndint.
* sysdeps/x86_64/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
* math/test-nearbyint-except.c: New file.
* math/Makefile (tests): Add test-nearbyint-except.
ISO C requires overflowing results from nexttoward to be the
appropriate infinity independent of the rounding mode, but some
implementations use a rounding-mode-dependent result (this is the same
issue as was fixed for nextafter in bug 16677). This patch fixes the
problem by making the nexttoward implementations discard the result
from the floating-point computation that forced an overflow exception
and then return the infinity previously computed with integer
arithmetic.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #19059]
* math/s_nexttowardf.c (__nexttowardf): Do not return value from
overflowing computation.
* sysdeps/i386/fpu/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/i386/fpu/s_nexttowardf.c (__nexttowardf): Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c (__nldbl_nexttowardf):
Likewise.
* math/libm-test.inc (nexttoward_test_data): Add more tests.
The i386 versions of acoshf and acosh raise a spurious "invalid"
exception for an argument that is a quiet NaN with the sign bit set.
The integer arithmetic to detect arguments < 1 also detects -NaN, and
then the computation 0 / 0 in that case raises the exception. This
patch fixes this by using (x - x) / (x - x) as the computation in that
case instead, which will always raise the exception for non-NaN
arguments reaching that code, but not for quiet NaN arguments.
Tested for x86_64 and x86.
[BZ #19032]
* sysdeps/i386/fpu/e_acosh.S (__ieee754_acosh): For arguments < 1,
compute result as (x - x) / (x - x) not as 0 / 0.
* sysdeps/i386/fpu/e_acoshf.S (__ieee754_acoshf): Likewise.
* math/libm-test.inc (acosh_test_data): Add another test of acosh.
For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large
errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids
cancellation error and then using log1p.
However, the thresholds for using that approach still result in log
being used on argument as large as sqrt(13/16) > 0.9, leading to
significant errors, in some cases above the 9ulp maximum allowed in
glibc libm. This patch arranges for the approach using log1p to be
used in any cases where |X|, |Y| < 1 and X^2 + Y^2 >= 0.5 (with the
existing allowance for cases where one of X and Y is very small),
adjusting the __x2y2m1 functions to work with the wider range of
inputs. This way, log only gets used on arguments below sqrt(1/2) (or
substantially above 1), where the error involved is much less.
Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration
I removed the existing clog and clog10 ulps before regenerating to
allow any reduced ulps to appear. Tests added include those found by
random test generation to produce large ulps either before or after
the patch, and some found by trying inputs close to the (0.75, 0.5)
threshold where the potential errors from using log are largest.
[BZ #19016]
* sysdeps/generic/math_private.h (__x2y2m1f): Update comment to
allow more cases with X^2 + Y^2 >= 0.5.
* sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as
normal element in sum instead of special-casing based on values of
arguments.
* sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add
-1 as normal element in sum instead of special-casing based on
values of arguments.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0]
(__x2y2m1): Update comment.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1
as normal element in sum instead of special-casing based on values
of arguments.
* math/s_clog.c (__clog): Handle more cases using log1p without
hypot.
* math/s_clog10.c (__clog10): Likewise.
* math/s_clog10f.c (__clog10f): Likewise.
* math/s_clog10l.c (__clog10l): Likewise.
* math/s_clogf.c (__clogf): Likewise.
* math/s_clogl.c (__clogl): Likewise.
* math/auto-libm-test-in: Add more tests of clog and clog10.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Similar to various other bugs in this area, pow functions can fail to
raise the underflow exception when the result is tiny and inexact but
one or more low bits of the intermediate result that is scaled down
(or, in the i386 case, converted from a wider evaluation format) are
zero. This patch forces the exception in a similar way to previous
fixes, thereby concluding the fixes for known bugs with missing
underflow exceptions currently filed in Bugzilla.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18825]
* sysdeps/i386/fpu/i386-math-asm.h (FLT_NARROW_EVAL_UFLOW_NONNAN):
New macro.
(DBL_NARROW_EVAL_UFLOW_NONNAN): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNAN): Likewise.
* sysdeps/i386/fpu/e_pow.S: Use DEFINE_DBL_MIN.
(__ieee754_pow): Use DBL_NARROW_EVAL_UFLOW_NONNAN instead of
DBL_NARROW_EVAL, reloading the PIC register as needed.
* sysdeps/i386/fpu/e_powf.S: Use DEFINE_FLT_MIN.
(__ieee754_powf): Use FLT_NARROW_EVAL_UFLOW_NONNAN instead of
FLT_NARROW_EVAL. Use separate return path for case when first
argument is NaN.
* sysdeps/i386/fpu/e_powl.S: Include <i386-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN, reloading the
PIC register.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Force
underflow for subnormal result.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Use
math_check_force_underflow_nonneg.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Use
math_check_force_underflow.
* sysdeps/x86_64/fpu/x86_64-math-asm.h
(LDBL_CHECK_FORCE_UFLOW_NONNAN): New macro.
* sysdeps/x86_64/fpu/e_powl.S: Include <x86_64-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN.
* math/auto-libm-test-in: Add more tests of pow.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, hypot functions can fail
to raise the underflow exception when the result is tiny and inexact
but one or more low bits of the intermediate result that is scaled
down (or, in the i386 case, converted from a wider evaluation format)
are zero. This patch forces the exception in a similar way to
previous fixes.
Note that this issue cannot arise for implementations of hypotf using
double (or wider) for intermediate evaluation (if hypotf should
underflow, that means the double square root is being computed of some
number of the form N*2^-298, for 0 < N < 2^46, which is exactly
represented as a double, and whatever the rounding mode such a square
root cannot have a mantissa with all zeroes after the initial 23
bits). Thus no changes are made to hypotf implementations in this
patch, only to hypot and hypotl.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18803]
* sysdeps/i386/fpu/e_hypot.S: Use DEFINE_DBL_MIN.
(MO): New macro.
(__ieee754_hypot) [PIC]: Load PIC register.
(__ieee754_hypot): Use DBL_NARROW_EVAL_UFLOW_NONNEG instead of
DBL_NARROW_EVAL.
* sysdeps/ieee754/dbl-64/e_hypot.c (__ieee754_hypot): Use
math_check_force_underflow_nonneg in case where result might be
tiny.
* sysdeps/ieee754/ldbl-128/e_hypotl.c (__ieee754_hypotl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c (__ieee754_hypotl):
Likewise.
* sysdeps/ieee754/ldbl-96/e_hypotl.c (__ieee754_hypotl): Likewise.
* sysdeps/powerpc/fpu/e_hypot.c (__ieee754_hypot): Likewise.
* math/auto-libm-test-in: Add more tests of hypot.
* math/auto-libm-test-out: Regenerated.
sysdeps/i386/fpu/e_atanh.S, unlike all other functions in that
directory, loads the PIC register with its own code using
_GLOBAL_OFFSET_TABLE_, rather than with the LOAD_PIC_REG macro. I see
no good reason for the difference; this patch makes it use the common
macro.
Tested for x86.
* sysdeps/i386/fpu/e_atanh.S (__ieee754_atanh) [PIC]: Use
LOAD_PIC_REG.
This patch refactors code in sysdeps/i386/fpu that forces underflow
exceptions to use common macros for that purpose as far as possible.
(Although some of the macros end up used in only one place, I think
it's cleanest to define all these macros together so that all the code
forcing underflow uses such macros. Some more uses of such macros
will also be introduced when fixing remaining bugs about missing
underflow exceptions, and it would be possible to do further
refactoring of the macros in i386-math-asm.h to share more code by
using other macros internally. Places that test for underflow by
examining the representation of the argument with integer operations,
rather that using floating-point comparisons on the argument or
result, are unchanged by this patch.)
Most of this code uses a macro MO to abstract away the differences
between PIC and non-PIC memory references to constants. log1p
functions, however, hardcoded PIC conditionals for this. Because the
common macros rely on the use of MO, I changed the log1p functions to
use the normal style here, and, for consistency, also made that change
to log1pl which is otherwise unaffected by this patch.
Tested for x86.
* sysdeps/i386/fpu/i386-math-asm.h (DEFINE_LDBL_MIN): New macro.
(FLT_CHECK_FORCE_UFLOW): Likewise.
(DBL_CHECK_FORCE_UFLOW): Likewise.
(FLT_CHECK_FORCE_UFLOW_NARROW): Likewise.
(DBL_CHECK_FORCE_UFLOW_NARROW): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNEG_NAN): Likewise.
(FLT_CHECK_FORCE_UFLOW_NONNAN): Likewise.
(DBL_CHECK_FORCE_UFLOW_NONNAN): Likewise.
(FLT_CHECK_FORCE_UFLOW_NONNEG): Likewise.
(DBL_CHECK_FORCE_UFLOW_NONNEG): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNEG): Likewise.
* sysdeps/i386/fpu/e_asin.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_asin): Use DBL_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/e_asinf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_asinf): Use FLT_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/e_atan2.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_atan2): Use DBL_CHECK_FORCE_UFLOW_NARROW.
* sysdeps/i386/fpu/e_atan2f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_atan2f): Use FLT_CHECK_FORCE_UFLOW_NARROW.
* sysdeps/i386/fpu/e_atanh.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_atanh): Use DBL_CHECK_FORCE_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_atanhf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_atanhf): Use FLT_CHECK_FORCE_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_exp2l.S: Include <i386-math-asm.h>.
(ldbl_min): Replace with use of DEFINE_LDBL_MIN.
(__ieee754_exp2l): Use LDBL_CHECK_FORCE_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_expl.S: Include <i386-math-asm.h>.
[!USE_AS_EXPM1L] (cmin): Replace with use of DEFINE_LDBL_MIN.
(IEEE754_EXPL): Use LDBL_CHECK_FORCE_UFLOW_NONNEG.
* sysdeps/i386/fpu/s_atan.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__atan): Use DBL_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/s_atanf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__atanf): Use FLT_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/s_expm1.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__expm1): Use DBL_CHECK_FORCE_UFLOW. Move underflow check after
main computation.
* sysdeps/i386/fpu/s_expm1f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__expm1f): Use FLT_CHECK_FORCE_UFLOW. Move underflow check after
main computation.
* sysdeps/i386/fpu/s_log1p.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(MO): New macro.
(__log1p): Use MO. Use DBL_CHECK_FORCE_UFLOW_NONNAN.
* sysdeps/i386/fpu/s_log1pf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(MO): New macro.
(__log1pf): Use MO. Use FLT_CHECK_FORCE_UFLOW_NONNAN.
* sysdeps/i386/fpu/s_log1pl.S (MO): New macro.
(__log1pl): Use MO.
Where glibc code needs to avoid excess range and precision in
floating-point arithmetic, code variously uses either asms or volatile
to force the results of that arithmetic to memory; mostly this is
conditional on FLT_EVAL_METHOD, but in the case of lrint / llrint
functions some use of volatile is unconditional (and is present
unnecessarily in versions for long double). This patch make such code
use the recently-added math_narrow_eval macro consistently, removing
the unnecessary uses of volatile in long double lrint / llrint
implementations completely.
Tested for x86_64, x86, mips64 and powerpc.
* math/s_nexttowardf.c (__nexttowardf): Use math_narrow_eval.
* stdlib/strtod_l.c: Include <math_private.h>.
(overflow_value): Use math_narrow_eval.
(underflow_value): Likewise.
* sysdeps/i386/fpu/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/i386/fpu/s_nexttowardf.c (__nexttowardf): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Likewise.
(__ieee754_gamma_r): Likewise.
* sysdeps/ieee754/dbl-64/gamma_productf.c (__gamma_productf):
Likewise.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
* sysdeps/ieee754/dbl-64/s_erf.c (__erfc): Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c (__llrint): Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c (__lrint): Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
(__ieee754_gammaf_r): Likewise.
* sysdeps/ieee754/flt-32/k_rem_pio2f.c (__kernel_rem_pio2f):
Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
* sysdeps/ieee754/flt-32/s_erff.c (__erfcf): Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c (__llrintf): Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c (__lrintf): Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c (__llrintl): Do not use
volatile.
* sysdeps/ieee754/ldbl-128/s_lrintl.c (__lrintl): Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c (__nexttoward): Use
math_narrow_eval.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-96/gamma_product.c (__gamma_product):
Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c (__llrintl): Do not use
volatile.
* sysdeps/ieee754/ldbl-96/s_lrintl.c (__lrintl): Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c (__nexttoward): Use
math_narrow_eval.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c (__nldbl_nexttowardf):
Likewise.
i386 exp, hypot and pow functions can return overflowing and
underflowing values with excess range and precision; ; Wilco
Dijkstra's patches to make isfinite etc. expand inline cause this
pre-existing issue to result in test failures.
This patch fixes those functions to avoid excess range and precision
in their return values. Appropriate macros are added for the repeated
code sequences; in future I'll add more such macros and refactor
existing code forcing underflow (with or without also eliminating
excess range and precision from the return value) to use such macros.
Tested for x86. If, after this patch, you still see x86 libm test
failures with excess range or precision, please file bugs in Bugzilla.
[BZ #18980]
* sysdeps/i386/fpu/i386-math-asm.h (DEFINE_FLT_MIN): New macro.
(DEFINE_DBL_MIN): Likewise.
(FLT_NARROW_EVAL_UFLOW_NONNEG_NAN): Likewise.
(DBL_NARROW_EVAL_UFLOW_NONNEG_NAN): Likewise.
(FLT_NARROW_EVAL_UFLOW_NONNEG): Likewise.
(DBL_NARROW_EVAL_UFLOW_NONNEG): Likewise.
* sysdeps/i386/fpu/e_exp.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_exp): Use DBL_NARROW_EVAL_UFLOW_NONNEG_NAN.
(__exp_finite): Use DBL_NARROW_EVAL_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_exp10.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_exp10): Use DBL_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_exp10f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_exp10f): Use FLT_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_exp2.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_exp2): Use DBL_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_exp2f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_exp2f): Use FLT_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_expf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_expf): Use FLT_NARROW_EVAL_UFLOW_NONNEG_NAN.
(__expf_finite): Use FLT_NARROW_EVAL_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_hypot.S: Include <i386-math-asm.h>.
(__ieee754_hypot): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/e_hypotf.S: Include <i386-math-asm.h>.
(__ieee754_hypotf): Use FLT_NARROW_EVAL.
* sysdeps/i386/fpu/e_pow.S: Include <i386-math-asm.h>.
(__ieee754_pow): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/e_powf.S: Include <i386-math-asm.h>.
(__ieee754_powf): Use FLT_NARROW_EVAL.
* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S
(__ieee754_expf_sse2): Convert double-precision result to single
precision.
* sysdeps/i386/fpu/libm-test-ulps: Update.
i386 scalb / scalbn / scalbln (and thus ldexp) functions for float and
double can return results with excess range (and consequently excess
precision for subnormal results). As the results of these functions
are fully determined by reference to IEEE 754 operations, this is
unambiguously a bug, apart from the testsuite failures it causes.
This patch makes those functions store their results on the stack and
load them back to eliminate the excess range. Double rounding is not
a problem, as the only cases where it could occur are when the result
overflows or underflows for extended precision, and then the
double-rounded results are the same as the single-rounded results.
The new macros will be used for more functions, more such macros
added, and existing code refactored to use such macros, in subsequent
patches.
Tested for x86. Committed.
[BZ #18981]
* sysdeps/i386/fpu/i386-math-asm.h: New file.
* sysdeps/i386/fpu/e_scalb.S: Include <i386-math-asm.h>.
(__ieee754_scalb): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/e_scalbf.S: Include <i386-math-asm.h>.
(__ieee754_scalbf): Use FLT_NARROW_EVAL.
* sysdeps/i386/fpu/s_scalbn.S: Include <i386-math-asm.h>.
(__scalbn): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/s_scalbnf.S: Include <i386-math-asm.h>.
(__scalbnf): Use FLT_NARROW_EVAL.
As noted in bug 6803, scalbn fails to set errno on overflow and
underflow. This patch fixes this by making scalbn an alias of ldexp,
which has exactly the same semantics (for floating-point types with
radix 2) and already has wrappers that deal with setting errno,
instead of an alias of the internal __scalbn (which ldexp calls).
Notes:
* Where compat symbols were defined for scalbn functions, I didn't
change what they point to (to keep the patch minimal), so such
compat symbols continue to go directly to the non-errno-setting
functions.
* Mike, I didn't do anything with the IA64 versions of these
functions, where I think both the ldexp and scalbn functions already
deal with setting errno. As a cleanup (not needed to fix this bug)
however you might want to make those functions into aliases for
IA64; there is no need for them to be separate function
implementations at all.
* This concludes the fix for bug 6803 since the scalb and scalbln
cases of that bug were fixed some time ago.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #6803]
* math/s_ldexp.c (scalbn): Define as weak alias of __ldexp.
[NO_LONG_DOUBLE] (scalbnl): Define as weak alias of __ldexp.
* math/s_ldexpf.c (scalbnf): Define as weak alias of __ldexpf.
* math/s_ldexpl.c (scalbnl): Define as weak alias of __ldexpl.
* sysdeps/i386/fpu/s_scalbn.S (scalbn): Remove alias.
* sysdeps/i386/fpu/s_scalbnf.S (scalbnf): Likewise.
* sysdeps/i386/fpu/s_scalbnl.S (scalbnl): Likewise.
* sysdeps/ieee754/dbl-64/s_scalbn.c (scalbn): Likewise.
[NO_LONG_DOUBLE] (scalbnl): Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_scalbn.c (scalbn):
Likewise.
[NO_LONG_DOUBLE] (scalbnl): Likewise.
* sysdeps/ieee754/flt-32/s_scalbnf.c (scalbnf): Likewise.
* sysdeps/ieee754/ldbl-128/s_scalbnl.c (scalbnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (scalbnl): Remove
long_double_symbol calls.
* sysdeps/ieee754/ldbl-64-128/s_scalbnl.c (scalbnl): Likewise.
* sysdeps/ieee754/ldbl-opt/s_ldexpl.c (__ldexpl_2): Define as
strong alias of __ldexpl.
(scalbnl): Define using long_double_symbol.
* sysdeps/m68k/m680x0/fpu/s_scalbn.c (__CONCATX(scalbn,suffix)):
Remove alias.
* sysdeps/sparc/sparc64/soft-fp/s_scalbnl.c (scalbnl): Likewise.
* sysdeps/x86_64/fpu/s_scalbnl.S (scalbnl): Likewise.
* math/libm-test.inc (scalbn_test_data): Add errno expectations.
(scalbln_test_data): Add more errno expectations.
On i386, the double version of exp10 can miss underflow exceptions if
the result is in the subnormal range for double but the last 11 bits
of the 64-bit extended-precision mantissa happen to be zero. This
patch forces the exception in a similar way to previous fixes.
As with the exp2 and exp fixes, the exp10f changes may in fact not be
needed to ensure underflow exceptions, but are included for
consistency and to fix the exp10 part of bug 18875 by ensuring that
excess range and precision is removed from underflowing return values.
Tested for x86_64 and x86.
[BZ #18875]
[BZ #18966]
* sysdeps/i386/fpu/e_exp10.S (dbl_min): New object.
(MO): New macro.
(__ieee754_exp10): For small results, force underflow exception
and remove excess range and precision from return value.
* sysdeps/i386/fpu/e_exp10f.S (flt_min): New object.
(MO): New macro.
(__ieee754_exp10f): For small results, force underflow exception
and remove excess range and precision from return value.
* math/auto-libm-test-in: Add more tests of exp10.
* math/auto-libm-test-out: Regenerated.
On i386, the double version of exp can miss underflow exceptions if
the result is in the subnormal range for double but the last 11 bits
of the 64-bit extended-precision mantissa happen to be zero. This
patch forces the exception in a similar way to previous fixes.
As with the exp2 fixes, the expf changes may in fact not be needed to
ensure underflow exceptions, but are included for consistency and to
fix the exp part of bug 18875 by ensuring that excess range and
precision is removed from underflowing return values.
Tested for x86_64 and x86.
[BZ #18875]
[BZ #18961]
* sysdeps/i386/fpu/e_exp.S (dbl_min): New object.
(MO): New macro.
(__ieee754_exp): For small results, force underflow exception and
remove excess range and precision from return value.
(__exp_finite): Likewise.
* sysdeps/i386/fpu/e_expf.S (flt_min): New object.
(MO): New macro.
(__ieee754_expf): For small results, force underflow exception and
remove excess range and precision from return value.
(__expf_finite): Likewise.
* math/auto-libm-test-in: Add more tests of exp.
* math/auto-libm-test-out: Regenerated.
Various exp2 implementations in glibc can miss underflow exceptions
when the scaling down part of the calculation is exact (or, in the x86
case, when the conversion from extended precision to the target
precision is exact). This patch forces the exception in a similar way
to previous fixes.
The x86 exp2f changes may in fact not be needed for this purpose -
it's likely to be the case that no argument of type float has an exp2
result so close to an exact subnormal float value that it equals that
value when rounded to 64 bits (even taking account of variation
between different x86 implementations). However, they are included
for consistency with the changes to exp2 and so as to fix the exp2f
part of bug 18875 by ensuring that excess range and precision is
removed from underflowing return values.
Tested for x86_64, x86 and mips64.
[BZ #16521]
[BZ #18875]
* math/e_exp2l.c (__ieee754_exp2l): Force underflow exception for
small results.
* sysdeps/i386/fpu/e_exp2.S (dbl_min): New object.
(MO): New macro.
(__ieee754_exp2): For small results, force underflow exception and
remove excess range and precision from return value.
* sysdeps/i386/fpu/e_exp2f.S (flt_min): New object.
(MO): New macro.
(__ieee754_exp2f): For small results, force underflow exception
and remove excess range and precision from return value.
* sysdeps/i386/fpu/e_exp2l.S (ldbl_min): New object.
(MO): New macro.
(__ieee754_exp2l): Force underflow exception for small results.
* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c (__ieee754_exp2f): Likewise.
* sysdeps/x86_64/fpu/e_exp2l.S (ldbl_min): New object.
(MO): New macro.
(__ieee754_exp2l): Force underflow exception for small results.
* math/auto-libm-test-in: Add more tests or exp2.
* math/auto-libm-test-out: Regenerated.
This patch adds more libm test inputs found through random test
generation to increase previously known ulps. This particular test
generation was run for mips64, so most of the increased ulps are for
ldbl-128 (float and double having been fairly well covered by such
testing for x86_64), but there's the odd ulps increase for other
formats.
Tested for x86_64, x86 and mips64.
* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
asinh, atan, atan2, atanh, cabs, carg, cos, csqrt, erfc, exp,
exp10, exp2, log, log1p, log2, pow, sin, sincos, sinh, tan and
tanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/mips/mips32/libm-test-ulps: Likewise.
* sysdeps/mips/mips64/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds more libm test inputs found through random test
generation to increase observed ulps on x86_64.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acosh, atanh, cbrt,
cosh, csqrt, erfc, expm1 and lgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The existing implementations of lgamma functions (except for the ia64
versions) use the reflection formula for negative arguments. This
suffers large inaccuracy from cancellation near zeros of lgamma (near
where the gamma function is +/- 1).
This patch fixes this inaccuracy. For arguments above -2, there are
no zeros and no large cancellation, while for sufficiently large
negative arguments the zeros are so close to integers that even for
integers +/- 1ulp the log(gamma(1-x)) term dominates and cancellation
is not significant. Thus, it is only necessary to take special care
about cancellation for arguments around a limited number of zeros.
Accordingly, this patch uses precomputed tables of relevant zeros,
expressed as the sum of two floating-point values. The log of the
ratio of two sines can be computed accurately using log1p in cases
where log would lose accuracy. The log of the ratio of two gamma(1-x)
values can be computed using Stirling's approximation (the difference
between two values of that approximation to lgamma being computable
without computing the two values and then subtracting), with
appropriate adjustments (which don't reduce accuracy too much) in
cases where 1-x is too small to use Stirling's approximation directly.
In the interval from -3 to -2, using the ratios of sines and of
gamma(1-x) can still produce too much cancellation between those two
parts of the computation (and that interval is also the worst interval
for computing the ratio between gamma(1-x) values, which computation
becomes more accurate, while being less critical for the final result,
for larger 1-x). Because this can result in errors slightly above
those accepted in glibc, this interval is instead dealt with by
polynomial approximations. Separate polynomial approximations to
(|gamma(x)|-1)(x-n)/(x-x0) are used for each interval of length 1/8
from -3 to -2, where n (-3 or -2) is the nearest integer to the
1/8-interval and x0 is the zero of lgamma in the relevant half-integer
interval (-3 to -2.5 or -2.5 to -2).
Together, the two approaches are intended to give sufficient accuracy
for all negative arguments in the problem range. Outside that range,
the previous implementation continues to be used.
Tested for x86_64, x86, mips64 and powerpc. The mips64 and powerpc
testing shows up pre-existing problems for ldbl-128 and ldbl-128ibm
with large negative arguments giving spurious "invalid" exceptions
(exposed by newly added tests for cases this patch doesn't affect the
logic for); I'll address those problems separately.
[BZ #2542]
[BZ #2543]
[BZ #2558]
* sysdeps/ieee754/dbl-64/e_lgamma_r.c (__ieee754_lgamma_r): Call
__lgamma_neg for arguments from -28.0 to -2.0.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (__ieee754_lgammaf_r): Call
__lgamma_negf for arguments from -15.0 to -2.0.
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
Call __lgamma_negl for arguments from -48.0 or -50.0 to -2.0.
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (__ieee754_lgammal_r):
Call __lgamma_negl for arguments from -33.0 to -2.0.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: New file.
* sysdeps/ieee754/dbl-64/lgamma_product.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_productf.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_product.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_productl.c: Likewise.
* sysdeps/generic/math_private.h (__lgamma_negf): New prototype.
(__lgamma_neg): Likewise.
(__lgamma_negl): Likewise.
(__lgamma_product): Likewise.
(__lgamma_productl): Likewise.
* math/Makefile (libm-calls): Add lgamma_neg and lgamma_product.
* math/auto-libm-test-in: Add more tests of lgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The csqrt implementations in glibc can miss underflow exceptions when
the real or imaginary part of the result becomes tiny in the course of
scaling down (in particular, multiplication by 0.5) and that scaling
is exact although the relevant part of the mathematical result isn't.
This patch forces the exception in a similar way to previous fixes.
Tested for x86_64 and x86.
[BZ #18370]
* math/s_csqrt.c (__csqrt): Force underflow exception for results
whose real or imaginary part has small absolute value.
* math/s_csqrtf.c (__csqrtf): Likewise.
* math/s_csqrtl.c (__csqrtl): Likewise.
* math/auto-libm-test-in: Add more tests of csqrt.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
This patch adds more test inputs to various libm functions found
through random generation to have larger ulps errors than previously
listed in libm-test-ulp, on at least one of x86_64 and x86.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
asinh, atan, atan2, atanh, cabs, cbrt, cosh, csqrt, erf, erfc,
exp, exp2, lgamma, log, log1p, log2, pow, sin, sincos, tan, tanh
and tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Similar to various other bugs in this area, some tanh implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #16520]
* sysdeps/ieee754/dbl-64/s_tanh.c: Include <float.h>.
(__tanh): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/flt-32/s_tanhf.c: Include <float.h>.
(__tanhf): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_tanhl.c: Include <float.h>.
(__tanhl): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c: Include <float.h>.
(__tanhl): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-96/s_tanhl.c: Include <float.h>.
(__tanhl): Force underflow exception for arguments with small
absolute value.
* math/auto-libm-test-in: Add more tests of tanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
This patch adds more tests of various libm functions found through
random test generation to give increased ulps on 32-bit x86.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acosh, asin, asinh,
atanh, cabs, carg, cbrt, cosh, csqrt, erf, erfc, exp, exp10,
expm1, hypot, log, log10, log1p, log2, pow, sinh, tan and tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
ldbl-128ibm tanhl uses a too-small threshold to decide when to return
+/-1, resulting in large errors. This patch changes it to a more
appropriate threshold (the requirement is for 2*exp(-2|x|) to be small
in terms of ulps of 1).
Tested for x86_64, x86 and powerpc.
[BZ #18790]
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c (__tanhl): Increase
threshold for returning +/- 1.
* math/auto-libm-test-in: Add more tests of tanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
ldbl-128ibm sinhl uses a too-big threshold to decide when to return
the argument, resulting in large errors. This patch fixes it to use a
more appropriate threshold.
Tested for x86_64, x86 and powerpc.
[BZ #18789]
* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c (__ieee754_sinhl): Use
smaller threshold for returning the argument.
* math/auto-libm-test-in: Add more tests of sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
Similar to various other bugs in this area, some sinh implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #16519]
* sysdeps/ieee754/dbl-64/e_sinh.c: Include <float.h>.
(__ieee754_sinh): Force underflow exception for arguments with
small absolute value.
* sysdeps/ieee754/flt-32/e_sinhf.c: Include <float.h>.
(__ieee754_sinhf): Force underflow exception for arguments with
small absolute value.
* sysdeps/ieee754/ldbl-128/e_sinhl.c: Include <float.h>.
(__ieee754_sinhl): Force underflow exception for arguments with
small absolute value.
* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c: Include <float.h>.
(__ieee754_sinhl): Force underflow exception for arguments with
small absolute value.
* sysdeps/ieee754/ldbl-96/e_sinhl.c: Include <float.h>.
(__ieee754_sinhl): Force underflow exception for arguments with
small absolute value.
* math/auto-libm-test-in: Add more tests of sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
In non-default rounding modes, tgamma can be slightly less accurate
than permitted by glibc's accuracy goals.
Part of the problem is error accumulation, addressed in this patch by
setting round-to-nearest for internal computations. However, there
was also a bug in the code dealing with computing pow (x + n, x + n)
where x + n is not exactly representable, providing another source of
error even in round-to-nearest mode; it was necessary to address both
bugs to get errors for all testcases within glibc's accuracy goals.
Given this second fix, accuracy in round-to-nearest mode is also
improved (hence regeneration of ulps for tgamma should be from scratch
- truncate libm-test-ulps or at least remove existing tgamma entries -
so that the expected ulps can be reduced).
Some additional complications also arose. Certain tgamma tests should
strictly, according to IEEE semantics, overflow or not depending on
the rounding mode; this is beyond the scope of glibc's accuracy goals
for any function without exactly-determined results, but
gen-auto-libm-tests doesn't handle being lax there as it does for
underflow. (libm-test.inc also doesn't handle being lax about whether
the result in cases very close to the overflow threshold is infinity
or a finite value close to overflow, but that doesn't cause problems
in this case though I've seen it cause problems with random test
generation for some functions.) Thus, spurious-overflow markings,
with a comment, are added to auto-libm-test-in (no bug in Bugzilla
because the issue is with the testsuite, not a user-visible bug in
glibc). And on x86, after the patch I saw ERANGE issues as previously
reported by Carlos (see my commentary in
<https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>), which
needed addressing by ensuring excess range and precision were
eliminated at various points if FLT_EVAL_METHOD != 0.
I also noticed and fixed a cosmetic issue where 1.0f was used in long
double functions and should have been 1.0L.
This completes the move of all functions to testing in all rounding
modes with ALL_RM_TEST, so gen-libm-have-vector-test.sh is updated to
remove the workaround for some functions not using ALL_RM_TEST.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18613]
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Take log of
X_ADJ not X when adjusting exponent.
(__ieee754_gamma_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Take log
of X_ADJ not X when adjusting exponent.
(__ieee754_gammaf_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Take
log of X_ADJ not X when adjusting exponent.
(__ieee754_gammal_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed. Use 1.0L not 1.0f as numerator of division.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Take
log of X_ADJ not X when adjusting exponent.
(__ieee754_gammal_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed. Use 1.0L not 1.0f as numerator of division.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Take log
of X_ADJ not X when adjusting exponent.
(__ieee754_gammal_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed. Use 1.0L not 1.0f as numerator of division.
* math/libm-test.inc (tgamma_test_data): Remove one test. Moved
to auto-libm-test-in.
(tgamma_test): Use ALL_RM_TEST.
* math/auto-libm-test-in: Add one test of tgamma. Mark some other
tests of tgamma with spurious-overflow.
* math/auto-libm-test-out: Regenerated.
* math/gen-libm-have-vector-test.sh: Do not check for START.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Some existing jn tests, if run in non-default rounding modes, produce
errors above those accepted in glibc, which causes problems for moving
tests of jn to use ALL_RM_TEST. This patch makes jn set rounding
to-nearest internally, as was done for yn some time ago, then computes
the appropriate underflowing value for results that underflowed to
zero in to-nearest, and moves the tests to ALL_RM_TEST. It does
nothing about the general inaccuracy of Bessel function
implementations in glibc, though it should make jn more accurate on
average in non-default rounding modes through reduced error
accumulation. The recomputation of results that underflowed to zero
should as a side-effect fix some cases of bug 16559, where jn just
used an exact zero, but that is *not* the goal of this patch and other
cases of that bug remain unfixed.
(Most of the changes in the patch are reindentation to add new scopes
for SET_RESTORE_ROUND*.)
Tested for x86_64, x86, powerpc and mips64.
[BZ #16559]
[BZ #18602]
* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Set
round-to-nearest internally then recompute results that
underflowed to zero in the original rounding mode.
* sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise
* math/libm-test.inc (jn_test): Use ALL_RM_TEST.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
cexp, ccos, ccosh, csin and csinh have spurious underflows in cases
where they compute sin of the smallest normal, that produces an
underflow exception (depending on which sin implementation is in use)
but the final result does not underflow. ctan and ctanh may also have
such underflows, or they may be latent (the issue there is that
e.g. ctan (DBL_MIN) should, rounded upwards, be the next double value
above DBL_MIN, which under glibc's accuracy goals may not have an
underflow exception, but the intermediate computation of sin (DBL_MIN)
would legitimately underflow on before-rounding architectures).
This patch fixes all those functions so they use plain comparisons (>
DBL_MIN etc.) instead of comparing the result of fpclassify with
FP_SUBNORMAL (in all these cases, we already know the number being
compared is finite). Note that in the case of csin / csinf / csinl,
there is no need for fabs calls in the comparison because the real
part has already been reduced to its absolute value.
As the patch fixes the failures that previously obstructed moving
tests of cexp to use ALL_RM_TEST, those tests are moved to ALL_RM_TEST
by the patch (two functions remain yet to be converted).
Tested for x86_64 and x86 and ulps updated accordingly.
[BZ #18594]
* math/s_ccosh.c (__ccosh): Compare with least normal value
instead of comparing class with FP_SUBNORMAL.
* math/s_ccoshf.c (__ccoshf): Likewise.
* math/s_ccoshl.c (__ccoshl): Likewise.
* math/s_cexp.c (__cexp): Likewise.
* math/s_cexpf.c (__cexpf): Likewise.
* math/s_cexpl.c (__cexpl): Likewise.
* math/s_csin.c (__csin): Likewise.
* math/s_csinf.c (__csinf): Likewise.
* math/s_csinh.c (__csinh): Likewise.
* math/s_csinhf.c (__csinhf): Likewise.
* math/s_csinhl.c (__csinhl): Likewise.
* math/s_csinl.c (__csinl): Likewise.
* math/s_ctan.c (__ctan): Likewise.
* math/s_ctanf.c (__ctanf): Likewise.
* math/s_ctanh.c (__ctanh): Likewise.
* math/s_ctanhf.c (__ctanhf): Likewise.
* math/s_ctanhl.c (__ctanhl): Likewise.
* math/s_ctanl.c (__ctanl): Likewise.
* math/auto-libm-test-in: Add more tests of ccos, ccosh, cexp,
csin, csinh, ctan and ctanh.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (cexp_test): Use ALL_RM_TEST.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The csqrt implementations in glibc can cause spurious underflows in
some cases as a side-effect of the scaling for large arguments (when
underflow is correct for the square root of the argument that was
scaled down to avoid overflow, but not for the original argument).
This patch arranges to avoid the underflowing intermediate computation
(eliminating a multiplication in 0.5 in the problem cases where a
subsequent scaling by 2 would follow).
Tested for x86_64 and x86 and ulps updated accordingly (only needed
for x86).
[BZ #18371]
* math/s_csqrt.c (__csqrt): Avoid multiplication by 0.5 where
intermediate but not final result might underflow.
* math/s_csqrtf.c (__csqrtf): Likewise.
* math/s_csqrtl.c (__csqrtl): Likewise.
* math/auto-libm-test-in: Add more tests of csqrt.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
Similar to various other bugs in this area, some expm1 implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes.
(The issue does not apply to the ldbl-* implementations or to those
for x86 / x86_64 long double. The change to
sysdeps/ieee754/dbl-64/wordsize-64/e_cosh.c is one I missed when
previously fixing bug 16354; the bug in that implementation was
previously latent, but the expm1 fixes stopped it being latent and so
required it to be fixed to avoid spurious underflows from cosh.)
Tested for x86_64 and x86.
[BZ #16353]
* sysdeps/i386/fpu/s_expm1.S (dbl_min): New object.
(__expm1): Force underflow exception for arguments with small
absolute value.
* sysdeps/i386/fpu/s_expm1f.S (flt_min): New object.
(__expm1f): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/dbl-64/s_expm1.c: Include <float.h>.
(__expm1): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/flt-32/s_expm1f.c: Include <float.h>.
(__expm1f): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/dbl-64/wordsize-64/e_cosh.c (__ieee754_cosh):
Check for small arguments before calling __expm1.
* math/auto-libm-test-in: Do not mark underflow exceptions as
possibly missing for bug 16353.
* math/auto-libm-test-out: Regenerated.
In the x86 / x86_64 implementations of expm1l, when expm1l's result
should underflow to 0 (argument minus the least subnormal, in some
rounding modes), it can be a zero of the wrong sign. This patch fixes
this by returning the argument with underflow forced in that case
(this is a 1ulp error relative to the correctly rounded result of -0,
which is OK in terms of the documented accuracy goals, whereas a
result with the wrong sign never is).
Tested for x86_64 and x86.
[BZ #18569]
* sysdeps/i386/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]: Force
underflow and return argument in case of subnormal argument.
* sysdeps/x86_64/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]:
Likewise.
* math/auto-libm-test-in: Add more tests of expm1.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, the x86 and x86_64
implementations of expl / exp10l can fail to produce underflow
exceptions when the unscaled result has trailing 0 bits so the scaling
down to subnormal precision is exact. This patch fixes this by
forcing the exception in the case of tiny results.
Tested for x86_64 and x86.
[BZ #16361]
* sysdeps/i386/fpu/e_expl.S [!USE_AS_EXPM1L] (cmin): New object.
[!USE_AS_EXPM1L] (IEEE754_EXPL): Force underflow exception for
tiny results.
* sysdeps/x86_64/fpu/e_expl.S [!USE_AS_EXPM1L] (cmin): New object.
[!USE_AS_EXPM1L] (IEEE754_EXPL): Force underflow exception for
tiny results.
* math/auto-libm-test-in: Add more tests of exp and exp10. Do not
mark underflow exceptions as possibly missing for bug 16361.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, some asinh implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes.
Tested for x86_64, x86 and mips64.
[BZ #16350]
* sysdeps/i386/fpu/s_asinh.S (__asinh): Force underflow exception
for arguments with small absolute value.
* sysdeps/i386/fpu/s_asinhf.S (__asinhf): Likewise.
* sysdeps/i386/fpu/s_asinhl.S (__asinhl): Likewise.
* sysdeps/ieee754/dbl-64/s_asinh.c: Include <float.h>.
(__asinh): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/flt-32/s_asinhf.c: Include <float.h>.
(__asinhf): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_asinhl.c: Include <float.h>.
(__asinhl): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c: Include <float.h>.
(__asinhl): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-96/s_asinhl.c: Include <float.h>.
(__asinhl): Force underflow exception for arguments with small
absolute value.
* math/auto-libm-test-in: Do not mark underflow exceptions as
possibly missing for bug 16350.
* math/auto-libm-test-out: Regenerated.
The i386 implementation of atanhl, for small arguments, does a
calculation that involves computing twice the square of the argument,
resulting in spurious underflows for some arguments. This patch fixes
this by just returning the argument when its exponent is below -32,
with underflow being forced as needed for subnormal arguments.
Tested for x86 and x86_64.
[BZ #18049]
* sysdeps/i386/fpu/e_atanhl.S (__ieee754_atanhl): For exponents
below -32, return the argument, with underflow if subnormal.
* math/auto-libm-test-in: Add more tests of atanh.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, some atanh implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes. (No change in this regard is needed
for the i386 implementation; special handling to force underflows in
these cases will only be needed there when the spurious underflows,
bug 18049, get fixed.)
Tested for x86_64, x86, powerpc and mips64.
[BZ #16352]
* sysdeps/i386/fpu/e_atanh.S (dbl_min): New object.
(__ieee754_atanh): Force underflow exception for results with
small absolute value.
* sysdeps/i386/fpu/e_atanhf.S (flt_min): New object.
(__ieee754_atanhf): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/dbl-64/e_atanh.c: Include <float.h>.
(__ieee754_atanh): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/flt-32/e_atanhf.c: Include <float.h>.
(__ieee754_atanhf): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/ldbl-128/e_atanhl.c: Include <float.h>.
(__ieee754_atanhl): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/ldbl-128ibm/e_atanhl.c: Include <float.h>.
(__ieee754_atanhl): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/ldbl-96/e_atanhl.c: Include <float.h>.
(__ieee754_atanhl): Force underflow exception for results with
small absolute value.
* math/auto-libm-test-in: Do not allow missing underflow
exceptions from atanh.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, some log1p implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes. (The ldbl-128ibm implementation
doesn't currently need any change as it already generates this
exception, albeit through code that would generate spurious exceptions
in other cases; special code for this issue will only be needed there
when fixing the spurious exceptions.)
Tested for x86_64, x86, powerpc and mips64.
[BZ #16339]
* sysdeps/i386/fpu/s_log1p.S (dbl_min): New object.
(__log1p): Force underflow exception for results with small
absolute value.
* sysdeps/i386/fpu/s_log1pf.S (flt_min): New object.
(__log1pf): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/dbl-64/s_log1p.c: Include <float.h>.
(__log1p): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/flt-32/s_log1pf.c: Include <float.h>.
(__log1pf): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_log1pl.c: Include <float.h>.
(__log1pl): Force underflow exception for results with small
absolute value.
* math/auto-libm-test-in: Do not allow missing underflow
exceptions from log1p.
* math/auto-libm-test-out: Regenerated.
This patch adds more randomly-generated tests of various libm
functions that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of csqrt, lgamma, log10
and sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds more randomly-generated tests of various libm
functions that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of acosh, atanh, cos,
csqrt, erfc, sin and sincos.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds more randomly-generated tests of various libm
functions that are observed to increase ulps on x86_64. (This process
must eventually converge, when my random test generation stops finding
inputs that increase the listed ulps, except maybe for any cases
uncovered where the errors exceed the maximum allowed 9ulp error and
so indicate actual libm bugs needing fixing.)
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of acosh, atanh, clog,
clog10, csqrt, erfc, exp2, expm1, log10, log2 and sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds more randomly-generated tests of various libm
functions that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of atan, clog, clog10,
cos, csqrt, erf, erfc, exp2, lgamma, log1p, sin, sincos, tanh and
tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of tgamma that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of tanh that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of tanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of tan that are observed
to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of tan.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of cos, sin and sincos
that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of cos, sin and sincos.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of lgamma that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of lgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.