Bug 19848 reports cases where powl on x86 / x86_64 has error
accumulation, for small integer exponents, larger than permitted by
glibc's accuracy goals, at least in some rounding modes. This patch
further restricts the exponent range for which the
small-integer-exponent logic is used to limit the possible error
accumulation.
Tested for x86_64 and x86 and ulps updated accordingly.
[BZ #19848]
* sysdeps/i386/fpu/e_powl.S (p3): Rename to p2 and change value
from 8 to 4.
(__ieee754_powl): Compare integer exponent against 4 not 8.
* sysdeps/x86_64/fpu/e_powl.S (p3): Rename to p2 and change value
from 8 to 4.
(__ieee754_powl): Compare integer exponent against 4 not 8.
* math/auto-libm-test-in: Add more tests of pow.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Since libmvec_nonshared.a may be linked into shared objects, ALIAS_IMPL
should use PIC relocation.
[BZ #19590]
* sysdeps/x86_64/fpu/svml_finite_alias.S (ALIAS_IMPL): Use PIC
relocation.
Old workaround based on assembly aliases can lead to link fail (bug 19058).
This patch makes workaround in another way to avoid it.
[BZ #19058]
* math/Makefile ($(inst_libdir)/libm.so): Added libmvec_nonshared.a
to AS_NEEDED.
* sysdeps/x86/fpu/bits/math-vector.h: Removed code with old workaround.
* sysdeps/x86_64/fpu/Makefile (libmvec-support,
libmvec-static-only-routines): Added new file.
* sysdeps/x86_64/fpu/svml_finite_alias.S: New file.
For the -ffinite-math-only versions of various x86_64 and x86 log*
functions, a zero result from log* (1) is returned with incorrect sign
in round-downward mode. This patch fixes this in a similar way to the
previous fixes for the non-*_finite versions of the functions.
Tested for x86_64 and x86 (including an i586 build), together with a
patch that will be applied separately to enable the main libm-test.inc
tests for the finite-math-only functions.
[BZ #19213]
* sysdeps/i386/fpu/e_log.S (__log_finite): Ensure +0 is always
returned for argument 1.
* sysdeps/i386/fpu/e_logf.S (__logf_finite): Likewise.
* sysdeps/i386/fpu/e_logl.S (__logl_finite): Likewise.
* sysdeps/i386/i686/fpu/e_logl.S (__logl_finite): Likewise.
* sysdeps/x86_64/fpu/e_log10l.S (__log10l_finite): Likewise.
* sysdeps/x86_64/fpu/e_log2l.S (__log2l_finite): Likewise.
* sysdeps/x86_64/fpu/e_logl.S (__logl_finite): Likewise.
fenv_t should include architecture-specific floating-point modes and
status flags. i386 and x86_64 fesetenv limit which bits they use from
the x87 status and control words, when using saved state, and limit
which parts of the state they set to fixed values, when using
FE_DFL_ENV / FE_NOMASK_ENV. The following should be included but are
excluded in at least some cases: status and masking for the "denormal
operand" exception (which isn't part of FE_ALL_EXCEPT); precision
control (explicitly mentioned in Annex F as something that counts as
part of the floating-point environment); MXCSR FZ and DAZ bits (for
FE_DFL_ENV and FE_NOMASK_ENV). This patch arranges for this extra
state to be handled by fesetenv (and thereby by feupdateenv, which
calls fesetenv).
(Note that glibc functions using floating point are not generally
expected to work correctly with non-default values of this state,
especially precision control, but it is still logically part of the
floating-point environment and should be handled as such by fesetenv.
Changes to the state relating to subnormals ought generally to work
with libm functions when the arguments aren't subnormal and neither
are the expected results; that's a consequence of functions avoiding
spurious internal underflows.)
A question arising from this is whether FE_NOMASK_ENV should or should
not mask the "denormal operand" exception. I decided it should mask
that exception. This is the status quo - previously that exception
could only be unmasked by direct manipulation of control registers
(possibly via <fpu_control.h>). In addition, it means that use of
FE_NOMASK_ENV leaves a floating-point environment the same as could be
obtained by fesetenv (FE_DFL_ENV); feenableexcept (FE_ALL_EXCEPT);,
rather than an environment in which an exception is unmasked that
could only be masked again by using fesetenv with FE_DFL_ENV (or a
previously saved environment) - this exception not being usable with
other <fenv.h> functions because it's outside FE_ALL_EXCEPT.
Tested for x86_64 and x86.
[BZ #16068]
* sysdeps/i386/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86_64/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86/fpu/test-fenv-sse-2.c: New file.
* sysdeps/x86/fpu/test-fenv-x87.c: Likewise.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-x87 and test-fenv-sse-2.
[$(subdir) = math] (CFLAGS-test-fenv-sse-2.c): New variable.
The i386 and x86_64 versions of fesetenv, when called with FE_DFL_ENV
or FE_NOMASK_ENV as argument, do not clear SSE exceptions raised in
MXCSR. These arguments should, like other fenv_t values, represent
the whole of the floating-point state, so such exceptions should be
cleared; this patch adds the required clearing. (Discovered while
working on bug 16068.)
Tested for x86_64 and x86.
[BZ #19181]
* sysdeps/i386/fpu/fesetenv.c (__fesetenv): Clear already-raised
SSE exceptions when argument is FE_DFL_ENV or FE_NOMASK_ENV.
* sysdeps/x86_64/fpu/fesetenv.c (__fesetenv): Likewise.
* math/test-fenv-clear-main.c: New file.
* math/test-fenv-clear.c: Likewise.
* math/Makefile (tests): Add test-fenv-clear.
* sysdeps/x86/fpu/test-fenv-clear-sse.c: New file.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-clear-sse.
[$(subdir) = math] (CFLAGS-test-fenv-clear-sse.c): New variable.
The implementations of nearbyint functions using x87 floating point
(i386 all versions, x86_64 long double only) use the fclex
instruction, which clears any exceptions that were raised before the
function was called. These functions must not clear exceptions that
were raised before they were called.
This patch fixes these functions to save and restore the whole
floating-point environment (fnstenv / fldenv) as the way of avoiding
raising "inexact" (recall that there isn't an x87 instruction for
loading just the status word, so the whole environment has to be saved
and loaded instead - the code already saved and loaded the control
word, which is now obtained from the saved environment after this
patch, to disable traps on "inexact"). In the case of the long double
functions, any "invalid" exception from frndint (applied to a
signaling NaN) needs merging into the saved state; this issue doesn't
apply to the float and double functions because that exception would
have been raised when the argument is loaded, before the environment
is saved.
[BZ #15491]
* sysdeps/i386/fpu/s_nearbyint.S (__nearbyint): Save and restore
floating-point environment instead of clearing all exceptions.
* sysdeps/i386/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sysdeps/i386/fpu/s_nearbyintl.S (__nearbyintl): Likewise,
merging in "invalid" exceptions from frndint.
* sysdeps/x86_64/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
* math/test-nearbyint-except.c: New file.
* math/Makefile (tests): Add test-nearbyint-except.
The x86_64 versions of lrint/lrintf/ lrintl are aliases for the long
long versions which isn't correct for x32, where exceptions must respect
overflow for 32-bit long. Separate versions of the long functions for
x32 that convert to 32-bit long and raise the right exceptions for that
conversion, while keeping the aliases in the non-x32 case.
Tested on x86_64 and x32. There are no code changes in libm.so on
x86_64.
* sysdeps/x86_64/fpu/s_llrint.S (__lrint): Add alias only if
__ILP32__ isn't defined.
(lrint): Likewise.
* sysdeps/x86_64/fpu/s_llrintf.S (__lrintf): Likewise.
(lrintf): Likewise.
* sysdeps/x86_64/fpu/s_llrintl.S (__lrintl): Likewise.
(lrintl): Likewise.
* sysdeps/x86_64/x32/fpu/s_lrint.S: New file.
* sysdeps/x86_64/x32/fpu/s_lrintf.S: Likewise.
* sysdeps/x86_64/x32/fpu/s_lrintl.S: Likewise.
This patch improves test coverage of the real libm functions [a-e]*,
ensuring that special cases and ranges of input values of potential
significance (such as close to overflow and underflow thresholds) are
more systematically covered.
This is a followup to
<https://sourceware.org/ml/libc-alpha/2013-12/msg00757.html> which
covered [a-c]* (however, I found more weaknesses in the coverage of
those functions when preparing this patch, hence the additional tests
being added for them here).
Addition of a test for acosh (-qNaN) is temporarily deferred, to be
included as part of a fix for bug 19032 which was discovered in the
course of adding these tests (and which illustrates the use of testing
-qNaN as well as +qNaN as input even to functions for which the sign
of a NaN isn't meant to be significant).
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
atan, atan2, atanh, cbrt, cos, cosh, erf, erfc, exp, exp10, exp2
and expm1.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (acos_test_data): Add more tests.
(asin_test_data): Likewise.
(asinh_test_data): Likewise.
(atan_test_data): Likewise.
(atanh_test_data): Likewise.
(atan2_test_data): Likewise.
(cbrt_test_data): Likewise.
(ceil_test_data): Likewise.
(copysign_test_data): Likewise.
(cos_test_data): Likewise.
(cosh_test_data): Likewise.
(erf_test_data): Likewise.
(erfc_test_data): Likewise.
(exp_test_data): Likewise.
(exp10_test_data): Likewise.
(exp2_test_data): Likewise.
(expm1_test_data): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large
errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids
cancellation error and then using log1p.
However, the thresholds for using that approach still result in log
being used on argument as large as sqrt(13/16) > 0.9, leading to
significant errors, in some cases above the 9ulp maximum allowed in
glibc libm. This patch arranges for the approach using log1p to be
used in any cases where |X|, |Y| < 1 and X^2 + Y^2 >= 0.5 (with the
existing allowance for cases where one of X and Y is very small),
adjusting the __x2y2m1 functions to work with the wider range of
inputs. This way, log only gets used on arguments below sqrt(1/2) (or
substantially above 1), where the error involved is much less.
Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration
I removed the existing clog and clog10 ulps before regenerating to
allow any reduced ulps to appear. Tests added include those found by
random test generation to produce large ulps either before or after
the patch, and some found by trying inputs close to the (0.75, 0.5)
threshold where the potential errors from using log are largest.
[BZ #19016]
* sysdeps/generic/math_private.h (__x2y2m1f): Update comment to
allow more cases with X^2 + Y^2 >= 0.5.
* sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as
normal element in sum instead of special-casing based on values of
arguments.
* sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add
-1 as normal element in sum instead of special-casing based on
values of arguments.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0]
(__x2y2m1): Update comment.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1
as normal element in sum instead of special-casing based on values
of arguments.
* math/s_clog.c (__clog): Handle more cases using log1p without
hypot.
* math/s_clog10.c (__clog10): Likewise.
* math/s_clog10f.c (__clog10f): Likewise.
* math/s_clog10l.c (__clog10l): Likewise.
* math/s_clogf.c (__clogf): Likewise.
* math/s_clogl.c (__clogl): Likewise.
* math/auto-libm-test-in: Add more tests of clog and clog10.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The flt-32 version of powf can be inaccurate because of bugs in the
extra-precision calculation of (x-1)/(x+1) or (x-1.5)/(x+1.5) as part
of calculating log(x) with extra precision: a constant used (as part
of adding 1 or 1.5 through integer arithmetic) is incorrect, and then
the code fails to mask a computed high part before using it in
arithmetic that relies on s_h*t_h being exactly representable. This
patch fixes these bugs.
Tested for x86_64 and x86. x86_64 ulps for powf removed and
regenerated to reflect reduced ulps from the increased accuracy for
existing tests.
[BZ #18956]
* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Add 0x00400000
not 0x0040000 for high bit of mantissa. Mask with 0xfffff000 when
extracting high part.
* math/auto-libm-test-in: Add another test of pow.
* math/auto-libm-test-out: Regenerated.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
Similar to various other bugs in this area, pow functions can fail to
raise the underflow exception when the result is tiny and inexact but
one or more low bits of the intermediate result that is scaled down
(or, in the i386 case, converted from a wider evaluation format) are
zero. This patch forces the exception in a similar way to previous
fixes, thereby concluding the fixes for known bugs with missing
underflow exceptions currently filed in Bugzilla.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18825]
* sysdeps/i386/fpu/i386-math-asm.h (FLT_NARROW_EVAL_UFLOW_NONNAN):
New macro.
(DBL_NARROW_EVAL_UFLOW_NONNAN): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNAN): Likewise.
* sysdeps/i386/fpu/e_pow.S: Use DEFINE_DBL_MIN.
(__ieee754_pow): Use DBL_NARROW_EVAL_UFLOW_NONNAN instead of
DBL_NARROW_EVAL, reloading the PIC register as needed.
* sysdeps/i386/fpu/e_powf.S: Use DEFINE_FLT_MIN.
(__ieee754_powf): Use FLT_NARROW_EVAL_UFLOW_NONNAN instead of
FLT_NARROW_EVAL. Use separate return path for case when first
argument is NaN.
* sysdeps/i386/fpu/e_powl.S: Include <i386-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN, reloading the
PIC register.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Force
underflow for subnormal result.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Use
math_check_force_underflow_nonneg.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Use
math_check_force_underflow.
* sysdeps/x86_64/fpu/x86_64-math-asm.h
(LDBL_CHECK_FORCE_UFLOW_NONNAN): New macro.
* sysdeps/x86_64/fpu/e_powl.S: Include <x86_64-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN.
* math/auto-libm-test-in: Add more tests of pow.
* math/auto-libm-test-out: Regenerated.
This patch refactors code in sysdeps/x86_64/fpu that forces underflow
exceptions and closely follows corresponding i386 code to use common
macros in x86_64-math-asm.h for that purpose. This is mainly about
keeping the code similar to the i386 code as far as possible, since
each macro apart from DEFINE_LDBL_MIN ends up used only once. It
would be possible to do a further refactoring to share these macros
between i386 and x86_64 (with i386 using the fcomip / fucomip versions
when building for i686 and above), but I have no immediate plans to do
so.
Tested for x86_64.
* sysdeps/x86_64/fpu/x86_64-math-asm.h: New file.
* sysdeps/x86_64/fpu/e_exp2l.S: Include <x86_64-math-asm.h>.
(ldbl_min): Replace with use of DEFINE_LDBL_MIN.
(__ieee754_exp2l): Use LDBL_CHECK_FORCE_UFLOW_NONNEG_NAN.
* sysdeps/x86_64/fpu/e_expl.S: Include <x86_64-math-asm.h>.
[!USE_AS_EXPM1L] (cmin): Replace with use of DEFINE_LDBL_MIN.
(IEEE754_EXPL): Use LDBL_CHECK_FORCE_UFLOW_NONNEG.
The x86_64 fma4 version of pow fails to disable contraction of
operations other than those explicitly intended to use fma
instructions, so resulting in large ulps errors on processors with
fma4 instructions, as in bug 18104 (165ulp for the test added for that
bug; error originally reported by "blaaa" on #glibc). This patch adds
$(config-cflags-nofma) for e_pow-fma4.c, corresponding to the use for
e_pow.c in sysdeps/ieee754/dbl-64/Makefile.
Tested for x86_64 on a processor with fma4.
[BZ #19003]
* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma4.c): Add
$(config-cflags-nofma).
As noted in bug 6803, scalbn fails to set errno on overflow and
underflow. This patch fixes this by making scalbn an alias of ldexp,
which has exactly the same semantics (for floating-point types with
radix 2) and already has wrappers that deal with setting errno,
instead of an alias of the internal __scalbn (which ldexp calls).
Notes:
* Where compat symbols were defined for scalbn functions, I didn't
change what they point to (to keep the patch minimal), so such
compat symbols continue to go directly to the non-errno-setting
functions.
* Mike, I didn't do anything with the IA64 versions of these
functions, where I think both the ldexp and scalbn functions already
deal with setting errno. As a cleanup (not needed to fix this bug)
however you might want to make those functions into aliases for
IA64; there is no need for them to be separate function
implementations at all.
* This concludes the fix for bug 6803 since the scalb and scalbln
cases of that bug were fixed some time ago.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #6803]
* math/s_ldexp.c (scalbn): Define as weak alias of __ldexp.
[NO_LONG_DOUBLE] (scalbnl): Define as weak alias of __ldexp.
* math/s_ldexpf.c (scalbnf): Define as weak alias of __ldexpf.
* math/s_ldexpl.c (scalbnl): Define as weak alias of __ldexpl.
* sysdeps/i386/fpu/s_scalbn.S (scalbn): Remove alias.
* sysdeps/i386/fpu/s_scalbnf.S (scalbnf): Likewise.
* sysdeps/i386/fpu/s_scalbnl.S (scalbnl): Likewise.
* sysdeps/ieee754/dbl-64/s_scalbn.c (scalbn): Likewise.
[NO_LONG_DOUBLE] (scalbnl): Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_scalbn.c (scalbn):
Likewise.
[NO_LONG_DOUBLE] (scalbnl): Likewise.
* sysdeps/ieee754/flt-32/s_scalbnf.c (scalbnf): Likewise.
* sysdeps/ieee754/ldbl-128/s_scalbnl.c (scalbnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (scalbnl): Remove
long_double_symbol calls.
* sysdeps/ieee754/ldbl-64-128/s_scalbnl.c (scalbnl): Likewise.
* sysdeps/ieee754/ldbl-opt/s_ldexpl.c (__ldexpl_2): Define as
strong alias of __ldexpl.
(scalbnl): Define using long_double_symbol.
* sysdeps/m68k/m680x0/fpu/s_scalbn.c (__CONCATX(scalbn,suffix)):
Remove alias.
* sysdeps/sparc/sparc64/soft-fp/s_scalbnl.c (scalbnl): Likewise.
* sysdeps/x86_64/fpu/s_scalbnl.S (scalbnl): Likewise.
* math/libm-test.inc (scalbn_test_data): Add errno expectations.
(scalbln_test_data): Add more errno expectations.
Various exp2 implementations in glibc can miss underflow exceptions
when the scaling down part of the calculation is exact (or, in the x86
case, when the conversion from extended precision to the target
precision is exact). This patch forces the exception in a similar way
to previous fixes.
The x86 exp2f changes may in fact not be needed for this purpose -
it's likely to be the case that no argument of type float has an exp2
result so close to an exact subnormal float value that it equals that
value when rounded to 64 bits (even taking account of variation
between different x86 implementations). However, they are included
for consistency with the changes to exp2 and so as to fix the exp2f
part of bug 18875 by ensuring that excess range and precision is
removed from underflowing return values.
Tested for x86_64, x86 and mips64.
[BZ #16521]
[BZ #18875]
* math/e_exp2l.c (__ieee754_exp2l): Force underflow exception for
small results.
* sysdeps/i386/fpu/e_exp2.S (dbl_min): New object.
(MO): New macro.
(__ieee754_exp2): For small results, force underflow exception and
remove excess range and precision from return value.
* sysdeps/i386/fpu/e_exp2f.S (flt_min): New object.
(MO): New macro.
(__ieee754_exp2f): For small results, force underflow exception
and remove excess range and precision from return value.
* sysdeps/i386/fpu/e_exp2l.S (ldbl_min): New object.
(MO): New macro.
(__ieee754_exp2l): Force underflow exception for small results.
* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c (__ieee754_exp2f): Likewise.
* sysdeps/x86_64/fpu/e_exp2l.S (ldbl_min): New object.
(MO): New macro.
(__ieee754_exp2l): Force underflow exception for small results.
* math/auto-libm-test-in: Add more tests or exp2.
* math/auto-libm-test-out: Regenerated.
This patch adds more libm test inputs found through random test
generation to increase previously known ulps. This particular test
generation was run for mips64, so most of the increased ulps are for
ldbl-128 (float and double having been fairly well covered by such
testing for x86_64), but there's the odd ulps increase for other
formats.
Tested for x86_64, x86 and mips64.
* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
asinh, atan, atan2, atanh, cabs, carg, cos, csqrt, erfc, exp,
exp10, exp2, log, log1p, log2, pow, sin, sincos, sinh, tan and
tanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/mips/mips32/libm-test-ulps: Likewise.
* sysdeps/mips/mips64/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds more libm test inputs found through random test
generation to increase observed ulps on x86_64.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acosh, atanh, cbrt,
cosh, csqrt, erfc, expm1 and lgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The existing implementations of lgamma functions (except for the ia64
versions) use the reflection formula for negative arguments. This
suffers large inaccuracy from cancellation near zeros of lgamma (near
where the gamma function is +/- 1).
This patch fixes this inaccuracy. For arguments above -2, there are
no zeros and no large cancellation, while for sufficiently large
negative arguments the zeros are so close to integers that even for
integers +/- 1ulp the log(gamma(1-x)) term dominates and cancellation
is not significant. Thus, it is only necessary to take special care
about cancellation for arguments around a limited number of zeros.
Accordingly, this patch uses precomputed tables of relevant zeros,
expressed as the sum of two floating-point values. The log of the
ratio of two sines can be computed accurately using log1p in cases
where log would lose accuracy. The log of the ratio of two gamma(1-x)
values can be computed using Stirling's approximation (the difference
between two values of that approximation to lgamma being computable
without computing the two values and then subtracting), with
appropriate adjustments (which don't reduce accuracy too much) in
cases where 1-x is too small to use Stirling's approximation directly.
In the interval from -3 to -2, using the ratios of sines and of
gamma(1-x) can still produce too much cancellation between those two
parts of the computation (and that interval is also the worst interval
for computing the ratio between gamma(1-x) values, which computation
becomes more accurate, while being less critical for the final result,
for larger 1-x). Because this can result in errors slightly above
those accepted in glibc, this interval is instead dealt with by
polynomial approximations. Separate polynomial approximations to
(|gamma(x)|-1)(x-n)/(x-x0) are used for each interval of length 1/8
from -3 to -2, where n (-3 or -2) is the nearest integer to the
1/8-interval and x0 is the zero of lgamma in the relevant half-integer
interval (-3 to -2.5 or -2.5 to -2).
Together, the two approaches are intended to give sufficient accuracy
for all negative arguments in the problem range. Outside that range,
the previous implementation continues to be used.
Tested for x86_64, x86, mips64 and powerpc. The mips64 and powerpc
testing shows up pre-existing problems for ldbl-128 and ldbl-128ibm
with large negative arguments giving spurious "invalid" exceptions
(exposed by newly added tests for cases this patch doesn't affect the
logic for); I'll address those problems separately.
[BZ #2542]
[BZ #2543]
[BZ #2558]
* sysdeps/ieee754/dbl-64/e_lgamma_r.c (__ieee754_lgamma_r): Call
__lgamma_neg for arguments from -28.0 to -2.0.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (__ieee754_lgammaf_r): Call
__lgamma_negf for arguments from -15.0 to -2.0.
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
Call __lgamma_negl for arguments from -48.0 or -50.0 to -2.0.
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (__ieee754_lgammal_r):
Call __lgamma_negl for arguments from -33.0 to -2.0.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: New file.
* sysdeps/ieee754/dbl-64/lgamma_product.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_productf.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_product.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_productl.c: Likewise.
* sysdeps/generic/math_private.h (__lgamma_negf): New prototype.
(__lgamma_neg): Likewise.
(__lgamma_negl): Likewise.
(__lgamma_product): Likewise.
(__lgamma_productl): Likewise.
* math/Makefile (libm-calls): Add lgamma_neg and lgamma_product.
* math/auto-libm-test-in: Add more tests of lgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The change in 0b5395f052 replaced calls
to __get_cpu_features@plt followed by a mov from rax to rdx, with a
single macro LOAD_RTLD_GLOBAL_RO_RDX. It is pretty clear that there
was a typo in s_floorf and __nearbyint due to which the (now incorrect)
mov was not removed. This patch removes that mov.
* sysdeps/x86_64/fpu/multiarch/s_floorf.S (__floorf): Remove
unnecessary movq.
* sysdeps/x86_64/fpu/multiarch/s_nearbyint.S (__nearbyint):
Likewise.
This patch adds more test inputs to various libm functions found
through random generation to have larger ulps errors than previously
listed in libm-test-ulp, on at least one of x86_64 and x86.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
asinh, atan, atan2, atanh, cabs, cbrt, cosh, csqrt, erf, erfc,
exp, exp2, lgamma, log, log1p, log2, pow, sin, sincos, tan, tanh
and tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds more tests of various libm functions found through
random test generation to give increased ulps on 32-bit x86.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acosh, asin, asinh,
atanh, cabs, carg, cbrt, cosh, csqrt, erf, erfc, exp, exp10,
expm1, hypot, log, log10, log1p, log2, pow, sinh, tan and tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
In non-default rounding modes, tgamma can be slightly less accurate
than permitted by glibc's accuracy goals.
Part of the problem is error accumulation, addressed in this patch by
setting round-to-nearest for internal computations. However, there
was also a bug in the code dealing with computing pow (x + n, x + n)
where x + n is not exactly representable, providing another source of
error even in round-to-nearest mode; it was necessary to address both
bugs to get errors for all testcases within glibc's accuracy goals.
Given this second fix, accuracy in round-to-nearest mode is also
improved (hence regeneration of ulps for tgamma should be from scratch
- truncate libm-test-ulps or at least remove existing tgamma entries -
so that the expected ulps can be reduced).
Some additional complications also arose. Certain tgamma tests should
strictly, according to IEEE semantics, overflow or not depending on
the rounding mode; this is beyond the scope of glibc's accuracy goals
for any function without exactly-determined results, but
gen-auto-libm-tests doesn't handle being lax there as it does for
underflow. (libm-test.inc also doesn't handle being lax about whether
the result in cases very close to the overflow threshold is infinity
or a finite value close to overflow, but that doesn't cause problems
in this case though I've seen it cause problems with random test
generation for some functions.) Thus, spurious-overflow markings,
with a comment, are added to auto-libm-test-in (no bug in Bugzilla
because the issue is with the testsuite, not a user-visible bug in
glibc). And on x86, after the patch I saw ERANGE issues as previously
reported by Carlos (see my commentary in
<https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>), which
needed addressing by ensuring excess range and precision were
eliminated at various points if FLT_EVAL_METHOD != 0.
I also noticed and fixed a cosmetic issue where 1.0f was used in long
double functions and should have been 1.0L.
This completes the move of all functions to testing in all rounding
modes with ALL_RM_TEST, so gen-libm-have-vector-test.sh is updated to
remove the workaround for some functions not using ALL_RM_TEST.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18613]
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Take log of
X_ADJ not X when adjusting exponent.
(__ieee754_gamma_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Take log
of X_ADJ not X when adjusting exponent.
(__ieee754_gammaf_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Take
log of X_ADJ not X when adjusting exponent.
(__ieee754_gammal_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed. Use 1.0L not 1.0f as numerator of division.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Take
log of X_ADJ not X when adjusting exponent.
(__ieee754_gammal_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed. Use 1.0L not 1.0f as numerator of division.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Take log
of X_ADJ not X when adjusting exponent.
(__ieee754_gammal_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed. Use 1.0L not 1.0f as numerator of division.
* math/libm-test.inc (tgamma_test_data): Remove one test. Moved
to auto-libm-test-in.
(tgamma_test): Use ALL_RM_TEST.
* math/auto-libm-test-in: Add one test of tgamma. Mark some other
tests of tgamma with spurious-overflow.
* math/auto-libm-test-out: Regenerated.
* math/gen-libm-have-vector-test.sh: Do not check for START.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Some existing jn tests, if run in non-default rounding modes, produce
errors above those accepted in glibc, which causes problems for moving
tests of jn to use ALL_RM_TEST. This patch makes jn set rounding
to-nearest internally, as was done for yn some time ago, then computes
the appropriate underflowing value for results that underflowed to
zero in to-nearest, and moves the tests to ALL_RM_TEST. It does
nothing about the general inaccuracy of Bessel function
implementations in glibc, though it should make jn more accurate on
average in non-default rounding modes through reduced error
accumulation. The recomputation of results that underflowed to zero
should as a side-effect fix some cases of bug 16559, where jn just
used an exact zero, but that is *not* the goal of this patch and other
cases of that bug remain unfixed.
(Most of the changes in the patch are reindentation to add new scopes
for SET_RESTORE_ROUND*.)
Tested for x86_64, x86, powerpc and mips64.
[BZ #16559]
[BZ #18602]
* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Set
round-to-nearest internally then recompute results that
underflowed to zero in the original rounding mode.
* sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise
* math/libm-test.inc (jn_test): Use ALL_RM_TEST.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
cexp, ccos, ccosh, csin and csinh have spurious underflows in cases
where they compute sin of the smallest normal, that produces an
underflow exception (depending on which sin implementation is in use)
but the final result does not underflow. ctan and ctanh may also have
such underflows, or they may be latent (the issue there is that
e.g. ctan (DBL_MIN) should, rounded upwards, be the next double value
above DBL_MIN, which under glibc's accuracy goals may not have an
underflow exception, but the intermediate computation of sin (DBL_MIN)
would legitimately underflow on before-rounding architectures).
This patch fixes all those functions so they use plain comparisons (>
DBL_MIN etc.) instead of comparing the result of fpclassify with
FP_SUBNORMAL (in all these cases, we already know the number being
compared is finite). Note that in the case of csin / csinf / csinl,
there is no need for fabs calls in the comparison because the real
part has already been reduced to its absolute value.
As the patch fixes the failures that previously obstructed moving
tests of cexp to use ALL_RM_TEST, those tests are moved to ALL_RM_TEST
by the patch (two functions remain yet to be converted).
Tested for x86_64 and x86 and ulps updated accordingly.
[BZ #18594]
* math/s_ccosh.c (__ccosh): Compare with least normal value
instead of comparing class with FP_SUBNORMAL.
* math/s_ccoshf.c (__ccoshf): Likewise.
* math/s_ccoshl.c (__ccoshl): Likewise.
* math/s_cexp.c (__cexp): Likewise.
* math/s_cexpf.c (__cexpf): Likewise.
* math/s_cexpl.c (__cexpl): Likewise.
* math/s_csin.c (__csin): Likewise.
* math/s_csinf.c (__csinf): Likewise.
* math/s_csinh.c (__csinh): Likewise.
* math/s_csinhf.c (__csinhf): Likewise.
* math/s_csinhl.c (__csinhl): Likewise.
* math/s_csinl.c (__csinl): Likewise.
* math/s_ctan.c (__ctan): Likewise.
* math/s_ctanf.c (__ctanf): Likewise.
* math/s_ctanh.c (__ctanh): Likewise.
* math/s_ctanhf.c (__ctanhf): Likewise.
* math/s_ctanhl.c (__ctanhl): Likewise.
* math/s_ctanl.c (__ctanl): Likewise.
* math/auto-libm-test-in: Add more tests of ccos, ccosh, cexp,
csin, csinh, ctan and ctanh.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (cexp_test): Use ALL_RM_TEST.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
csin and csinh can produce bad results when overflowing in directed
rounding modes, because a multiplication that can overflow is followed
by a possible negation. This patch fixes this by negating one of the
arguments of the multiplication before the multiplication instead of
negating the result.
The new tests for this issue are added to auto-libm-test-in, starting
use of that file for csin and csinh. The issue was found in the
course of moving existing tests for csin and csinh (existing tests, by
being enabled in more cases than previously, showed the issue for
float and double but not for long double); that move will now be done
separately.
Tested for x86_64 and x86 and ulps updated accordingly.
[BZ #18593]
* math/s_csin.c (__csin): Negate before rather than after possibly
overflowing multiplication.
* math/s_csinf.c (__csinf): Likewise.
* math/s_csinh.c (__csinh): Likewise.
* math/s_csinhf.c (__csinhf): Likewise.
* math/s_csinhl.c (__csinhl): Likewise.
* math/s_csinl.c (__csinl): Likewise.
* math/auto-libm-test-in: Add some tests of csin and csinh.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (csin_test_data): Use AUTO_TESTS_c_c.
(csinh_test_data): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
In the x86 / x86_64 implementations of expm1l, when expm1l's result
should underflow to 0 (argument minus the least subnormal, in some
rounding modes), it can be a zero of the wrong sign. This patch fixes
this by returning the argument with underflow forced in that case
(this is a 1ulp error relative to the correctly rounded result of -0,
which is OK in terms of the documented accuracy goals, whereas a
result with the wrong sign never is).
Tested for x86_64 and x86.
[BZ #18569]
* sysdeps/i386/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]: Force
underflow and return argument in case of subnormal argument.
* sysdeps/x86_64/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]:
Likewise.
* math/auto-libm-test-in: Add more tests of expm1.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, the x86 and x86_64
implementations of expl / exp10l can fail to produce underflow
exceptions when the unscaled result has trailing 0 bits so the scaling
down to subnormal precision is exact. This patch fixes this by
forcing the exception in the case of tiny results.
Tested for x86_64 and x86.
[BZ #16361]
* sysdeps/i386/fpu/e_expl.S [!USE_AS_EXPM1L] (cmin): New object.
[!USE_AS_EXPM1L] (IEEE754_EXPL): Force underflow exception for
tiny results.
* sysdeps/x86_64/fpu/e_expl.S [!USE_AS_EXPM1L] (cmin): New object.
[!USE_AS_EXPM1L] (IEEE754_EXPL): Force underflow exception for
tiny results.
* math/auto-libm-test-in: Add more tests of exp and exp10. Do not
mark underflow exceptions as possibly missing for bug 16361.
* math/auto-libm-test-out: Regenerated.