The x86_64 versions of lrint/lrintf/ lrintl are aliases for the long
long versions which isn't correct for x32, where exceptions must respect
overflow for 32-bit long. Separate versions of the long functions for
x32 that convert to 32-bit long and raise the right exceptions for that
conversion, while keeping the aliases in the non-x32 case.
Tested on x86_64 and x32. There are no code changes in libm.so on
x86_64.
* sysdeps/x86_64/fpu/s_llrint.S (__lrint): Add alias only if
__ILP32__ isn't defined.
(lrint): Likewise.
* sysdeps/x86_64/fpu/s_llrintf.S (__lrintf): Likewise.
(lrintf): Likewise.
* sysdeps/x86_64/fpu/s_llrintl.S (__lrintl): Likewise.
(lrintl): Likewise.
* sysdeps/x86_64/x32/fpu/s_lrint.S: New file.
* sysdeps/x86_64/x32/fpu/s_lrintf.S: Likewise.
* sysdeps/x86_64/x32/fpu/s_lrintl.S: Likewise.
This patch sets lseek/llseek for 64-bit, MIPS n32, and x86_32 as non-
cancelable. This make it consistant with 32-bit platform.
Tested on i686, x86_64, and x32.
* sysdeps/unix/sysv/linux/mips/mips64/syscalls.list (lseek): Set as
non-cancelable.
* sysdeps/unix/sysv/linux/wordsize-64/syscalls.list (llseek): Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/lseek.S (__libc_lseek64):
Likewise.
GCC added support for -mno-vzeroupper in version 4.6. Thus the
configure tests for this support are obsolete, and this patch removes
them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).
* sysdeps/i386/configure.ac (libc_cv_cc_novzeroupper): Remove
configure test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (libc_cv_cc_novzeroupper): Remove
configure test.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/Makefile [$(config-cflags-novzeroupper) = yes]:
Make code unconditional.
The dbl-64 implementation of lrint produces incorrect results for some
arguments with 64-bit long because a 32-bit (unsigned) low part of the
mantissa is shifted left, losing high bits in the process. This patch
fixes this by casting to long int before shifting, as in lround (as
this case only applies for 64-bit long, there are no issues with
sign-extension).
Tested for mips64 (n64).
[BZ #19095]
* sysdeps/ieee754/dbl-64/s_lrint.c (__lrint): Cast low part of
mantissa to long int before shifting left.
The dbl-64, ldbl-96 and ldbl-128 implementations of lrint and llrint
fail to produce "invalid" exceptions in cases where the rounded result
overflows the target type, but truncating the floating-point argument
to the next integer towards zero does not overflow it (so in
particular casts do not produce such exceptions). (This issue cannot
arise for float, or for double with 64-bit target type, or for ldbl-96
with 64-bit target type and negative arguments, because of
insufficient precision in the floating-point type for arguments with
the relevant property to exist. It also obviously cannot arise in
FE_TOWARDZERO mode.)
This patch fixes these problems by inserting checks for the special
cases that can occur in each implementation, and explicitly raising
FE_INVALID (and avoiding the cast if it might raise spurious
FE_INEXACT, while raising FE_INEXACT explicitly in the cases where it
is needed; unlike lround and llround, FE_INEXACT is required, not
optional, for these functions for a within-range inexact result).
The fixes are conditional on FE_INVALID or FE_INEXACT being defined.
If any future architecture supports one but not both of those
exceptions, the code will fail to compile and need fixing to handle
that case (this seemed better than conditioning on both macros being
defined, resulting in code that would compile but quietly miss
exceptions on such a system).
Tested for x86_64, x86 and mips64. Tested the ldbl-96 changes (only
relevant for ia64, it appears) on x86_64 by removing the x86_64
versions of lrintl / llrintl.
[BZ #19094]
* sysdeps/ieee754/dbl-64/s_lrint.c: Include <fenv.h> and
<limits.h>.
(__lrint) [FE_INVALID || FE_INEXACT]: Force FE_INVALID exception
when result overflows but exception would not result from cast.
* sysdeps/ieee754/ldbl-128/s_llrintl.c: Include <fenv.h> and
<limits.h>.
(__llrintl) [FE_INVALID || FE_INEXACT]: Force FE_INVALID exception
when result overflows but exception would not result from cast.
* sysdeps/ieee754/ldbl-128/s_lrintl.c: Include <fenv.h> and
<limits.h>.
(__lrintl) [FE_INVALID || FE_INEXACT]: Force FE_INVALID exception
when result overflows but exception would not result from cast.
* sysdeps/ieee754/ldbl-96/s_llrintl.c: Include <fenv.h> and
<limits.h>.
(__llrintl) [FE_INVALID || FE_INEXACT]: Force FE_INVALID exception
when result overflows but exception would not result from cast.
* sysdeps/ieee754/ldbl-96/s_lrintl.c: Include <fenv.h> and
<limits.h>.
(__lrintl) [FE_INVALID || FE_INEXACT]: Force FE_INVALID exception
when result overflows but exception would not result from cast.
* math/libm-test.inc (lrint_test_data): Add more tests.
(llrint_test_data): Likewise.
The dbl-64, ldbl-96 and ldbl-128 implementations of lround and llround
fail to produce "invalid" exceptions in cases where the rounded result
overflows the target type, but truncating the floating-point argument
to the next integer towards zero does not overflow it (so in
particular casts do not produce such exceptions). (This issue cannot
arise for float, or for double with 64-bit target type, or for ldbl-96
with 64-bit target type and negative arguments, because of
insufficient precision in the floating-point type for arguments with
the relevant property to exist.)
This patch fixes these problems by inserting checks for the special
cases that can occur in each implementation, and explicitly raising
FE_INVALID (and avoiding the cast if it might raise spurious
FE_INEXACT).
Tested for x86_64, x86 and mips64.
[BZ #19088]
* sysdeps/ieee754/dbl-64/s_lround.c: Include <fenv.h> and
<limits.h>.
(__lround) [FE_INVALID]: Force FE_INVALID exception when result
overflows but exception would not result from cast.
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Include <fenv.h>
and <limits.h>.
(__lround) [FE_INVALID]: Force FE_INVALID exception when result
overflows but exception would not result from cast.
* sysdeps/ieee754/ldbl-128/s_llroundl.c: Include <fenv.h> and
<limits.h>.
(__llroundl) [FE_INVALID]: Force FE_INVALID exception when result
overflows but exception would not result from cast.
* sysdeps/ieee754/ldbl-128/s_lroundl.c: Include <fenv.h> and
<limits.h>.
(__lroundl) [FE_INVALID]: Force FE_INVALID exception when result
overflows but exception would not result from cast.
* sysdeps/ieee754/ldbl-96/s_llroundl.c: Include <fenv.h> and
<limits.h>.
(__llroundl) [FE_INVALID]: Force FE_INVALID exception when result
overflows but exception would not result from cast.
* sysdeps/ieee754/ldbl-96/s_lroundl.c: Include <fenv.h> and
<limits.h>.
(__lroundl) [FE_INVALID]: Force FE_INVALID exception when result
overflows but exception would not result from cast.
* math/libm-test.inc (lround_test_data): Add more tests.
(llround_test_data): Likewise.
The ldbl-128 implementations of lrintl and lroundl miss "invalid"
exceptions on systems with 32-bit long for arguments that overflow
long but have exponent below 48. This patch fixes this by rearranging
the sequence of tests in the code so the exponent < 48 case is only
used for exponents that don't overflow long.
Tested for mips64 (n32 and n64).
[BZ #19085]
* sysdeps/ieee754/ldbl-128/s_lrintl.c (__lrintl): Move test for
exponent below 48 inside case for non-overflowing exponent.
* sysdeps/ieee754/ldbl-128/s_lroundl.c (__lroundl): Likewise.
This patch enables use of sysdeps/ieee754/dbl-64/wordsize-64 for
MIPS64 (both n64 and n32), removing a #error in one case now that case
has been tested and found to work.
Tested for mips64 (n64 and n32).
* sysdeps/mips/mips64/Implies: Use ieee754/dbl-64/wordsize-64.
* sysdeps/ieee754/dbl-64/wordsize-64/s_issignaling.c
(__issignaling) [HIGH_ORDER_BIT_IS_SET_FOR_SNAN]: Remove #error.
The implementation of lround in dbl-64/wordsize-64 as an alias or
wrapper for llround is always incorrect when long is not 64-bit,
because it misses required exceptions in overflow cases, as shown by
my recently added tests. This patch removes that alias / wrapper in
the non-LP64 case, together with the REGISTER_CAST_INT32_TO_INT64
macro, restoring the previous version of lround for dbl-64/wordsize-64
(newly conditioned on !_LP64).
Tested for x86_64, and for mips64 with use of dbl-64/wordsize-64
enabled.
[BZ #19079]
* sysdeps/ieee754/dbl-64/wordsize-64/s_lround.c: Restore previous
file, conditioned on [!_LP64].
* sysdeps/ieee754/dbl-64/wordsize-64/s_llround.c
[!_LP64] (__lround): Do not define as function or alias.
[!_LP64] (lround): Likewise.
[!_LP64] (__lroundl): Likewise.
[!_LP64] (lroundl): Likewise.
* sysdeps/tile/sysdep.h (REGISTER_CAST_INT32_TO_INT64): Remove
macro.
* sysdeps/x86_64/x32/sysdep.h (REGISTER_CAST_INT32_TO_INT64):
Likewise.
GCC added support for -msse4 in version 4.3. Thus the configure tests
for it are obsolete, and this patch removes them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).
* sysdeps/i386/configure.ac (libc_cv_cc_sse4): Remove configure
test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/i386/i686/multiarch/Makefile
[$(config-cflags-sse4) = yes]: Make code unconditional.
* sysdeps/i386/i686/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]:
Likewise.
* sysdeps/i386/i686/multiarch/strspn.S [HAVE_SSE4_SUPPORT]:
Likewise.
* sysdeps/x86_64/configure.ac (libc_cv_cc_sse4): Remove configure
test.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/multiarch/Makefile [$(config-cflags-sse4) = yes]:
Make code unconditional.
* sysdeps/x86_64/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]:
Likewise.
* sysdeps/x86_64/multiarch/strspn.S [HAVE_SSE4_SUPPORT]: Likewise.
* config.h.in (HAVE_SSE4_SUPPORT): Remove #undef.
The ldbl-128ibm expl wrapper checks the argument to determine when to
call __kernel_standard_l, thereby overriding overflowing results from
__ieee754_expl that could otherwise (given appropriately patched
libgcc) be correct for the rounding mode. This patch changes it to
check the result of __ieee754_expl instead, as other versions of this
wrapper do.
Tested for powerpc.
[BZ #19078]
* sysdeps/ieee754/ldbl-128ibm/w_expl.c (o_thres): Remove variable.
(u_thres): Likewise.
(__expl): Determine whether to call __kernel_standard_l based on
value of result, not argument.
The ldbl-128ibm implementation of logl produces a zero with the wrong
sign for logl (1) in FE_DOWNWARD mode. This patch makes it explicitly
return 0.0L in that case, as in e.g. the ldbl-128 implementation.
Tested for powerpc.
[BZ #19077]
* sysdeps/ieee754/ldbl-128ibm/e_logl.c (__ieee754_logl): Return
0.0L for argument 1.0L.
The ldbl-128ibm implementation of log1pl produces an infinity with the
wrong sign for log1pl (-1) in FE_DOWNWARD mode. This patch fixes this
by changing a division (-1.0L / (x - x)) (incorrect in FE_DOWNWARD
mode) to (-1.0L / 0.0L) (correct in all rounding modes).
Tested for powerpc.
[BZ #19076]
* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c (__log1pl): Divide by
constant 0.0L when computing infinite result.
The ldbl-96 version of lroundl is incorrect for systems with 64-bit
long when the argument's absolute value is just below a power of 2,
2^32 or more, and rounds up to the next integer; in such cases, it
returns 0. The problem is incrementing the high part of the mantissa
loses the high bit of the value (which is not an issue for any other
floating-point format, and is handled specially in lround when the bit
corresponding to 0.5 was in the high part rather than the low part).
This patch fixes this in a similar way to that used in llroundl:
storing the high part in an unsigned long variable before incrementing
it, so problems cannot occur in the case when this code is reachable.
I improved test coverage for both lround and llround by making them
use the same test inputs (appropriately conditioned on the size of
long in the lround case) - complete with the same comments, to make
comparison as easy as possible. (This test coverage improvement was
how I found the lroundl bug.)
Tested for x86_64 and x86.
[BZ #19071]
* sysdeps/ieee754/ldbl-96/s_lroundl.c (__lroundl): Use unsigned
long int variable to store possibly incremented high part of
mantissa.
* math/libm-test.inc (lround_test_data): Add tests used for
llround. Use [LONG_MAX > 0x7fffffff] consistently as condition
for tests requiring 64-bit long. Do not condition tests on
[TEST_FLOAT] unnecessarily.
(llround_test_data): Add tests used for lround. Add another
expectation for the "inexact" exception. Do not condition tests
on [TEST_FLOAT] unnecessarily.
On powerpc32 hard-float, older processors (ones where fcfid is not
available for 32-bit code), GCC generates conversions from integers to
floating point that wrongly convert integer 0 to -0 instead of +0 in
FE_DOWNWARD mode. This in turn results in logb and a few other
functions wrongly returning -0 when they should return +0.
This patch works around this issue in glibc as I proposed in
<https://sourceware.org/ml/libc-alpha/2015-09/msg00728.html>, so that
the affected functions can be correct and the affected tests pass in
the absence of a GCC fix for this longstanding issue (GCC bug 67771 -
if fixed, of course we can put in GCC version conditionals, and
eventually phase out the workarounds). A new macro
FIX_INT_FP_CONVERT_ZERO is added in a new sysdeps header
fix-int-fp-convert-zero.h, and the powerpc32/fpu version of that
header defines the macro based on the results of a configure test for
whether such conversions use the fcfid instruction.
Tested for x86_64 (that installed stripped shared libraries are
unchanged by the patch) and powerpc (that HAVE_PPC_FCFID comes out to
0 as expected and that the relevant tests are fixed). Also tested a
build with GCC configured for -mcpu=power4 and verified that
HAVE_PPC_FCFID comes out to 1 in that case.
There are still some other issues to fix to get test-float and
test-double passing cleanly for older powerpc32 processors (apart from
the need for an ulps regeneration for powerpc). (test-ldouble will be
harder to get passing cleanly, but with a combination of selected
fixes to ldbl-128ibm code that don't involve significant performance
issues, allowing spurious underflow and inexact exceptions for that
format, and lots of XFAILing for the default case of unpatched libgcc,
it should be doable.)
[BZ #887]
[BZ #19049]
[BZ #19050]
* sysdeps/generic/fix-int-fp-convert-zero.h: New file.
* sysdeps/ieee754/dbl-64/e_log10.c: Include
<fix-int-fp-convert-zero.h>.
(__ieee754_log10): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/ieee754/dbl-64/e_log2.c: Include
<fix-int-fp-convert-zero.h>.
(__ieee754_log2): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/ieee754/dbl-64/s_erf.c: Include
<fix-int-fp-convert-zero.h>.
(__erfc): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/ieee754/dbl-64/s_logb.c: Include
<fix-int-fp-convert-zero.h>.
(__logb): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/ieee754/flt-32/e_log10f.c: Include
<fix-int-fp-convert-zero.h>.
(__ieee754_log10f): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/ieee754/flt-32/e_log2f.c: Include
<fix-int-fp-convert-zero.h>.
(__ieee754_log2f): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/ieee754/flt-32/s_erff.c: Include
<fix-int-fp-convert-zero.h>.
(__erfcf): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/ieee754/flt-32/s_logbf.c: Include
<fix-int-fp-convert-zero.h>.
(__logbf): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/ieee754/ldbl-128ibm/s_erfl.c: Include
<fix-int-fp-convert-zero.h>.
(__erfcl): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/ieee754/ldbl-128ibm/s_logbl.c: Include
<fix-int-fp-convert-zero.h>.
(__logbl): Adjust signs as needed if FIX_INT_FP_CONVERT_ZERO.
* sysdeps/powerpc/powerpc32/fpu/configure.ac: New file.
* sysdeps/powerpc/powerpc32/fpu/configure: New generated file.
* sysdeps/powerpc/powerpc32/fpu/fix-int-fp-convert-zero.h: New
file.
* config.h.in [_LIBC] (HAVE_PPC_FCFID): New macro.
ISO C requires overflowing results from nexttoward to be the
appropriate infinity independent of the rounding mode, but some
implementations use a rounding-mode-dependent result (this is the same
issue as was fixed for nextafter in bug 16677). This patch fixes the
problem by making the nexttoward implementations discard the result
from the floating-point computation that forced an overflow exception
and then return the infinity previously computed with integer
arithmetic.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #19059]
* math/s_nexttowardf.c (__nexttowardf): Do not return value from
overflowing computation.
* sysdeps/i386/fpu/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/i386/fpu/s_nexttowardf.c (__nexttowardf): Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c (__nldbl_nexttowardf):
Likewise.
* math/libm-test.inc (nexttoward_test_data): Add more tests.
The file sysdeps/powerpc/sysdeps.h defines aliases for register operands,
which add the letter 'r' as a prefix to a register name. E.g.: register 20
can be written as 'r20', instead of '20'. On the one hand, this increases
readability, as it makes it easier for readers to know whether the operand is a
register or an immediate. On the other hand, this permits that immediate
operands be written as if they were registers, and vice-versa, thus reducing
the readability of the code.
This commit removes some of these unintentional misuses.
This commit also increases readability of the code by adding the prefix 'cr' to
some uses of the control register.
Both changes have no effect on the final code. Checked with objdump.
* sysdeps/powerpc/powerpc64/power8/strncpy.S: Remove or add register
prefix from operands.
The ldbl-128 / ldbl-128ibm implementation of lgamma has problems with
its handling of large arguments. It has an overflow threshold that is
correct only for ldbl-128, despite being used for both types - with
diagnostic control macros as a temporary measure to disable warnings
about that constant overflowing for ldbl-128ibm - and it has a
calculation that's roughly x * log(x) - x, resulting in overflows for
arguments that are roughly at most a factor 1/log(threshold) below the
overflow threshold.
This patch fixes both issues, using an overflow threshold appropriate
for the type in question and adding another case for large arguments
that avoids the possible intermediate overflow.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #16347]
[BZ #19046]
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c: Do not include
<libc-internal.h>.
(MAXLGM): Do not use diagnostic control macros.
[LDBL_MANT_DIG == 106] (MAXLGM): Change value to overflow
threshold for ldbl-128ibm.
(__ieee754_lgammal_r): For large arguments, multiply by log - 1
instead of multiplying by log then subtracting.
* math/auto-libm-test-in: Add more tests of lgamma.
* math/auto-libm-test-out: Regenerated.
The ldbl-128ibm implementation of exp10l uses a version of log(10)
split into high and low parts - but the low part is negative, so
causing spurious overflows from __ieee754_expl (exp_high) in cases
close to the overflow threshold (I added relevant tests close to the
overflow threshold to the testsuite earlier today). The same issue
applies close to the underflow threshold as well (except that spurious
underflows in IBM long double arithmetic are harder to fix than the
other deficiencies, so we might end up permitting those for IBM long
double in the libm testsuite, as permitted by ISO C).
This patch fixes it to use a low part rounded downward to 48 bits
instead. (The choice of 48 instead of 53 bits is to make it more
obviously safe even when the low part of the argument is negative.)
Tested for powerpc. (Note that because of libgcc bugs with
multiplication very close to LDBL_MAX, libgcc also needs patching for
all the problem cases to be fixed, but this patch is still safe and
correct in the absence of such libgcc fixes.)
[BZ #16620]
* sysdeps/ieee754/ldbl-128ibm/e_exp10l.c (log10_high): Use value
of log (10) rounded downward to 48 bits.
(log10_low): Use corresponding low part of log (10).
The i386 versions of acoshf and acosh raise a spurious "invalid"
exception for an argument that is a quiet NaN with the sign bit set.
The integer arithmetic to detect arguments < 1 also detects -NaN, and
then the computation 0 / 0 in that case raises the exception. This
patch fixes this by using (x - x) / (x - x) as the computation in that
case instead, which will always raise the exception for non-NaN
arguments reaching that code, but not for quiet NaN arguments.
Tested for x86_64 and x86.
[BZ #19032]
* sysdeps/i386/fpu/e_acosh.S (__ieee754_acosh): For arguments < 1,
compute result as (x - x) / (x - x) not as 0 / 0.
* sysdeps/i386/fpu/e_acoshf.S (__ieee754_acoshf): Likewise.
* math/libm-test.inc (acosh_test_data): Add another test of acosh.
This patch improves test coverage of the real libm functions [a-e]*,
ensuring that special cases and ranges of input values of potential
significance (such as close to overflow and underflow thresholds) are
more systematically covered.
This is a followup to
<https://sourceware.org/ml/libc-alpha/2013-12/msg00757.html> which
covered [a-c]* (however, I found more weaknesses in the coverage of
those functions when preparing this patch, hence the additional tests
being added for them here).
Addition of a test for acosh (-qNaN) is temporarily deferred, to be
included as part of a fix for bug 19032 which was discovered in the
course of adding these tests (and which illustrates the use of testing
-qNaN as well as +qNaN as input even to functions for which the sign
of a NaN isn't meant to be significant).
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
atan, atan2, atanh, cbrt, cos, cosh, erf, erfc, exp, exp10, exp2
and expm1.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (acos_test_data): Add more tests.
(asin_test_data): Likewise.
(asinh_test_data): Likewise.
(atan_test_data): Likewise.
(atanh_test_data): Likewise.
(atan2_test_data): Likewise.
(cbrt_test_data): Likewise.
(ceil_test_data): Likewise.
(copysign_test_data): Likewise.
(cos_test_data): Likewise.
(cosh_test_data): Likewise.
(erf_test_data): Likewise.
(erfc_test_data): Likewise.
(exp_test_data): Likewise.
(exp10_test_data): Likewise.
(exp2_test_data): Likewise.
(expm1_test_data): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large
errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids
cancellation error and then using log1p.
However, the thresholds for using that approach still result in log
being used on argument as large as sqrt(13/16) > 0.9, leading to
significant errors, in some cases above the 9ulp maximum allowed in
glibc libm. This patch arranges for the approach using log1p to be
used in any cases where |X|, |Y| < 1 and X^2 + Y^2 >= 0.5 (with the
existing allowance for cases where one of X and Y is very small),
adjusting the __x2y2m1 functions to work with the wider range of
inputs. This way, log only gets used on arguments below sqrt(1/2) (or
substantially above 1), where the error involved is much less.
Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration
I removed the existing clog and clog10 ulps before regenerating to
allow any reduced ulps to appear. Tests added include those found by
random test generation to produce large ulps either before or after
the patch, and some found by trying inputs close to the (0.75, 0.5)
threshold where the potential errors from using log are largest.
[BZ #19016]
* sysdeps/generic/math_private.h (__x2y2m1f): Update comment to
allow more cases with X^2 + Y^2 >= 0.5.
* sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as
normal element in sum instead of special-casing based on values of
arguments.
* sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add
-1 as normal element in sum instead of special-casing based on
values of arguments.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0]
(__x2y2m1): Update comment.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1
as normal element in sum instead of special-casing based on values
of arguments.
* math/s_clog.c (__clog): Handle more cases using log1p without
hypot.
* math/s_clog10.c (__clog10): Likewise.
* math/s_clog10f.c (__clog10f): Likewise.
* math/s_clog10l.c (__clog10l): Likewise.
* math/s_clogf.c (__clogf): Likewise.
* math/s_clogl.c (__clogl): Likewise.
* math/auto-libm-test-in: Add more tests of clog and clog10.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The flt-32 version of powf can be inaccurate because of bugs in the
extra-precision calculation of (x-1)/(x+1) or (x-1.5)/(x+1.5) as part
of calculating log(x) with extra precision: a constant used (as part
of adding 1 or 1.5 through integer arithmetic) is incorrect, and then
the code fails to mask a computed high part before using it in
arithmetic that relies on s_h*t_h being exactly representable. This
patch fixes these bugs.
Tested for x86_64 and x86. x86_64 ulps for powf removed and
regenerated to reflect reduced ulps from the increased accuracy for
existing tests.
[BZ #18956]
* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Add 0x00400000
not 0x0040000 for high bit of mantissa. Mask with 0xfffff000 when
extracting high part.
* math/auto-libm-test-in: Add another test of pow.
* math/auto-libm-test-out: Regenerated.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
Similar to various other bugs in this area, pow functions can fail to
raise the underflow exception when the result is tiny and inexact but
one or more low bits of the intermediate result that is scaled down
(or, in the i386 case, converted from a wider evaluation format) are
zero. This patch forces the exception in a similar way to previous
fixes, thereby concluding the fixes for known bugs with missing
underflow exceptions currently filed in Bugzilla.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18825]
* sysdeps/i386/fpu/i386-math-asm.h (FLT_NARROW_EVAL_UFLOW_NONNAN):
New macro.
(DBL_NARROW_EVAL_UFLOW_NONNAN): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNAN): Likewise.
* sysdeps/i386/fpu/e_pow.S: Use DEFINE_DBL_MIN.
(__ieee754_pow): Use DBL_NARROW_EVAL_UFLOW_NONNAN instead of
DBL_NARROW_EVAL, reloading the PIC register as needed.
* sysdeps/i386/fpu/e_powf.S: Use DEFINE_FLT_MIN.
(__ieee754_powf): Use FLT_NARROW_EVAL_UFLOW_NONNAN instead of
FLT_NARROW_EVAL. Use separate return path for case when first
argument is NaN.
* sysdeps/i386/fpu/e_powl.S: Include <i386-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN, reloading the
PIC register.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Force
underflow for subnormal result.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Use
math_check_force_underflow_nonneg.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Use
math_check_force_underflow.
* sysdeps/x86_64/fpu/x86_64-math-asm.h
(LDBL_CHECK_FORCE_UFLOW_NONNAN): New macro.
* sysdeps/x86_64/fpu/e_powl.S: Include <x86_64-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN.
* math/auto-libm-test-in: Add more tests of pow.
* math/auto-libm-test-out: Regenerated.
Systems without floating-point exceptions and rounding modes should
use the soft-fp versions of fmaf and fma, not the sysdeps/ieee754
versions that rely on setting rounding to zero and testing for the
"inexact" exception; this has been noted on
<https://sourceware.org/glibc/wiki/PortStatus> for some time. This
patch makes no-FPU ColdFire use the soft-fp files; sfp-machine.h is
made to include the nios2 version of sfp-machine.h which seems
sufficiently generic for 32-bit systems.
[BZ #13304]
* sysdeps/m68k/coldfire/nofpu/s_fma.c: New file.
* sysdeps/m68k/coldfire/nofpu/s_fmaf.c: Likewise.
* sysdeps/m68k/coldfire/nofpu/sfp-machine.h: Likewise.
Systems without floating-point exceptions and rounding modes should
use the soft-fp versions of fmaf and fma, not the sysdeps/ieee754
versions that rely on setting rounding to zero and testing for the
"inexact" exception; this has been noted on
<https://sourceware.org/glibc/wiki/PortStatus> for some time. This
patch makes MicroBlaze use the soft-fp files; sfp-machine.h is made to
include the nios2 version of sfp-machine.h which seems sufficiently
generic for 32-bit systems.
[BZ #13304]
* sysdeps/microblaze/s_fma.c: New file.
* sysdeps/microblaze/s_fmaf.c: Likewise.
* sysdeps/microblaze/sfp-machine.h: Likewise.
Similar to various other bugs in this area, hypot functions can fail
to raise the underflow exception when the result is tiny and inexact
but one or more low bits of the intermediate result that is scaled
down (or, in the i386 case, converted from a wider evaluation format)
are zero. This patch forces the exception in a similar way to
previous fixes.
Note that this issue cannot arise for implementations of hypotf using
double (or wider) for intermediate evaluation (if hypotf should
underflow, that means the double square root is being computed of some
number of the form N*2^-298, for 0 < N < 2^46, which is exactly
represented as a double, and whatever the rounding mode such a square
root cannot have a mantissa with all zeroes after the initial 23
bits). Thus no changes are made to hypotf implementations in this
patch, only to hypot and hypotl.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18803]
* sysdeps/i386/fpu/e_hypot.S: Use DEFINE_DBL_MIN.
(MO): New macro.
(__ieee754_hypot) [PIC]: Load PIC register.
(__ieee754_hypot): Use DBL_NARROW_EVAL_UFLOW_NONNEG instead of
DBL_NARROW_EVAL.
* sysdeps/ieee754/dbl-64/e_hypot.c (__ieee754_hypot): Use
math_check_force_underflow_nonneg in case where result might be
tiny.
* sysdeps/ieee754/ldbl-128/e_hypotl.c (__ieee754_hypotl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c (__ieee754_hypotl):
Likewise.
* sysdeps/ieee754/ldbl-96/e_hypotl.c (__ieee754_hypotl): Likewise.
* sysdeps/powerpc/fpu/e_hypot.c (__ieee754_hypot): Likewise.
* math/auto-libm-test-in: Add more tests of hypot.
* math/auto-libm-test-out: Regenerated.
This patch refactors code in sysdeps/x86_64/fpu that forces underflow
exceptions and closely follows corresponding i386 code to use common
macros in x86_64-math-asm.h for that purpose. This is mainly about
keeping the code similar to the i386 code as far as possible, since
each macro apart from DEFINE_LDBL_MIN ends up used only once. It
would be possible to do a further refactoring to share these macros
between i386 and x86_64 (with i386 using the fcomip / fucomip versions
when building for i686 and above), but I have no immediate plans to do
so.
Tested for x86_64.
* sysdeps/x86_64/fpu/x86_64-math-asm.h: New file.
* sysdeps/x86_64/fpu/e_exp2l.S: Include <x86_64-math-asm.h>.
(ldbl_min): Replace with use of DEFINE_LDBL_MIN.
(__ieee754_exp2l): Use LDBL_CHECK_FORCE_UFLOW_NONNEG_NAN.
* sysdeps/x86_64/fpu/e_expl.S: Include <x86_64-math-asm.h>.
[!USE_AS_EXPM1L] (cmin): Replace with use of DEFINE_LDBL_MIN.
(IEEE754_EXPL): Use LDBL_CHECK_FORCE_UFLOW_NONNEG.
sysdeps/i386/fpu/e_atanh.S, unlike all other functions in that
directory, loads the PIC register with its own code using
_GLOBAL_OFFSET_TABLE_, rather than with the LOAD_PIC_REG macro. I see
no good reason for the difference; this patch makes it use the common
macro.
Tested for x86.
* sysdeps/i386/fpu/e_atanh.S (__ieee754_atanh) [PIC]: Use
LOAD_PIC_REG.
This patch refactors code in sysdeps/i386/fpu that forces underflow
exceptions to use common macros for that purpose as far as possible.
(Although some of the macros end up used in only one place, I think
it's cleanest to define all these macros together so that all the code
forcing underflow uses such macros. Some more uses of such macros
will also be introduced when fixing remaining bugs about missing
underflow exceptions, and it would be possible to do further
refactoring of the macros in i386-math-asm.h to share more code by
using other macros internally. Places that test for underflow by
examining the representation of the argument with integer operations,
rather that using floating-point comparisons on the argument or
result, are unchanged by this patch.)
Most of this code uses a macro MO to abstract away the differences
between PIC and non-PIC memory references to constants. log1p
functions, however, hardcoded PIC conditionals for this. Because the
common macros rely on the use of MO, I changed the log1p functions to
use the normal style here, and, for consistency, also made that change
to log1pl which is otherwise unaffected by this patch.
Tested for x86.
* sysdeps/i386/fpu/i386-math-asm.h (DEFINE_LDBL_MIN): New macro.
(FLT_CHECK_FORCE_UFLOW): Likewise.
(DBL_CHECK_FORCE_UFLOW): Likewise.
(FLT_CHECK_FORCE_UFLOW_NARROW): Likewise.
(DBL_CHECK_FORCE_UFLOW_NARROW): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNEG_NAN): Likewise.
(FLT_CHECK_FORCE_UFLOW_NONNAN): Likewise.
(DBL_CHECK_FORCE_UFLOW_NONNAN): Likewise.
(FLT_CHECK_FORCE_UFLOW_NONNEG): Likewise.
(DBL_CHECK_FORCE_UFLOW_NONNEG): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNEG): Likewise.
* sysdeps/i386/fpu/e_asin.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_asin): Use DBL_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/e_asinf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_asinf): Use FLT_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/e_atan2.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_atan2): Use DBL_CHECK_FORCE_UFLOW_NARROW.
* sysdeps/i386/fpu/e_atan2f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_atan2f): Use FLT_CHECK_FORCE_UFLOW_NARROW.
* sysdeps/i386/fpu/e_atanh.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_atanh): Use DBL_CHECK_FORCE_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_atanhf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_atanhf): Use FLT_CHECK_FORCE_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_exp2l.S: Include <i386-math-asm.h>.
(ldbl_min): Replace with use of DEFINE_LDBL_MIN.
(__ieee754_exp2l): Use LDBL_CHECK_FORCE_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_expl.S: Include <i386-math-asm.h>.
[!USE_AS_EXPM1L] (cmin): Replace with use of DEFINE_LDBL_MIN.
(IEEE754_EXPL): Use LDBL_CHECK_FORCE_UFLOW_NONNEG.
* sysdeps/i386/fpu/s_atan.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__atan): Use DBL_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/s_atanf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__atanf): Use FLT_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/s_expm1.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__expm1): Use DBL_CHECK_FORCE_UFLOW. Move underflow check after
main computation.
* sysdeps/i386/fpu/s_expm1f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__expm1f): Use FLT_CHECK_FORCE_UFLOW. Move underflow check after
main computation.
* sysdeps/i386/fpu/s_log1p.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(MO): New macro.
(__log1p): Use MO. Use DBL_CHECK_FORCE_UFLOW_NONNAN.
* sysdeps/i386/fpu/s_log1pf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(MO): New macro.
(__log1pf): Use MO. Use FLT_CHECK_FORCE_UFLOW_NONNAN.
* sysdeps/i386/fpu/s_log1pl.S (MO): New macro.
(__log1pl): Use MO.
The x86_64 fma4 version of pow fails to disable contraction of
operations other than those explicitly intended to use fma
instructions, so resulting in large ulps errors on processors with
fma4 instructions, as in bug 18104 (165ulp for the test added for that
bug; error originally reported by "blaaa" on #glibc). This patch adds
$(config-cflags-nofma) for e_pow-fma4.c, corresponding to the use for
e_pow.c in sysdeps/ieee754/dbl-64/Makefile.
Tested for x86_64 on a processor with fma4.
[BZ #19003]
* sysdeps/x86_64/fpu/multiarch/Makefile (CFLAGS-e_pow-fma4.c): Add
$(config-cflags-nofma).
sysdeps/ieee754/flt-32/e_exp2f.c declares two variable as "static
const volatile float". Maybe this use of "volatile" was originally
intended to inhibit optimization of underflowing / overflowing
operations such as TWOM100 * TWOM100; in any case, it's not currently
needed, as given -frounding-math constant folding of such expressions
is properly disabled when it would be unsafe. This patch removes the
unnecessary use of "volatile".
Tested for x86_64.
* sysdeps/ieee754/flt-32/e_exp2f.c (TWOM100): Remove volatile.
(TWO127): Likewise.
Where glibc code needs to avoid excess range and precision in
floating-point arithmetic, code variously uses either asms or volatile
to force the results of that arithmetic to memory; mostly this is
conditional on FLT_EVAL_METHOD, but in the case of lrint / llrint
functions some use of volatile is unconditional (and is present
unnecessarily in versions for long double). This patch make such code
use the recently-added math_narrow_eval macro consistently, removing
the unnecessary uses of volatile in long double lrint / llrint
implementations completely.
Tested for x86_64, x86, mips64 and powerpc.
* math/s_nexttowardf.c (__nexttowardf): Use math_narrow_eval.
* stdlib/strtod_l.c: Include <math_private.h>.
(overflow_value): Use math_narrow_eval.
(underflow_value): Likewise.
* sysdeps/i386/fpu/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/i386/fpu/s_nexttowardf.c (__nexttowardf): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Likewise.
(__ieee754_gamma_r): Likewise.
* sysdeps/ieee754/dbl-64/gamma_productf.c (__gamma_productf):
Likewise.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
* sysdeps/ieee754/dbl-64/s_erf.c (__erfc): Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c (__llrint): Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c (__lrint): Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
(__ieee754_gammaf_r): Likewise.
* sysdeps/ieee754/flt-32/k_rem_pio2f.c (__kernel_rem_pio2f):
Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
* sysdeps/ieee754/flt-32/s_erff.c (__erfcf): Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c (__llrintf): Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c (__lrintf): Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c (__llrintl): Do not use
volatile.
* sysdeps/ieee754/ldbl-128/s_lrintl.c (__lrintl): Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c (__nexttoward): Use
math_narrow_eval.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-96/gamma_product.c (__gamma_product):
Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c (__llrintl): Do not use
volatile.
* sysdeps/ieee754/ldbl-96/s_lrintl.c (__lrintl): Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c (__nexttoward): Use
math_narrow_eval.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c (__nldbl_nexttowardf):
Likewise.
i386 exp, hypot and pow functions can return overflowing and
underflowing values with excess range and precision; ; Wilco
Dijkstra's patches to make isfinite etc. expand inline cause this
pre-existing issue to result in test failures.
This patch fixes those functions to avoid excess range and precision
in their return values. Appropriate macros are added for the repeated
code sequences; in future I'll add more such macros and refactor
existing code forcing underflow (with or without also eliminating
excess range and precision from the return value) to use such macros.
Tested for x86. If, after this patch, you still see x86 libm test
failures with excess range or precision, please file bugs in Bugzilla.
[BZ #18980]
* sysdeps/i386/fpu/i386-math-asm.h (DEFINE_FLT_MIN): New macro.
(DEFINE_DBL_MIN): Likewise.
(FLT_NARROW_EVAL_UFLOW_NONNEG_NAN): Likewise.
(DBL_NARROW_EVAL_UFLOW_NONNEG_NAN): Likewise.
(FLT_NARROW_EVAL_UFLOW_NONNEG): Likewise.
(DBL_NARROW_EVAL_UFLOW_NONNEG): Likewise.
* sysdeps/i386/fpu/e_exp.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_exp): Use DBL_NARROW_EVAL_UFLOW_NONNEG_NAN.
(__exp_finite): Use DBL_NARROW_EVAL_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_exp10.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_exp10): Use DBL_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_exp10f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_exp10f): Use FLT_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_exp2.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_exp2): Use DBL_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_exp2f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_exp2f): Use FLT_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_expf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_expf): Use FLT_NARROW_EVAL_UFLOW_NONNEG_NAN.
(__expf_finite): Use FLT_NARROW_EVAL_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_hypot.S: Include <i386-math-asm.h>.
(__ieee754_hypot): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/e_hypotf.S: Include <i386-math-asm.h>.
(__ieee754_hypotf): Use FLT_NARROW_EVAL.
* sysdeps/i386/fpu/e_pow.S: Include <i386-math-asm.h>.
(__ieee754_pow): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/e_powf.S: Include <i386-math-asm.h>.
(__ieee754_powf): Use FLT_NARROW_EVAL.
* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S
(__ieee754_expf_sse2): Convert double-precision result to single
precision.
* sysdeps/i386/fpu/libm-test-ulps: Update.
i386 scalb / scalbn / scalbln (and thus ldexp) functions for float and
double can return results with excess range (and consequently excess
precision for subnormal results). As the results of these functions
are fully determined by reference to IEEE 754 operations, this is
unambiguously a bug, apart from the testsuite failures it causes.
This patch makes those functions store their results on the stack and
load them back to eliminate the excess range. Double rounding is not
a problem, as the only cases where it could occur are when the result
overflows or underflows for extended precision, and then the
double-rounded results are the same as the single-rounded results.
The new macros will be used for more functions, more such macros
added, and existing code refactored to use such macros, in subsequent
patches.
Tested for x86. Committed.
[BZ #18981]
* sysdeps/i386/fpu/i386-math-asm.h: New file.
* sysdeps/i386/fpu/e_scalb.S: Include <i386-math-asm.h>.
(__ieee754_scalb): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/e_scalbf.S: Include <i386-math-asm.h>.
(__ieee754_scalbf): Use FLT_NARROW_EVAL.
* sysdeps/i386/fpu/s_scalbn.S: Include <i386-math-asm.h>.
(__scalbn): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/s_scalbnf.S: Include <i386-math-asm.h>.
(__scalbnf): Use FLT_NARROW_EVAL.
Various i386 libm functions return values with excess range and
precision; Wilco Dijkstra's patches to make isfinite etc. expand
inline cause this pre-existing issue to result in test failures (when
e.g. a result that overflows float but not long double gets counted as
overflowing for some purposes but not others).
This patch addresses those cases arising from functions defined in C,
adding a math_narrow_eval macro that forces values to memory to
eliminate excess precision if FLT_EVAL_METHOD indicates this is
needed, and is a no-op otherwise. I'll convert existing uses of
volatile and asm for this purpose to use the new macro later, once
i386 has clean test results again (which requires fixes for .S files
as well).
Tested for x86_64 and x86. Committed.
[BZ #18980]
* sysdeps/generic/math_private.h: Include <float.h>.
(math_narrow_eval): New macro.
[FLT_EVAL_METHOD != 0] (excess_precision): Likewise.
* sysdeps/ieee754/dbl-64/e_cosh.c (__ieee754_cosh): Use
math_narrow_eval on overflowing return value.
* sysdeps/ieee754/dbl-64/e_lgamma_r.c (__ieee754_lgamma_r):
Likewise.
* sysdeps/ieee754/dbl-64/e_sinh.c (__ieee754_sinh): Likewise.
* sysdeps/ieee754/flt-32/e_coshf.c (__ieee754_coshf): Likewise.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (__ieee754_lgammaf_r):
Likewise.
* sysdeps/ieee754/flt-32/e_sinhf.c (__ieee754_sinhf): Likewise.
The logic in setjmp/__longjmp incorrectly uses "PIC" to figure out
whether the code is going into a shared library when it should be
using "SHARED". If you build glibc with a gcc version that has PIE
enabled by default, then the code will try to use symbols that are
only in the shared library.
URL: https://bugs.gentoo.org/336914