This patch fixes the __copysignf optimized macro meant to internal libm
usage when used with constant value. Without the explicit cast to
float, if it is used with const double value (for instance, on
s_casinhf.c) double constants will be used and it may lead to precision
issues in some algorithms.
It fixes the following failures on PPC64/POWER7:
Failure: Test: Real part of: cacos_downward (inf + 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_downward (inf - 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_downward (inf + 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_downward (inf - 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf + 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf - 0 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf + 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
Failure: Test: Real part of: cacos_towardzero (inf - 0.5 i)
Result:
is: 1.19209289550781250000e-07 0x1.00000000000000000000p-23
should be: 0.00000000000000000000e+00 0x0.00000000000000000000p+0
This patch optimizes the FPSCR update on exception and rounding change
functions by just updating its value if new value if different from
current one. It also optimizes fedisableexcept and feenableexcept by
removing an unecessary FPSCR read.
This patch fixes the powerpc32 optimized nearbyint/nearbyintf bogus
results for FE_DOWNWARD rounding mode. This is due wrong instructions
sequence used in the rounding calculation (two subtractions instead of
adition and a subtraction).
Fixes BZ#16815.
As recently discussed
<https://sourceware.org/ml/libc-alpha/2014-02/msg00670.html>, it
doesn't seem particularly useful for libm-test-ulps files to contain
huge amounts of data on ulps for individual tests; just the global
maximum observed ulps for each function, together with the
verification of exceptions, errno and special results such as
infinities and NaNs for each test, suffices to verify that a
function's behavior on the given test inputs is within the expected
accuracy. Removing this data reduces source tree churn caused by
updates to these files when libm tests are added, and reduces the
frequency with which testsuite additions actually need libm-test-ulps
changes at all.
Accordingly, this patch removes that data, so that individual tests
get checked against the global bounds for the given function and only
generate an error if those are exceeded. Tested x86_64 (including
verifying that if an ulps value is artificially reduced, the tests do
indeed fail as they should and "make regen-ulps" generates the
expected changes).
* math/libm-test.inc (struct ulp_data): Don't refer to ulps for
individual tests in comment.
(libm-test-ulps.h): Don't refer to test_ulps in #include comment.
(prev_max_error): New variable.
(prev_real_max_error): Likewise.
(prev_imag_max_error): Likewise.
(compare_ulp_data): Don't refer to test names in comment.
(find_test_ulps): Remove function.
(find_function_ulps): Likewise.
(find_complex_function_ulps): Likewise.
(init_max_error): Take function name as argument. Look up ulps
for that function.
(print_ulps): Remove function.
(print_max_error): Use prev_max_error instead of calling
find_function_ulps.
(print_complex_max_error): Use prev_real_max_error and
prev_imag_max_error instead of calling find_complex_function_ulps.
(check_float_internal): Take max_ulp parameter instead of calling
find_test_ulps. Don't call print_ulps.
(check_float): Update call to check_float_internal.
(check_complex): Update calls to check_float_internal.
(START): Pass argument to init_max_error.
* math/gen-libm-test.pl (%results): Don't include "kind"
information.
(parse_ulps): Don't handle ulps of individual tests.
(print_ulps_file): Likewise.
(output_ulps): Likewise.
* math/README.libm-test: Update.
* manual/libm-err-tab.pl (parse_ulps): Don't handle ulps of
individual tests.
* sysdeps/aarch64/libm-test-ulps: Remove individual test ulps.
* sysdeps/alpha/fpu/libm-test-ulps: Likewise.
* sysdeps/arm/libm-test-ulps: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/ia64/fpu/libm-test-ulps: Likewise.
* sysdeps/m68k/coldfire/fpu/libm-test-ulps: Likewise.
* sysdeps/m68k/m680x0/fpu/libm-test-ulps: Likewise.
* sysdeps/microblaze/libm-test-ulps: Likewise.
* sysdeps/mips/mips32/libm-test-ulps: Likewise.
* sysdeps/mips/mips64/libm-test-ulps: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Likewise.
* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
* sysdeps/s390/fpu/libm-test-ulps: Likewise.
* sysdeps/sh/libm-test-ulps: Likewise.
* sysdeps/sparc/fpu/libm-test-ulps: Likewise.
* sysdeps/tile/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
* sysdeps/hppa/fpu/libm-test-ulps: Remove individual test ulps.
For PPC64, all the wrappers at sysdeps are superfluous: they are
basically the same implementation from math/w_sqrt.c with the
'#ifdef _IEEE_LIBM'. And the power4 version just force the 'fsqrt'
instruction utilization with an inline assembly, which is already
handled by math_private.h __ieee754_sqrt implementation.
This patch helps some math functions performance by adding the libc_fexxx
variant of inline functions to handle both FPU round and exception set/restore
and by using them on the libc_fexxx_ctx functions. It is based on already coded
fexxx family functions for PPC with fpu.
Here is the summary of performance improvements due this patch (measured on a
POWER7 machine):
Before:
cos(): ITERS:9.5895e+07: TOTAL:5116.03Mcy, MAX:77.6cy, MIN:49.792cy, 18744 calls/Mcy
exp(): ITERS:2.827e+07: TOTAL:5187.15Mcy, MAX:494.018cy, MIN:38.422cy, 5450.01 calls/Mcy
pow(): ITERS:6.1705e+07: TOTAL:5144.26Mcy, MAX:171.95cy, MIN:29.935cy, 11994.9 calls/Mcy
sin(): ITERS:8.6898e+07: TOTAL:5117.06Mcy, MAX:83.841cy, MIN:46.582cy, 16982 calls/Mcy
tan(): ITERS:2.9473e+07: TOTAL:5115.39Mcy, MAX:191.017cy, MIN:172.352cy, 5761.63 calls/Mcy
After:
cos(): ITERS:2.05265e+08: TOTAL:5111.37Mcy, MAX:78.754cy, MIN:24.196cy, 40158.5 calls/Mcy
exp(): ITERS:3.341e+07: TOTAL:5170.84Mcy, MAX:476.317cy, MIN:15.574cy, 6461.23 calls/Mcy
pow(): ITERS:7.6153e+07: TOTAL:5129.1Mcy, MAX:147.5cy, MIN:30.916cy, 14847.2 calls/Mcy
sin(): ITERS:1.58816e+08: TOTAL:5115.11Mcy, MAX:1490.39cy, MIN:22.341cy, 31048.4 calls/Mcy
tan(): ITERS:3.4964e+07: TOTAL:5114.18Mcy, MAX:177.422cy, MIN:146.115cy, 6836.68 calls/Mcy
http://sourceware.org/ml/libc-alpha/2013-07/msg00201.html
These two functions oddly test x+1>0 when a double x is >= 0.0, and
similarly when x is negative. I don't see the point of that since the
test should always be true. I also don't see any need to convert x+1
to integer rather than simply using xr+1. Note that the standard
allows these functions to return any value when the input is outside
the range of long long, but it's not too hard to prevent xr+1
overflowing so that's what I've done.
(With rounding mode FE_UPWARD, x+1 can be a lot more than what you
might naively expect, but perhaps that situation was covered by the
x - xrf < 1.0 test.)
* sysdeps/powerpc/fpu/s_llround.c (__llround): Rewrite.
* sysdeps/powerpc/fpu/s_llroundf.c (__llroundf): Rewrite.
http://sourceware.org/ml/libc-alpha/2013-07/msg00200.html
This works around the fact that vsx is disabled in current
little-endian gcc. Also, float constants take 4 bytes in memory
vs. 16 bytes for vector constants, and we don't need to write one lot
of masks for double (register format) and another for float (mem
format).
* sysdeps/powerpc/fpu/s_float_bitwise.h (__float_and_test28): Don't
use vector int constants.
(__float_and_test24, __float_and8, __float_get_exp): Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00083.html
Further replacement of ieee854 macros and unions. These files also
have some optimisations for comparison against 0.0L, infinity and nan.
Since the ABI specifies that the high double of an IBM long double
pair is the value rounded to double, a high double of 0.0 means the
low double must also be 0.0. The ABI also says that infinity and
nan are encoded in the high double, with the low double unspecified.
This means that tests for 0.0L, +/-Infinity and +/-NaN need only check
the high double.
* sysdeps/ieee754/ldbl-128ibm/e_atan2l.c (__ieee754_atan2l): Rewrite
all uses of ieee854 long double macros and unions. Simplify tests
for long doubles that are fully specified by the high double.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_ilogbl.c (__ieee754_ilogbl): Likewise.
Remove dead code too.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_log10l.c (__ieee754_log10l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_logl.c (__ieee754_logl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
Remove dead code too.
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c (__kernel_tanl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_isinf_nsl.c (__isinf_nsl): Likewise.
Simplify.
* sysdeps/ieee754/ldbl-128ibm/s_isinfl.c (___isinfl): Likewise.
Simplify.
* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c (__log1pl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_modfl.c (__modfl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c (__nextafterl): Likewise.
Comment on variable precision.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalblnl.c (__scalblnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (__scalbnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c (__tanhl): Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Adjust tan_towardzero ulps.
Adjustments for libm ulps added with commit d8b82cad1b,
495fd99f3a, and 5ba3cc691c.
I also adjusted some exp10 ulps definition that was higher than needed.