This patch helps some math functions performance by adding the libc_fexxx
variant of inline functions to handle both FPU round and exception set/restore
and by using them on the libc_fexxx_ctx functions. It is based on already coded
fexxx family functions for PPC with fpu.
Here is the summary of performance improvements due this patch (measured on a
POWER7 machine):
Before:
cos(): ITERS:9.5895e+07: TOTAL:5116.03Mcy, MAX:77.6cy, MIN:49.792cy, 18744 calls/Mcy
exp(): ITERS:2.827e+07: TOTAL:5187.15Mcy, MAX:494.018cy, MIN:38.422cy, 5450.01 calls/Mcy
pow(): ITERS:6.1705e+07: TOTAL:5144.26Mcy, MAX:171.95cy, MIN:29.935cy, 11994.9 calls/Mcy
sin(): ITERS:8.6898e+07: TOTAL:5117.06Mcy, MAX:83.841cy, MIN:46.582cy, 16982 calls/Mcy
tan(): ITERS:2.9473e+07: TOTAL:5115.39Mcy, MAX:191.017cy, MIN:172.352cy, 5761.63 calls/Mcy
After:
cos(): ITERS:2.05265e+08: TOTAL:5111.37Mcy, MAX:78.754cy, MIN:24.196cy, 40158.5 calls/Mcy
exp(): ITERS:3.341e+07: TOTAL:5170.84Mcy, MAX:476.317cy, MIN:15.574cy, 6461.23 calls/Mcy
pow(): ITERS:7.6153e+07: TOTAL:5129.1Mcy, MAX:147.5cy, MIN:30.916cy, 14847.2 calls/Mcy
sin(): ITERS:1.58816e+08: TOTAL:5115.11Mcy, MAX:1490.39cy, MIN:22.341cy, 31048.4 calls/Mcy
tan(): ITERS:3.4964e+07: TOTAL:5114.18Mcy, MAX:177.422cy, MIN:146.115cy, 6836.68 calls/Mcy
http://sourceware.org/ml/libc-alpha/2013-07/msg00001.html
This patch starts the process of supporting powerpc64 little-endian
long double in glibc. IBM long double is an array of two ieee
doubles, so making union ibm_extended_long_double reflect this fact is
the correct way to access fields of the doubles.
* sysdeps/ieee754/ldbl-128ibm/ieee754.h
(union ibm_extended_long_double): Define as an array of ieee754_double.
(IBM_EXTENDED_LONG_DOUBLE_BIAS): Delete.
* sysdeps/ieee754/ldbl-128ibm/printf_fphex.c: Update all references
to ibm_extended_long_double and IBM_EXTENDED_LONG_DOUBLE_BIAS.
* sysdeps/ieee754/ldbl-128ibm/e_exp10l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/ldbl2mpn.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h: Likewise.
* sysdeps/ieee754/ldbl-128ibm/mpn2ldbl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/strtold_l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c: Likewise.
The patch increase the high value to check if expl overflows. Current
high mark value is not really correct, the algorithm accepts high values.
It also adds a correct wrapper function to check for overflow and underflow.
__fe_nomask_env.
* sysdeps/powerpc/fpu/fe_nomask.c: Add libm_hidden_def.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/fe_nomask.c: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/fpu/fe_nomask.c: Likewise.
* sysdeps/powerpc/bits/fenv.h: Make safe for C++.
* sysdeps/unix/sysv/linux/powerpc/bits/mathinline.h: New file.
* sysdeps/powerpc/fpu/fegetexcept.c (__fegetexcept): Rename
function from fegetexcept and make old name weak alias.
* include/fenv.h: Declare __fegetexcept.
* sysdeps/powerpc/fpu/fedisblxcpt.c: Use __fegetexcept instead of
fegetexcept.
* sysdeps/powerpc/fpu/feenablxcpt.c: Likewise.
* sysdeps/powerpc/fpu/fraiseexcpt.c (__feraiseexcept): Avoid call
to fetestexcept.
* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c (__log1pl): Use __frexpl
instead of frexpl to avoid local PLT.
* math/s_significandl.c (__significandl): Use __ilogbl instead of
ilogbl to avoid local PLT.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Use __ldexpl
instead of ldexpl to avoid local PLT.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c (__ieee754_expl): Use
__roundl not roundl to avoid local PLT.
* sysdeps/ieee754/ldbl-128/e_j0l.c: Use function names which avoid
local PLTs. Use __sincosl instead of separate sinl and cosl
calls.
* sysdeps/ieee754/ldbl-128/e_j1l.c: Likewise.
Jakub Jelinek <jakub@redhat.com>
Roland McGrath <roland@redhat.com>
Steven Munroe <sjmunroe@us.ibm.com>
Alan Modra <amodra@bigpond.net.au>
* sysdeps/powerpc/powerpc64/fpu/s_truncf.S: Comment fix.
* sysdeps/powerpc/powerpc32/fpu/s_truncf.S: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_llroundf.S: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* math/libm-test.inc (check_float_internal): Allow ulp <= 0.5.
(erfc_test): Don't run erfcl (27.0L) test if erfcl (27.0L) is
denormal.
[TEST_LDOUBLE] (ceil_test, floor_test, llrint_test, llround_test,
rint_test, round_test, trunc_test): Add new tests.
* sysdeps/powerpc/powerpc32/fpu/s_copysignl.S: New file.
* sysdeps/powerpc/powerpc32/fpu/s_fabs.S: New file.
* sysdeps/powerpc/powerpc32/fpu/s_fabsl.S: New file.
* sysdeps/powerpc/powerpc32/fpu/s_fdim.c: New file.
* sysdeps/powerpc/powerpc32/fpu/s_fmax.S: New file.
* sysdeps/powerpc/powerpc32/fpu/s_fmin.S: New file.
* sysdeps/powerpc/powerpc32/fpu/s_isnan.c: New file.
* sysdeps/powerpc/powerpc64/fpu/s_ceill.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_copysignl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_fabs.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_fabsl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_fdim.c: New file.
* sysdeps/powerpc/powerpc64/fpu/s_floorl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_fmax.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_fmin.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_isnan.c: New file.
* sysdeps/powerpc/powerpc64/fpu/s_llrintl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_llroundl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_lrintl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_lroundl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyintl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_rintl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_roundl.S: New file.
* sysdeps/powerpc/powerpc64/fpu/s_truncl.S: New file.
* sysdeps/unix/sysv/linux/powerpc/Implies: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/fpu/Implies: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/Implies: New file.
* sysdeps/unix/sysv/linux/powerpc/configure.in: New file.
* sysdeps/unix/sysv/linux/powerpc/configure: New file.
* sysdeps/unix/sysv/linux/powerpc/bits/wordsize.h
(__LONG_DOUBLE_MATH_OPTIONAL): Define.
(__NO_LONG_DOUBLE_MATH): Define.
* sysdeps/unix/sysv/linux/powerpc/nldbl-abi.h: New file.
* sysdeps/powerpc/fpu/s_isnan.c: Include math_ldbl_opt.h.
* sysdeps/powerpc/powerpc64/fpu/s_ceil.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (ceill): Add compatibility symbols.
* sysdeps/powerpc/powerpc64/fpu/s_copysign.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (copysignl): Add compatibility symbols.
* sysdeps/powerpc/powerpc64/fpu/s_floor.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (floorl): Add compatibility symbols.
* sysdeps/powerpc/powerpc64/fpu/s_llrint.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (llrintl, lrintl): Add compatibility symbols.
* sysdeps/powerpc/powerpc64/fpu/s_llround.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (llroundl, lroundl): Add compatibility symbols.
* sysdeps/powerpc/powerpc64/fpu/s_rint.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (rintl): Add compatibility symbols.
* sysdeps/powerpc/powerpc64/fpu/s_round.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (roundl): Add compatibility symbols.
* sysdeps/powerpc/powerpc64/fpu/s_trunc.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (truncl): Add compatibility symbols.
* sysdeps/powerpc/powerpc32/fpu/s_ceil.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (ceill): Add compatibility symbols.
* sysdeps/powerpc/powerpc32/fpu/s_copysign.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (copysignl): Add compatibility symbols.
* sysdeps/powerpc/powerpc32/fpu/s_floor.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (floorl): Add compatibility symbols.
* sysdeps/powerpc/powerpc32/fpu/s_lrint.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (lrintl): Add compatibility symbols.
* sysdeps/powerpc/powerpc32/fpu/s_llrint.c: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (llrintl): Add compatibility symbols.
* sysdeps/powerpc/powerpc32/fpu/s_lround.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (lroundl): Add compatibility symbols.
* sysdeps/powerpc/powerpc32/fpu/s_rint.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (rintl): Add compatibility symbols.
* sysdeps/powerpc/powerpc32/fpu/s_round.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (roundl): Add compatibility symbols.
* sysdeps/powerpc/powerpc32/fpu/s_trunc.S: Include math_ldbl_opt.h.
[LONG_DOUBLE_COMPAT] (truncl): Add compatibility symbols.
* misc/qefgcvt_r.c [LDBL_MIN_10_EXP == -291] (FLOAT_MIN_10_NORM): New.
* sysdeps/powerpc/fpu/bits/mathdef.h (__NO_LONG_DOUBLE_MATH): Remove.
* sysdeps/powerpc/Implies: Add ieee754/ldbl-128ibm.
* sysdeps/powerpc/powerpc32/Implies: Remove powerpc/soft-fp.
* sysdeps/ieee754/ldbl-128ibm/Makefile: New file.
* sysdeps/ieee754/ldbl-128ibm/e_acoshl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_acosl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_asinl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_atan2l.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_atanhl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_coshl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_fmodl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_j0l.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_j1l.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_lgammal_r.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_log10l.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_log2l.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_logl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_rem_pio2l.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_remainderl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/e_sqrtl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/ieee754.h: New file.
* sysdeps/ieee754/ldbl-128ibm/k_cosl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/k_sincosl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/k_sinl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/ldbl2mpn.c: New file.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h: New file.
* sysdeps/ieee754/ldbl-128ibm/mpn2ldbl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/printf_fphex.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_atanl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_cbrtl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_cosl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_erfl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_fabsl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_finitel.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_fpclassifyl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_frexpl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_ilogbl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_isinfl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_isnanl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_logbl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_modfl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_rintl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_scalblnl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_signbitl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_sincosl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_sinl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_tanl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_truncl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/strtold_l.c: New file.
* sysdeps/ieee754/ldbl-128ibm/t_sincosl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/w_expl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_copysignl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_floorl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_llrintl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_llroundl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_roundl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_ceill.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_nearbyintl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_lrintl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/s_lroundl.c: New file.
* sysdeps/ieee754/ldbl-128/e_powl.c: Fix old comment.