glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-24 22:10:13 +00:00

Author	SHA1	Message	Date
Paul Zimmermann	392b3f0971	replace tgammaf by the CORE-MATH implementation The CORE-MATH implementation is correctly rounded (for any rounding mode). This can be checked by exhaustive tests in a few minutes since there are less than 2^32 values to check against for example GNU MPFR. This patch also adds some bench values for tgammaf. Tested on x86_64 and x86 (cfarm26). With the initial GNU libc code it gave on an Intel(R) Core(TM) i7-8700: "tgammaf": { "": { "duration": 3.50188e+09, "iterations": 2e+07, "max": 602.891, "min": 65.1415, "mean": 175.094 } } With the new code: "tgammaf": { "": { "duration": 3.30825e+09, "iterations": 5e+07, "max": 211.592, "min": 32.0325, "mean": 66.1649 } } With the initial GNU libc code it gave on cfarm26 (i686): "tgammaf": { "": { "duration": 3.70505e+09, "iterations": 6e+06, "max": 2420.23, "min": 243.154, "mean": 617.509 } } With the new code: "tgammaf": { "": { "duration": 3.24497e+09, "iterations": 1.8e+07, "max": 1238.15, "min": 101.155, "mean": 180.276 } } Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Changes in v2: - include <math.h> (fix the linknamespace failures) - restored original benchtests/strcoll-inputs/filelist#en_US.UTF-8 file - restored original wrapper code (math/w_tgammaf_compat.c), except for the dealing with the sign - removed the tgammaf/float entries in all libm-test-ulps files - address other comments from Joseph Myers (https://sourceware.org/pipermail/libc-alpha/2024-July/158736.html) Changes in v3: - pass NULL argument for signgam from w_tgammaf_compat.c - use of math_narrow_eval - added more comments Changes in v4: - initialize local_signgam to 0 in math/w_tgamma_template.c - replace sysdeps/ieee754/dbl-64/gamma_productf.c by dummy file Changes in v5: - do not mention local_signgam any more in math/w_tgammaf_compat.c - initialize local_signgam to 1 instead of 0 in w_tgamma_template.c and added comment Changes in v6: - pass NULL as 2nd argument of __ieee754_gammaf_r in w_tgammaf_compat.c, and check for NULL in e_gammaf_r.c Changes in v7: - added Signed-off-by line for Alexei Sibidanov (author of the code) Changes in v8: - added Signed-off-by line for Paul Zimmermann (submitted of the patch) Changes in v9: - address comments from review by Adhemerval Zanella Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-10-11 11:12:32 +02:00
Szabolcs Nagy	2a9943b4a0	math: Fix exp10 undefined left shift Left shift of ki is undefined when ki<0, copy the logic from exp, which uses unsigned arithmetics, to fix it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-04 15:33:26 +01:00
H.J. Lu	4e21cb95e2	nearbyint: Don't define alias when used in IFUNC [BZ #31759 ] Fix BZ #31759 by not defining nearbyint aliases when used in IFUNC. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-05-20 05:21:41 -07:00
Wilco Dijkstra	08ddd26814	math: remove exp10 wrappers Remove the error handling wrapper from exp10. This is very similar to the changes done to exp and exp2, except that we also need to handle pow10 and pow10l. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-12 16:02:12 +00:00
Paul Eggert	dff8da6b3e	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
Joe Ramsay	63d0a35d5f	math: Add new exp10 implementation New implementation is based on the existing exp/exp2, with different reduction constants and polynomial. Worst-case error in round-to- nearest is 0.513 ULP. The exp/exp2 shared table is reused for exp10 - .rodata size of e_exp_data increases by 64 bytes. As for exp/exp2, targets with single-instruction rounding/conversion intrinsics can use them by toggling TOINT_INTRINSICS=1 and adding the necessary code to their math_private.h. Improvements on Neoverse V1 compared to current GLIBC master: exp10 thruput: 3.3x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] exp10 latency: 1.8x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] Tested on: aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-12-04 15:52:11 +00:00
Andreas Schwab	5aa1ddfcb3	Avoid maybe-uninitialized warning in __kernel_rem_pio2 With GCC 14 on 32-bit x86 the compiler emits a maybe-uninitialized warning: ../sysdeps/ieee754/dbl-64/k_rem_pio2.c: In function '__kernel_rem_pio2': ../sysdeps/ieee754/dbl-64/k_rem_pio2.c:364:20: error: 'fq' may be used uninitialized [-Werror=maybe-uninitialized] 364 \| y[0] = fq[0]; y[1] = fq[1]; y[2] = fw; \| ~~^~~ This is similar to the warning that is suppressed in the other branch of the switch. Help the compiler knowing that the variable is always initialized, which also makes the suppression obsolete.	2023-10-16 09:59:32 +02:00
H.J. Lu	a8ecb126d4	x86_64: Add log1p with FMA On Skylake, it changes log1p bench performance by: Before After Improvement max 63.349 58.347 8% min 4.448 5.651 -30% mean 12.0674 10.336 14% The minimum code path is if (hx < 0x3FDA827A) /* x < 0.41422 / { if (__glibc_unlikely (ax >= 0x3ff00000)) / x <= -1.0 / { ... } if (__glibc_unlikely (ax < 0x3e200000)) / \|x\| < 2*-29 / { math_force_eval (two54 + x); /* raise inexact / if (ax < 0x3c900000) / \|x\| < 2*-54 / { ... } else return x - x * x * 0.5; FMA and non-FMA code sequences look similar. Non-FMA version is slightly faster. Since log1p is called by asinh and atanh, it improves asinh performance by: Before After Improvement max 75.645 63.135 16% min 10.074 10.071 0% mean 15.9483 14.9089 6% and improves atanh performance by: Before After Improvement max 91.768 75.081 18% min 15.548 13.883 10% mean 18.3713 16.8011 8%	2023-08-21 10:44:26 -07:00
H.J. Lu	1b214630ce	x86_64: Add expm1 with FMA On Skylake, it improves expm1 bench performance by: Before After Improvement max 70.204 68.054 3% min 20.709 16.2 22% mean 22.1221 16.7367 24% NB: Add extern long double __expm1l (long double); extern long double __expm1f128 (long double); for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is defined since __expm1 may be expanded in their declarations which causes the build failure.	2023-08-14 08:14:19 -07:00
Joe Ramsay	aed39a3aa3	aarch64: Add vector implementations of cos routines Replace the loop-over-scalar placeholder routines with optimised implementations from Arm Optimized Routines (AOR). Also add some headers containing utilities for aarch64 libmvec routines, and update libm-test-ulps. Data tables for new routines are used via a pointer with a barrier on it, in order to prevent overly aggressive constant inlining in GCC. This allows a single adrp, combined with offset loads, to be used for every constant in the table. Special-case handlers are marked NOINLINE in order to confine the save/restore overhead of switching from vector to normal calling standard. This way we only incur the extra memory access in the exceptional cases. NOINLINE definitions have been moved to math_private.h in order to reduce duplication. AOR exposes a config option, WANT_SIMD_EXCEPT, to enable selective masking (and later fixing up) of invalid lanes, in order to trigger fp exceptions correctly (AdvSIMD only). This is tested and maintained in AOR, however it is configured off at source level here for performance reasons. We keep the WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify the upstreaming process from AOR to glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-06-30 09:04:10 +01:00
Paul Pluzhnikov	65cc53fe7c	Fix misspellings in sysdeps/ -- BZ 25337	2023-05-30 23:02:29 +00:00
Wilco Dijkstra	76d0f094dd	math: Improve fmod(f) performance Optimize the fast paths (x < y) and (x/y < 2^12). Delay handling of special cases to reduce the number of instructions executed before the fast paths. Performance improvements for fmod: Skylake Zen2 Neoverse V1 subnormals 11.8% 4.2% 11.5% normal 3.9% 0.01% -0.5% close-exponents 6.3% 5.6% 19.4% Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-04-17 13:03:10 +01:00
Adhemerval Zanella Netto	16439f419b	math: Remove the error handling wrapper from fmod and fmodf The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). The ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. For both i686 and m68k, which provive arch specific implementation, wrappers are added so no new symbol are added (which would require to change the implementations). It shows an small improvement, the results for fmod: Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- x86_64 (Ryzen 9) \| subnormals \| 12.5049 \| 9.40992 x86_64 (Ryzen 9) \| normal \| 296.939 \| 296.738 x86_64 (Ryzen 9) \| close-exponents \| 16.0244 \| 13.119 aarch64 (N1) \| subnormal \| 6.81778 \| 4.33313 aarch64 (N1) \| normal \| 155.620 \| 152.915 aarch64 (N1) \| close-exponents \| 8.21306 \| 5.76138 armhf (N1) \| subnormal \| 15.1083 \| 14.5746 armhf (N1) \| normal \| 244.833 \| 241.738 armhf (N1) \| close-exponents \| 21.8182 \| 22.457 Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2023-04-03 16:45:27 -03:00
Adhemerval Zanella Netto	34b9f8bc17	math: Improve fmod This uses a new algorithm similar to already proposed earlier [1]. With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers), the simplest implementation is: mx * 2^ex == 2 * mx * 2^(ex - 1) while (ex > ey) { mx = 2; --ex; mx %= my; } With mx/my being mantissa of double floating pointer, on each step the argument reduction can be improved 11 (which is sizeo of uint64_t minus MANTISSA_WIDTH plus the signal bit): while (ex > ey) { mx << 11; ex -= 11; mx %= my; } / The implementation uses builtin clz and ctz, along with shifts to convert hx/hy back to doubles. Different than the original patch, this path assume modulo/divide operation is slow, so use multiplication with invert values. I see the following performance improvements using fmod benchtests (result only show the 'mean' result): Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- x86_64 (Ryzen 9) \| subnormals \| 19.1584 \| 12.5049 x86_64 (Ryzen 9) \| normal \| 1016.51 \| 296.939 x86_64 (Ryzen 9) \| close-exponents \| 18.4428 \| 16.0244 aarch64 (N1) \| subnormal \| 11.153 \| 6.81778 aarch64 (N1) \| normal \| 528.649 \| 155.62 aarch64 (N1) \| close-exponents \| 11.4517 \| 8.21306 I also see similar improvements on arm-linux-gnueabihf when running on the N1 aarch64 chips, where it a lot of soft-fp implementation (for modulo, clz, ctz, and multiplication): Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- armhf (N1) \| subnormal \| 15.908 \| 15.1083 armhf (N1) \| normal \| 837.525 \| 244.833 armhf (N1) \| close-exponents \| 16.2111 \| 21.8182 Instead of using the math_private.h definitions, I used the math_config.h instead which is used on newer math implementations. Co-authored-by: kirill <kirill.okhotnikov@gmail.com> [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2023-04-03 16:36:24 -03:00
Joseph Myers	6d7e8eda9b	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
Xiaolin Tang	2e2485ce05	Use GCC builtins for logb functions if desired. This patch is using the corresponding GCC builtin for logbf, logb, logbl and logbf128 if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins-function.h. Co-Authored-By: Xi Ruoyao <xry111@xry111.site>	2022-11-29 16:00:28 +08:00
Xiaolin Tang	a1981ecbfd	Use GCC builtins for llrint functions if desired. This patch is using the corresponding GCC builtin for llrintf, llrint, llrintl and llrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins-function.h. Co-Authored-By: Xi Ruoyao <xry111@xry111.site>	2022-11-29 16:00:28 +08:00
Xiaolin Tang	2b23ab1fea	Use GCC builtins for lrint functions if desired. This patch is using the corresponding GCC builtin for lrintf, lrint, lrintl and lrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins-function.h. Co-Authored-By: Xi Ruoyao <xry111@xry111.site>	2022-11-29 16:00:28 +08:00
Szabolcs Nagy	7363a9a9a0	math: Fix asin and acos invalid exception with old gcc This works around a gcc issue where it const folded inf/inf into nan, preventing the invalid exception to be signalled. (x-x)/(x-x) is more robust against optimizations and works for all out of bounds values including x==nan. The gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115 should be fixed on release branches starting from gcc-10, but it is better to change the code in case glibc is built with older gcc. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2022-10-17 08:18:52 +01:00
Andreas Schwab	dc1e5eeb25	x86_64: Optimize sincos where sin/cos is optimized (bug 29193) The compiler may substitute calls to sin or cos with calls to sincos, thus we should have the same optimized implementations for sincos. The optimized implementations may produce results that differ, that also makes sure that the sincos call aggrees with the sin and cos calls.	2022-06-01 10:29:52 +02:00
Paul Eggert	581c785bf3	Update copyright dates with scripts/update-copyrights I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 7061 files FOO. I then removed trailing white space from math/tgmath.h, support/tst-support-open-dev-null-range.c, and sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following obscure pre-commit check failure diagnostics from Savannah. I don't know why I run into these diagnostics whereas others evidently do not. remote: * 912-#endif remote: * 913: remote: * 914- remote: * error: lines with trailing whitespace found ... remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines	2022-01-01 11:40:24 -08:00
Akila Welihinda	3b1402b3fc	sysdeps: Simplify sin Taylor Series calculation The macro TAYLOR_SIN adds the term `-0.5daa^2 + da` in hopes of regaining some precision as a function of da. However the comment says we add the term `-0.5daa^2 + 0.5*da` which is different. This fix updates the comment to reflect the code and also simplifies the calculation by replacing `a` with `x` because they always have the same value. Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu> Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-12-13 15:31:05 +01:00
Adhemerval Zanella	104d2005d5	math: Remove the error handling wrapper from hypot and hypotf The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). Only ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Wilco Dijkstra	2f44eef584	math: Use fmin/fmax on hypot It optimizes for architectures that provides fast builtins. Checked on aarch64-linux-gnu.	2021-12-13 10:08:46 -03:00
Wilco Dijkstra	ccfa865a82	math: Improve hypot performance with FMA Improve hypot performance significantly by using fma when available. The fma version has twice the throughput of the previous version and 70% of the latency. The non-fma version has 30% higher throughput and 10% higher latency. Max ULP error is 0.949 with fma and 0.792 without fma. Passes GLIBC testsuite.	2021-12-13 09:02:34 -03:00
Wilco Dijkstra	6c848d7038	math: Use an improved algorithm for hypot (dbl-64) This implementation is based on the 'An Improved Algorithm for hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the following changes: - Handle qNaN and sNaN. - Tune the 'widely varying operands' to avoid spurious underflow due the multiplication and fix the return value for upwards rounding mode. - Handle required underflow exception for denormal results. The main advantage of the new algorithm is its precision: with a random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc current implementation shows around 0.34% results with an error of 1 ulp (3424869 results) while the new implementation only shows 0.002% of total (18851). The performance result are also only slight worse than current implementation. On x86_64 (Ryzen 5900X) with gcc 12: Before: "hypot": { "workload-random": { "duration": 3.73319e+09, "iterations": 1.12e+08, "reciprocal-throughput": 22.8737, "latency": 43.7904, "max-throughput": 4.37184e+07, "min-throughput": 2.28361e+07 } } After: "hypot": { "workload-random": { "duration": 3.7597e+09, "iterations": 9.8e+07, "reciprocal-throughput": 23.7547, "latency": 52.9739, "max-throughput": 4.2097e+07, "min-throughput": 1.88772e+07 } } Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org> Checked on x86_64-linux-gnu and aarch64-linux-gnu. [1] https://arxiv.org/pdf/1904.09481.pdf	2021-12-13 09:02:34 -03:00
Joseph Myers	b3f27d8150	Add narrowing fma functions This patch adds the narrowing fused multiply-add functions from TS 18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal, f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x, f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128, f64xfmaf128 for configurations with _Float64x and _Float128; __f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case (for calls to ffmal and dfmal when long double is IEEE binary128). Corresponding tgmath.h macro support is also added. The changes are mostly similar to those for the other narrowing functions previously added, especially that for sqrt, so the description of those generally applies to this patch as well. As with sqrt, I reused the same test inputs in auto-libm-test-in as for non-narrowing fma rather than adding extra or separate inputs for narrowing fma. The tests in libm-test-narrow-fma.inc also follow those for non-narrowing fma. The non-narrowing fma has a known bug (bug 6801) that it does not set errno on errors (overflow, underflow, Inf * 0, Inf - Inf). Rather than fixing this or having narrowing fma check for errors when non-narrowing does not (complicating the cases when narrowing fma can otherwise be an alias for a non-narrowing function), this patch does not attempt to check for errors from narrowing fma and set errno; the CHECK_NARROW_FMA macro is still present, but as a placeholder that does nothing, and this missing errno setting is considered to be covered by the existing bug rather than needing a separate open bug. missing-errno annotations are duly added to many of the auto-libm-test-in test inputs for fma. This completes adding all the new functions from TS 18661-1 to glibc, so will be followed by corresponding stdc-predef.h changes to define __STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support for TS 18661-1 will be at a similar level to that for C standard floating-point facilities up to C11 (pragmas not implemented, but library functions done). (There are still further changes to be done to implement changes to the types of fromfp functions from N2548.) Tested as followed: natively with the full glibc testsuite for x86_64 (GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC 11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32 hard float, mips64 (all three ABIs, both hard and soft float). The different GCC versions are to cover the different cases in tgmath.h and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in glibc headers, GCC 7 has proper _Float* support, GCC 8 adds __builtin_tgmath).	2021-09-22 21:25:31 +00:00
Joseph Myers	1356f38df5	Fix f64xdivf128, f64xmulf128 spurious underflows (bug 28358) As described in bug 28358, the round-to-odd computations used in the libm functions that round their results to a narrower format can yield spurious underflow exceptions in the following circumstances: the narrowing only narrows the precision of the type and not the exponent range (i.e., it's narrowing _Float128 to _Float64x on x86_64, x86 or ia64), the architecture does after-rounding tininess detection (which applies to all those architectures), the result is inexact, tiny before rounding but not tiny after rounding (with the chosen rounding mode) for _Float64x (which is possible for narrowing mul, div and fma, not for narrowing add, sub or sqrt), so the underflow exception resulting from the toward-zero computation in _Float128 is spurious for _Float64x. Fixed by making ROUND_TO_ODD call feclearexcept (FE_UNDERFLOW) in the problem cases (as indicated by an extra argument to the macro); there is never any need to preserve underflow exceptions from this part of the computation, because the conversion of the round-to-odd value to the narrower type will underflow in exactly the cases in which the function should raise that exception, but it may be more efficient to avoid the extra manipulation of the floating-point environment when not needed. Tested for x86_64 and x86, and with build-many-glibcs.py.	2021-09-21 21:54:37 +00:00
Joseph Myers	4b6574a6f6	Redirect fma calls to __fma in libm include/math.h has a mechanism to redirect internal calls to various libm functions, that can often be inlined by the compiler, to call non-exported __* names for those functions in the case when the calls aren't inlined, with the redirection being disabled when NO_MATH_REDIRECT. Add fma to the functions to which this mechanism is applied. At present, libm-internal fma calls (generally to __builtin_fma* functions) are only done when it's known the call will be inlined, with alternative code not relying on an fma operation being used in the caller otherwise. This patch is in preparation for adding the TS 18661 / C2X narrowing fma functions to glibc; it will be natural for the narrowing function implementations to call the underlying fma functions unconditionally, with this either being inlined or resulting in an __fma* call. (Using two levels of round-to-odd computation like that, in the case where there isn't an fma hardware instruction, isn't optimal but is certainly a lot simpler for the initial implementation than writing different narrowing fma implementations for all the various pairs of formats.) Tested with build-many-glibcs.py that installed stripped shared libraries are unchanged by the patch (using <https://sourceware.org/pipermail/libc-alpha/2021-September/130991.html> to fix installed library stripping in build-many-glibcs.py). Also tested for x86_64.	2021-09-15 22:57:35 +00:00
Joseph Myers	abd383584b	Add narrowing square root functions This patch adds the narrowing square root functions from TS 18661-1 / TS 18661-3 / C2X to glibc's libm: fsqrt, fsqrtl, dsqrtl, f32sqrtf64, f32sqrtf32x, f32xsqrtf64 for all configurations; f32sqrtf64x, f32sqrtf128, f64sqrtf64x, f64sqrtf128, f32xsqrtf64x, f32xsqrtf128, f64xsqrtf128 for configurations with _Float64x and _Float128; __f32sqrtieee128 and __f64sqrtieee128 aliases in the powerpc64le case (for calls to fsqrtl and dsqrtl when long double is IEEE binary128). Corresponding tgmath.h macro support is also added. The changes are mostly similar to those for the other narrowing functions previously added, so the description of those generally applies to this patch as well. However, the not-actually-narrowing cases (where the two types involved in the function have the same floating-point format) are aliased to sqrt, sqrtl or sqrtf128 rather than needing a separately built not-actually-narrowing function such as was needed for add / sub / mul / div. Thus, there is no __nldbl_dsqrtl name for ldbl-opt because no such name was needed (whereas the other functions needed such a name since the only other name for that entry point was e.g. f32xaddf64, not reserved by TS 18661-1); the headers are made to arrange for sqrt to be called in that case instead. The DIAG_* calls in sysdeps/ieee754/soft-fp/s_dsqrtl.c are because they were observed to be needed in GCC 7 testing of riscv32-linux-gnu-rv32imac-ilp32. The other sysdeps/ieee754/soft-fp/ files added didn't need such DIAG_* in any configuration I tested with build-many-glibcs.py, but if they do turn out to be needed in more files with some other configuration / GCC version, they can always be added there. I reused the same test inputs in auto-libm-test-in as for non-narrowing sqrt rather than adding extra or separate inputs for narrowing sqrt. The tests in libm-test-narrow-sqrt.inc also follow those for non-narrowing sqrt. Tested as followed: natively with the full glibc testsuite for x86_64 (GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC 11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32 hard float, mips64 (all three ABIs, both hard and soft float). The different GCC versions are to cover the different cases in tgmath.h and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in glibc headers, GCC 7 has proper _Float* support, GCC 8 adds __builtin_tgmath).	2021-09-10 20:56:22 +00:00
Siddhesh Poyarekar	30891f35fa	Remove "Contributed by" lines We stopped adding "Contributed by" or similar lines in sources in 2012 in favour of git logs and keeping the Contributors section of the glibc manual up to date. Removing these lines makes the license header a bit more consistent across files and also removes the possibility of error in attribution when license blocks or files are copied across since the contributed-by lines don't actually reflect reality in those cases. Move all "Contributed by" and similar lines (Written by, Test by, etc.) into a new file CONTRIBUTED-BY to retain record of these contributions. These contributors are also mentioned in manual/contrib.texi, so we just maintain this additional record as a courtesy to the earlier developers. The following scripts were used to filter a list of files to edit in place and to clean up the CONTRIBUTED-BY file respectively. These were not added to the glibc sources because they're not expected to be of any use in future given that this is a one time task: https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02 Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2021-09-03 22:06:44 +05:30
Shen-Ta Hsieh	eb9066203f	Use GCC builtins for roundeven functions if desired. This patch is using the corresponding GCC builtin for roundevenf, roundeven and roundevenl if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins.h. These builtin functions is supported since GCC 10. The code of the generic implementation is not changed. Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-06-27 07:56:57 -07:00
Shen-Ta Hsieh	447954a206	math: redirect roundeven function This patch redirect roundeven function for futhermore changes. Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com> Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2021-06-27 07:56:57 -07:00
Paul Zimmermann	43576de04a	Improve the accuracy of tgamma (BZ #26983 ) With this patch, the maximal known error for tgamma is now reduced to 9 ulps for dbl-64, for all rounding modes. Since exhaustive testing is not possible for dbl-64, it might be that there are still cases with an error larger than 9 ulps, but all known cases are fixed (intensive tests were done to find cases with large errors). Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm, s390x, sparc, and i686). Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-04-07 13:23:39 +02:00
Wilco Dijkstra	92cfc9ad82	math: Remove mpa files (part 2) [BZ #15267 ] Previous commit was missing deleted files in sysdeps/ieee754/dbl-64. Finally remove all mpa related files, headers, declarations, probes, unused tables and update makefiles. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 15:45:19 +00:00
Wilco Dijkstra	47ad14d789	math: Remove mpa files [BZ #15267 ] Finally remove all mpa related files, headers, declarations, probes, unused tables and update makefiles. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 14:26:36 +00:00
Wilco Dijkstra	4e1a870b9a	math: Remove slow paths from atan2 [BZ #15267 ] Remove slow paths from atan2. Add ULP annotations. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 14:26:36 +00:00
Wilco Dijkstra	e898cd1593	math: Remove slow paths from atan [BZ #15267 ] Remove slow paths from atan. Add ULP annotations. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 14:26:36 +00:00
Wilco Dijkstra	476d692e8a	math: Remove slow paths in tan [BZ #15267 ] Remove slow paths in tan. Add ULP annotations. Merge 'number' into 'mynumber'. Remove unused entries from tan constants. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 14:26:36 +00:00
Wilco Dijkstra	db3f7bb558	math: Remove slow paths from asin and acos [BZ #15267 ] This patch series removes all remaining slow paths and related code. First asin/acos, tan, atan, atan2 implementations are updated, and the final patch removes the unused mpa files, headers and probes. Passes buildmanyglibc. Remove slow paths from asin/acos. Add ULP annotations based on previous slow path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps. Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>	2021-03-11 14:26:36 +00:00
Adhemerval Zanella	bf7db6d369	math: Add BZ#18980 fix back on dbl-64 cosh It is regression from `9e97f239ea` (Remove dbl-64/wordsize-64 (part 2)) where is missed to add the BZ#18980 fix (`9e97f239ea`). Checked on i686-linux-gnu.	2021-01-11 16:56:33 -03:00
Wilco Dijkstra	9e97f239ea	Remove dbl-64/wordsize-64 (part 2) Remove the wordsize-64 implementations by merging them into the main dbl-64 directory. The second patch just moves all wordsize-64 files and removes a few wordsize-64 uses in comments and Implies files. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-07 15:26:26 +00:00
Wilco Dijkstra	caa884dda7	Remove dbl-64/wordsize-64 Remove the wordsize-64 implementations by merging them into the main dbl-64 directory. The first patch adds special cases needed for 32-bit targets (FIX_INT_FP_CONVERT_ZERO and FIX_DBL_LONG_CONVERT_OVERFLOW) to the wordsize-64 versions. This has no effect on 64-bit targets since they don't define these macros. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2021-01-07 15:02:51 +00:00
Paul Eggert	2b778ceb40	Update copyright dates with scripts/update-copyrights I used these shell commands: ../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright (cd ../glibc && git commit -am"[this commit message]") and then ignored the output, which consisted lines saying "FOO: warning: copyright statement not found" for each of 6694 files FOO. I then removed trailing white space from benchtests/bench-pthread-locks.c and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this diagnostic from Savannah: remote: * pre-commit check failed ... remote: * error: lines with trailing whitespace found remote: error: hook declined to update refs/heads/master	2021-01-02 12:17:34 -08:00
Anssi Hannula	69a7ca7705	ieee754: Remove unused __sin32 and __cos32 The __sin32 and __cos32 functions were only used in the now removed slow path of asin and acos.	2020-12-18 12:10:31 +05:30
Anssi Hannula	f67f9c9af2	ieee754: Remove slow paths from asin and acos asin and acos have slow paths for rounding the last bit that cause some calls to be 500-1500x slower than average calls. These slow paths are rare, a test of a trillion (1.000.000.000.000) random inputs between -1 and 1 showed 32870 slow calls for acos and 4473 for asin, with most occurrences between -1.0 .. -0.9 and 0.9 .. 1.0. The slow paths claim correct rounding and use __sin32() and __cos32() (which compare two result candidates and return the closest one) as the final step, with the second result candidate (res1) having a small offset applied from res. This suggests that res and res1 are intended to be 1 ULP apart (which makes sense for rounding), barring bugs, allowing us to pick either one and still remain within 1 ULP of the exact result. Remove the slow paths as the accuracy is better than 1 ULP even without them, which is enough for glibc. Also remove code comments claiming correctly rounded results. After slow path removal, checking the accuracy of 14.400.000.000 random asin() and acos() inputs showed only three incorrectly rounded (error > 0.5 ULP) results: - asin(-0x1.ee2b43286db75p-1) (0.500002 ULP, same as before) - asin(-0x1.f692ba202abcp-4) (0.500003 ULP, same as before) - asin(-0x1.9915e876fc062p-1) (0.50000000001 ULP, previously exact) The first two had the same error even before this commit, and they did not use the slow path at all. Checking 4934 known randomly found previously-slow-path asin inputs shows 25 calls with incorrectly rounded results, with a maximum error of 0.500000002 ULP (for 0x1.fcd5742999ab8p-1). The previous slow-path code rounded all these inputs correctly (error < 0.5 ULP). The observed average speed increase was 130x. Checking 36240 known randomly found previously-slow-path acos inputs shows 42 calls with incorrectly rounded results, with a maximum error of 0.500000008 ULP (for 0x1.f63845056f35ep-1). The previous "exact" slow-path code showed 34 calls with incorrectly rounded results, with the same maximum error of 0.500000008 ULP (for 0x1.f63845056f35ep-1). The observed average speed increase was 130x. The functions could likely be trimmed more while keeping acceptable accuracy, but this at least gets rid of the egregiously slow cases. Tested on x86_64.	2020-12-18 12:09:23 +05:30
Joseph Myers	6c010c5dde	Use C2x return value from getpayload of non-NaN (bug 26073). In TS 18661-1, getpayload had an unspecified return value for a non-NaN argument, while C2x requires the return value -1 in that case. This patch implements the return value of -1. I don't think this is worth having a new symbol version that's an alias of the old one, although occasionally we do that in such cases where the new function semantics are a refinement of the old ones (to avoid programs relying on the new semantics running on older glibc versions but not behaving as intended). Tested for x86_64 and x86; also ran math/ tests for aarch64 and powerpc.	2020-07-06 16:18:02 +00:00
Vineet Gupta	e93c264336	ieee754/dbl-64: Reduce the scope of temporary storage variables This came to light when adding hard-flaot support to ARC glibc port without hardware sqrt support causing glibc build to fail: \| ../sysdeps/ieee754/dbl-64/e_sqrt.c: In function '__ieee754_sqrt': \| ../sysdeps/ieee754/dbl-64/e_sqrt.c:58:54: error: unused variable 'ty' [-Werror=unused-variable] \| double y, t, del, res, res1, hy, z, zz, p, hx, tx, ty, s; The reason being EMULV() macro uses the hardware provided __builtin_fma() variant, leaving temporary variables 'p, hx, tx, hy, ty' unused hence compiler warning and ensuing error. The intent of the patch was to fix that error, but EMULV is pervasive and used fair bit indirectly via othe rmacros, hence this patch. Functionally it should not result in code gen changes and if at all those would be better since the scope of those temporaries is greatly reduced now Built tested with aarch64-linux-gnu arm-linux-gnueabi arm-linux-gnueabihf hppa-linux-gnu x86_64-linux-gnu arm-linux-gnueabihf riscv64-linux-gnu-rv64imac-lp64 riscv64-linux-gnu-rv64imafdc-lp64 powerpc-linux-gnu microblaze-linux-gnu nios2-linux-gnu hppa-linux-gnu Also as suggested by Joseph [1] used --strip and compared the libs with and w/o patch and they are byte-for-byte unchanged (with gcc 9). \| for i in `find . -name libm-2.31.9000.so`; \| do \| echo $i; diff $i /SCRATCH/vgupta/gnu2/install/glibcs/$i ; echo $?; \| done \| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so \| 0 \| ./arm-linux-gnueabi/lib/libm-2.31.9000.so \| 0 \| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so \| 0 \| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so \| 0 \| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so \| 0 \| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so \| 0 \| ./powerpc-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./microblaze-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./nios2-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./hppa-linux-gnu/lib/libm-2.31.9000.so \| 0 \| ./s390x-linux-gnu/lib64/libm-2.31.9000.so [1] https://sourceware.org/pipermail/libc-alpha/2019-November/108267.html	2020-06-15 13:09:21 -07:00
Vineet Gupta	628d90c5f9	ieee754: provide gcc builtins based generic fma functions Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-06-03 10:23:28 -07:00
Vineet Gupta	3374868668	ieee754: provide gcc builtins based generic sqrt functions Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2020-06-03 10:23:22 -07:00

1 2 3 4 5 ...

548 Commits