glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-22 04:50:07 +00:00

Author	SHA1	Message	Date
Paul Zimmermann	392b3f0971	replace tgammaf by the CORE-MATH implementation The CORE-MATH implementation is correctly rounded (for any rounding mode). This can be checked by exhaustive tests in a few minutes since there are less than 2^32 values to check against for example GNU MPFR. This patch also adds some bench values for tgammaf. Tested on x86_64 and x86 (cfarm26). With the initial GNU libc code it gave on an Intel(R) Core(TM) i7-8700: "tgammaf": { "": { "duration": 3.50188e+09, "iterations": 2e+07, "max": 602.891, "min": 65.1415, "mean": 175.094 } } With the new code: "tgammaf": { "": { "duration": 3.30825e+09, "iterations": 5e+07, "max": 211.592, "min": 32.0325, "mean": 66.1649 } } With the initial GNU libc code it gave on cfarm26 (i686): "tgammaf": { "": { "duration": 3.70505e+09, "iterations": 6e+06, "max": 2420.23, "min": 243.154, "mean": 617.509 } } With the new code: "tgammaf": { "": { "duration": 3.24497e+09, "iterations": 1.8e+07, "max": 1238.15, "min": 101.155, "mean": 180.276 } } Signed-off-by: Alexei Sibidanov <sibid@uvic.ca> Signed-off-by: Paul Zimmermann <Paul.Zimmermann@inria.fr> Changes in v2: - include <math.h> (fix the linknamespace failures) - restored original benchtests/strcoll-inputs/filelist#en_US.UTF-8 file - restored original wrapper code (math/w_tgammaf_compat.c), except for the dealing with the sign - removed the tgammaf/float entries in all libm-test-ulps files - address other comments from Joseph Myers (https://sourceware.org/pipermail/libc-alpha/2024-July/158736.html) Changes in v3: - pass NULL argument for signgam from w_tgammaf_compat.c - use of math_narrow_eval - added more comments Changes in v4: - initialize local_signgam to 0 in math/w_tgamma_template.c - replace sysdeps/ieee754/dbl-64/gamma_productf.c by dummy file Changes in v5: - do not mention local_signgam any more in math/w_tgammaf_compat.c - initialize local_signgam to 1 instead of 0 in w_tgamma_template.c and added comment Changes in v6: - pass NULL as 2nd argument of __ieee754_gammaf_r in w_tgammaf_compat.c, and check for NULL in e_gammaf_r.c Changes in v7: - added Signed-off-by line for Alexei Sibidanov (author of the code) Changes in v8: - added Signed-off-by line for Paul Zimmermann (submitted of the patch) Changes in v9: - address comments from review by Adhemerval Zanella Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-10-11 11:12:32 +02:00
Wilco Dijkstra	44fa9c1080	math: Improve layout of expf data GCC aligns global data to 16 bytes if their size is >= 16 bytes. This patch changes the exp2f_data struct slightly so that the fields are better aligned. As a result on targets that support them, load-pair instructions accessing poly_scaled and invln2_scaled are now 16-byte aligned. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-10-01 13:39:26 +01:00
Andreas K. Hüttel	98ffc1bfeb	Convert to autoconf 2.72 (vanilla release, no distribution patches) As discussed at the patch review meeting Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org> Reviewed-by: Simon Chopin <simon.chopin@canonical.com>	2024-06-17 21:15:28 +02:00
Joseph Myers	7ec903e028	Implement C23 exp2m1, exp10m1 C23 adds various <math.h> function families originally defined in TS 18661-4. Add the exp2m1 and exp10m1 functions (exp2(x)-1 and exp10(x)-1, like expm1). As with other such functions, these use type-generic templates that could be replaced with faster and more accurate type-specific implementations in future. Test inputs are copied from those for expm1, plus some additions close to the overflow threshold (copied from exp2 and exp10) and also some near the underflow threshold. exp2m1 has the unusual property of having an input (M_MAX_EXP) where whether the function overflows (under IEEE semantics) depends on the rounding mode. Although these could reasonably be XFAILed in the testsuite (as we do in some cases for arguments very close to a function's overflow threshold when an error of a few ulps in the implementation can result in the implementation not agreeing with an ideal one on whether overflow takes place - the testsuite isn't smart enough to handle this automatically), since these functions aren't required to be correctly rounding, I made the implementation check for and handle this case specially. The Makefile ordering expected by lint-makefiles for the new functions is a bit peculiar, but I implemented it in this patch so that the test passes; I don't know why log2 also needed moving in one Makefile variable setting when it didn't in my previous patches, but the failure showed a different place was expected for that function as well. The powerpc64le IFUNC setup seems not to be as self-contained as one might hope; it shouldn't be necessary to add IFUNCs for new functions such as these simply to get them building, but without setting up IFUNCs for the new functions, there were undefined references to __GI___expm1f128 (that IFUNC machinery results in no such function being defined, but doesn't stop include/math.h from doing the redirection resulting in the exp2m1f128 and exp10m1f128 implementations expecting to call it). Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-06-17 16:31:49 +00:00
Joseph Myers	55eb99e9a9	Implement C23 log10p1 C23 adds various <math.h> function families originally defined in TS 18661-4. Add the log10p1 functions (log10(1+x): like log1p, but for base-10 logarithms). This is directly analogous to the log2p1 implementation (except that whereas log2p1 has a smaller underflow range than log1p, log10p1 has a larger underflow range). The test inputs are copied from those for log1p and log2p1, plus a few more inputs in that wider underflow range. Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-06-17 13:48:13 +00:00
Joseph Myers	bb014f50c4	Implement C23 logp1 C23 adds various <math.h> function families originally defined in TS 18661-4. Add the logp1 functions (aliases for log1p functions - the name is intended to be more consistent with the new log2p1 and log10p1, where clearly it would have been very confusing to name those functions log21p and log101p). As aliases rather than new functions, the content of this patch is somewhat different from those actually adding new functions. Tests are shared with log1p, so this patch does mechanically update all affected libm-test-ulps files to expect the same errors for both functions. The vector versions of log1p on aarch64 and x86_64 are not updated to have logp1 aliases (and thus there are no corresponding header, tests, abilist or ulps changes for vector functions either). It would be reasonable for such vector aliases and corresponding changes to other files to be made separately. For now, the log1p tests instead avoid testing logp1 in the vector case (a Makefile change is needed to avoid problems with grep, used in generating the .c files for vector function tests, matching more than one ALL_RM_TEST line in a file testing multiple functions with the same inputs, when it assumes that the .inc file only has a single such line). Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-06-17 13:47:09 +00:00
Szabolcs Nagy	2a9943b4a0	math: Fix exp10 undefined left shift Left shift of ki is undefined when ki<0, copy the logic from exp, which uses unsigned arithmetics, to fix it. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-06-04 15:33:26 +01:00
H.J. Lu	23c60af6dc	sysdeps/ieee754/ldbl-opt/Makefile: Split and sort libnldbl-calls Put each item on a separate line and sort libnldbl-calls. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-24 10:25:40 -07:00
H.J. Lu	639c143db3	sysdeps/ieee754/ldbl-opt/Makefile: Remove test-nldbl-redirect-static Remove $(objpfx)test-nldbl-redirect-static checked in by accident. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-24 06:36:18 -07:00
H.J. Lu	acfb169b3c	sysdeps/ieee754/ldbl-opt/Makefile: Split and sort tests Put each test on a separate line and sort tests. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-24 06:31:49 -07:00
Adhemerval Zanella	eaa8113bf0	math: Provide missing math symbols on libc.a (BZ 31781) The libc.a for alpha, s390, and sparcv9 does not provide copysignf64x, copysignf128, frexpf64x, frexpf128, modff64x, and modff128. Checked with a static build for the affected ABIs. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-23 09:36:08 -03:00
H.J. Lu	43d41ae6d7	Don't provide XXXf128_do_not_use aliases [BZ #31757 ] Don't provide __nexttowardf128_do_not_use, nexttowardf128_do_not_use, finitef128_do_not_use, isinff128_do_not_use and isnanf128_do_not_use. This fixes BZ #31757. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-05-22 06:12:17 -07:00
Adhemerval Zanella	5d4999e519	math: Fix isnanf128 static build (BZ 31774) Some static implementation of float128 routines might call __isnanf128, which is not provided by the static object. Checked on x86_64-linux-gnu. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>	2024-05-21 16:53:27 -03:00
Adhemerval Zanella	0b716305df	math: Fix i386 and m68k fmod/fmodf on static build (BZ 31488) The commit `16439f419b` removed the static fmod/fmodf on i386 and m68k with and empty w_fmod.c (required for the ABIs that uses the newly implementation). This patch fixes by adding the required symbols on the arch-specific w_fmod{f}_compat.c implementation. To statically build fmod fails on some ABI (alpha, s390, sparc) because it does not export the ldexpf128, this is also fixed by this patch. Checked on i686-linux-gnu and with a build for m68k-linux-gnu. Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Tested-by: Aurelien Jarno <aurelien@aurel32.net>	2024-05-21 13:43:39 -03:00
Joseph Myers	79c52daf47	Implement C23 log2p1 C23 adds various <math.h> function families originally defined in TS 18661-4. Add the log2p1 functions (log2(1+x): like log1p, but for base-2 logarithms). This illustrates the intended structure of implementations of all these function families: define them initially with a type-generic template implementation. If someone wishes to add type-specific implementations, it is likely such implementations can be both faster and more accurate than the type-generic one and can then override it for types for which they are implemented (adding benchmarks would be desirable in such cases to demonstrate that a new implementation is indeed faster). The test inputs are copied from those for log1p. Note that these changes make gen-auto-libm-tests depend on MPFR 4.2 (or later). The bulk of the changes are fairly generic for any such new function. (sysdeps/powerpc/nofpu/Makefile only needs changing for those type-generic templates that use fabs.) Tested for x86_64 and x86, and with build-many-glibcs.py.	2024-05-20 13:41:39 +00:00
H.J. Lu	4e21cb95e2	nearbyint: Don't define alias when used in IFUNC [BZ #31759 ] Fix BZ #31759 by not defining nearbyint aliases when used in IFUNC. Signed-off-by: H.J. Lu <hjl.tools@gmail.com> Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>	2024-05-20 05:21:41 -07:00
Joseph Myers	83d8d289b2	Rename c2x / gnu2x tests to c23 / gnu23 Complete the internal renaming from "C2X" and related names in GCC by renaming -c2x and -gnu2x tests to -c23 and -gnu23. Tested for x86_64, and with build-many-glibcs.py for powerpc64le.	2024-02-01 17:55:57 +00:00
Joseph Myers	42cc619dfb	Refer to C23 in place of C2X in glibc WG14 decided to use the name C23 as the informal name of the next revision of the C standard (notwithstanding the publication date in 2024). Update references to C2X in glibc to use the C23 name. This is intended to update everything except where it involves renaming files (the changes involving renaming tests are intended to be done separately). In the case of the _ISOC2X_SOURCE feature test macro - the only user-visible interface involved - support for that macro is kept for backwards compatibility, while adding _ISOC23_SOURCE. Tested for x86_64.	2024-02-01 11:02:01 +00:00
Wilco Dijkstra	08ddd26814	math: remove exp10 wrappers Remove the error handling wrapper from exp10. This is very similar to the changes done to exp and exp2, except that we also need to handle pow10 and pow10l. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2024-01-12 16:02:12 +00:00
Paul Eggert	dff8da6b3e	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
Joe Ramsay	63d0a35d5f	math: Add new exp10 implementation New implementation is based on the existing exp/exp2, with different reduction constants and polynomial. Worst-case error in round-to- nearest is 0.513 ULP. The exp/exp2 shared table is reused for exp10 - .rodata size of e_exp_data increases by 64 bytes. As for exp/exp2, targets with single-instruction rounding/conversion intrinsics can use them by toggling TOINT_INTRINSICS=1 and adding the necessary code to their math_private.h. Improvements on Neoverse V1 compared to current GLIBC master: exp10 thruput: 3.3x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] exp10 latency: 1.8x in [-0x1.439b746e36b52p+8 0x1.34413509f79ffp+8] Tested on: aarch64-linux-gnu (TOINT_INTRINSICS, fma contraction) and x86_64-linux-gnu (!TOINT_INTRINSICS, no fma contraction) Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-12-04 15:52:11 +00:00
Andreas Schwab	5aa1ddfcb3	Avoid maybe-uninitialized warning in __kernel_rem_pio2 With GCC 14 on 32-bit x86 the compiler emits a maybe-uninitialized warning: ../sysdeps/ieee754/dbl-64/k_rem_pio2.c: In function '__kernel_rem_pio2': ../sysdeps/ieee754/dbl-64/k_rem_pio2.c:364:20: error: 'fq' may be used uninitialized [-Werror=maybe-uninitialized] 364 \| y[0] = fq[0]; y[1] = fq[1]; y[2] = fw; \| ~~^~~ This is similar to the warning that is suppressed in the other branch of the switch. Help the compiler knowing that the variable is always initialized, which also makes the suppression obsolete.	2023-10-16 09:59:32 +02:00
H.J. Lu	a8ecb126d4	x86_64: Add log1p with FMA On Skylake, it changes log1p bench performance by: Before After Improvement max 63.349 58.347 8% min 4.448 5.651 -30% mean 12.0674 10.336 14% The minimum code path is if (hx < 0x3FDA827A) /* x < 0.41422 / { if (__glibc_unlikely (ax >= 0x3ff00000)) / x <= -1.0 / { ... } if (__glibc_unlikely (ax < 0x3e200000)) / \|x\| < 2*-29 / { math_force_eval (two54 + x); /* raise inexact / if (ax < 0x3c900000) / \|x\| < 2*-54 / { ... } else return x - x * x * 0.5; FMA and non-FMA code sequences look similar. Non-FMA version is slightly faster. Since log1p is called by asinh and atanh, it improves asinh performance by: Before After Improvement max 75.645 63.135 16% min 10.074 10.071 0% mean 15.9483 14.9089 6% and improves atanh performance by: Before After Improvement max 91.768 75.081 18% min 15.548 13.883 10% mean 18.3713 16.8011 8%	2023-08-21 10:44:26 -07:00
H.J. Lu	1b214630ce	x86_64: Add expm1 with FMA On Skylake, it improves expm1 bench performance by: Before After Improvement max 70.204 68.054 3% min 20.709 16.2 22% mean 22.1221 16.7367 24% NB: Add extern long double __expm1l (long double); extern long double __expm1f128 (long double); for __typeof (__expm1l) and __typeof (__expm1f128) when __expm1 is defined since __expm1 may be expanded in their declarations which causes the build failure.	2023-08-14 08:14:19 -07:00
Siddhesh Poyarekar	c6cb8783b5	configure: Use autoconf 2.71 Bump autoconf requirement to 2.71 to allow regenerating configure on more recent distributions. autoconf 2.71 has been in Fedora since F36 and is the current version in Debian stable (bookworm). It appears to be current in Gentoo as well. All sysdeps configure and preconfigure scripts have also been regenerated; all changes are trivial transformations that do not affect functionality. Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-07-17 10:08:10 -04:00
Frédéric Bérat	02261d1bd9	sysdeps/ieee754/ldbl-128ibm-compat: Fix warn unused result Return value from scanf and asprintf routines are now properly checked in test-scanf-ldbl-compat-template.c and test-printf-ldbl-compat.c. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-07-05 16:59:48 +02:00
Frédéric Bérat	ba745eff46	misc/bits/syslog.h: Clearly separate declaration from definition This allows to include bits/syslog-decl.h in include/sys/syslog.h and therefore be able to create the libc_hidden_builtin_proto (__syslog_chk) prototype. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-07-05 16:59:48 +02:00
Frédéric Bérat	505c884aeb	stdio: Ensure _chk routines have their hidden builtin definition available If libc_hidden_builtin_{def,proto} isn't properly set for _chk routines, there are unwanted PLT entries in libc.so. There is a special case with __asprintf_chk: If ldbl_* macros are used for asprintf, ABI gets broken on s390x, if it isn't, ppc64le isn't building due to multiple asm redirections. This is due to the inclusion of bits/stdio-lbdl.h for ppc64le whereas it isn't for s390x. This header creates redirections, which are not compatible with the ones generated using libc_hidden_def. Yet, we can't use libc_hidden_ldbl_proto on s390x since it will not create a simple strong alias (e.g. as done on x86_64), but a versioned alias, leading to ABI breakage. This results in errors on s390x: /usr/bin/ld: glibc/iconv/../libio/bits/stdio2.h:137: undefined reference to `__asprintf_chk' Original __asprintf_chk symbols: 00000000001395b0 T __asprintf_chk 0000000000177e90 T __nldbl___asprintf_chk __asprintf_chk symbols with ldbl_* macros: 000000000012d590 t ___asprintf_chk 000000000012d590 t __asprintf_chk@@GLIBC_2.4 000000000012d590 t __GI___asprintf_chk 000000000012d590 t __GL____asprintf_chk___asprintf_chk 0000000000172240 T __nldbl___asprintf_chk __asprintf_chk symbols with the patch: 000000000012d590 t ___asprintf_chk 000000000012d590 T __asprintf_chk 000000000012d590 t __GI___asprintf_chk 0000000000172240 T __nldbl___asprintf_chk Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-07-05 16:59:48 +02:00
Frédéric Bérat	ba96ff24b2	sysdeps: Ensure ieee128_chk routines to be properly named The _chk routines naming doesn't match the name that would be generated using libc_hidden_ldbl_proto. Since the macro is needed for some of these _chk functions for _FORTIFY_SOURCE to be enabled, that needed to be fixed. While at it, all the _chk function get renamed appropriately for consistency, even if not strictly necessary. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org> Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>	2023-07-05 16:59:48 +02:00
Frédéric Bérat	20c894d21e	Exclude routines from fortification Since the _FORTIFY_SOURCE feature uses some routines of Glibc, they need to be excluded from the fortification. On top of that: - some tests explicitly verify that some level of fortification works appropriately, we therefore shouldn't modify the level set for them. - some objects need to be build with optimization disabled, which prevents _FORTIFY_SOURCE to be used for them. Assembler files that implement architecture specific versions of the fortified routines were not excluded from _FORTIFY_SOURCE as there is no C header included that would impact their behavior. Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>	2023-07-05 16:59:48 +02:00
Joe Ramsay	aed39a3aa3	aarch64: Add vector implementations of cos routines Replace the loop-over-scalar placeholder routines with optimised implementations from Arm Optimized Routines (AOR). Also add some headers containing utilities for aarch64 libmvec routines, and update libm-test-ulps. Data tables for new routines are used via a pointer with a barrier on it, in order to prevent overly aggressive constant inlining in GCC. This allows a single adrp, combined with offset loads, to be used for every constant in the table. Special-case handlers are marked NOINLINE in order to confine the save/restore overhead of switching from vector to normal calling standard. This way we only incur the extra memory access in the exceptional cases. NOINLINE definitions have been moved to math_private.h in order to reduce duplication. AOR exposes a config option, WANT_SIMD_EXCEPT, to enable selective masking (and later fixing up) of invalid lanes, in order to trigger fp exceptions correctly (AdvSIMD only). This is tested and maintained in AOR, however it is configured off at source level here for performance reasons. We keep the WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify the upstreaming process from AOR to glibc. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>	2023-06-30 09:04:10 +01:00
Paul Pluzhnikov	65cc53fe7c	Fix misspellings in sysdeps/ -- BZ 25337	2023-05-30 23:02:29 +00:00
Sachin Monga	1a57ab0c92	Added Redirects to longdouble error functions [BZ #29033 ] This patch redirects the error functions to the appropriate longdouble variants which enables the compiler to optimize for the abi ieeelongdouble. Signed-off-by: Sachin Monga <smonga@linux.ibm.com>	2023-05-10 13:59:48 -05:00
Wilco Dijkstra	76d0f094dd	math: Improve fmod(f) performance Optimize the fast paths (x < y) and (x/y < 2^12). Delay handling of special cases to reduce the number of instructions executed before the fast paths. Performance improvements for fmod: Skylake Zen2 Neoverse V1 subnormals 11.8% 4.2% 11.5% normal 3.9% 0.01% -0.5% close-exponents 6.3% 5.6% 19.4% Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2023-04-17 13:03:10 +01:00
Adhemerval Zanella Netto	16439f419b	math: Remove the error handling wrapper from fmod and fmodf The error handling is moved to sysdeps/ieee754 version with no SVID support. The compatibility symbol versions still use the wrapper with SVID error handling around the new code. There is no new symbol version nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv). The ia64 is unchanged, since it still uses the arch specific __libm_error_region on its implementation. For both i686 and m68k, which provive arch specific implementation, wrappers are added so no new symbol are added (which would require to change the implementations). It shows an small improvement, the results for fmod: Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- x86_64 (Ryzen 9) \| subnormals \| 12.5049 \| 9.40992 x86_64 (Ryzen 9) \| normal \| 296.939 \| 296.738 x86_64 (Ryzen 9) \| close-exponents \| 16.0244 \| 13.119 aarch64 (N1) \| subnormal \| 6.81778 \| 4.33313 aarch64 (N1) \| normal \| 155.620 \| 152.915 aarch64 (N1) \| close-exponents \| 8.21306 \| 5.76138 armhf (N1) \| subnormal \| 15.1083 \| 14.5746 armhf (N1) \| normal \| 244.833 \| 241.738 armhf (N1) \| close-exponents \| 21.8182 \| 22.457 Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2023-04-03 16:45:27 -03:00
Adhemerval Zanella Netto	cf9cf33199	math: Improve fmodf This uses a new algorithm similar to already proposed earlier [1]. With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers), the simplest implementation is: mx * 2^ex == 2 * mx * 2^(ex - 1) while (ex > ey) { mx = 2; --ex; mx %= my; } With mx/my being mantissa of double floating pointer, on each step the argument reduction can be improved 8 (which is sizeof of uint32_t minus MANTISSA_WIDTH plus the signal bit): while (ex > ey) { mx << 8; ex -= 8; mx %= my; } / The implementation uses builtin clz and ctz, along with shifts to convert hx/hy back to doubles. Different than the original patch, this path assume modulo/divide operation is slow, so use multiplication with invert values. I see the following performance improvements using fmod benchtests (result only show the 'mean' result): Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- x86_64 (Ryzen 9) \| subnormals \| 17.2549 \| 12.0318 x86_64 (Ryzen 9) \| normal \| 85.4096 \| 49.9641 x86_64 (Ryzen 9) \| close-exponents \| 19.1072 \| 15.8224 aarch64 (N1) \| subnormal \| 10.2182 \| 6.81778 aarch64 (N1) \| normal \| 60.0616 \| 20.3667 aarch64 (N1) \| close-exponents \| 11.5256 \| 8.39685 I also see similar improvements on arm-linux-gnueabihf when running on the N1 aarch64 chips, where it a lot of soft-fp implementation (for modulo, and multiplication): Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- armhf (N1) \| subnormal \| 11.6662 \| 10.8955 armhf (N1) \| normal \| 69.2759 \| 34.1524 armhf (N1) \| close-exponents \| 13.6472 \| 18.2131 Instead of using the math_private.h definitions, I used the math_config.h instead which is used on newer math implementations. Co-authored-by: kirill <kirill.okhotnikov@gmail.com> [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2023-04-03 16:45:18 -03:00
Adhemerval Zanella Netto	34b9f8bc17	math: Improve fmod This uses a new algorithm similar to already proposed earlier [1]. With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers), the simplest implementation is: mx * 2^ex == 2 * mx * 2^(ex - 1) while (ex > ey) { mx = 2; --ex; mx %= my; } With mx/my being mantissa of double floating pointer, on each step the argument reduction can be improved 11 (which is sizeo of uint64_t minus MANTISSA_WIDTH plus the signal bit): while (ex > ey) { mx << 11; ex -= 11; mx %= my; } / The implementation uses builtin clz and ctz, along with shifts to convert hx/hy back to doubles. Different than the original patch, this path assume modulo/divide operation is slow, so use multiplication with invert values. I see the following performance improvements using fmod benchtests (result only show the 'mean' result): Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- x86_64 (Ryzen 9) \| subnormals \| 19.1584 \| 12.5049 x86_64 (Ryzen 9) \| normal \| 1016.51 \| 296.939 x86_64 (Ryzen 9) \| close-exponents \| 18.4428 \| 16.0244 aarch64 (N1) \| subnormal \| 11.153 \| 6.81778 aarch64 (N1) \| normal \| 528.649 \| 155.62 aarch64 (N1) \| close-exponents \| 11.4517 \| 8.21306 I also see similar improvements on arm-linux-gnueabihf when running on the N1 aarch64 chips, where it a lot of soft-fp implementation (for modulo, clz, ctz, and multiplication): Architecture \| Input \| master \| patch -----------------\|-----------------\|----------\|-------- armhf (N1) \| subnormal \| 15.908 \| 15.1083 armhf (N1) \| normal \| 837.525 \| 244.833 armhf (N1) \| close-exponents \| 16.2111 \| 21.8182 Instead of using the math_private.h definitions, I used the math_config.h instead which is used on newer math implementations. Co-authored-by: kirill <kirill.okhotnikov@gmail.com> [1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2023-04-03 16:36:24 -03:00
Adhemerval Zanella Netto	88677348b4	Move libc_freeres_ptrs and libc_subfreeres to hidden/weak functions They are both used by __libc_freeres to free all library malloc allocated resources to help tooling like mtrace or valgrind with memory leak tracking. The current scheme uses assembly markers and linker script entries to consolidate the free routine function pointers in the RELRO segment and to be freed buffers in BSS. This patch changes it to use specific free functions for libc_freeres_ptrs buffers and call the function pointer array directly with call_function_static_weak. It allows the removal of both the internal macros and the linker script sections. Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-03-27 13:57:55 -03:00
Joseph Myers	dee2bea048	C2x scanf binary constant handling C2x adds binary integer constants starting with 0b or 0B, and supports those constants for the %i scanf format (in addition to the %b format, which isn't yet implemented for scanf in glibc). Implement that scanf support for glibc. As with the strtol support, this is incompatible with previous C standard versions, in that such an input string starting with 0b or 0B was previously required to be parsed as 0 (with the rest of the input potentially matching subsequent parts of the scanf format string). Thus this patch adds 12 new __isoc23_* functions per long double format (12, 24 or 36 depending on how many long double formats the glibc configuration supports), with appropriate header redirection support (generally very closely following that for the __isoc99_* scanf functions - note that __GLIBC_USE (DEPRECATED_SCANF) takes precedence over __GLIBC_USE (C2X_STRTOL), so the case of GNU extensions to C89 continues to get old-style GNU %a and does not get this new feature). The function names would remain as __isoc23_* even if C2x ends up published in 2024 rather than 2023. When scanf %b support is added, I think it will be appropriate for all versions of scanf to follow C2x rules for inputs to the %b format (given that there are no compatibility concerns for a new format). Tested for x86_64 (full glibc testsuite). The first version was also tested for powerpc (32-bit) and powerpc64le (stdio-common/ and wcsmbs/ tests), and with build-many-glibcs.py.	2023-03-02 19:10:37 +00:00
Adhemerval Zanella	30546ac2d1	math: Suppress -O0 warnings for soft-fp fsqrt [BZ #19444 ] The patch suppress the same warnings from `87c266d758`, that shows issues for microblaze, mips soft-fp, nios2, and or1k. Reviewed-by: Carlos O'Donell <carlos@redhat.com>	2023-01-11 17:50:51 -03:00
Joseph Myers	6d7e8eda9b	Update copyright dates with scripts/update-copyrights	2023-01-06 21:14:39 +00:00
Joseph Myers	8f27dc1af5	Fix ldbl-128 built-in function use Fix the following issues with built-in function use in sysdeps/ieee754/ldbl-128 and sysdeps/ieee754/float128: * fabsl used __builtin_fabsf128 unconditionally, breaking the build with GCC 6 for several architectures; it should use __builtin_fabsl with an appropriate redirection in float128_private.h. (I'm not particularly concerned with building glibc with GCC 6; rather, I want to be able to run the tgmath.h tests with GCC 6, which is a significantly different case for tgmath.h compared to GCC 7 and later because of the lack of _FloatN / _FloatNx support in the compiler, and at present running the tests with a compiler means building glibc with that compiler.) * Some (conditional) uses of built-in functions had been added to ldbl-128 without appropriate float128_private.h remapping (there was remapping for the macros controlling whether the built-in functions are used, just not for the functions themselves). * s_llrintl.c called __builtin_round not __builtin_llrintl, which is obviously wrong. Tested with build-many-glibcs.py for aarch64-linux-gnu, GCC 6 (where it fixes the glibc build) and GCC 12, and with the glibc testsuite for x86_64.	2023-01-05 00:02:54 +00:00
Florian Weimer	e88b9f0e5c	stdio-common: Convert vfprintf and related functions to buffers vfprintf is entangled with vfwprintf (of course), __printf_fp, __printf_fphex, __vstrfmon_l_internal, and the strfrom family of functions. The latter use the internal snprintf functionality, so vsnprintf is converted as well. The simples conversion is __printf_fphex, followed by __vstrfmon_l_internal and __printf_fp, and finally __vfprintf_internal and __vfwprintf_internal. __vsnprintf_internal and strfrom* are mostly consuming the new interfaces, so they are comparatively simple. __printf_fp is a public symbol, so the FILE -based interface had to preserved. The __printf_fp rewrite does not change the actual binary-to-decimal conversion algorithm, and digits are still not emitted directly to the target buffer. However, the staging buffer now uses bytes instead of wide characters, and one buffer copy is eliminated. The changes are at least performance-neutral in my testing. Floating point printing and snprintf improved measurably, so that this Lua script for i=1,5000000 do print(i, i math.pi) end runs about 5% faster for me. To preserve fprintf performance for a simple "%d" format, this commit has some logic changes under LABEL (unsigned_number) to avoid additional function calls. There are certainly some very easy performance improvements here: binary, octal and hexadecimal formatting can easily avoid the temporary work buffer (the number of digits can be computed ahead-of-time using one of the __builtin_clz* built-ins). Decimal formatting can use a specialized version of _itoa_word for base 10. The existing (inconsistent) width handling between strfmon and printf is preserved here. __print_fp_buffer_1 would have to use __translated_number_width to achieve ISO conformance for printf. Test expectations in libio/tst-vtables-common.c are adjusted because the internal staging buffer merges all virtual function calls into one. In general, stack buffer usage is greatly reduced, particularly for unbuffered input streams. __printf_fp can still use a large buffer in binary128 mode for %g, though. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>	2022-12-19 18:56:54 +01:00
Xiaolin Tang	2e2485ce05	Use GCC builtins for logb functions if desired. This patch is using the corresponding GCC builtin for logbf, logb, logbl and logbf128 if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins-function.h. Co-Authored-By: Xi Ruoyao <xry111@xry111.site>	2022-11-29 16:00:28 +08:00
Xiaolin Tang	a1981ecbfd	Use GCC builtins for llrint functions if desired. This patch is using the corresponding GCC builtin for llrintf, llrint, llrintl and llrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins-function.h. Co-Authored-By: Xi Ruoyao <xry111@xry111.site>	2022-11-29 16:00:28 +08:00
Xiaolin Tang	2b23ab1fea	Use GCC builtins for lrint functions if desired. This patch is using the corresponding GCC builtin for lrintf, lrint, lrintl and lrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one in math-use-builtins-function.h. Co-Authored-By: Xi Ruoyao <xry111@xry111.site>	2022-11-29 16:00:28 +08:00
Joseph Myers	f66780ba46	Fix build with GCC 13 _FloatN, _FloatNx built-in functions GCC 13 has added more _FloatN and _FloatNx versions of existing <math.h> and <complex.h> built-in functions, for use in libstdc++-v3. This breaks the glibc build because of how those functions are defined as aliases to functions with the same ABI but different types. Add appropriate -fno-builtin-* options for compiling relevant files, as already done for the case of long double functions aliasing double ones and based on the list of files used there. I fixed some mistakes in that list of double files that I noticed while implementing this fix, but there may well be more such (harmless) cases, in this list or the new one (files that don't actually exist or don't define the named functions as aliases so don't need the options). I did try to exclude cases where glibc doesn't define certain functions for _FloatN or _FloatNx types at all from the new uses of -fno-builtin-* options. As with the options for double files (see the commit message for commit `49348beafe`, "Fix build with GCC 10 when long double = double."), it's deliberate that the options are used even if GCC currently doesn't have a built-in version of a given functions, so providing some level of future-proofing against more such built-in functions being added in future. Tested with build-many-glibcs.py for aarch64-linux-gnu powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu (compilers and glibcs builds) with GCC mainline.	2022-10-31 23:20:08 +00:00
Aurelien Jarno	2b5478569e	Avoid undefined behaviour in ibm128 implementation of llroundl (BZ #29488 ) Detecting an overflow edge case depended on signed overflow of a long long. Replace the additions and the overflow checks by __builtin_add_overflow(). Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>	2022-10-24 20:48:02 +02:00
Michael Hudson-Doyle	b6e37b7805	Fix BZ #29463 in the ibm128 implementation of y1l too Avoid moving code across SET_RESTORE_ROUNDL in order to fix [BZ #29463]. Tested-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>	2022-10-24 10:59:20 -03:00
Szabolcs Nagy	7363a9a9a0	math: Fix asin and acos invalid exception with old gcc This works around a gcc issue where it const folded inf/inf into nan, preventing the invalid exception to be signalled. (x-x)/(x-x) is more robust against optimizations and works for all out of bounds values including x==nan. The gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115 should be fixed on release branches starting from gcc-10, but it is better to change the code in case glibc is built with older gcc. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>	2022-10-17 08:18:52 +01:00

1 2 3 4 5 ...

1215 Commits