The *_chk routines naming doesn't match the name that would be generated
using libc_hidden_ldbl_proto. Since the macro is needed for some of
these *_chk functions for _FORTIFY_SOURCE to be enabled, that needed to
be fixed.
While at it, all the *_chk function get renamed appropriately for
consistency, even if not strictly necessary.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Paul E. Murphy <murphyp@linux.ibm.com>
Since the _FORTIFY_SOURCE feature uses some routines of Glibc, they need to
be excluded from the fortification.
On top of that:
- some tests explicitly verify that some level of fortification works
appropriately, we therefore shouldn't modify the level set for them.
- some objects need to be build with optimization disabled, which
prevents _FORTIFY_SOURCE to be used for them.
Assembler files that implement architecture specific versions of the
fortified routines were not excluded from _FORTIFY_SOURCE as there is no
C header included that would impact their behavior.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Replace the loop-over-scalar placeholder routines with optimised
implementations from Arm Optimized Routines (AOR).
Also add some headers containing utilities for aarch64 libmvec
routines, and update libm-test-ulps.
Data tables for new routines are used via a pointer with a
barrier on it, in order to prevent overly aggressive constant
inlining in GCC. This allows a single adrp, combined with offset
loads, to be used for every constant in the table.
Special-case handlers are marked NOINLINE in order to confine the
save/restore overhead of switching from vector to normal calling
standard. This way we only incur the extra memory access in the
exceptional cases. NOINLINE definitions have been moved to
math_private.h in order to reduce duplication.
AOR exposes a config option, WANT_SIMD_EXCEPT, to enable
selective masking (and later fixing up) of invalid lanes, in
order to trigger fp exceptions correctly (AdvSIMD only). This is
tested and maintained in AOR, however it is configured off at
source level here for performance reasons. We keep the
WANT_SIMD_EXCEPT blocks in routine sources to greatly simplify
the upstreaming process from AOR to glibc.
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
This patch redirects the error functions to the appropriate
longdouble variants which enables the compiler to optimize
for the abi ieeelongdouble.
Signed-off-by: Sachin Monga <smonga@linux.ibm.com>
Optimize the fast paths (x < y) and (x/y < 2^12). Delay handling of special
cases to reduce the number of instructions executed before the fast paths.
Performance improvements for fmod:
Skylake Zen2 Neoverse V1
subnormals 11.8% 4.2% 11.5%
normal 3.9% 0.01% -0.5%
close-exponents 6.3% 5.6% 19.4%
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper
with SVID error handling around the new code. There is no new symbol
version nor compatibility code on !LIBM_SVID_COMPAT targets
(e.g. riscv).
The ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation. For both i686 and m68k,
which provive arch specific implementation, wrappers are added so
no new symbol are added (which would require to change the
implementations).
It shows an small improvement, the results for fmod:
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 12.5049 | 9.40992
x86_64 (Ryzen 9) | normal | 296.939 | 296.738
x86_64 (Ryzen 9) | close-exponents | 16.0244 | 13.119
aarch64 (N1) | subnormal | 6.81778 | 4.33313
aarch64 (N1) | normal | 155.620 | 152.915
aarch64 (N1) | close-exponents | 8.21306 | 5.76138
armhf (N1) | subnormal | 15.1083 | 14.5746
armhf (N1) | normal | 244.833 | 241.738
armhf (N1) | close-exponents | 21.8182 | 22.457
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:
mx * 2^ex == 2 * mx * 2^(ex - 1)
while (ex > ey)
{
mx *= 2;
--ex;
mx %= my;
}
With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 8 (which is sizeof of uint32_t minus
MANTISSA_WIDTH plus the signal bit):
while (ex > ey)
{
mx << 8;
ex -= 8;
mx %= my;
} */
The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles. Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.
I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 17.2549 | 12.0318
x86_64 (Ryzen 9) | normal | 85.4096 | 49.9641
x86_64 (Ryzen 9) | close-exponents | 19.1072 | 15.8224
aarch64 (N1) | subnormal | 10.2182 | 6.81778
aarch64 (N1) | normal | 60.0616 | 20.3667
aarch64 (N1) | close-exponents | 11.5256 | 8.39685
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 11.6662 | 10.8955
armhf (N1) | normal | 69.2759 | 34.1524
armhf (N1) | close-exponents | 13.6472 | 18.2131
Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.
Co-authored-by: kirill <kirill.okhotnikov@gmail.com>
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
This uses a new algorithm similar to already proposed earlier [1].
With x = mx * 2^ex and y = my * 2^ey (mx, my, ex, ey being integers),
the simplest implementation is:
mx * 2^ex == 2 * mx * 2^(ex - 1)
while (ex > ey)
{
mx *= 2;
--ex;
mx %= my;
}
With mx/my being mantissa of double floating pointer, on each step the
argument reduction can be improved 11 (which is sizeo of uint64_t minus
MANTISSA_WIDTH plus the signal bit):
while (ex > ey)
{
mx << 11;
ex -= 11;
mx %= my;
} */
The implementation uses builtin clz and ctz, along with shifts to
convert hx/hy back to doubles. Different than the original patch,
this path assume modulo/divide operation is slow, so use multiplication
with invert values.
I see the following performance improvements using fmod benchtests
(result only show the 'mean' result):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
x86_64 (Ryzen 9) | subnormals | 19.1584 | 12.5049
x86_64 (Ryzen 9) | normal | 1016.51 | 296.939
x86_64 (Ryzen 9) | close-exponents | 18.4428 | 16.0244
aarch64 (N1) | subnormal | 11.153 | 6.81778
aarch64 (N1) | normal | 528.649 | 155.62
aarch64 (N1) | close-exponents | 11.4517 | 8.21306
I also see similar improvements on arm-linux-gnueabihf when running on
the N1 aarch64 chips, where it a lot of soft-fp implementation (for
modulo, clz, ctz, and multiplication):
Architecture | Input | master | patch
-----------------|-----------------|----------|--------
armhf (N1) | subnormal | 15.908 | 15.1083
armhf (N1) | normal | 837.525 | 244.833
armhf (N1) | close-exponents | 16.2111 | 21.8182
Instead of using the math_private.h definitions, I used the
math_config.h instead which is used on newer math implementations.
Co-authored-by: kirill <kirill.okhotnikov@gmail.com>
[1] https://sourceware.org/pipermail/libc-alpha/2020-November/119794.html
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
They are both used by __libc_freeres to free all library malloc
allocated resources to help tooling like mtrace or valgrind with
memory leak tracking.
The current scheme uses assembly markers and linker script entries
to consolidate the free routine function pointers in the RELRO segment
and to be freed buffers in BSS.
This patch changes it to use specific free functions for
libc_freeres_ptrs buffers and call the function pointer array directly
with call_function_static_weak.
It allows the removal of both the internal macros and the linker
script sections.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
C2x adds binary integer constants starting with 0b or 0B, and supports
those constants for the %i scanf format (in addition to the %b format,
which isn't yet implemented for scanf in glibc). Implement that scanf
support for glibc.
As with the strtol support, this is incompatible with previous C
standard versions, in that such an input string starting with 0b or 0B
was previously required to be parsed as 0 (with the rest of the input
potentially matching subsequent parts of the scanf format string).
Thus this patch adds 12 new __isoc23_* functions per long double
format (12, 24 or 36 depending on how many long double formats the
glibc configuration supports), with appropriate header redirection
support (generally very closely following that for the __isoc99_*
scanf functions - note that __GLIBC_USE (DEPRECATED_SCANF) takes
precedence over __GLIBC_USE (C2X_STRTOL), so the case of GNU
extensions to C89 continues to get old-style GNU %a and does not get
this new feature). The function names would remain as __isoc23_* even
if C2x ends up published in 2024 rather than 2023.
When scanf %b support is added, I think it will be appropriate for all
versions of scanf to follow C2x rules for inputs to the %b format
(given that there are no compatibility concerns for a new format).
Tested for x86_64 (full glibc testsuite). The first version was also
tested for powerpc (32-bit) and powerpc64le (stdio-common/ and wcsmbs/
tests), and with build-many-glibcs.py.
The patch suppress the same warnings from 87c266d758,
that shows issues for microblaze, mips soft-fp, nios2, and or1k.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Fix the following issues with built-in function use in
sysdeps/ieee754/ldbl-128 and sysdeps/ieee754/float128:
* fabsl used __builtin_fabsf128 unconditionally, breaking the build
with GCC 6 for several architectures; it should use __builtin_fabsl
with an appropriate redirection in float128_private.h. (I'm not
particularly concerned with building glibc with GCC 6; rather, I
want to be able to run the tgmath.h tests with GCC 6, which is a
significantly different case for tgmath.h compared to GCC 7 and
later because of the lack of _FloatN / _FloatNx support in the
compiler, and at present running the tests with a compiler means
building glibc with that compiler.)
* Some (conditional) uses of built-in functions had been added to
ldbl-128 without appropriate float128_private.h remapping (there was
remapping for the macros controlling whether the built-in functions
are used, just not for the functions themselves).
* s_llrintl.c called __builtin_round not __builtin_llrintl, which is
obviously wrong.
Tested with build-many-glibcs.py for aarch64-linux-gnu, GCC 6 (where
it fixes the glibc build) and GCC 12, and with the glibc testsuite for
x86_64.
vfprintf is entangled with vfwprintf (of course), __printf_fp,
__printf_fphex, __vstrfmon_l_internal, and the strfrom family of
functions. The latter use the internal snprintf functionality,
so vsnprintf is converted as well.
The simples conversion is __printf_fphex, followed by
__vstrfmon_l_internal and __printf_fp, and finally
__vfprintf_internal and __vfwprintf_internal. __vsnprintf_internal
and strfrom* are mostly consuming the new interfaces, so they
are comparatively simple.
__printf_fp is a public symbol, so the FILE *-based interface
had to preserved.
The __printf_fp rewrite does not change the actual binary-to-decimal
conversion algorithm, and digits are still not emitted directly to
the target buffer. However, the staging buffer now uses bytes
instead of wide characters, and one buffer copy is eliminated.
The changes are at least performance-neutral in my testing.
Floating point printing and snprintf improved measurably, so that
this Lua script
for i=1,5000000 do
print(i, i * math.pi)
end
runs about 5% faster for me. To preserve fprintf performance for
a simple "%d" format, this commit has some logic changes under
LABEL (unsigned_number) to avoid additional function calls. There
are certainly some very easy performance improvements here: binary,
octal and hexadecimal formatting can easily avoid the temporary work
buffer (the number of digits can be computed ahead-of-time using one
of the __builtin_clz* built-ins). Decimal formatting can use a
specialized version of _itoa_word for base 10.
The existing (inconsistent) width handling between strfmon and printf
is preserved here. __print_fp_buffer_1 would have to use
__translated_number_width to achieve ISO conformance for printf.
Test expectations in libio/tst-vtables-common.c are adjusted because
the internal staging buffer merges all virtual function calls into
one.
In general, stack buffer usage is greatly reduced, particularly for
unbuffered input streams. __printf_fp can still use a large buffer
in binary128 mode for %g, though.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This patch is using the corresponding GCC builtin for logbf, logb,
logbl and logbf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins-function.h.
Co-Authored-By: Xi Ruoyao <xry111@xry111.site>
This patch is using the corresponding GCC builtin for llrintf, llrint,
llrintl and llrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins-function.h.
Co-Authored-By: Xi Ruoyao <xry111@xry111.site>
This patch is using the corresponding GCC builtin for lrintf, lrint,
lrintl and lrintf128 if the USE_FUNCTION_BUILTIN macros are defined to one
in math-use-builtins-function.h.
Co-Authored-By: Xi Ruoyao <xry111@xry111.site>
GCC 13 has added more _FloatN and _FloatNx versions of existing
<math.h> and <complex.h> built-in functions, for use in libstdc++-v3.
This breaks the glibc build because of how those functions are defined
as aliases to functions with the same ABI but different types. Add
appropriate -fno-builtin-* options for compiling relevant files, as
already done for the case of long double functions aliasing double
ones and based on the list of files used there.
I fixed some mistakes in that list of double files that I noticed
while implementing this fix, but there may well be more such
(harmless) cases, in this list or the new one (files that don't
actually exist or don't define the named functions as aliases so don't
need the options). I did try to exclude cases where glibc doesn't
define certain functions for _FloatN or _FloatNx types at all from the
new uses of -fno-builtin-* options. As with the options for double
files (see the commit message for commit
49348beafe, "Fix build with GCC 10 when
long double = double."), it's deliberate that the options are used
even if GCC currently doesn't have a built-in version of a given
functions, so providing some level of future-proofing against more
such built-in functions being added in future.
Tested with build-many-glibcs.py for aarch64-linux-gnu
powerpc-linux-gnu powerpc64le-linux-gnu x86_64-linux-gnu (compilers
and glibcs builds) with GCC mainline.
Detecting an overflow edge case depended on signed overflow of a long
long. Replace the additions and the overflow checks by
__builtin_add_overflow().
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Avoid moving code across SET_RESTORE_ROUNDL in order to fix
[BZ #29463].
Tested-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This works around a gcc issue where it const folded inf/inf into nan,
preventing the invalid exception to be signalled.
(x-x)/(x-x) is more robust against optimizations and works for all
out of bounds values including x==nan.
The gcc issue https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95115
should be fixed on release branches starting from gcc-10, but it is
better to change the code in case glibc is built with older gcc.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
GCC 13 adds support for _FloatN and _FloatNx types in C++, so breaking
the installed glibc headers that assume such support is not present.
GCC mostly works around this with fixincludes, but that doesn't help
for building glibc and its tests (glibc doesn't itself contain C++
code, but there's C++ code built for tests). Update glibc's
bits/floatn-common.h and bits/floatn.h headers to handle the GCC 13
support directly.
In general the changes match those made by fixincludes, though I think
the ones in sysdeps/powerpc/bits/floatn.h, where the header tests
__LDBL_MANT_DIG__ == 113 or uses #elif, wouldn't match the existing
fixincludes patterns.
Some places involving special C++ handling in relation to _FloatN
support are not changed. There's no need to change the
__HAVE_FLOATN_NOT_TYPEDEF definition (also in a form that wouldn't be
matched by the fixincludes fixes) because it's only used in relation
to macro definitions using features not supported for C++
(__builtin_types_compatible_p and _Generic). And there's no need to
change the inline function overloads for issignaling, iszero and
iscanonical in C++ because cases where types have the same format but
are no longer compatible types are handled automatically by the C++
overload resolution rules.
This patch also does not change the overload handling for iseqsig, and
there I think changes *are* needed, beyond those in this patch or made
by fixincludes. The way that overload is defined, via a template
parameter to a structure type, requires overloads whenever the types
are incompatible, even if they have the same format. So I think we
need to add overloads with GCC 13 for every supported _FloatN and
_FloatNx type, rather than just having one for _Float128 when it has a
different ABI to long double as at present (but for older GCC, such
overloads must not be defined for types that end up defined as
typedefs for another type).
Tested with build-many-glibcs.py: compilers build for
aarch64-linux-gnu ia64-linux-gnu mips64-linux-gnu powerpc-linux-gnu
powerpc64le-linux-gnu x86_64-linux-gnu; glibcs build for
aarch64-linux-gnu ia64-linux-gnu i686-linux-gnu mips-linux-gnu
mips64-linux-gnu-n32 powerpc-linux-gnu powerpc64le-linux-gnu
x86_64-linux-gnu.
math/test-float128-y1 fails on x86_64 and ppc64el with gcc 12 and -O3,
because code inside a block guarded by SET_RESTORE_ROUNDL is being moved
after the rounding mode has been restored. Use math_force_eval to
prevent this (and insert some math_opt_barrier calls to prevent code
from being moved before the rounding mode is set).
Fixes#29463
Reviewed-By: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
The compiler may substitute calls to sin or cos with calls to sincos, thus
we should have the same optimized implementations for sincos. The
optimized implementations may produce results that differ, that also makes
sure that the sincos call aggrees with the sin and cos calls.
Both float, double, and _Float128 are assumed to be supported
(float and double already only uses builtins). Only long double
is parametrized due GCC bug 29253 which prevents its usage on
powerpc.
It allows to remove i686, ia64, x86_64, powerpc, and sparc arch
specific implementation.
On ia64 it also fixes the sNAN handling:
math/test-float64x-fabs
math/test-ldouble-fabs
Checked on x86_64-linux-gnu, i686-linux-gnu, powerpc-linux-gnu,
powerpc64-linux-gnu, sparc64-linux-gnu, and ia64-linux-gnu.
Converting double precision constants to float is now affected by the
runtime dynamic rounding mode instead of being evaluated at compile
time with default rounding mode (except static object initializers).
This can change the computed result and cause performance regression.
The known correctness issues (increased ulp errors) are already fixed,
this patch fixes remaining cases of unnecessary runtime conversions.
Add float M_* macros to math.h as new GNU extension API. To avoid
conversions the new M_* macros are used and instead of casting double
literals to float, use float literals (only required if the conversion
is inexact).
The patch was tested on aarch64 where the following symbols had new
spurious conversion instructions that got fixed:
__clog10f
__gammaf_r_finite@GLIBC_2.17
__j0f_finite@GLIBC_2.17
__j1f_finite@GLIBC_2.17
__jnf_finite@GLIBC_2.17
__kernel_casinhf
__lgamma_negf
__log1pf
__y0f_finite@GLIBC_2.17
__y1f_finite@GLIBC_2.17
cacosf
cacoshf
casinhf
catanf
catanhf
clogf
gammaf_positive
Fixes bug 28713.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
s_cosf.c and s_sinf.c have
if (abstop12 (y) < abstop12 (pio4))
where abstop12 takes a float argument, but pio4 is static const double.
pio4 is used only in calls to abstop12 and never in arithmetic. Apply
-static const double pio4 = 0x1.921FB54442D18p-1;
+static const float pio4 = 0x1.921FB6p-1f;
to fix:
FAIL: math/test-float-cos
FAIL: math/test-float-sin
FAIL: math/test-float-sincos
FAIL: math/test-float32-cos
FAIL: math/test-float32-sin
FAIL: math/test-float32-sincos
when compiling with GCC 12.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
The macro TAYLOR_SIN adds the term `-0.5*da*a^2 + da` in hopes
of regaining some precision as a function of da. However the
comment says we add the term `-0.5*da*a^2 + 0.5*da` which is
different. This fix updates the comment to reflect the
code and also simplifies the calculation by replacing `a` with `x`
because they always have the same value.
Signed-off-by: Akila Welihinda <akilawelihinda@ucla.edu>
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
The error handling is moved to sysdeps/ieee754 version with no SVID
support. The compatibility symbol versions still use the wrapper with
SVID error handling around the new code. There is no new symbol version
nor compatibility code on !LIBM_SVID_COMPAT targets (e.g. riscv).
Only ia64 is unchanged, since it still uses the arch specific
__libm_error_region on its implementation.
Checked on x86_64-linux-gnu, i686-linux-gnu, and aarch64-linux-gnu.
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e9 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.05% results with an error of
1 ulp (453266 results) while the new implementation only shows
0.0001% of total (1280).
Checked on aarch64-linux-gnu and x86_64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
This implementation is based on 'An Improved Algorithm for hypot(a,b)'
by Carlos F. Borges [1] using the MyHypot3 with the following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for subnormal results.
The main advantage of the new algorithm is its precision. With a
random 1e8 input pairs in the range of [LDBL_MIN, LDBL_MAX], glibc
current implementation shows around 0.02% results with an error of
1 ulp (23158 results) while the new implementation only shows
0.0001% of total (111).
[1] https://arxiv.org/pdf/1904.09481.pdf
Improve hypot performance significantly by using fma when available. The
fma version has twice the throughput of the previous version and 70% of
the latency. The non-fma version has 30% higher throughput and 10%
higher latency.
Max ULP error is 0.949 with fma and 0.792 without fma.
Passes GLIBC testsuite.
This implementation is based on the 'An Improved Algorithm for
hypot(a,b)' by Carlos F. Borges [1] using the MyHypot3 with the
following changes:
- Handle qNaN and sNaN.
- Tune the 'widely varying operands' to avoid spurious underflow
due the multiplication and fix the return value for upwards
rounding mode.
- Handle required underflow exception for denormal results.
The main advantage of the new algorithm is its precision: with a
random 1e9 input pairs in the range of [DBL_MIN, DBL_MAX], glibc
current implementation shows around 0.34% results with an error of
1 ulp (3424869 results) while the new implementation only shows
0.002% of total (18851).
The performance result are also only slight worse than current
implementation. On x86_64 (Ryzen 5900X) with gcc 12:
Before:
"hypot": {
"workload-random": {
"duration": 3.73319e+09,
"iterations": 1.12e+08,
"reciprocal-throughput": 22.8737,
"latency": 43.7904,
"max-throughput": 4.37184e+07,
"min-throughput": 2.28361e+07
}
}
After:
"hypot": {
"workload-random": {
"duration": 3.7597e+09,
"iterations": 9.8e+07,
"reciprocal-throughput": 23.7547,
"latency": 52.9739,
"max-throughput": 4.2097e+07,
"min-throughput": 1.88772e+07
}
}
Co-Authored-By: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Checked on x86_64-linux-gnu and aarch64-linux-gnu.
[1] https://arxiv.org/pdf/1904.09481.pdf
Use a more optimized comparison for check for NaN and infinite and
add an inlined issignaling implementation for float. With gcc it
results in 2 FP comparisons.
The file Copyright is also changed to use GPL, the implementation was
completely changed by 7c10fd3515 to use double precision instead of
scaling and this change removes all the GET_FLOAT_WORD usage.
Checked on x86_64-linux-gnu.
The largest errors over the full binary32 range are after this
patch (on x86_64):
RNDN: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDZ: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDU: libm wrong by up to 9.00e+00 ulp(s) [9] for x=0x1.04c39cp+6
RNDD: libm wrong by up to 8.98e+00 ulp(s) [9] for x=0x1.4b7066p+7
Inputs that were yielding huge errors have been added to "make check".
Reviewed-by: Adhemeral Zanella <adhemerval.zanella@linaro.org>
C2X adds new <math.h> functions for floating-point maximum and
minimum, corresponding to the new operations that were added in IEEE
754-2019 because of concerns about the old operations not being
associative in the presence of signaling NaNs. fmaximum and fminimum
handle NaNs like most <math.h> functions (any NaN argument means the
result is a quiet NaN). fmaximum_num and fminimum_num handle both
quiet and signaling NaNs the way fmax and fmin handle quiet NaNs (if
one argument is a number and the other is a NaN, return the number),
but still raise "invalid" for a signaling NaN argument, making them
exceptions to the normal rule that a function with a floating-point
result raising "invalid" also returns a quiet NaN. fmaximum_mag,
fminimum_mag, fmaximum_mag_num and fminimum_mag_num are corresponding
functions returning the argument with greatest or least absolute
value. All these functions also treat +0 as greater than -0. There
are also corresponding <tgmath.h> type-generic macros.
Add these functions to glibc. The implementations use type-generic
templates based on those for fmax, fmin, fmaxmag and fminmag, and test
inputs are based on those for those functions with appropriate
adjustments to the expected results. The RISC-V maintainers might
wish to add optimized versions of fmaximum_num and fminimum_num (for
float and double), since RISC-V (F extension version 2.2 and later)
provides instructions corresponding to those functions - though it
might be at least as useful to add architecture-independent built-in
functions to GCC and teach the RISC-V back end to expand those
functions inline, which is what you generally want for functions that
can be implemented with a single instruction.
Tested for x86_64 and x86, and with build-many-glibcs.py.
Avoid defining f64xfmaf128 twice when building s_fmaf128.c.
This can be reproduced on powerpc64le whenever f128 functions do not
have IFUNC enabled, e.g. using "--with-cpu=power8 --disable-multi-arch", or
when using "-with-cpu=power9".
Fixes: b3f27d8150 ("Add narrowing fma functions")
This patch adds the narrowing fused multiply-add functions from TS
18661-1 / TS 18661-3 / C2X to glibc's libm: ffma, ffmal, dfmal,
f32fmaf64, f32fmaf32x, f32xfmaf64 for all configurations; f32fmaf64x,
f32fmaf128, f64fmaf64x, f64fmaf128, f32xfmaf64x, f32xfmaf128,
f64xfmaf128 for configurations with _Float64x and _Float128;
__f32fmaieee128 and __f64fmaieee128 aliases in the powerpc64le case
(for calls to ffmal and dfmal when long double is IEEE binary128).
Corresponding tgmath.h macro support is also added.
The changes are mostly similar to those for the other narrowing
functions previously added, especially that for sqrt, so the
description of those generally applies to this patch as well. As with
sqrt, I reused the same test inputs in auto-libm-test-in as for
non-narrowing fma rather than adding extra or separate inputs for
narrowing fma. The tests in libm-test-narrow-fma.inc also follow
those for non-narrowing fma.
The non-narrowing fma has a known bug (bug 6801) that it does not set
errno on errors (overflow, underflow, Inf * 0, Inf - Inf). Rather
than fixing this or having narrowing fma check for errors when
non-narrowing does not (complicating the cases when narrowing fma can
otherwise be an alias for a non-narrowing function), this patch does
not attempt to check for errors from narrowing fma and set errno; the
CHECK_NARROW_FMA macro is still present, but as a placeholder that
does nothing, and this missing errno setting is considered to be
covered by the existing bug rather than needing a separate open bug.
missing-errno annotations are duly added to many of the
auto-libm-test-in test inputs for fma.
This completes adding all the new functions from TS 18661-1 to glibc,
so will be followed by corresponding stdc-predef.h changes to define
__STDC_IEC_60559_BFP__ and __STDC_IEC_60559_COMPLEX__, as the support
for TS 18661-1 will be at a similar level to that for C standard
floating-point facilities up to C11 (pragmas not implemented, but
library functions done). (There are still further changes to be done
to implement changes to the types of fromfp functions from N2548.)
Tested as followed: natively with the full glibc testsuite for x86_64
(GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC
11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32
hard float, mips64 (all three ABIs, both hard and soft float). The
different GCC versions are to cover the different cases in tgmath.h
and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in
glibc headers, GCC 7 has proper _Float* support, GCC 8 adds
__builtin_tgmath).
As described in bug 28358, the round-to-odd computations used in the
libm functions that round their results to a narrower format can yield
spurious underflow exceptions in the following circumstances: the
narrowing only narrows the precision of the type and not the exponent
range (i.e., it's narrowing _Float128 to _Float64x on x86_64, x86 or
ia64), the architecture does after-rounding tininess detection (which
applies to all those architectures), the result is inexact, tiny
before rounding but not tiny after rounding (with the chosen rounding
mode) for _Float64x (which is possible for narrowing mul, div and fma,
not for narrowing add, sub or sqrt), so the underflow exception
resulting from the toward-zero computation in _Float128 is spurious
for _Float64x.
Fixed by making ROUND_TO_ODD call feclearexcept (FE_UNDERFLOW) in the
problem cases (as indicated by an extra argument to the macro); there
is never any need to preserve underflow exceptions from this part of
the computation, because the conversion of the round-to-odd value to
the narrower type will underflow in exactly the cases in which the
function should raise that exception, but it may be more efficient to
avoid the extra manipulation of the floating-point environment when
not needed.
Tested for x86_64 and x86, and with build-many-glibcs.py.
include/math.h has a mechanism to redirect internal calls to various
libm functions, that can often be inlined by the compiler, to call
non-exported __* names for those functions in the case when the calls
aren't inlined, with the redirection being disabled when
NO_MATH_REDIRECT. Add fma to the functions to which this mechanism is
applied.
At present, libm-internal fma calls (generally to __builtin_fma*
functions) are only done when it's known the call will be inlined,
with alternative code not relying on an fma operation being used in
the caller otherwise. This patch is in preparation for adding the TS
18661 / C2X narrowing fma functions to glibc; it will be natural for
the narrowing function implementations to call the underlying fma
functions unconditionally, with this either being inlined or resulting
in an __fma* call. (Using two levels of round-to-odd computation like
that, in the case where there isn't an fma hardware instruction, isn't
optimal but is certainly a lot simpler for the initial implementation
than writing different narrowing fma implementations for all the
various pairs of formats.)
Tested with build-many-glibcs.py that installed stripped shared
libraries are unchanged by the patch (using
<https://sourceware.org/pipermail/libc-alpha/2021-September/130991.html>
to fix installed library stripping in build-many-glibcs.py). Also
tested for x86_64.
This patch adds the narrowing square root functions from TS 18661-1 /
TS 18661-3 / C2X to glibc's libm: fsqrt, fsqrtl, dsqrtl, f32sqrtf64,
f32sqrtf32x, f32xsqrtf64 for all configurations; f32sqrtf64x,
f32sqrtf128, f64sqrtf64x, f64sqrtf128, f32xsqrtf64x, f32xsqrtf128,
f64xsqrtf128 for configurations with _Float64x and _Float128;
__f32sqrtieee128 and __f64sqrtieee128 aliases in the powerpc64le case
(for calls to fsqrtl and dsqrtl when long double is IEEE binary128).
Corresponding tgmath.h macro support is also added.
The changes are mostly similar to those for the other narrowing
functions previously added, so the description of those generally
applies to this patch as well. However, the not-actually-narrowing
cases (where the two types involved in the function have the same
floating-point format) are aliased to sqrt, sqrtl or sqrtf128 rather
than needing a separately built not-actually-narrowing function such
as was needed for add / sub / mul / div. Thus, there is no
__nldbl_dsqrtl name for ldbl-opt because no such name was needed
(whereas the other functions needed such a name since the only other
name for that entry point was e.g. f32xaddf64, not reserved by TS
18661-1); the headers are made to arrange for sqrt to be called in
that case instead.
The DIAG_* calls in sysdeps/ieee754/soft-fp/s_dsqrtl.c are because
they were observed to be needed in GCC 7 testing of
riscv32-linux-gnu-rv32imac-ilp32. The other sysdeps/ieee754/soft-fp/
files added didn't need such DIAG_* in any configuration I tested with
build-many-glibcs.py, but if they do turn out to be needed in more
files with some other configuration / GCC version, they can always be
added there.
I reused the same test inputs in auto-libm-test-in as for
non-narrowing sqrt rather than adding extra or separate inputs for
narrowing sqrt. The tests in libm-test-narrow-sqrt.inc also follow
those for non-narrowing sqrt.
Tested as followed: natively with the full glibc testsuite for x86_64
(GCC 11, 7, 6) and x86 (GCC 11); with build-many-glibcs.py with GCC
11, 7 and 6; cross testing of math/ tests for powerpc64le, powerpc32
hard float, mips64 (all three ABIs, both hard and soft float). The
different GCC versions are to cover the different cases in tgmath.h
and tgmath.h tests properly (GCC 6 has _Float* only as typedefs in
glibc headers, GCC 7 has proper _Float* support, GCC 8 adds
__builtin_tgmath).
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date. Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.
Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions. These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.
The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively. These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:
https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dchttps://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch is using the corresponding GCC builtin for roundevenf,
roundeven and roundevenl if the USE_FUNCTION_BUILTIN macros are defined
to one in math-use-builtins.h.
These builtin functions is supported since GCC 10.
The code of the generic implementation is not changed.
Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This patch redirect roundeven function for futhermore changes.
Signed-off-by: Shen-Ta Hsieh <ibmibmibm.tw@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
This patch replaced obsolete AC_TRY_COMPILE to AC_COMPILE_IFELSE or
AC_PREPROC_IFELSE.
It has been confirmed that GNU 'autoconf' 2.69 suppressed obsolete
warnings, updated the following files:
- configure
- sysdeps/mach/configure
- sysdeps/mach/hurd/configure
- sysdeps/s390/configure
- sysdeps/unix/sysv/linux/configure
and didn't change the following files:
- sysdeps/ieee754/ldbl-opt/configure
- sysdeps/unix/sysv/linux/powerpc/configure
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The symbol has never been exported, so no compatibility symbol is
needed. Removing this file prevents ld from creation an exported
symbol in case GLIBC_2_0 expands to a symbol version which
does not have a local: *; directive in the symbol version map file.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
These workload traces cover the whole "long double" range.
This patch was prepared with the help of Adhemerval Zanella.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
With this patch, the maximal known error for tgamma is now reduced to 9 ulps
for dbl-64, for all rounding modes. Since exhaustive testing is not possible
for dbl-64, it might be that there are still cases with an error larger than
9 ulps, but all known cases are fixed (intensive tests were done to find cases
with large errors).
Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm,
s390x, sparc, and i686).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.
The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average. Two different algorithms are used:
* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)
* for large inputs, an asymptotic formula from [1] is used
[1] Fast and Accurate Bessel Function Computation,
John Harrison, Proceedings of Arith 19, 2009.
Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).
Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
All of the isnan functions are in libc.so due to printf_fp, so move
__isnanf128 there too for consistency.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Previous commit was missing deleted files in sysdeps/ieee754/dbl-64.
Finally remove all mpa related files, headers, declarations, probes, unused
tables and update makefiles.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Finally remove all mpa related files, headers, declarations, probes, unused
tables and update makefiles.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Remove slow paths in tan. Add ULP annotations. Merge 'number' into 'mynumber'.
Remove unused entries from tan constants.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
This patch series removes all remaining slow paths and related code.
First asin/acos, tan, atan, atan2 implementations are updated, and the final
patch removes the unused mpa files, headers and probes. Passes buildmanyglibc.
Remove slow paths from asin/acos. Add ULP annotations based on previous slow
path checks (which are approximate). Update AArch64 and x86_64 libm-test-ulps.
Reviewed-By: Paul Zimmermann <Paul.Zimmermann@inria.fr>
Remove the wordsize-64 implementations by merging them into the main dbl-64
directory. The second patch just moves all wordsize-64 files and removes a
few wordsize-64 uses in comments and Implies files.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove the wordsize-64 implementations by merging them into the main dbl-64
directory. The first patch adds special cases needed for 32-bit targets
(FIX_INT_FP_CONVERT_ZERO and FIX_DBL_LONG_CONVERT_OVERFLOW) to the
wordsize-64 versions. This has no effect on 64-bit targets since they don't
define these macros.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Make the tests use TEST_COND_intel96 to decide on whether to build the
unnormal tests instead of the macro in nan-pseudo-number.h and then
drop the header inclusion. This unbreaks test runs on all
architectures that do not have ldbl-96.
Also drop the HANDLE_PSEUDO_NUMBERS macro since it is not used
anywhere.
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
Add support to treat pseudo-numbers specially and implement x86
version to consider all of them as signaling.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
asin and acos have slow paths for rounding the last bit that cause some
calls to be 500-1500x slower than average calls.
These slow paths are rare, a test of a trillion (1.000.000.000.000)
random inputs between -1 and 1 showed 32870 slow calls for acos and 4473
for asin, with most occurrences between -1.0 .. -0.9 and 0.9 .. 1.0.
The slow paths claim correct rounding and use __sin32() and __cos32()
(which compare two result candidates and return the closest one) as the
final step, with the second result candidate (res1) having a small offset
applied from res. This suggests that res and res1 are intended to be 1
ULP apart (which makes sense for rounding), barring bugs, allowing us to
pick either one and still remain within 1 ULP of the exact result.
Remove the slow paths as the accuracy is better than 1 ULP even without
them, which is enough for glibc.
Also remove code comments claiming correctly rounded results.
After slow path removal, checking the accuracy of 14.400.000.000 random
asin() and acos() inputs showed only three incorrectly rounded
(error > 0.5 ULP) results:
- asin(-0x1.ee2b43286db75p-1) (0.500002 ULP, same as before)
- asin(-0x1.f692ba202abcp-4) (0.500003 ULP, same as before)
- asin(-0x1.9915e876fc062p-1) (0.50000000001 ULP, previously exact)
The first two had the same error even before this commit, and they did
not use the slow path at all.
Checking 4934 known randomly found previously-slow-path asin inputs
shows 25 calls with incorrectly rounded results, with a maximum error of
0.500000002 ULP (for 0x1.fcd5742999ab8p-1). The previous slow-path code
rounded all these inputs correctly (error < 0.5 ULP).
The observed average speed increase was 130x.
Checking 36240 known randomly found previously-slow-path acos inputs
shows 42 calls with incorrectly rounded results, with a maximum error of
0.500000008 ULP (for 0x1.f63845056f35ep-1). The previous "exact"
slow-path code showed 34 calls with incorrectly rounded results, with the
same maximum error of 0.500000008 ULP (for 0x1.f63845056f35ep-1).
The observed average speed increase was 130x.
The functions could likely be trimmed more while keeping acceptable
accuracy, but this at least gets rid of the egregiously slow cases.
Tested on x86_64.
The tls.h inclusion is not really required and limits possible
definition on more arch specific headers.
This is a cleanup to allow inline functions on sysdep.h, more
specifically on i386 and ia64 which requires to access some tls
definitions its own.
No semantic changes expected, checked with a build against all
affected ABIs.
In TS 18661-1, getpayload had an unspecified return value for a
non-NaN argument, while C2x requires the return value -1 in that case.
This patch implements the return value of -1. I don't think this is
worth having a new symbol version that's an alias of the old one,
although occasionally we do that in such cases where the new function
semantics are a refinement of the old ones (to avoid programs relying
on the new semantics running on older glibc versions but not behaving
as intended).
Tested for x86_64 and x86; also ran math/ tests for aarch64 and
powerpc.
This patch changes the exp10f error handling semantics to only set
errno according to POSIX rules. New symbol version is introduced at
GLIBC_2.32. The old wrappers are kept for compat symbols.
There are some outliers that need special handling:
- ia64 provides an optimized implementation of exp10f that uses ia64
specific routines to set SVID compatibility. The new symbol version
is aliased to the exp10f one.
- m68k also provides an optimized implementation, and the new version
uses it instead of the sysdeps/ieee754/flt32 one.
- riscv and csky uses the generic template implementation that
does not provide SVID support. For both cases a new exp10f
version is not added, but rather the symbols version of the
generic sysdeps/ieee754/flt32 is adjusted instead.
Checked on aarch64-linux-gnu, x86_64-linux-gnu, i686-linux-gnu,
powerpc64le-linux-gnu.
It is inspired by expf and reuses its tables and internal functions.
The error checks are inlined and errno setting is in separate tail
called functions, but the wrappers are kept in this patch to handle
the _LIB_VERSION==_SVID_ case.
Double precision arithmetics is used which is expected to be faster on
most targets (including soft-float) than using single precision and it
is easier to get good precision result with it.
Result for x86_64 (i7-4790K CPU @ 4.00GHz) are:
Before new code:
"exp10f": {
"workload-spec2017.wrf (adapted)": {
"duration": 4.0414e+09,
"iterations": 1.00128e+08,
"reciprocal-throughput": 26.6818,
"latency": 54.043,
"max-throughput": 3.74787e+07,
"min-throughput": 1.85038e+07
}
With new code:
"exp10f": {
"workload-spec2017.wrf (adapted)": {
"duration": 4.11951e+09,
"iterations": 1.23968e+08,
"reciprocal-throughput": 21.0581,
"latency": 45.4028,
"max-throughput": 4.74876e+07,
"min-throughput": 2.20251e+07
}
Result for aarch64 (A72 @ 2GHz) are:
Before new code:
"exp10f": {
"workload-spec2017.wrf (adapted)": {
"duration": 4.62362e+09,
"iterations": 3.3376e+07,
"reciprocal-throughput": 127.698,
"latency": 149.365,
"max-throughput": 7.831e+06,
"min-throughput": 6.69501e+06
}
With new code:
"exp10f": {
"workload-spec2017.wrf (adapted)": {
"duration": 4.29108e+09,
"iterations": 6.6752e+07,
"reciprocal-throughput": 51.2111,
"latency": 77.3568,
"max-throughput": 1.9527e+07,
"min-throughput": 1.29271e+07
}
Checked on x86_64-linux-gnu, powerpc64le-linux-gnu, aarch64-linux-gnu,
and sparc64-linux-gnu.
This came to light when adding hard-flaot support to ARC glibc port
without hardware sqrt support causing glibc build to fail:
| ../sysdeps/ieee754/dbl-64/e_sqrt.c: In function '__ieee754_sqrt':
| ../sysdeps/ieee754/dbl-64/e_sqrt.c:58:54: error: unused variable 'ty' [-Werror=unused-variable]
| double y, t, del, res, res1, hy, z, zz, p, hx, tx, ty, s;
The reason being EMULV() macro uses the hardware provided
__builtin_fma() variant, leaving temporary variables 'p, hx, tx, hy, ty'
unused hence compiler warning and ensuing error.
The intent of the patch was to fix that error, but EMULV is pervasive
and used fair bit indirectly via othe rmacros, hence this patch.
Functionally it should not result in code gen changes and if at all
those would be better since the scope of those temporaries is greatly
reduced now
Built tested with aarch64-linux-gnu arm-linux-gnueabi arm-linux-gnueabihf hppa-linux-gnu x86_64-linux-gnu arm-linux-gnueabihf riscv64-linux-gnu-rv64imac-lp64 riscv64-linux-gnu-rv64imafdc-lp64 powerpc-linux-gnu microblaze-linux-gnu nios2-linux-gnu hppa-linux-gnu
Also as suggested by Joseph [1] used --strip and compared the libs with
and w/o patch and they are byte-for-byte unchanged (with gcc 9).
| for i in `find . -name libm-2.31.9000.so`;
| do
| echo $i; diff $i /SCRATCH/vgupta/gnu2/install/glibcs/$i ; echo $?;
| done
| ./aarch64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabi/lib/libm-2.31.9000.so
| 0
| ./x86_64-linux-gnu/lib64/libm-2.31.9000.so
| 0
| ./arm-linux-gnueabihf/lib/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imac-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./riscv64-linux-gnu-rv64imafdc-lp64/lib64/lp64/libm-2.31.9000.so
| 0
| ./powerpc-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./microblaze-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./nios2-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./hppa-linux-gnu/lib/libm-2.31.9000.so
| 0
| ./s390x-linux-gnu/lib64/libm-2.31.9000.so
[1] https://sourceware.org/pipermail/libc-alpha/2019-November/108267.html
The minimum GCC version has been raised to 6.2 for building
glibc. Therefore, follow the advice inside the implementation
and remove the GCC < 6 codepath.
Likewise, remove the hidden_proto as all internal usages should
inline now.
GCC 7.5.0 (PR94200) will refuse to compile if both -mabi=% and
-mlong-double-128 are passed on the command line. Surprisingly,
it will work happily if the latter is not. For the sake of
maintaining status quo, test for and blacklist such compilers.
Tested with a GCC 8.3.1 and GCC 7.5.0 compiler for ppc64le.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Improve the commentary to aid future developers who will stumble
upon this novel, yet not always perfect, mechanism to support
alternative formats for long double.
Likewise, rename __LONG_DOUBLE_USES_FLOAT128 to
__LDOUBLE_REDIRECTS_TO_FLOAT128_ABI now that development work
has settled down. The command used was
git grep -l __LONG_DOUBLE_USES_FLOAT128 ':!./ChangeLog*' | \
xargs sed -i 's/__LONG_DOUBLE_USES_FLOAT128/__LDOUBLE_REDIRECTS_TO_FLOAT128_ABI/g'
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
The test for enabling _Float128 or IEEE 128 long double can be
greatly simplified knowing that there is no ibm128, thus we require
no special cases, and everything is canonical.
This reverts the changes to ldbl-128ibm iscanonical.h from commit
8dbfea3a20 and extends the check
for __NO_LONG_DOUBLE_MATH to include a check for float128 redirects
to long double.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This better resembles the default linking process with the gnulibs,
and also resolves the increasingly difficult to maintain
f128-loader-link usage on powerpc64le as some libgcc symbols are
dependent on those found in the loader (ld).
Tweak the PLT bypass magic when building glibc with long double
redirects. This is made more difficult by the fact we only get
one chance to redirect functions. This happens via the public
headers.
There are roughly three classes of redirect we need to attend to
today:
1. Simple redirects, redirected via cdef macro overrides and
and new libc_hidden_ldbl_proto macro.
2. Internal usage of internal API, e.g __snprintf, which has
no direct analogue. This is bypassed directly on case-by-
case basis.
3. Double redirects, e.g sscanf and related. These require
a heavier handed approach of macro renaming to existing
symbols.
Most simple redirects are handled via 1. Ideally, the libc_*
macro would live in libc-symbols.h, but in practice the macros
needed for it to do anything useful live in cdefs.h, so they
are defined in the local override.
Notably, the internal name of the asprintf generated for ieee ldbl
redirects is renamed to work with internal prefixed usage.
This resolves the local plt usage introduced when building glibc
with ldbl == ieee128 on ppc64le.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
With mathinline removal there is no need to keep building and testing
inline math tests.
The gen-libm-tests.py support to generate ULP_I_* is removed and all
libm-test-ulps files are updated to longer have the
i{float,double,ldouble} entries. The support for no-test-inline is
also removed from both gen-auto-libm-tests and the
auto-libm-test-out-* were regenerated.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Soon, powerpc64le will need to provide extra compiler flags to the long
double files in order to continue to build using the IBM 128-bit
extended floating point type as long double.
This patch creates test-ibm128* tests from the long double function tests.
In order to explicitly test IBM long double functions -mabi=ibmlongdouble is
added to CFLAGS.
Likewise, update the test headers to correct choose ULPs when redirects
are enabled.
Co-authored-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Co-authored-by: Paul E. Murphy <murphyp@linux.vnet.ibm.com>
For lack of a more comprehensive solution, tack on the ibm128 ABI
compiler options for the totalorder{,mag}l compat tests which exist
prior to enabling this feature.
The functions in the nexttoward family are special, in the sense that
they always have a long double argument, regardless of their suffix
(i.e.: nexttowardf and nexttoward have a long double argument, besides
the float and double arguments).
On top of that, they are also special because nexttoward functions are
not part of the _FloatN API, hence __nexttowardf128 do not exist.
This patch adds 4 new function implementations for the new long double
format:
__nexttoward_to_ieee128
__nexttowardf_to_ieee128
__nexttowardieee128 (as an alias to __nextafterieee128)
Likewise, rename "long double" "_Float128" in shared ldbl-128
files to ensure correct type is used irrespective of ABI
switches.
Thank you to those who helped out with this patch:
Co-Authored-By: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Modify the headers to redirect long double functions to global __*f128
symbols or to __*ieee128 otherwise.
Most of the functions in math.h benefit from the infrastructure already
available for __LDBL_COMPAT. The only exceptions are nexttowardf and
nexttoward that need especial treatment.
Both math/bits/mathcalls-helper-functions.h and math/bits/mathcalls.h
were modified in order to provide alternative redirection destinations
that are essential to support functions that should not be redirected to
the same name pattern of the rest of the functions, i.e.: __fpclassify,
__signbit, __iseqsig, __issignaling, isinf, finite and isnan, which will
be redirected to __*f128 instead of __*ieee128 used for the rest.
Instead of attempting something more creative, just copy
the small struct from ldbl-128 and enable it when IEEE
long double is present, and update the ibm long double
variant if supported.
Likewise, provide a shadow copy of math_ldbl.h to prevent
the ibm128 specific long double header from poisoning
unrelated files due to it's usage in math_private.h.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
We want to ensure that if a second file is built to support
ieee128 long double, we built its companion implementation
with ibm128 long double. The shared object versions of these
files build correctly because the aliasing is sufficiently
complex to prevent the redirects from applying when defining
them.
However, this does not prevent the static object variants
from becoming quietly broken due to redirects. This is
intentionally avoided by marking such objects to be built
with -mabi=ibmlongdouble.
Shuffle the misplaced routines to build against the subdir
which defines the needed symbols.
A number of utility files and helper objects should also be
explicitly configured to build with the ibm128 ABI to prevent
gremlins when enabling IEEE long double.
Move the narrow math aliasing macros into a new sysdep header file
math-narrow-alias-float128.h. Then, provide an override header
to supply the necessary changes to supply the *ieee128 aliases of
these symbols.
This adds ieee128 aliases for faddl, fdivl, fmull, fsubl, daddl, ddivl,
dmull, dsubl.
After defining the long double redirections to double, __MATHDECL_1 has
to be redefined to its previous state in order to avoid redirecting all
subsequent types.