I found the i386 libm-test-ulps files needed updating (probably the
sqrt changes perturbed exactly when excess precision was used by the
compiler).
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
This patch series cleans up the many uses of __ieee754_sqrt(f/l) in GLIBC.
The goal is to enable GCC to do the inlining, and if this fails call the
__ieee754_sqrt function. This is done by internally declaring sqrt with asm
redirects. The compat symbols and sqrt wrappers need to disable the redirect.
The redirect is also disabled if there are already redirects defined when
using -ffinite-math-only.
All math functions (but not math tests, non-library code and libnldbl) are
built with -fno-math-errno which means GCC will typically inline sqrt as a
single instruction. This means targets are no longer forced to add a special
inline for sqrt.
* include/math.h (sqrt): Declare with asm redirect.
(sqrtf): Likewise.
(sqrtl): Likewise.
(sqrtf128): Likewise.
* Makeconfig: Add -fno-math-errno for libc/libm, but build testsuite,
nonlib and libnldbl with -fmath-errno.
* math/w_sqrt_compat.c: Define NO_MATH_REDIRECT.
* math/w_sqrt_template.c: Likewise.
* math/w_sqrtf_compat.c: Likewise.
* math/w_sqrtl_compat.c: Likewise.
* sysdeps/i386/fpu/w_sqrt.c: Likewise.
* sysdeps/i386/fpu/w_sqrt_compat.c: Likewise.
* sysdeps/generic/math-type-macros-float128.h: Remove math.h and
complex.h.
As discussed in bug 22902, the i386 fenv_private.h implementation has
problems for float128 for the case of 32-bit glibc built with libgcc
from GCC configured using --with-fpmath=sse.
The optimized floating-point state handling in fenv_private.h needs to
know which floating-point state - x87 or SSE - is used for each
floating-point type, so that only one state needs updating / testing
for libm code using that state internally. On 32-bit x86, the x87
rounding mode is always used for float128, but the x87 exception flags
are only used when libgcc is built using x87 floating-point
arithmetic; if libgcc is built for SSE arithmetic, the SSE exception
flags are used.
The choice of arithmetic with which libgcc is built is independent of
that with which glibc is built. Thus, since glibc cannot tell the
choice used in libgcc, the default implementations of
libc_feholdexcept_setroundf128 and libc_feupdateenv_testf128 (which
use the <fenv.h> functions, thus using both x87 and SSE state on
processors that have both) need to be used; this patch updates the
code accordingly.
Tested for 32-bit x86; HJ reports testing in the --with-fpmath=sse
case.
[BZ #22902]
* sysdeps/i386/fpu/fenv_private.h [!__x86_64__]
(libc_feholdexcept_setroundf128): New macro.
[!__x86_64__] (libc_feupdateenv_testf128): Likewise.
Remove the slow paths from pow. Like several other double precision math
functions, pow is exactly rounded. This is not required from math functions
and causes major overheads as it requires multiple fallbacks using higher
precision arithmetic if a result is close to 0.5ULP. Ridiculous slowdowns
of up to 100000x have been reported when the highest precision path triggers.
All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
The worst case error is ~0.506ULP. A simple test over a few hundred million
values shows pow is 10% faster on average. This fixes BZ #13932.
[BZ #13932]
* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
* benchtests/pow-inputs: Update comment for slow path cases.
* manual/probes.texi (slowpow_p10): Delete removed probe.
(slowpow_p10): Likewise.
* math/Makefile: Remove halfulp.c and slowpow.c.
* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/generic/math_private.h (__exp1): Remove error argument.
(__halfulp): Remove.
(__slowpow): Remove.
* sysdeps/i386/fpu/halfulp.c: Delete file.
* sysdeps/i386/fpu/slowpow.c: Likewise.
* sysdeps/ia64/fpu/halfulp.c: Likewise.
* sysdeps/ia64/fpu/slowpow.c: Likewise.
* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
improve comments and add error analysis.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
(power1): Remove function:
(log1): Remove error argument, add error analysis.
(my_log2): Remove function.
* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.
This patch adds the narrowing add functions from TS 18661-1 to glibc's
libm: fadd, faddl, daddl, f32addf64, f32addf32x, f32xaddf64 for all
configurations; f32addf64x, f32addf128, f64addf64x, f64addf128,
f32xaddf64x, f32xaddf128, f64xaddf128 for configurations with
_Float64x and _Float128; __nldbl_daddl for ldbl-opt. As discussed for
the build infrastructure patch, tgmath.h support is deliberately
deferred, and FP_FAST_* macros are not applicable without optimized
function implementations.
Function implementations are added for all relevant pairs of formats
(including certain cases of a format and itself where more than one
type has that format). The main implementations use round-to-odd, or
a trivial computation in the case where both formats are the same or
where the wider format is IBM long double (in which case we don't
attempt to be correctly rounding). The sysdeps/ieee754/soft-fp
implementations use soft-fp, and are used automatically for
configurations without exceptions and rounding modes by virtue of
existing Implies files. As previously discussed, optimized versions
for particular architectures are possible, but not included.
i386 gets a special version of f32xaddf64 to avoid problems with
double rounding (similar to the existing fdim version), since this
function must round just once without an intermediate rounding to long
double. (No such special version is needed for any other function,
because the nontrivial functions use round-to-odd, which does the
intermediate computation with the rounding mode set to round-to-zero,
and double rounding is OK except in round-to-nearest mode, so is OK
for that intermediate round-to-zero computation.) mul and div will
need slightly different special versions for i386 (using round-to-odd
on long double instead of precision control) because of the
possibility of inexact intermediate results in the subnormal range for
double.
To reduce duplication among the different function implementations,
math-narrow.h gets macros CHECK_NARROW_ADD, NARROW_ADD_ROUND_TO_ODD
and NARROW_ADD_TRIVIAL.
In the trivial cases and for any architecture-specific optimized
implementations, the overhead of the errno setting might be
significant, but I think that's best handled through compiler built-in
functions rather than providing separate no-errno versions in glibc
(and likewise there are no __*_finite entry points for these function
provided, __*_finite effectively being no-errno versions at present in
most cases).
Tested for x86_64 and x86, with both GCC 6 and GCC 7. Tested for
mips64 (all three ABIs, both hard and soft float) and powerpc with GCC
7. Tested with build-many-glibcs.py with both GCC 6 and GCC 7.
* math/Makefile (libm-narrow-fns): Add add.
(libm-test-funcs-narrow): Likewise.
* math/Versions (GLIBC_2.28): Add narrowing add functions.
* math/bits/mathcalls-narrow.h (add): Use __MATHCALL_NARROW .
* math/gen-auto-libm-tests.c (test_functions): Add add.
* math/math-narrow.h (CHECK_NARROW_ADD): New macro.
(NARROW_ADD_ROUND_TO_ODD): Likewise.
(NARROW_ADD_TRIVIAL): Likewise.
* sysdeps/ieee754/float128/float128_private.h (__faddl): New
macro.
(__daddl): Likewise.
* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Add fadd and
dadd.
(CFLAGS-nldbl-dadd.c): New variable.
(CFLAGS-nldbl-fadd.c): Likewise.
* sysdeps/ieee754/ldbl-opt/Versions (GLIBC_2.28): Add
__nldbl_daddl.
* sysdeps/ieee754/ldbl-opt/nldbl-compat.h (__nldbl_daddl): New
prototype.
* manual/arith.texi (Misc FP Arithmetic): Document fadd, faddl,
daddl, fMaddfN, fMaddfNx, fMxaddfN and fMxaddfNx.
* math/auto-libm-test-in: Add tests of add.
* math/auto-libm-test-out-narrow-add: New generated file.
* math/libm-test-narrow-add.inc: New file.
* sysdeps/i386/fpu/s_f32xaddf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_f32xaddf64.c: Likewise.
* sysdeps/ieee754/dbl-64/s_fadd.c: Likewise.
* sysdeps/ieee754/float128/s_f32addf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64addf128.c: Likewise.
* sysdeps/ieee754/float128/s_f64xaddf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_f64xaddf128.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_daddl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_faddl.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-dadd.c: Likewise.
* sysdeps/ieee754/ldbl-opt/nldbl-fadd.c: Likewise.
* sysdeps/ieee754/soft-fp/s_daddl.c: Likewise.
* sysdeps/ieee754/soft-fp/s_fadd.c: Likewise.
* sysdeps/ieee754/soft-fp/s_faddl.c: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Update.
* sysdeps/mach/hurd/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/riscv/rv64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
TS 18661-1 defines libm functions that carry out an operation (+ - * /
sqrt fma) on their arguments and return a result rounded to a
(usually) narrower type, as if the original result were computed to
infinite precision and then rounded directly to the result type
without any intermediate rounding to the argument type. For example,
fadd, faddl and daddl for addition. These are the last remaining TS
18661-1 functions left to be added to glibc. TS 18661-3 extends this
to corresponding functions for _FloatN and _FloatNx types.
As functions parametrized by two rather than one varying
floating-point types, these functions require infrastructure in glibc
that was not required for previous libm functions. This patch
provides such infrastructure - excluding test support, and actual
function implementations, which will be in subsequent patches.
Declaring the functions uses a header bits/mathcalls-narrow.h, which
is included many times, for each relevant pair of types. This will
end up containing macro calls of the form
__MATHCALL_NARROW (__MATHCALL_NAME (add), __MATHCALL_REDIR_NAME (add), 2);
for each family of narrowing functions. (The structure of this macro
call, with the calls to __MATHCALL_NAME and __MATHCALL_REDIR_NAME
there rather than in the definition of __MATHCALL_NARROW, arises from
the names such as "add" *not* themselves being reserved identifiers -
meaning it's necessary to avoid any indirection that would result in a
user-defined "add" macro being expanded.) Whereas for existing
functions declaring long double functions is disabled if _LIBC in the
case where they alias double functions, to facilitate defining the
long double functions as aliases of the double ones, there is no such
logic for the narrowing functions in this patch. Rather, the files
defining such functions are expected to use #define to hide the
original declarations of the alias names, to avoid errors about
defining aliases with incompatible types.
math/Makefile support is added for building the functions (listed in
libm-narrow-fns, currently empty) for all relevant pairs of types. An
internal header math-narrow.h is added for macros shared between
multiple function implementations - currently a ROUND_TO_ODD macro to
facilitate writing functions using the round-to-odd implementation
approach, and alias macros to create all the required function
aliases. libc_feholdexcept_setroundf128 and libc_feupdateenv_testf128
are added for use when required (only for x86_64). float128_private.h
support is added for ldbl-128 narrowing functions to be used for
_Float128.
Certain things are specifically omitted from this patch and the
immediate followups. tgmath.h support is deferred; there remain
unresolved questions about how the type-generic macros for these
functions are supposed to work, especially in the case of arguments of
integer type. The math.h / bits/mathcalls-narrow.h logic, and the
logic for determining what functions / aliases to define, will need
some adjustments to support the sqrt and fma functions, where
e.g. f32xsqrtf64 can just be an alias for sqrt rather than a separate
function. TS 18661-1 defines FP_FAST_* macros but no support is
included for defining them (they won't in general be true without
architecture-specific optimized function versions).
For each of the function groups (add sub mul div sqrt fma) there are
always six functions present (e.g. fadd, faddl, daddl, f32addf64,
f32addf32x, f32xaddf64). When _Float64x and _Float128 are supported,
there are seven more (e.g. f32addf64x, f32addf128, f64addf64x,
f64addf128, f32xaddf64x, f32xaddf128, f64xaddf128). In addition, in
the ldbl-opt case there are function names such as __nldbl_daddl (an
alias for f32xaddf64, which is not a reserved name in TS 18661-1, only
in TS 18661-3), for calls to daddl to be mapped to in the
-mlong-double-64 case. (Calls to faddl just get mapped to fadd, and
for sqrt and fma there won't be __nldbl_* functions because dsqrtl and
dfmal can just be mapped to sqrt and fma with -mlong-double-64.)
While there are six or thirteen functions present in each group (plus
__nldbl_* names only as an ABI, not an API), not all are distinct;
they fall in various groups of aliases. There are two distinct
versions built if long double has the same format as double; four if
they have distinct formats but there is no _Float64x or _Float128
support; five if long double has binary128 format; seven when
_Float128 is distinct from long double.
Architecture-specific optimized versions are possible, but not
included in my patches. For example, IA64 generally supports
narrowing the result of most floating-point instructions; Power ISA
2.07 (POWER8) supports double values as arguments to float
instructions, with the results narrowed as expected; Power ISA 3
(POWER9) supports round-to-odd for float128 instructions, so meaning
that approach can be used without needing to set and restore the
rounding mode and test "inexact". I intend to leave any such
optimized versions to the architecture maintainers. Generally in such
cases it would also make sense for calls to these functions to be
expanded inline (given -fno-math-errno); I put a suggestion for TS
18661-1 built-in functions at <https://gcc.gnu.org/wiki/SummerOfCode>.
Tested for x86_64 (this patch in isolation, as well as testing for
various configurations in conjunction with further patches).
* math/bits/mathcalls-narrow.h: New file.
* include/bits/mathcalls-narrow.h: Likewise.
* math/math-narrow.h: Likewise.
* math/math.h (__MATHCALL_NARROW_ARGS_1): New macro.
(__MATHCALL_NARROW_ARGS_2): Likewise.
(__MATHCALL_NARROW_ARGS_3): Likewise.
(__MATHCALL_NARROW_NORMAL): Likewise.
(__MATHCALL_NARROW_REDIR): Likewise.
(__MATHCALL_NARROW): Likewise.
[__GLIBC_USE (IEC_60559_BFP_EXT)]: Repeatedly include
<bits/mathcalls-narrow.h> with _Mret_, _Marg_ and __MATHCALL_NAME
defined.
[__GLIBC_USE (IEC_60559_TYPES_EXT)]: Likewise.
* math/Makefile (headers): Add bits/mathcalls-narrow.h.
(libm-narrow-fns): New variable.
(libm-narrow-types-basic): Likewise.
(libm-narrow-types-ldouble-yes): Likewise.
(libm-narrow-types-float128-yes): Likewise.
(libm-narrow-types-float128-alias-yes): Likewise.
(libm-narrow-types): Likewise.
(libm-routines): Add narrowing functions.
* sysdeps/i386/fpu/fenv_private.h [__x86_64__]
(libc_feholdexcept_setroundf128): New macro.
[__x86_64__] (libc_feupdateenv_testf128): Likewise.
* sysdeps/ieee754/float128/float128_private.h: Include
<math/math-narrow.h>.
[libc_feholdexcept_setroundf128] (libc_feholdexcept_setroundl):
Undefine and redefine.
[libc_feupdateenv_testf128] (libc_feupdateenv_testl): Likewise.
(libm_alias_float_ldouble): Undefine and redefine.
(libm_alias_double_ldouble): Likewise.
In commit cba595c350 and commit
f81ddabffd, ABI compatibility with
applications was broken by increasing the size of the on-stack
allocated __pthread_unwind_buf_t beyond the oringal size.
Applications only have the origianl space available for
__pthread_unwind_register, and __pthread_unwind_next to use,
any increase in the size of __pthread_unwind_buf_t causes these
functions to write beyond the original structure into other
on-stack variables leading to segmentation faults in common
applications like vlc. The only workaround is to version those
functions which operate on the old sized objects, but this must
happen in glibc 2.28.
Thank you to Andrew Senkevich, H.J. Lu, and Aurelien Jarno, for
submitting reports and tracking the issue down.
The commit reverts the above mentioned commits and testing on
x86_64 shows that the ABI compatibility is restored. A tst-cleanup1
regression test linked with an older glibc now passes when run
with the newly built glibc. Previously a tst-cleanup1 linked with
an older glibc would segfault when run with an affected glibc build.
Tested on x86_64 with no regressions.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.
Typical performance gains is typically around 5x when measured on
Sparc s7 for common values between exp(1) and exp(40).
Using the glibc perf tests on sparc,
sparc (nsec) x86 (nsec)
old new old new
max 17629 395 5173 144
min 399 54 15 13
mean 5317 200 1349 23
The extreme max times for the old (ieee754) exp are due to the
multiprecision computation in the old algorithm when the true value is
very near 0.5 ulp away from an value representable in double
precision. The new algorithm does not take special measures for those
cases. The current glibc exp perf tests overrepresent those values.
Informal testing suggests approximately one in 200 cases might
invoke the high cost computation. The performance advantage of the new
algorithm for other values is still large but not as large as indicated
by the chart above.
Glibc correctness tests for exp() and expf() were run. Within the
test suite 3 input values were found to cause 1 bit differences (ulp)
when "FE_TONEAREST" rounding mode is set. No differences in exp() were
seen for the tested values for the other rounding modes.
Typical example:
exp(-0x1.760cd2p+0) (-1.46113312244415283203125)
new code: 2.31973271630014299393707e-01 0x1.db14cd799387ap-3
old code: 2.31973271630014271638132e-01 0x1.db14cd7993879p-3
exp = 2.31973271630014285508337 (high precision)
Old delta: off by 0.49 ulp
New delta: off by 0.51 ulp
In addition, because ieee754_exp() is used by other routines, cexp()
showed test results with very small imaginary input values where the
imaginary portion of the result was off by 3 ulp when in upward
rounding mode, but not in the other rounding modes. For x86, tgamma
showed a few values where the ulp increased to 6 (max ulp for tgamma
is 5). Sparc tgamma did not show these failures. I presume the tgamma
differences are due to compiler optimization differences within the
gamma function.The gamma function is known to be difficult to compute
accurately.
* sysdeps/ieee754/dbl-64/e_exp.c: Include <math-svid-compat.h> and
<errno.h>. Include "eexp.tbl".
(half): New constant.
(one): Likewise.
(__ieee754_exp): Rewrite.
(__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/eexp.tbl: New file.
* sysdeps/ieee754/dbl-64/slowexp.c: Remove file.
* sysdeps/i386/fpu/slowexp.c: Likewise.
* sysdeps/ia64/fpu/slowexp.c: Likewise.
* sysdeps/m68k/m680x0/fpu/slowexp.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-avx.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma.c: Likewise.
* sysdeps/x86_64/fpu/multiarch/slowexp-fma4.c: Likewise.
* sysdeps/generic/math_private.h (__slowexp): Remove prototype.
* sysdeps/ieee754/dbl-64/e_pow.c: Remove mention of slowexp.c in
comment.
* sysdeps/powerpc/power4/fpu/Makefile [$(subdir) = math]
(CPPFLAGS-slowexp.c): Remove variable.
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
Remove slowexp-fma, slowexp-fma4 and slowexp-avx.
(CFLAGS-slowexp-fma.c): Remove variable.
(CFLAGS-slowexp-fma4.c): Likewise.
(CFLAGS-slowexp-avx.c): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-avx.c (__slowexp): Do not
define as macro.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma.c (__slowexp): Likewise.
* sysdeps/x86_64/fpu/multiarch/e_exp-fma4.c (__slowexp): Likewise.
* math/Makefile (type-double-routines): Remove slowexp.
* manual/probes.texi (slowexp_p6): Remove.
(slowexp_p32): Likewise.
On x86, padding in struct __jmp_buf_tag is used for shadow stack pointer
to support Shadow Stack in Intel Control-flow Enforcemen Technology.
cancel_jmp_buf has been updated to include saved_mask so that it is as
large as struct __jmp_buf_tag. We must suport the old cancel_jmp_buf
in existing binaries. Since symbol versioning doesn't work on
cancel_jmp_buf, feature_1 is added to tcbhead_t so that setjmp and
longjmp can check if shadow stack is enabled. NB: Shadow stack is
enabled only if all modules are shadow stack enabled.
[BZ #22563]
* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
* sysdeps/i386/nptl/tls.h (tcbhead_t): Add feature_1.
* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Rename __glibc_unused1
to feature_1.
Static PIE extends address space layout randomization to static
executables. It provides additional security hardening benefits at
the cost of some memory and performance.
Dynamic linker, ld.so, is a standalone program which can be loaded at
any address. This patch adds a configure option, --enable-static-pie,
to embed the part of ld.so in static executable to create static position
independent executable (static PIE). A static PIE is similar to static
executable, but can be loaded at any address without help from a dynamic
linker. When --enable-static-pie is used to configure glibc, libc.a is
built as PIE and all static executables, including tests, are built as
static PIE. The resulting libc.a can be used together with GCC 8 or
above to build static PIE with the compiler option, -static-pie. But
GCC 8 isn't required to build glibc with --enable-static-pie. Only GCC
with PIE support is needed. When an older GCC is used to build glibc
with --enable-static-pie, proper input files are passed to linker to
create static executables as static PIE, together with "-z text" to
prevent dynamic relocations in read-only segments, which are not allowed
in static PIE.
The following changes are made for static PIE:
1. Add a new function, _dl_relocate_static_pie, to:
a. Get the run-time load address.
b. Read the dynamic section.
c. Perform dynamic relocations.
Dynamic linker also performs these steps. But static PIE doesn't load
any shared objects.
2. Call _dl_relocate_static_pie at entrance of LIBC_START_MAIN in
libc.a. crt1.o, which is used to create dynamic and non-PIE static
executables, is updated to include a dummy _dl_relocate_static_pie.
rcrt1.o is added to create static PIE, which will link in the real
_dl_relocate_static_pie. grcrt1.o is also added to create static PIE
with -pg. GCC 8 has been updated to support rcrt1.o and grcrt1.o for
static PIE.
Static PIE can work on all architectures which support PIE, provided:
1. Target must support accessing of local functions without dynamic
relocations, which is needed in start.S to call __libc_start_main with
function addresses of __libc_csu_init, __libc_csu_fini and main. All
functions in static PIE are local functions. If PIE start.S can't reach
main () defined in a shared object, the code sequence:
pass address of local_main to __libc_start_main
...
local_main:
tail call to main via PLT
can be used.
2. start.S is updated to check PIC instead SHARED for PIC code path and
avoid dynamic relocation, when PIC is defined and SHARED isn't defined,
to support static PIE.
3. All assembly codes are updated check PIC instead SHARED for PIC code
path to avoid dynamic relocations in read-only sections.
4. All assembly codes are updated check SHARED instead PIC for static
symbol name.
5. elf_machine_load_address in dl-machine.h are updated to support static
PIE.
6. __brk works without TLS nor dynamic relocations in read-only section
so that it can be used by __libc_setup_tls to initializes TLS in static
PIE.
NB: When glibc is built with GCC defaulted to PIE, libc.a is compiled
with -fPIE, regardless if --enable-static-pie is used to configure glibc.
When glibc is configured with --enable-static-pie, libc.a is compiled
with -fPIE, regardless whether GCC defaults to PIE or not. The same
libc.a can be used to build both static executable and static PIE.
There is no need for separate PIE copy of libc.a.
On x86-64, the normal static sln:
text data bss dec hex filename
625425 8284 5456 639165 9c0bd elf/sln
the static PIE sln:
text data bss dec hex filename
657626 20636 5392 683654 a6e86 elf/sln
The code size is increased by 5% and the binary size is increased by 7%.
Linker requirements to build glibc with --enable-static-pie:
1. Linker supports --no-dynamic-linker to remove PT_INTERP segment from
static PIE.
2. Linker can create working static PIE. The x86-64 linker needs the
fix for
https://sourceware.org/bugzilla/show_bug.cgi?id=21782
The i386 linker needs to be able to convert "movl main@GOT(%ebx), %eax"
to "leal main@GOTOFF(%ebx), %eax" if main is defined locally.
Binutils 2.29 or above are OK for i686 and x86-64. But linker status for
other targets need to be verified.
3. Linker should resolve undefined weak symbols to 0 in static PIE:
https://sourceware.org/bugzilla/show_bug.cgi?id=22269
4. Many ELF backend linkers incorrectly check bfd_link_pic for TLS
relocations, which should check bfd_link_executable instead:
https://sourceware.org/bugzilla/show_bug.cgi?id=22263
Tested on aarch64, i686 and x86-64.
Using GCC 7 and binutils master branch, build-many-glibcs.py with
--enable-static-pie with all patches for static PIE applied have the
following build successes:
PASS: glibcs-aarch64_be-linux-gnu build
PASS: glibcs-aarch64-linux-gnu build
PASS: glibcs-armeb-linux-gnueabi-be8 build
PASS: glibcs-armeb-linux-gnueabi build
PASS: glibcs-armeb-linux-gnueabihf-be8 build
PASS: glibcs-armeb-linux-gnueabihf build
PASS: glibcs-arm-linux-gnueabi build
PASS: glibcs-arm-linux-gnueabihf build
PASS: glibcs-arm-linux-gnueabihf-v7a build
PASS: glibcs-arm-linux-gnueabihf-v7a-disable-multi-arch build
PASS: glibcs-m68k-linux-gnu build
PASS: glibcs-microblazeel-linux-gnu build
PASS: glibcs-microblaze-linux-gnu build
PASS: glibcs-mips64el-linux-gnu-n32 build
PASS: glibcs-mips64el-linux-gnu-n32-nan2008 build
PASS: glibcs-mips64el-linux-gnu-n32-nan2008-soft build
PASS: glibcs-mips64el-linux-gnu-n32-soft build
PASS: glibcs-mips64el-linux-gnu-n64 build
PASS: glibcs-mips64el-linux-gnu-n64-nan2008 build
PASS: glibcs-mips64el-linux-gnu-n64-nan2008-soft build
PASS: glibcs-mips64el-linux-gnu-n64-soft build
PASS: glibcs-mips64-linux-gnu-n32 build
PASS: glibcs-mips64-linux-gnu-n32-nan2008 build
PASS: glibcs-mips64-linux-gnu-n32-nan2008-soft build
PASS: glibcs-mips64-linux-gnu-n32-soft build
PASS: glibcs-mips64-linux-gnu-n64 build
PASS: glibcs-mips64-linux-gnu-n64-nan2008 build
PASS: glibcs-mips64-linux-gnu-n64-nan2008-soft build
PASS: glibcs-mips64-linux-gnu-n64-soft build
PASS: glibcs-mipsel-linux-gnu build
PASS: glibcs-mipsel-linux-gnu-nan2008 build
PASS: glibcs-mipsel-linux-gnu-nan2008-soft build
PASS: glibcs-mipsel-linux-gnu-soft build
PASS: glibcs-mips-linux-gnu build
PASS: glibcs-mips-linux-gnu-nan2008 build
PASS: glibcs-mips-linux-gnu-nan2008-soft build
PASS: glibcs-mips-linux-gnu-soft build
PASS: glibcs-nios2-linux-gnu build
PASS: glibcs-powerpc64le-linux-gnu build
PASS: glibcs-powerpc64-linux-gnu build
PASS: glibcs-tilegxbe-linux-gnu-32 build
PASS: glibcs-tilegxbe-linux-gnu build
PASS: glibcs-tilegx-linux-gnu-32 build
PASS: glibcs-tilegx-linux-gnu build
PASS: glibcs-tilepro-linux-gnu build
and the following build failures:
FAIL: glibcs-alpha-linux-gnu build
elf/sln is failed to link due to:
assertion fail bfd/elf64-alpha.c:4125
This is caused by linker bug and/or non-PIC code in PIE libc.a.
FAIL: glibcs-hppa-linux-gnu build
elf/sln is failed to link due to:
collect2: fatal error: ld terminated with signal 11 [Segmentation fault]
https://sourceware.org/bugzilla/show_bug.cgi?id=22537
FAIL: glibcs-ia64-linux-gnu build
elf/sln is failed to link due to:
collect2: fatal error: ld terminated with signal 11 [Segmentation fault]
FAIL: glibcs-powerpc-linux-gnu build
FAIL: glibcs-powerpc-linux-gnu-soft build
FAIL: glibcs-powerpc-linux-gnuspe build
FAIL: glibcs-powerpc-linux-gnuspe-e500v1 build
elf/sln is failed to link due to:
ld: read-only segment has dynamic relocations.
This is caused by linker bug and/or non-PIC code in PIE libc.a. See:
https://sourceware.org/bugzilla/show_bug.cgi?id=22264
FAIL: glibcs-powerpc-linux-gnu-power4 build
elf/sln is failed to link due to:
findlocale.c:96:(.text+0x22c): @local call to ifunc memchr
This is caused by linker bug and/or non-PIC code in PIE libc.a.
FAIL: glibcs-s390-linux-gnu build
elf/sln is failed to link due to:
collect2: fatal error: ld terminated with signal 11 [Segmentation fault], core dumped
assertion fail bfd/elflink.c:14299
This is caused by linker bug and/or non-PIC code in PIE libc.a.
FAIL: glibcs-sh3eb-linux-gnu build
FAIL: glibcs-sh3-linux-gnu build
FAIL: glibcs-sh4eb-linux-gnu build
FAIL: glibcs-sh4eb-linux-gnu-soft build
FAIL: glibcs-sh4-linux-gnu build
FAIL: glibcs-sh4-linux-gnu-soft build
elf/sln is failed to link due to:
ld: read-only segment has dynamic relocations.
This is caused by linker bug and/or non-PIC code in PIE libc.a. See:
https://sourceware.org/bugzilla/show_bug.cgi?id=22263
Also TLS code sequence in SH assembly syscalls in glibc doesn't match TLS
code sequence expected by ld:
https://sourceware.org/bugzilla/show_bug.cgi?id=22270
FAIL: glibcs-sparc64-linux-gnu build
FAIL: glibcs-sparcv9-linux-gnu build
FAIL: glibcs-tilegxbe-linux-gnu build
FAIL: glibcs-tilegxbe-linux-gnu-32 build
FAIL: glibcs-tilegx-linux-gnu build
FAIL: glibcs-tilegx-linux-gnu-32 build
FAIL: glibcs-tilepro-linux-gnu build
elf/sln is failed to link due to:
ld: read-only segment has dynamic relocations.
This is caused by linker bug and/or non-PIC code in PIE libc.a. See:
https://sourceware.org/bugzilla/show_bug.cgi?id=22263
[BZ #19574]
* INSTALL: Regenerated.
* Makeconfig (real-static-start-installed-name): New.
(pic-default): Updated for --enable-static-pie.
(pie-default): New for --enable-static-pie.
(default-pie-ldflag): Likewise.
(+link-static-before-libc): Replace $(DEFAULT-LDFLAGS-$(@F))
with $(if $($(@F)-no-pie),$(no-pie-ldflag),$(default-pie-ldflag)).
Replace $(static-start-installed-name) with
$(real-static-start-installed-name).
(+prectorT): Updated for --enable-static-pie.
(+postctorT): Likewise.
(CFLAGS-.o): Add $(pie-default).
(CFLAGS-.op): Likewise.
* NEWS: Mention --enable-static-pie.
* config.h.in (ENABLE_STATIC_PIE): New.
* configure.ac (--enable-static-pie): New configure option.
(have-no-dynamic-linker): New LIBC_CONFIG_VAR.
(have-static-pie): Likewise.
Enable static PIE if linker supports --no-dynamic-linker.
(ENABLE_STATIC_PIE): New AC_DEFINE.
(enable-static-pie): New LIBC_CONFIG_VAR.
* configure: Regenerated.
* csu/Makefile (omit-deps): Add r$(start-installed-name) and
gr$(start-installed-name) for --enable-static-pie.
(extra-objs): Likewise.
(install-lib): Likewise.
(extra-objs): Add static-reloc.o and static-reloc.os
($(objpfx)$(start-installed-name)): Also depend on
$(objpfx)static-reloc.o.
($(objpfx)r$(start-installed-name)): New.
($(objpfx)g$(start-installed-name)): Also depend on
$(objpfx)static-reloc.os.
($(objpfx)gr$(start-installed-name)): New.
* csu/libc-start.c (LIBC_START_MAIN): Call _dl_relocate_static_pie
in libc.a.
* csu/libc-tls.c (__libc_setup_tls): Add main_map->l_addr to
initimage.
* csu/static-reloc.c: New file.
* elf/Makefile (routines): Add dl-reloc-static-pie.
(elide-routines.os): Likewise.
(DEFAULT-LDFLAGS-tst-tls1-static-non-pie): Removed.
(tst-tls1-static-non-pie-no-pie): New.
* elf/dl-reloc-static-pie.c: New file.
* elf/dl-support.c (_dl_get_dl_main_map): New function.
* elf/dynamic-link.h (ELF_DURING_STARTUP): Also check
STATIC_PIE_BOOTSTRAP.
* elf/get-dynamic-info.h (elf_get_dynamic_info): Likewise.
* gmon/Makefile (tests): Add tst-gmon-static-pie.
(tests-static): Likewise.
(DEFAULT-LDFLAGS-tst-gmon-static): Removed.
(tst-gmon-static-no-pie): New.
(CFLAGS-tst-gmon-static-pie.c): Likewise.
(CRT-tst-gmon-static-pie): Likewise.
(tst-gmon-static-pie-ENV): Likewise.
(tests-special): Likewise.
($(objpfx)tst-gmon-static-pie.out): Likewise.
(clean-tst-gmon-static-pie-data): Likewise.
($(objpfx)tst-gmon-static-pie-gprof.out): Likewise.
* gmon/tst-gmon-static-pie.c: New file.
* manual/install.texi: Document --enable-static-pie.
* sysdeps/generic/ldsodefs.h (_dl_relocate_static_pie): New.
(_dl_get_dl_main_map): Likewise.
* sysdeps/i386/configure.ac: Check if linker supports static PIE.
* sysdeps/x86_64/configure.ac: Likewise.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure: Likewise.
* sysdeps/mips/Makefile (ASFLAGS-.o): Add $(pie-default).
(ASFLAGS-.op): Likewise.
This patch continues filling out TS 18661-3 support by adding *f64 and
*f32x function aliases, supporting _Float64 and _Float32x, as aliases
for double functions. These types are supported for all glibc
configurations. The API corresponds exactly to that for _Float128 and
_Float64x. _Float32 aliases to float functions remain to be added in
subsequent patches to complete this process (then there are a few
miscellaneous functions in TS 18661-3 to implement that aren't simply
versions of existing functions for new types).
The patch enables the feature in bits/floatn-common.h, adds symbol
versions and documentation with updates to ABI baselines, and arranges
for the libm functions for the new types to be tested. As with the
_Float64x changes there are some x86 ulps updates because of header
inlines not used for the new types (and one other change to the
non-multiarch libm-test-ulps, which I suppose comes from using a
different compiler version / configuration from when it was last
regenerated).
Tested for x86_64 and x86, and with build-many-glibcs.py, with both
GCC 6 and GCC 7.
* bits/floatn-common.h (__HAVE_FLOAT64): Define to 1.
(__HAVE_FLOAT32X): Likewise.
* manual/math.texi (Mathematics): Document support for _Float64
and _Float32x.
* math/Makefile (test-types): Add float64 and float32x.
* math/Versions (GLIBC_2.27): Add _Float64 and _Float32x
functions.
* stdlib/Versions (GLIBC_2.27): Likewise.
* wcsmbs/Versions (GLIBC_2.27): Likewise.
* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist:
Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libc.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libc.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libc.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilepro/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes i386 libm function implementations use
libm_alias_float (or libm_alias_float_other in cases where the main
symbol name is defined with versioned_symbol) to define function
aliases.
Tested with build-many-glibcs.py for all its i386 configurations that
installed stripped shared libraries are unchanged by the patch, as
well as running the full glibc testsuite for i686.
* sysdeps/i386/fpu/s_asinhf.S: Include <libm-alias-float.h>.
(asinhf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_atanf.S: Include <libm-alias-float.h>.
(atanf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_cbrtf.S: Include <libm-alias-float.h>.
(cbrtf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_ceilf.S: Include <libm-alias-float.h>.
(ceilf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_copysignf.S: Include <libm-alias-float.h>.
(copysignf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_expm1f.S: Include <libm-alias-float.h>.
(expm1f): Define using libm_alias_float.
* sysdeps/i386/fpu/s_fabsf.S: Include <libm-alias-float.h>.
(fabsf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_floorf.S: Include <libm-alias-float.h>.
(floorf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_fmaxf.S: Include <libm-alias-float.h>.
(fmaxf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_fminf.S: Include <libm-alias-float.h>.
(fminf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_frexpf.S: Include <libm-alias-float.h>.
(frexpf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_llrintf.S: Include <libm-alias-float.h>.
(llrintf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_logbf.S: Include <libm-alias-float.h>.
(logbf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_lrintf.S: Include <libm-alias-float.h>.
(lrintf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_nearbyintf.S: Include <libm-alias-float.h>.
(nearbyintf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_remquof.S: Include <libm-alias-float.h>.
(remquof): Define using libm_alias_float.
* sysdeps/i386/fpu/s_rintf.S: Include <libm-alias-float.h>.
(rintf): Define using libm_alias_float.
* sysdeps/i386/fpu/s_truncf.S: Include <libm-alias-float.h>.
(truncf): Define using libm_alias_float.
* sysdeps/i386/i686/fpu/multiarch/e_exp2f.c: Include
<libm-alias-float.h>.
(exp2f): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/i386/i686/fpu/multiarch/e_expf.c: Include
<libm-alias-float.h>.
(expf): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/i386/i686/fpu/multiarch/e_log2f.c: Include
<libm-alias-float.h>.
(log2f): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/i386/i686/fpu/multiarch/e_logf.c: Include
<libm-alias-float.h>.
(logf): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/i386/i686/fpu/multiarch/e_powf.c: Include
<libm-alias-float.h>.
(powf): Define using libm_alias_float, or libm_alias_float_other
if [SHARED].
* sysdeps/i386/i686/fpu/multiarch/s_cosf.c: Include
<libm-alias-float.h>.
(cosf): Define using libm_alias_float.
* sysdeps/i386/i686/fpu/multiarch/s_sincosf.c: Include
<libm-alias-float.h>.
(sincosf): Define using libm_alias_float.
* sysdeps/i386/i686/fpu/multiarch/s_sinf.c: Include
<libm-alias-float.h>.
(sinf): Define using libm_alias_float.
* sysdeps/i386/i686/fpu/s_fmaxf.S: Include <libm-alias-float.h>.
(fmaxf): Define using libm_alias_float.
* sysdeps/i386/i686/fpu/s_fminf.S: Include <libm-alias-float.h>.
(fminf): Define using libm_alias_float.
* sysdeps/i386/i686/multiarch/s_fmaf.c: Include
<libm-alias-float.h>.
(fmaf): Define using libm_alias_float.
Continuing the preparation for additional _FloatN / _FloatNx function
aliases, this patch makes i386 libm function implementations use
libm_alias_double to define function aliases.
Tested with build-many-glibcs.py for all its i386 configurations that
installed stripped shared libraries are unchanged by the patch, as
well as running the full glibc testsuite for i686.
* sysdeps/i386/fpu/s_asinh.S: Include <libm-alias-double.h>.
(asinh): Define using libm_alias_double.
* sysdeps/i386/fpu/s_atan.S: Include <libm-alias-double.h>.
(atan): Define using libm_alias_double.
* sysdeps/i386/fpu/s_cbrt.S: Include <libm-alias-double.h>.
(cbrt): Define using libm_alias_double.
* sysdeps/i386/fpu/s_ceil.S: Include <libm-alias-double.h>.
(ceil): Define using libm_alias_double.
* sysdeps/i386/fpu/s_copysign.S: Include <libm-alias-double.h>.
(copysign): Define using libm_alias_double.
* sysdeps/i386/fpu/s_expm1.S: Include <libm-alias-double.h>.
(expm1): Define using libm_alias_double.
* sysdeps/i386/fpu/s_fabs.S: Include <libm-alias-double.h>.
(fabs): Define using libm_alias_double.
* sysdeps/i386/fpu/s_fdim.c: Include <libm-alias-double.h>.
(fdim): Define using libm_alias_double.
* sysdeps/i386/fpu/s_floor.S: Include <libm-alias-double.h>.
(floor): Define using libm_alias_double.
* sysdeps/i386/fpu/s_fmax.S: Include <libm-alias-double.h>.
(fmax): Define using libm_alias_double.
* sysdeps/i386/fpu/s_fmin.S: Include <libm-alias-double.h>.
(fmin): Define using libm_alias_double.
* sysdeps/i386/fpu/s_frexp.S: Include <libm-alias-double.h>.
(frexp): Define using libm_alias_double.
* sysdeps/i386/fpu/s_llrint.S: Include <libm-alias-double.h>.
(llrint): Define using libm_alias_double.
* sysdeps/i386/fpu/s_logb.S: Include <libm-alias-double.h>.
(logb): Define using libm_alias_double.
* sysdeps/i386/fpu/s_lrint.S: Include <libm-alias-double.h>.
(lrint): Define using libm_alias_double.
* sysdeps/i386/fpu/s_nearbyint.S: Include <libm-alias-double.h>.
(nearbyint): Define using libm_alias_double.
* sysdeps/i386/fpu/s_remquo.S: Include <libm-alias-double.h>.
(remquo): Define using libm_alias_double.
* sysdeps/i386/fpu/s_rint.S: Include <libm-alias-double.h>.
(rint): Define using libm_alias_double.
* sysdeps/i386/fpu/s_trunc.S: Include <libm-alias-double.h>.
(trunc): Define using libm_alias_double.
* sysdeps/i386/i686/fpu/s_fmax.S: Include <libm-alias-double.h>.
(fmax): Define using libm_alias_double.
* sysdeps/i386/i686/fpu/s_fmin.S: Include <libm-alias-double.h>.
(fmin): Define using libm_alias_double.
* sysdeps/i386/i686/multiarch/s_fma.c: Include <libm-alias-double.h>.
(fma): Define using libm_alias_double.
This patch continues filling out TS 18661-3 support by adding *f64x
function aliases on platforms with _Float64x support. (It so happens
the set of such platforms is exactly the same as the set of platforms
with _Float128 support, although on x86_64, x86 and ia32 the _Float64x
format is Intel extended rather than binary128.) The API provided
corresponds exactly to that provided for _Float128, mostly coming from
TS 18661-3. As these functions always alias those for another type
(long double, _Float128 or both), __* function names are not provided,
as in other cases of alias types.
Given the preparation done in previous patches, this one just enables
the feature via Makeconfig and bits/floatn.h, adds symbol versions,
and updates documentation and ABI baselines. The symbol versions are
present unconditionally as GLIBC_2.27 in the relevant Versions files,
as it's OK for those to specify versions for functions that may not be
present in some configurations; no additional complexity is needed
unless in future some configuration gains support for this type that
didn't have such support in 2.27. The Makeconfig additions for ia64
and x86 aren't strictly needed, as those configurations also get
float64x-alias-fcts definitions from
sysdeps/ieee754/float128/Makeconfig, but still seem appropriate given
that _Float64x is not _Float128 for those configurations.
A libm-test-ulps update for x86 is included. This is because
bits/mathinline.h does not have _Float64x support added and for two
functions the use of out-of-line functions results in increased ulps
(ifloat64x shares ulps with ildouble / ifloat128 as appropriate).
Given that we'd like generally to eliminate bits/mathinline.h
optimizations, preferring to have such optimizations in GCC instead,
it seems reasonable not to add such support there for new types. GCC
support for _FloatN / _FloatNx built-in functions is limited, but has
been improved in GCC 8, and at some point I hope the full set of libm
built-in functions in GCC, and other optimizations with
per-floating-type aspects, will be enabled for all _FloatN / _FloatNx
types.
Tested for x86_64 and x86, and with build-many-glibcs.py, with both
GCC 6 and GCC 7.
* sysdeps/ia64/Makeconfig (float64x-alias-fcts): New variable.
* sysdeps/ieee754/float128/Makeconfig (float64x-alias-fcts):
Likewise.
* sysdeps/ieee754/ldbl-128/Makeconfig (float64x-alias-fcts):
Likewise.
* sysdeps/x86/Makeconfig: New file.
* bits/floatn-common.h (__HAVE_FLOAT64X): Remove macro.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* bits/floatn.h (__HAVE_FLOAT64X): New macro.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/ia64/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h (__HAVE_FLOAT64X):
Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/mips/ieee754/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/powerpc/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/x86/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* manual/math.texi (Mathematics): Document support for _Float64x.
* math/Versions (GLIBC_2.27): Add _Float64x functions.
* stdlib/Versions (GLIBC_2.27): Likewise.
* wcsmbs/Versions (GLIBC_2.27): Likewise.
* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
This patch continues the preparation for additional _FloatN / _FloatNx
function aliases by using libm_alias_ldouble for sysdeps/i386/fpu long
double functions, so that they can have _Float64x aliases added in
future.
Tested for x86_64 (which includes some of these implementations) and
x86, including build-many-glibcs.py tests that installed stripped
shared libraries are unchanged by the patch.
* sysdeps/i386/fpu/e_expl.S: Include <libm-alias-ldouble.h>.
[USE_AS_EXPM1L] (expm1l): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_asinhl.S: Include <libm-alias-ldouble.h>.
(asinhl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_atanl.c: Include <libm-alias-ldouble.h>.
(atanl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_cbrtl.S: Include <libm-alias-ldouble.h>.
(cbrtl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_ceill.S: Include <libm-alias-ldouble.h>.
(ceill): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_copysignl.S: Include <libm-alias-ldouble.h>.
(copysignl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_fabsl.S: Include <libm-alias-ldouble.h>.
(fabsl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_floorl.S: Include <libm-alias-ldouble.h>.
(floorl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_fmaxl.S: Include <libm-alias-ldouble.h>.
(fmaxl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_fminl.S: Include <libm-alias-ldouble.h>.
(fminl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_frexpl.S: Include <libm-alias-ldouble.h>.
(frexpl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_llrintl.S: Include <libm-alias-ldouble.h>.
(llrintl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_logbl.c: Include <libm-alias-ldouble.h>.
(logbl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_lrintl.S: Include <libm-alias-ldouble.h>.
(lrintl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_nearbyintl.S: Include <libm-alias-ldouble.h>.
(nearbyintl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_nextafterl.c: Include <libm-alias-ldouble.h>.
(nextafterl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_remquol.S: Include <libm-alias-ldouble.h>.
(remquol): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_rintl.c: Include <libm-alias-ldouble.h>.
(rintl): Define using libm_alias_ldouble.
* sysdeps/i386/fpu/s_truncl.S: Include <libm-alias-ldouble.h>.
(truncl): Define using libm_alias_ldouble.
* sysdeps/i386/i686/fpu/s_fmaxl.S: Include <libm-alias-ldouble.h>.
(fmaxl): Define using libm_alias_ldouble.
* sysdeps/i386/i686/fpu/s_fminl.S: Include <libm-alias-ldouble.h>.
(fminl): Define using libm_alias_ldouble.
This patch adds a new build test to check for internal fields
offsets for user visible internal field. Although currently
the only field which is statically initialized to a non zero value
is pthread_mutex_t.__data.__kind value, the tests also check the
offset of __kind, __spins, __elision (if supported), and __list
internal member. A internal header (pthread-offset.h) is added
to each major ABI with the reference value.
Checked on x86_64-linux-gnu and with a build check for all affected
ABIs (aarch64-linux-gnu, alpha-linux-gnu, arm-linux-gnueabihf,
hppa-linux-gnu, i686-linux-gnu, ia64-linux-gnu, m68k-linux-gnu,
microblaze-linux-gnu, mips64-linux-gnu, mips64-n32-linux-gnu,
mips-linux-gnu, powerpc64le-linux-gnu, powerpc-linux-gnu,
s390-linux-gnu, s390x-linux-gnu, sh4-linux-gnu, sparc64-linux-gnu,
sparcv9-linux-gnu, tilegx-linux-gnu, tilegx-linux-gnu-x32,
tilepro-linux-gnu, x86_64-linux-gnu, and x86_64-linux-x32).
* nptl/pthreadP.h (ASSERT_PTHREAD_STRING,
ASSERT_PTHREAD_INTERNAL_OFFSET): New macro.
* nptl/pthread_mutex_init.c (__pthread_mutex_init): Add build time
checks for internal pthread_mutex_t offsets.
* sysdeps/aarch64/nptl/pthread-offsets.h
(__PTHREAD_MUTEX_NUSERS_OFFSET, __PTHREAD_MUTEX_KIND_OFFSET,
__PTHREAD_MUTEX_SPINS_OFFSET, __PTHREAD_MUTEX_ELISION_OFFSET,
__PTHREAD_MUTEX_LIST_OFFSET): New macro.
* sysdeps/alpha/nptl/pthread-offsets.h: Likewise.
* sysdeps/arm/nptl/pthread-offsets.h: Likewise.
* sysdeps/hppa/nptl/pthread-offsets.h: Likewise.
* sysdeps/i386/nptl/pthread-offsets.h: Likewise.
* sysdeps/ia64/nptl/pthread-offsets.h: Likewise.
* sysdeps/m68k/nptl/pthread-offsets.h: Likewise.
* sysdeps/microblaze/nptl/pthread-offsets.h: Likewise.
* sysdeps/mips/nptl/pthread-offsets.h: Likewise.
* sysdeps/nios2/nptl/pthread-offsets.h: Likewise.
* sysdeps/powerpc/nptl/pthread-offsets.h: Likewise.
* sysdeps/s390/nptl/pthread-offsets.h: Likewise.
* sysdeps/sh/nptl/pthread-offsets.h: Likewise.
* sysdeps/sparc/nptl/pthread-offsets.h: Likewise.
* sysdeps/tile/nptl/pthread-offsets.h: Likewise.
* sysdeps/x86_64/nptl/pthread-offsets.h: Likewise.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add a new header file, sysdeps/x86/sysdep.h, for common assembly code
macros between i386 and x86-64. Tested on i686 and x86-64. There are
no differences in outputs of "readelf -a" and "objdump -dw" on all glibc
shared objects before and after the patch.
* sysdeps/i386/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
of <sysdeps/generic/sysdep.h>.
(ALIGNARG): Removed.
(ASM_SIZE_DIRECTIVE): Likewise.
(ENTRY): Likewise.
(END): Likewise.
(ENTRY_CHK): Likewise.
(END_CHK): Likewise.
(syscall_error): Likewise.
(mcount): Likewise.
(PSEUDO_END): Likewise.
(L): Likewise.
(atom_text_section): Likewise.
* sysdeps/x86/sysdep.h: New file.
* sysdeps/x86_64/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
of <sysdeps/generic/sysdep.h>.
(ALIGNARG): Removed.
(ASM_SIZE_DIRECTIVE): Likewise.
(ENTRY): Likewise.
(END): Likewise.
(ENTRY_CHK): Likewise.
(END_CHK): Likewise.
(syscall_error): Likewise.
(mcount): Likewise.
(PSEUDO_END): Likewise.
(L): Likewise.
(atom_text_section): Likewise.
i586 strcpy.S used a clever trick with LEA to implement jump table:
/* ECX has the last 2 bits of the address of source - 1. */
andl $3, %ecx
call 2f
2: popl %edx
/* 0xb is the distance between 2: and 1:. */
leal 0xb(%edx,%ecx,8), %ecx
jmp *%ecx
.align 8
1: /* ECX == 0 */
orb (%esi), %al
jz L(end)
stosb
xorl %eax, %eax
incl %esi
/* ECX == 1 */
orb (%esi), %al
jz L(end)
stosb
xorl %eax, %eax
incl %esi
/* ECX == 2 */
orb (%esi), %al
jz L(end)
stosb
xorl %eax, %eax
incl %esi
/* ECX == 3 */
L(1): movl (%esi), %ecx
leal 4(%esi),%esi
This fails if there are instruction length changes before L(1):. This
patch replaces it with conditional branches:
cmpb $2, %cl
je L(Src2)
ja L(Src3)
cmpb $1, %cl
je L(Src1)
L(Src0):
which have similar performance and work with any instruction lengths.
Tested on i586 and i686 with and without --disable-multi-arch.
[BZ #22353]
* sysdeps/i386/i586/strcpy.S (STRCPY): Use conditional branches.
(1): Renamed to ...
(L(Src0)): This.
(L(Src1)): New.
(L(Src2)): Likewise.
(L(1)): Renamed to ...
(L(Src3)): This.
This patch replaces i386 assembly versions of e_exp2f with generic
e_exp2f.c. For workload-spec2017.wrf, on Nehalem, it improves
performance by:
Before After Improvement
reciprocal-throughput 112.996 40.0454 182%
latency 126.581 54.4479 132%
On Skylake, it improves performance by:
Before After Improvement
reciprocal-throughput 113.14 39.447 186%
latency 136.068 55.684 144%
On IvyBridge with --disable-multi-arch, it improves performance by:
Before After Improvement
reciprocal-throughput 132.521 40.3759 228%
latency 145.791 58.4587 149%
* sysdeps/i386/fpu/e_exp2f.S: Removed.
* sysdeps/i386/fpu/w_exp2f.c: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Updated for generic e_exp2f.c.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/Makefile (libm-sysdep_routines):
Add e_exp2f-sse2.
(CFLAGS-e_exp2f-sse2.c): New.
* sysdeps/i386/i686/fpu/multiarch/e_exp2f-sse2.c: New file.
* sysdeps/i386/i686/fpu/multiarch/e_exp2f.c: Likewise.
On i386, when multi-arch is enabled, all external functions must be
called via PIC PLT in PIE, which requires setting up EBX register,
since they may be IFUNC functions.
* config.h.in (NO_HIDDEN_EXTERN_FUNC_IN_PIE): New.
* include/libc-symbols.h (__hidden_proto_hiddenattr): Add check
for PIC and NO_HIDDEN_EXTERN_FUNC_IN_PIE.
* sysdeps/i386/configure.ac (NO_HIDDEN_EXTERN_FUNC_IN_PIE): New
AC_DEFINE if multi-arch is enabled.
* sysdeps/i386/configure: Regenerated.
Don't use "leal main@GOTOFF(%ebx), %eax" since main may be in a
shared object. Linker will convert "movl main@GOT(%ebx), %eax"
to "leal main@GOTOFF(%ebx), %eax" if main is defined locally.
* sysdeps/i386/start.S: Replace "leal main@GOT(%ebx), %eax" with
"movl main@GOTOFF(%ebx), %eax".
This code is used in non-PIE static executable and static PIE. It checks
if _DYNAMIC is undefined before using it to compute load address. But
not all targets can convert access _DYNAMIC via GOT, which needs dynamic
relocation, to PC-relative at link-time.
* sysdeps/i386/dl-machine.h (elf_machine_load_address): Don't
allow undefined _DYNAMIC in PIE libc.a.
* sysdeps/x86_64/dl-machine.h (elf_machine_load_address):
Likewse.
The new generic logf, log2f and powf code don't need wrappers any more,
they set errno inline so only use the wrappers on targets that need it.
* sysdeps/ieee754/flt-32/e_log2f.c (__log2f): Define without wrapper.
* sysdeps/ieee754/flt-32/e_logf.c (__logf): Likewise
* sysdeps/ieee754/flt-32/e_powf.c (__powf): Likewise
* sysdeps/ieee754/flt-32/w_log2f.c: New file.
* sysdeps/ieee754/flt-32/w_logf.c: New file.
* sysdeps/ieee754/flt-32/w_powf.c: New file.
* sysdeps/i386/fpu/w_log2f.c: New file.
* sysdeps/i386/fpu/w_logf.c: New file.
* sysdeps/i386/fpu/w_powf.c: New file.
* sysdeps/m68k/m680x0/fpu/w_log2f.c: New file.
* sysdeps/m68k/m680x0/fpu/w_logf.c: New file.
* sysdeps/m68k/m680x0/fpu/w_powf.c: New file.
The new generic expf and exp2f code don't need wrappers any more, they
set errno inline, so only use the wrappers on targets that need it.
(If the wrapper is needed, then the top level wrapper code is included,
otherwise empty w_exp*f.c is used to suppress the wrapper.)
A powerpc64 expf implementation includes the expf c code directly which
needed some changes.
* sysdeps/ieee754/flt-32/e_exp2f.c (__exp2f): Define without wrapper.
* sysdeps/ieee754/flt-32/e_expf.c (__expf): Likewise
* sysdeps/ieee754/flt-32/w_exp2f.c: New file.
* sysdeps/ieee754/flt-32/w_expf.c: New file.
* sysdeps/powerpc/powerpc64/fpu/multiarch/e_expf-ppc64.c: Update for
the new expf code.
* sysdeps/powerpc/powerpc64/fpu/multiarch/w_expf.c: New file.
* sysdeps/powerpc/powerpc64/power8/fpu/w_expf.c: New file.
* sysdeps/m68k/m680x0/fpu/w_exp2f.c: New file.
* sysdeps/m68k/m680x0/fpu/w_expf.c: New file.
* sysdeps/i386/fpu/w_exp2f.c: New file.
* sysdeps/i386/fpu/w_expf.c: New file.
* sysdeps/i386/i686/fpu/multiarch/w_expf.c: New file.
* sysdeps/x86_64/fpu/w_expf.c: New file.
without wrapper on aarch64:
powf reciprocal-throughput: 4.2x faster
powf latency: 2.6x faster
old worst-case error: 1.11 ulp
new worst-case error: 0.82 ulp
aarch64 .text size: -780 bytes
aarch64 .rodata size: +144 bytes
powf(x,y) is implemented as exp2(y*log2(x)) with the same algorithms
that are used in exp2f and log2f, except that the log2f polynomial is
larger for extra precision and its output (and exp2f input) may be
scaled by a power of 2 (POWF_SCALE) to simplify the argument reduction
step of exp2 (possible when efficient round and convert toint operation
is available).
The special case handling tries to minimize the checks in the hot path.
When the input of exp2_inline is checked, int arithmetics is used as
that was faster on the tested aarch64 cores.
* math/Makefile (type-float-routines): Add e_powf_log2_data.
* sysdeps/ieee754/flt-32/e_powf.c: New implementation.
* sysdeps/ieee754/flt-32/e_powf_log2_data.c: New file.
* sysdeps/ieee754/flt-32/math_config.h (__powf_log2_data): Define.
(issignalingf_inline): Likewise.
(POWF_LOG2_TABLE_BITS): Likewise.
(POWF_LOG2_POLY_ORDER): Likewise.
(POWF_SCALE_BITS): Likewise.
(POWF_SCALE): Likewise.
* sysdeps/i386/fpu/e_powf_log2_data.c: New file.
* sysdeps/ia64/fpu/e_powf_log2_data.c: New file.
* sysdeps/m68k/m680x0/fpu/e_powf_log2_data.c: New file.
Similar to the new logf: double precision arithmetics and a small
lookup table is used. The argument reduction step is the same as in
the new logf.
without wrapper on aarch64:
log2f reciprocal-throughput: 2.3x faster
log2f latency: 2.1x faster
old worst case error: 1.72 ulp
new worst case error: 0.75 ulp
aarch64 .text size: -252 bytes
aarch64 .rodata size: +244 bytes
* math/Makefile (type-float-routines): Add e_log2f_data.
* sysdeps/ieee754/flt-32/e_log2f.c: New implementation.
* sysdeps/ieee754/flt-32/e_log2f_data.c: New file.
* sysdeps/ieee754/flt-32/math_config.h (__log2f_data): Define.
(LOG2F_TABLE_BITS, LOG2F_POLY_ORDER): Define.
* sysdeps/i386/fpu/e_log2f_data.c: New file.
* sysdeps/ia64/fpu/e_log2f_data.c: New file.
* sysdeps/m68k/m680x0/fpu/e_log2f_data.c: New file.
without wrapper on aarch64:
logf reciprocal-throughput: 2.2x faster
logf latency: 1.9x faster
old worst case error: 0.89 ulp
new worst case error: 0.82 ulp
aarch64 .text size: -356 bytes
aarch64 .rodata size: +240 bytes
Uses double precision arithmetics and a lookup table to allow smaller
polynomial and avoid the use of division.
Data is in a separate translation unit with fixed layout to prevent the
compiler generating suboptimal literal access.
Errors are handled inline according to POSIX rules, but this patch
keeps the wrapper with SVID compatible error handling.
Needs libm-test-ulps adjustment for clogf in non-nearest rounding mode.
* math/Makefile (type-float-routines): Add e_logf_data.
* sysdeps/ieee754/flt-32/e_logf.c: New implementation.
* sysdeps/ieee754/flt-32/e_logf_data.c: New file.
* sysdeps/ieee754/flt-32/math_config.h (__logf_data): Define.
(LOGF_TABLE_BITS, LOGF_POLY_ORDER): Define.
* sysdeps/i386/fpu/e_logf_data.c: New file.
* sysdeps/ia64/fpu/e_logf_data.c: New file.
* sysdeps/m68k/m680x0/fpu/e_logf_data.c: New file.
When --enable-static-pie is used to build static PIE, _DYNAMIC is used
to compute the load address of static PIE. But _DYNAMIC is undefined
when creating static executable. This patch makes _DYNAMIC weak in PIE
libc.a so that it can be undefined.
* sysdeps/i386/dl-machine.h (elf_machine_load_address): Allow
undefined _DYNAMIC in PIE libc.a.
* sysdeps/x86_64/dl-machine.h (elf_machine_load_address):
Likewse.
Based on new expf and exp2f code from
https://github.com/ARM-software/optimized-routines/
with wrapper on aarch64:
expf reciprocal-throughput: 2.3x faster
expf latency: 1.7x faster
without wrapper on aarch64:
expf reciprocal-throughput: 3.3x faster
expf latency: 1.7x faster
without wrapper on aarch64:
exp2f reciprocal-throughput: 2.8x faster
exp2f latency: 1.3x faster
libm.so size on aarch64:
.text size: -152 bytes
.rodata size: -1740 bytes
expf/exp2f worst case nearest rounding error: 0.502 ulp
worst case non-nearest rounding error: 1 ulp
Error checks are inline and errno setting is in separate tail called
functions, but the wrappers are kept in this patch to handle the
_LIB_VERSION==_SVID_ case. (So e.g. errno is set twice for expf calls
and once for __expf_finite calls on targets where the new code is used.)
Double precision arithmetics is used which is expected to be faster on
most targets (including soft-float) than using single precision and it
is easier to get good precision result with it.
Const data is kept in a separate translation unit which complicates
maintenance a bit, but is expected to give good code for literal loads
on most targets and allows sharing data across expf, exp2f and powf.
(This data is disabled on i386, m68k and ia64 which have their own
expf, exp2f and powf code.)
Some details may need target specific tweaks:
- best convert and round to int operation in the arg reduction may be
different across targets.
- code was optimized on fma target, optimal polynomial eval may be
different without fma.
- gcc does not always generate good code for fp bit representation
access via unions or it may be inherently slow on some targets.
The libm-test-ulps will need adjustment because..
- The argument reduction ideally uses nearest rounded rint, but that is
not efficient on most targets, so the polynomial can get evaluated on a
wider interval in non-nearest rounding mode making 1 ulp errors common
in that case.
- The polynomial is evaluated such that it may have 1 ulp error on
negative tiny inputs with upward rounding.
* math/Makefile (type-float-routines): Add math_errf and e_exp2f_data.
* sysdeps/aarch64/fpu/math_private.h (TOINT_INTRINSICS): Define.
(roundtoint, converttoint): Likewise.
* sysdeps/ieee754/flt-32/e_expf.c: New implementation.
* sysdeps/ieee754/flt-32/e_exp2f.c: New implementation.
* sysdeps/ieee754/flt-32/e_exp2f_data.c: New file.
* sysdeps/ieee754/flt-32/math_config.h: New file.
* sysdeps/ieee754/flt-32/math_errf.c: New file.
* sysdeps/ieee754/flt-32/t_exp2f.h: Remove.
* sysdeps/i386/fpu/e_exp2f_data.c: New file.
* sysdeps/i386/fpu/math_errf.c: New file.
* sysdeps/ia64/fpu/e_exp2f_data.c: New file.
* sysdeps/ia64/fpu/math_errf.c: New file.
* sysdeps/m68k/m680x0/fpu/e_exp2f_data.c: New file.
* sysdeps/m68k/m680x0/fpu/math_errf.c: New file.
Add unwind info to __libc_start_main so that unwinding continues one
extra level to _start. Similarly add unwind info to backtrace.
Given many targets require this, do this in a general way.
* csu/Makefile: Add -funwind-tables to libc-start.c.
* debug/Makefile: Add -funwind-tables to backtrace.c.
* sysdeps/aarch64/Makefile: Remove CFLAGS-backtrace.c.
* sysdeps/arm/Makefile: Likewise.
* sysdeps/i386/Makefile: Likewise.
* sysdeps/m68k/Makefile: Likewise.
* sysdeps/mips/Makefile: Likewise.
* sysdeps/nios2/Makefile: Likewise.
* sysdeps/sh/Makefile: Likewise.
* sysdeps/sparc/Makefile: Likewise.
The initial obsoletion of SVID libm error handling left the old
wrappers and __kernel_standard still being used for new ports and
static linking, just with macro definitions of _LIB_VERSION and
matherr that meant symbols with those names were never actually used
and the code for different error handling variants could be optimized
out.
This patch cleans things up further by eliminating the
__kernel_standard use for new ports and static linking. Now, the old
wrappers no longer generate any code in the !LIBM_SVID_COMPAT case,
while the new errno-only wrappers that were added for float128 support
are now also used for float, double and long double in that case.
The changes are generally straightforward. The w_scalb*_compat
wrappers continue to be used (scalb is obsolescent in the sense of not
being supported for float128, but is present in supported standards -
the 2001 edition of POSIX and earlier XSI versions - so remains
supported for static linking and new ports, as do the float and long
double variants that are existing GNU extensions). Those wrappers
would only call __kernel_standard in the _LIB_VERSION == _SVID_ case.
Since we would like to be able to compile most of glibc without
optimization, relying on a static function whose only use is under an
if (0) condition being optimized away to avoid an undefined
__kernel_standard reference may not be a good idea. Thus, the
relevant code in the scalb wrappers has LIBM_SVID_COMPAT conditionals
added to guarantee it's not built at all in the case where
__kernel_standard does not exist.
Just as i386 has its own w_sqrt_compat.c, so w_sqrt.c is also added.
ia64 gets dummy w_*.c to prevent those files being built where they
would conflict with the ia64 libm, as with its existing w_*_compat.c.
Conditions disabling code for !LIBM_SVID_COMPAT are needed in both the
math/ wrappers and in the long double wrappers in ldbl-opt (to avoid
them setting up aliases and symbol versions for undefined symbols). I
hope that future cleanups to how libm function aliases and symbol
versioning are done will eliminate the need for most of the ldbl-opt
wrappers.
Tested for x86_64 and x86, and with build-many-glibcs.py.
* sysdeps/generic/math-type-macros-double.h: Include
<math-svid-compat.h>.
(__USE_WRAPPER_TEMPLATE): Define to !LIBM_SVID_COMPAT.
* sysdeps/generic/math-type-macros-float.h: Include
<math-svid-compat.h>.
(__USE_WRAPPER_TEMPLATE): Define to !LIBM_SVID_COMPAT.
* sysdeps/generic/math-type-macros-ldouble.h: Include
<math-svid-compat.h>.
(__USE_WRAPPER_TEMPLATE): Define to !LIBM_SVID_COMPAT.
* math/lgamma-compat.h (BUILD_LGAMMA): Include LIBM_SVID_COMPAT
condition.
* math/w_acos_compat.c: Condition contents on [LIBM_SVID_COMPAT].
* math/w_acosf_compat.c: Likewise.
* math/w_acosh_compat.c: Likewise.
* math/w_acoshf_compat.c: Likewise.
* math/w_acoshl_compat.c: Likewise.
* math/w_acosl_compat.c: Likewise.
* math/w_asin_compat.c: Likewise.
* math/w_asinf_compat.c: Likewise.
* math/w_asinl_compat.c: Likewise.
* math/w_atan2_compat.c: Likewise.
* math/w_atan2f_compat.c: Likewise.
* math/w_atan2l_compat.c: Likewise.
* math/w_atanh_compat.c: Likewise.
* math/w_atanhf_compat.c: Likewise.
* math/w_atanhl_compat.c: Likewise.
* math/w_cosh_compat.c: Likewise.
* math/w_coshf_compat.c: Likewise.
* math/w_coshl_compat.c: Likewise.
* math/w_exp10_compat.c: Likewise.
* math/w_exp10f_compat.c: Likewise.
* math/w_exp10l_compat.c: Likewise.
* math/w_exp2_compat.c: Likewise.
* math/w_exp2f_compat.c: Likewise.
* math/w_exp2l_compat.c: Likewise.
* math/w_fmod_compat.c: Likewise.
* math/w_fmodf_compat.c: Likewise.
* math/w_fmodl_compat.c: Likewise.
* math/w_hypot_compat.c: Likewise.
* math/w_hypotf_compat.c: Likewise.
* math/w_hypotl_compat.c: Likewise.
* math/w_j0_compat.c: Likewise.
* math/w_j0f_compat.c: Likewise.
* math/w_j0l_compat.c: Likewise.
* math/w_j1_compat.c: Likewise.
* math/w_j1f_compat.c: Likewise.
* math/w_j1l_compat.c: Likewise.
* math/w_jn_compat.c: Likewise.
* math/w_jnf_compat.c: Likewise.
* math/w_jnl_compat.c: Likewise.
* math/w_lgamma_r_compat.c: Likewise.
* math/w_lgammaf_r_compat.c: Likewise.
* math/w_lgammal_r_compat.c: Likewise.
* math/w_log10_compat.c: Likewise.
* math/w_log10f_compat.c: Likewise.
* math/w_log10l_compat.c: Likewise.
* math/w_log2_compat.c: Likewise.
* math/w_log2f_compat.c: Likewise.
* math/w_log2l_compat.c: Likewise.
* math/w_log_compat.c: Likewise.
* math/w_logf_compat.c: Likewise.
* math/w_logl_compat.c: Likewise.
* math/w_pow_compat.c: Likewise.
* math/w_powf_compat.c: Likewise.
* math/w_powl_compat.c: Likewise.
* math/w_remainder_compat.c: Likewise.
* math/w_remainderf_compat.c: Likewise.
* math/w_remainderl_compat.c: Likewise.
* math/w_sinh_compat.c: Likewise.
* math/w_sinhf_compat.c: Likewise.
* math/w_sinhl_compat.c: Likewise.
* math/w_sqrt_compat.c: Likewise.
* math/w_sqrtf_compat.c: Likewise.
* math/w_sqrtl_compat.c: Likewise.
* math/w_tgamma_compat.c: Likewise.
* math/w_tgammaf_compat.c: Likewise.
* math/w_tgammal_compat.c: Likewise.
* math/w_scalb_compat.c (sysv_scalb): Condition definition on
[LIBM_SVID_COMPAT].
(__scalb): Condition call to sysv_scalb on [LIBM_SVID_COMPAT].
* math/w_scalbf_compat.c (sysv_scalbf): Condition definition on
[LIBM_SVID_COMPAT].
(__scalbf): Condition call to sysv_scalbf on [LIBM_SVID_COMPAT].
* math/w_scalbl_compat.c (sysv_scalbl): Condition definition on
[LIBM_SVID_COMPAT].
(__scalbl): Condition call to sysv_scalbl on [LIBM_SVID_COMPAT].
* sysdeps/i386/fpu/w_sqrt.c: New file.
* sysdeps/ia64/fpu/w_acos.c: Likewise.
* sysdeps/ia64/fpu/w_acosf.c: Likewise.
* sysdeps/ia64/fpu/w_acosh.c: Likewise.
* sysdeps/ia64/fpu/w_acoshf.c: Likewise.
* sysdeps/ia64/fpu/w_acoshl.c: Likewise.
* sysdeps/ia64/fpu/w_acosl.c: Likewise.
* sysdeps/ia64/fpu/w_asin.c: Likewise.
* sysdeps/ia64/fpu/w_asinf.c: Likewise.
* sysdeps/ia64/fpu/w_asinl.c: Likewise.
* sysdeps/ia64/fpu/w_atan2.c: Likewise.
* sysdeps/ia64/fpu/w_atan2f.c: Likewise.
* sysdeps/ia64/fpu/w_atan2l.c: Likewise.
* sysdeps/ia64/fpu/w_atanh.c: Likewise.
* sysdeps/ia64/fpu/w_atanhf.c: Likewise.
* sysdeps/ia64/fpu/w_atanhl.c: Likewise.
* sysdeps/ia64/fpu/w_cosh.c: Likewise.
* sysdeps/ia64/fpu/w_coshf.c: Likewise.
* sysdeps/ia64/fpu/w_coshl.c: Likewise.
* sysdeps/ia64/fpu/w_exp.c: Likewise.
* sysdeps/ia64/fpu/w_exp10.c: Likewise.
* sysdeps/ia64/fpu/w_exp10f.c: Likewise.
* sysdeps/ia64/fpu/w_exp10l.c: Likewise.
* sysdeps/ia64/fpu/w_exp2.c: Likewise.
* sysdeps/ia64/fpu/w_exp2f.c: Likewise.
* sysdeps/ia64/fpu/w_exp2l.c: Likewise.
* sysdeps/ia64/fpu/w_expf.c: Likewise.
* sysdeps/ia64/fpu/w_expl.c: Likewise.
* sysdeps/ia64/fpu/w_fmod.c: Likewise.
* sysdeps/ia64/fpu/w_fmodf.c: Likewise.
* sysdeps/ia64/fpu/w_fmodl.c: Likewise.
* sysdeps/ia64/fpu/w_hypot.c: Likewise.
* sysdeps/ia64/fpu/w_hypotf.c: Likewise.
* sysdeps/ia64/fpu/w_hypotl.c: Likewise.
* sysdeps/ia64/fpu/w_lgamma_r.c: Likewise.
* sysdeps/ia64/fpu/w_lgammaf_r.c: Likewise.
* sysdeps/ia64/fpu/w_lgammal_r.c: Likewise.
* sysdeps/ia64/fpu/w_log.c: Likewise.
* sysdeps/ia64/fpu/w_log10.c: Likewise.
* sysdeps/ia64/fpu/w_log10f.c: Likewise.
* sysdeps/ia64/fpu/w_log10l.c: Likewise.
* sysdeps/ia64/fpu/w_log2.c: Likewise.
* sysdeps/ia64/fpu/w_log2f.c: Likewise.
* sysdeps/ia64/fpu/w_log2l.c: Likewise.
* sysdeps/ia64/fpu/w_logf.c: Likewise.
* sysdeps/ia64/fpu/w_logl.c: Likewise.
* sysdeps/ia64/fpu/w_pow.c: Likewise.
* sysdeps/ia64/fpu/w_powf.c: Likewise.
* sysdeps/ia64/fpu/w_powl.c: Likewise.
* sysdeps/ia64/fpu/w_remainder.c: Likewise.
* sysdeps/ia64/fpu/w_remainderf.c: Likewise.
* sysdeps/ia64/fpu/w_remainderl.c: Likewise.
* sysdeps/ia64/fpu/w_sinh.c: Likewise.
* sysdeps/ia64/fpu/w_sinhf.c: Likewise.
* sysdeps/ia64/fpu/w_sinhl.c: Likewise.
* sysdeps/ia64/fpu/w_sqrt.c: Likewise.
* sysdeps/ia64/fpu/w_sqrtf.c: Likewise.
* sysdeps/ia64/fpu/w_sqrtl.c: Likewise.
* sysdeps/ia64/fpu/w_tgamma.c: Likewise.
* sysdeps/ia64/fpu/w_tgammaf.c: Likewise.
* sysdeps/ia64/fpu/w_tgammal.c: Likewise.
* sysdeps/ieee754/dbl-64/w_exp_compat.c: Condition contents on
[LIBM_SVID_COMPAT].
* sysdeps/ieee754/flt-32/w_expf_compat.c: Likewise.
* sysdeps/ieee754/k_standard.c: Likewise.
* sysdeps/ieee754/k_standardf.c: Likewise.
* sysdeps/ieee754/k_standardl.c: Likewise.
* sysdeps/ieee754/ldbl-128/w_expl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/w_expl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-96/w_expl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-64-128/w_expl_compat.c: Condition
long_double_symbol call on [LIBM_SVID_COMPAT].
* sysdeps/ieee754/ldbl-opt/w_acoshl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_acosl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_asinl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_atan2l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_atanhl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_coshl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_fmodl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_hypotl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_j0l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_j1l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_jnl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_lgammal_r_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_log10l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_log2l_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_logl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_powl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_remainderl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_sinhl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_sqrtl_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_tgammal_compat.c: Likewise.
* sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c: Condition
long_double_symbol and compat_symbol calls on [LIBM_SVID_COMPAT].
This patch obsoletes the pow10, pow10f and pow10l functions (makes
them into compat symbols, not available for new ports or static
linking). The exp10 names for these functions are standardized (in TS
18661-4) and were added in the same glibc version (2.1) as pow10 so
source code can change to use them without any loss of portability.
Since pow10 is deliberately not provided for _Float128, only exp10,
this slightly simplifies moving to the new wrapper templates in the
!LIBM_SVID_COMPAT case, by avoiding needing to arrange for pow10,
pow10f and pow10l to be defined by those templates.
Tested for x86_64, and with build-many-glibcs.py.
* manual/math.texi (pow10): Do not document.
(pow10f): Likewise.
(pow10l): Likewise.
* math/bits/mathcalls.h [__USE_GNU] (pow10): Do not declare.
* math/bits/math-finite.h [__USE_GNU] (pow10): Likewise.
* math/libm-test-exp10.inc (pow10_test): Remove.
(do_test): Do not call pow10.
* math/w_exp10_compat.c (pow10): Make into compat symbol.
[NO_LONG_DOUBLE] (pow10l): Likewise.
* math/w_exp10f_compat.c (pow10f): Likewise.
* math/w_exp10l_compat.c (pow10l): Likewise.
* sysdeps/ia64/fpu/e_exp10.S: Include <shlib-compat.h>.
(pow10): Make into compat symbol.
* sysdeps/ia64/fpu/e_exp10f.S: Include <shlib-compat.h>.
(pow10f): Make into compat symbol.
* sysdeps/ia64/fpu/e_exp10l.S: Include <shlib-compat.h>.
(pow10l): Make into compat symbol.
* sysdeps/ieee754/ldbl-opt/Makefile (libnldbl-calls): Remove
pow10.
(CFLAGS-nldbl-pow10.c): Remove variable..
* sysdeps/ieee754/ldbl-opt/nldbl-pow10.c: Remove file.
* sysdeps/ieee754/ldbl-opt/w_exp10_compat.c (pow10l): Condition on
[SHLIB_COMPAT (libm, GLIBC_2_1, GLIBC_2_27)].
* sysdeps/ieee754/ldbl-opt/w_exp10l_compat.c (compat_symbol):
Undefine and redefine.
(pow10l): Make into compat symbol.
* sysdeps/aarch64/libm-test-ulps: Remove pow10 ulps.
* sysdeps/alpha/fpu/libm-test-ulps: Likewise.
* sysdeps/arm/libm-test-ulps: Likewise.
* sysdeps/hppa/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/microblaze/libm-test-ulps: Likewise.
* sysdeps/mips/mips32/libm-test-ulps: Likewise.
* sysdeps/mips/mips64/libm-test-ulps: Likewise.
* sysdeps/nios2/libm-test-ulps: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Likewise.
* sysdeps/powerpc/nofpu/libm-test-ulps: Likewise.
* sysdeps/s390/fpu/libm-test-ulps: Likewise.
* sysdeps/sh/libm-test-ulps: Likewise.
* sysdeps/sparc/fpu/libm-test-ulps: Likewise.
* sysdeps/tile/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch completes the ucontext.h namespace fixes by fixing issues
related to the use of struct sigcontext as mcontext_t, and inclusion
of <bits/sigcontext.h> even when struct sigcontext is not so used.
Inclusion of <bits/sigcontext.h> by <sys/ucontext.h> is removed; the
way to get the sigcontext structure is by including <signal.h> (in a
context where __USE_MISC is defined); the sysdeps/generic version of
sys/ucontext.h keeps the inclusion by necessity, with a comment about
how this is not namespace-clean, but the only configuration that used
it, MicroBlaze, gets its own version of the header in this patch.
Where mcontext_t was typedefed to struct sigcontext, the contents of
struct sigcontext are inserted (with appropriate namespace handling to
prefix fields with __ when __USE_MISC is not defined); review should
check that this has been done correctly in each case, whether the
definition of struct sigcontext comes from glibc headers or from the
Linux kernel. This changes C++ name mangling on affected
architectures (which do not include x86_64/x86).
Tested for x86_64, and with build-many-glibcs.py.
2017-08-14 Joseph Myers <joseph@codesourcery.com>
[BZ #21457]
* sysdeps/arm/sys/ucontext.h: Do not include <bits/sigcontext.h>.
* sysdeps/generic/sys/ucontext.h: Add comment about use of struct
sigcontext and namespace requirements.
* sysdeps/i386/sys/ucontext.h: Do not include <bits/sigcontext.h>.
* sysdeps/m68k/sys/ucontext.h: Likewise.
* sysdeps/mips/sys/ucontext.h: Likewise. Include <bits/types.h>.
* sysdeps/unix/sysv/linux/aarch64/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
(__ctx): Define earlier.
(mcontext_t): Define structure contents rather than using struct
sigcontext.
* sysdeps/unix/sysv/linux/aarch64/ucontext_i.sym (oEXTENSION): Use
__glibc_reserved1 instead of __reserved.
* sysdeps/unix/sysv/linux/alpha/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
(__ctx): Define earlier.
(mcontext_t): Define structure contents rather than using struct
sigcontext.
* sysdeps/unix/sysv/linux/alpha/ucontext-offsets.sym: Use
mcontext_t instead of struct sigcontext.
* sysdeps/unix/sysv/linux/arm/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
(__ctx): Define earlier.
(mcontext_t): Define structure contents rather than using struct
sigcontext.
* sysdeps/unix/sysv/linux/hppa/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
(__ctx): Define earlier.
(mcontext_t): Define structure contents rather than using struct
sigcontext.
* sysdeps/unix/sysv/linux/ia64/makecontext.c (__makecontext): Use
mcontext_t instead of struct sigcontext.
* sysdeps/unix/sysv/linux/ia64/sigcontext-offsets.sym: Use
mcontext_t instead of struct sigcontext.
* sysdeps/unix/sysv/linux/ia64/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
(__ctx): New macro.
(struct __ia64_fpreg_mcontext): New type.
(mcontext_t): Define structure contents rather than using struct
sigcontext.
(_SC_GR0_OFFSET): Use mcontext_t instead of struct sigcontext.
(uc_sigmask): Define using __ctx.
(uc_stack): Likewise.
* sysdeps/unix/sysv/linux/ia64/sys/procfs.h: Include
<bits/sigcontext.h>.
* sysdeps/unix/sysv/linux/ia64/sys/ptrace.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
* sysdeps/unix/sysv/linux/microblaze/sys/ucontext.h: New file.
* sysdeps/unix/sysv/linux/mips/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
* sysdeps/unix/sysv/linux/nios2/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
* sysdeps/unix/sysv/linux/s390/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
* sysdeps/unix/sysv/linux/sh/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
* sysdeps/unix/sysv/linux/sparc/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
* sysdeps/unix/sysv/linux/tile/sys/ucontext.h: Do not include
<bits/sigcontext.h>.
(__ctx): Define earlier.
(mcontext_t): Define structure contents rather than using struct
sigcontext.
* sysdeps/unix/sysv/linux/x86/sys/ucontext.h: Do not include
<bits/sigcontext.h>. Include <bits/types.h>.
* conform/Makefile (test-xfail-XPG42/signal.h/conform): Remove.
(test-xfail-XPG42/sys/wait.h/conform): Likewise.
(test-xfail-XPG42/ucontext.h/conform): Likewise.
(test-xfail-UNIX98/signal.h/conform): Likewise.
(test-xfail-UNIX98/sys/wait.h/conform): Likewise.
(test-xfail-UNIX98/ucontext.h/conform): Likewise.
(test-xfail-XOPEN2K/signal.h/conform): Likewise.
(test-xfail-XOPEN2K/sys/wait.h/conform): Likewise.
(test-xfail-XOPEN2K/ucontext.h/conform): Likewise.
(test-xfail-POSIX2008/signal.h/conform): Likewise.
(test-xfail-POSIX2008/sys/wait.h/conform): Likewise.
(test-xfail-XOPEN2K8/signal.h/conform): Likewise.
(test-xfail-XOPEN2K8/sys/wait.h/conform): Likewise.
__mcount_internal is called from assembler code. Use an explicit
regparm attribute to pass both arguments in registers, to match what
used to happen with internal_function before commit
fbdc1e3e8d (i386: Do not set
internal_function).
All calls to functions with the internal_function attribute
have been removed from assembler implementations, which means that
the definition of internal_function can be changed at the C level
without causing ABI issues with assembler code.
_dl_fixup still uses a regparm calling convention on i386, but this
is controlled through ARCH_FIXUP_ATTRIBUTE, not internal_function.
The standard members of ucontext_t, in all standard versions with that
type, are uc_link, uc_sigmask, uc_stack and uc_mcontext.
The uc_* namespace is mostly reserved for additions to the structure.
However, in XPG4.2, it's only reserved when <ucontext.h> is included,
not when <signal.h> is included, while <signal.h> is required to
define ucontext_t (but not allowed to make visible other symbols from
<ucontext.h>). Thus, nonstandard members should avoid uc_* names.
Some already do use __uc_*, but others don't; most architectures (all
except ia64, I think) have a member uc_flags and some have additional
members beyond that.
This patch makes nonstandard members have an __ prefix unless
__USE_MISC is defined. Members whose names indicate they are solely
padding / reserved for future use are renamed unconditionally to use
the __glibc_reserved1 naming convention.
This is part of the preparation for a revised version of the
mcontext_t / sigcontext patch to be able to eliminate all 13 of the
miscellaneous XFAILs in conform/Makefile, rather than only 11 of them
as at present (at least one further fix on top of this one will be
needed for that as well).
Tested for x86_64, and with build-many-glibcs.py.
[BZ #21457]
* sysdeps/arm/sys/ucontext.h (__ctx): Move undefine further down.
(ucontext_t): Use __ctx with uc_flags. Rename uc_filler to
__glibc_reserved1.
* sysdeps/generic/sys/ucontext.h (__ctx): New macro.
(ucontext_t): Use __ctx with uc_flags.
* sysdeps/i386/sys/ucontext.h (__ctx): Move undefine further down.
(__ctxt): Likewise.
(ucontext_t): Use __ctx with uc_flags. Rename uc_filler to
__glibc_reserved1.
* sysdeps/m68k/sys/ucontext.h (__ctx): Move undefine further down.
(ucontext_t): Use __ctx with uc_flags. Rename uc_filler to
__glibc_reserved1.
* sysdeps/mips/sys/ucontext.h (__ctx): Move undefine further down.
(ucontext_t): Use __ctx with uc_flags. Rename uc_filler to
__glibc_reserved1.
* sysdeps/unix/sysv/linux/aarch64/sys/ucontext.h (__ctx): New
macro.
(ucontext_t): Use __ctx with uc_flags.
* sysdeps/unix/sysv/linux/alpha/sys/ucontext.h (__ctx): New macro.
(ucontext_t): Use __ctx with uc_flags.
* sysdeps/unix/sysv/linux/arm/sys/ucontext.h (__ctx): New macro.
(ucontext_t): Use __ctx with uc_flags and uc_regspace.
* sysdeps/unix/sysv/linux/hppa/sys/ucontext.h (__ctx): New macro.
(ucontext_t): Use __ctx with uc_flags.
* sysdeps/unix/sysv/linux/m68k/sys/ucontext.h (__ctx): Move
undefine further down.
(ucontext_t): Use __ctx with uc_flags. Rename uc_filler to
__glibc_reserved1.
* sysdeps/unix/sysv/linux/mips/sys/ucontext.h (__ctx): Move
undefine further down.
(ucontext_t): Use __ctx with uc_flags.
* sysdeps/unix/sysv/linux/nios2/sys/ucontext.h (__ctx): Move
undefine further down.
(ucontext_t): Use __ctx with uc_flags.
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (ucontext_t): Use
__ctx with uc_flags, uc_regs_ptr, uc_regs and uc_reg_space.
Rename uc_pad to __glibc_reserved1.
* sysdeps/unix/sysv/linux/s390/sys/ucontext.h (__ctx): Move
undefine further down.
(ucontext_t): Use __ctx with uc_flags.
* sysdeps/unix/sysv/linux/sh/sys/ucontext.h (__ctx): Move undefine
further down.
(ucontext_t): Use __ctx with uc_flags.
* sysdeps/unix/sysv/linux/sparc/sys/ucontext.h (ucontext_t): Use
__ctx with uc_flags.
* sysdeps/unix/sysv/linux/tile/sys/ucontext.h (__ctx): New macro.
(ucontext_t): Use __ctx with uc_flags.
* sysdeps/unix/sysv/linux/x86/sys/ucontext.h (ucontext_t): Use
__ctx with uc_flags.
Since start.o may be compiled as PIC, we should check PIC instead of
SHARED. Also avoid dynamic relocation against main in static PIE since
_start is the entry point before the executable is relocated.
* sysdeps/i386/start.S (_start): Check Check PIC instead of
SHARED. Avoid dynamic relocation against main in static PIE.
__memset_zero_constant_len_parameter should be removed by
commit 61062f5630
Author: Ulrich Drepper <drepper@redhat.com>
Date: Tue Mar 1 00:35:23 2005 +0000
2005-02-24 Roland McGrath <roland@redhat.com>
* debug/Versions (libc: GLIBC_2.4): Remove
__memset_zero_constant_len_parameter.
* sysdeps/generic/memset_chk.c: Remove alias and warning.
* misc/sys/cdefs.h (__warndecl): New macro.
* debug/warning-nop.c: New file.
* string/bits/string3.h (memset): Call __warn_memset_zero_len with no
arguments, instead of calling __memset_zero_constant_len_parameter.
Use __warndecl for __warn_memset_zero_len.
* debug/Makefile (routines): Add $(static-only-routines).
(static-only-routines): New variable.
This patch removes the last emaining pieces of it. Tested it on i586,
i686 and x86-64.
[BZ #21790]
* sysdeps/i386/i586/memset.S
(__memset_zero_constant_len_parameter): Removed.
* sysdeps/i386/i686/memset.S
(__memset_zero_constant_len_parameter): Likewise.
* sysdeps/i386/i686/multiarch/memset_chk.S
(__memset_zero_constant_len_parameter): Likewise.
* sysdeps/x86_64/memset.S (__memset_zero_constant_len_parameter):
Likewise.
There is no need to define multiarch __memmove_chk in libc.a since they
aren't used at all.
[BZ #21791]
* sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S
(MEMCPY_CHK): Define only if SHARED is defined.
* sysdeps/i386/i686/multiarch/memcpy-ssse3-rep.S (MEMCPY_CHK):
Likewise.
* sysdeps/i386/i686/multiarch/memcpy-ssse3.S (MEMCPY_CHK):
Likewise.
Since there are no multiarch versions of memmove_chk and memset_chk,
test multiarch versions of memmove_chk and memset_chk only in libc.so.
[BZ #21741]
* sysdeps/i386/i686/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test memmove_chk and memset_chk only
in libc.so.
GCC 7 changed the definition of max_align_t on i386:
https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=9b5c49ef97e63cc63f1ffa13baf771368105ebe2
As a result, glibc malloc no longer returns memory blocks which are as
aligned as max_align_t requires.
This causes malloc/tst-malloc-thread-fail to fail with an error like this
one:
error: allocation function 0, size 144 not aligned to 16
This patch moves the MALLOC_ALIGNMENT definition to <malloc-alignment.h>
and increases the malloc alignment to 16 for i386.
[BZ #21120]
* malloc/malloc-internal.h (MALLOC_ALIGNMENT): Moved to ...
* sysdeps/generic/malloc-alignment.h: Here. New file.
* sysdeps/i386/malloc-alignment.h: Likewise.
* sysdeps/generic/malloc-machine.h: Include <malloc-alignment.h>.
The ucontext_t type has a tag struct ucontext. As with previous such
issues for siginfo_t and stack_t, this tag is not permitted by POSIX
(is not in a reserved namespace), and so namespace conformance means
breaking C++ name mangling for this type.
In this case, the type does need to have some tag rather than just a
typedef name, because it includes a pointer to itself. This patch
uses struct ucontext_t as the new tag, so the type is mangled as
ucontext_t (the POSIX *_t reservation applies in all namespaces, not
just the namespace of ordinary identifiers). Another reserved name
such as struct __ucontext could of course be used.
Because of other namespace issues, this patch does not by itself fix
bug 21457 or allow any XFAILs to be removed.
Tested for x86_64, and with build-many-glibcs.py.
[BZ #21457]
* sysdeps/arm/sys/ucontext.h (struct ucontext): Rename to struct
ucontext_t.
* sysdeps/generic/sys/ucontext.h (struct ucontext): Likewise.
* sysdeps/i386/sys/ucontext.h (struct ucontext): Likewise.
* sysdeps/m68k/sys/ucontext.h (struct ucontext): Likewise.
* sysdeps/mips/sys/ucontext.h (struct ucontext): Likewise.
* sysdeps/unix/sysv/linux/aarch64/sys/ucontext.h (struct
ucontext): Likewise.
* sysdeps/unix/sysv/linux/alpha/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/arm/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/hppa/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/ia64/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/m68k/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/mips/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/nios2/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (struct
ucontext): Likewise.
* sysdeps/unix/sysv/linux/s390/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/sh/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/sparc/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/tile/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/unix/sysv/linux/x86/sys/ucontext.h (struct ucontext):
Likewise.
* sysdeps/powerpc/powerpc32/backtrace.c (struct
rt_signal_frame_32): Likewise.
* sysdeps/powerpc/powerpc64/backtrace.c (struct signal_frame_64):
Likewise.
* sysdeps/unix/sysv/linux/aarch64/kernel_rt_sigframe.h (struct
kernel_rt_sigframe): Likewise.
* sysdeps/unix/sysv/linux/aarch64/sigcontextinfo.h (SIGCONTEXT):
Likewise.
* sysdeps/unix/sysv/linux/arm/register-dump.h (register_dump):
Likewise.
* sysdeps/unix/sysv/linux/arm/sigcontextinfo.h (SIGCONTEXT):
Likewise.
* sysdeps/unix/sysv/linux/hppa/profil-counter.h
(__profil_counter): Likewise.
* sysdeps/unix/sysv/linux/microblaze/sigcontextinfo.h
(SIGCONTEXT): Likewise.
* sysdeps/unix/sysv/linux/mips/kernel_rt_sigframe.h (struct
kernel_rt_sigframe): Likewise.
* sysdeps/unix/sysv/linux/nios2/kernel_rt_sigframe.h (struct
kernel_rt_sigframe): Likewise.
* sysdeps/unix/sysv/linux/nios2/sigcontextinfo.h (SIGCONTEXT):
Likewise.
* sysdeps/unix/sysv/linux/sh/makecontext.S (__makecontext):
Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/makecontext.c
(__start_context): Likewise.
* sysdeps/unix/sysv/linux/tile/sigcontextinfo.h (SIGCONTEXT):
Likewise.
* sysdeps/unix/sysv/linux/x86_64/register-dump.h (register_dump):
Likewise.
* sysdeps/unix/sysv/linux/x86_64/sigcontextinfo.h (SIGCONTEXT):
Likewise.
This patch enables float128 support for x86_64 and x86. All GCC
versions that can build glibc provide the required support, but since
GCC 6 and before don't provide __builtin_nanq / __builtin_nansq, sNaN
tests and some tests of NaN payloads need to be disabled with such
compilers (this does not affect the generated glibc binaries at all,
just the tests). bits/floatn.h declares float128 support to be
available for GCC versions that provide the required libgcc support
(4.3 for x86_64, 4.4 for i386 GNU/Linux, 4.5 for i386 GNU/Hurd);
compilation-only support was present some time before then, but not
really useful without the libgcc functions.
fenv_private.h needed updating to avoid trying to put _Float128 values
in registers. I make no assertion of optimality of the
math_opt_barrier / math_force_eval definitions for this case; they are
simply intended to be sufficient to work correctly.
Tested for x86_64 and x86, with GCC 7 and GCC 6. (Testing for x32 was
compilation tests only with build-many-glibcs.py to verify the ABI
baseline updates. I have not done any testing for Hurd, although the
float128 support is enabled there as for GNU/Linux.)
* sysdeps/i386/Implies: Add ieee754/float128.
* sysdeps/x86_64/Implies: Likewise.
* sysdeps/x86/bits/floatn.h: New file.
* sysdeps/x86/float128-abi.h: Likewise.
* manual/math.texi (Mathematics): Document support for _Float128
on x86_64 and x86.
* sysdeps/i386/fpu/fenv_private.h: Include <bits/floatn.h>.
(math_opt_barrier): Do not put _Float128 values in floating-point
registers.
(math_force_eval): Likewise.
[__x86_64__] (SET_RESTORE_ROUNDF128): New macro.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (CPPFLAGS): Append
to Makefile variable.
* sysdeps/x86/fpu/e_sqrtf128.c: New file.
* sysdeps/x86/fpu/sfp-machine.h: Likewise. Based on libgcc.
* sysdeps/x86/math-tests.h: New file.
* math/libm-test-support.h (XFAIL_FLOAT128_PAYLOAD): New macro.
* math/libm-test-getpayload.inc (getpayload_test_data): Use
XFAIL_FLOAT128_PAYLOAD.
* math/libm-test-setpayload.inc (setpayload_test_data): Likewise.
* math/libm-test-totalorder.inc (totalorder_test_data): Likewise.
* math/libm-test-totalordermag.inc (totalordermag_test_data):
Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Update.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Testing with GCC 7 for 32-bit x86 showed some ulps differences,
presumably from variation in when values with excess precision get
spilled to the stack and so lose that precision. This patch updates
the libm-test-ulps files accordingly.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
These machine-dependent inline string functions have never been on by
default, and even if they were a good idea at the time they were
introduced, they haven't really been touched in ten to fifteen years
and probably aren't a good idea on current-gen processors. Current
thinking is that this class of optimization is best left to the
compiler.
* bits/string.h, string/bits/string.h
* sysdeps/aarch64/bits/string.h
* sysdeps/m68k/m680x0/m68020/bits/string.h
* sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h
* sysdeps/x86/bits/string.h: Delete file.
* string/string.h: Don't include bits/string.h.
* string/bits/string3.h: Rename to bits/string_fortified.h.
No need to undef various symbols that the removed headers
might have defined as macros.
* string/Makefile (headers): Remove bits/string.h, change
bits/string3.h to bits/string_fortified.h.
* string/string-inlines.c: Update commentary. Remove definitions
of various macros that nothing looks at anymore. Don't directly
include bits/string.h. Set _STRING_INLINE_unaligned here, based on
compiler-predefined macros.
* string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY
_is_ defined, provide internal hidden alias __strncat.
* include/string.h: Declare internal hidden alias __strncat.
Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is
not defined.
* include/bits/string3.h: Rename to bits/string_fortified.h,
update to match above.
* sysdeps/i386/string-inlines.c: Define compat symbols for
everything formerly defined by sysdeps/x86/bits/string.h.
Make existing definitions into compat symbols as well.
Remove some no-longer-necessary messing around with macros.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c
* sysdeps/s390/multiarch/mempcpy.c
No need to define _HAVE_STRING_ARCH_mempcpy.
Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT.
* sysdeps/i386/i686/multiarch/strncat-c.c
* sysdeps/s390/multiarch/strncat-c.c
* sysdeps/x86_64/multiarch/strncat-c.c
Define STRNCAT_PRIMARY. Don't change definition of libc_hidden_def.
Since commit d957c4d3fa (i386: Compile
rtld-*.os with -mno-sse -mno-mmx -mfpmath=387), vector intrinsics can
no longer be used in ld.so, even if the compiled code never makes it
into the final ld.so link. This commit adds the missing IS_IN (libc)
guard to the SSE 4.2 strcspn implementation, so that it can be used from
ld.so in the future.
This is fairly complicated, not because the users of __need_Emath and
__need_error_t have complicated requirements, but because the core
changes had a lot of fallout.
__need_error_t exists for gnulib compatibility in argz.h and argp.h.
error_t itself is a Hurdism, an enum containing all the E-constants,
so you can do 'p (error_t) errno' in gdb and get a symbolic value.
argz.h and argp.h use it for function return values, and they want to
fall back to 'int' when that's not available. There is no reason why
these nonstandard headers cannot just go ahead and include all of
errno.h; so we do that.
__need_Emath is defined only by .S files; what they _really_ need is
for errno.h to avoid declaring anything other than the E-constants
(e.g. 'extern int __errno_location(void);' is a syntax error in
assembly language). This is replaced with a check for __ASSEMBLER__ in
errno.h, plus a carefully documented requirement for bits/errno.h not
to define anything other than macros. That in turn has the
consequence that bits/errno.h must not define errno - fortunately, all
live ports use the same definition of errno, so I've moved it to
errno.h. The Hurd bits/errno.h must also take care not to define
error_t when __ASSEMBLER__ is defined, which involves repeating all of
the definitions twice, but it's a generated file so that's okay.
* stdlib/errno.h: Remove __need_Emath and __need_error_t logic.
Reorganize file. Declare errno here. When __ASSEMBLER__ is
defined, don't declare anything other than the E-constants.
* include/errno.h: Change conditional for exposing internal
declarations to (not _ISOMAC and not __ASSEMBLER__).
* bits/errno.h: Remove logic for __need_Emath. Document
requirements for a port-specific bits/errno.h.
* sysdeps/unix/sysv/linux/bits/errno.h
* sysdeps/unix/sysv/linux/alpha/bits/errno.h
* sysdeps/unix/sysv/linux/hppa/bits/errno.h
* sysdeps/unix/sysv/linux/mips/bits/errno.h
* sysdeps/unix/sysv/linux/sparc/bits/errno.h:
Add multiple-include guard and check against improper inclusion.
Remove __need_Emath logic. Don't declare errno here. Ensure all
constants are defined as simple integer literals. Consistent
formatting.
* sysdeps/mach/hurd/errnos.awk: Likewise. Only define error_t and
enum __error_t_codes if __ASSEMBLER__ is not defined.
* sysdeps/mach/hurd/bits/errno.h: Regenerate.
* argp/argp.h, string/argz.h: Don't define __need_error_t before
including errno.h.
* sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S
* sysdeps/i386/i686/fpu/multiarch/s_sincosf-sse2.S
* sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S
* sysdeps/x86_64/fpu/s_cosf.S
* sysdeps/x86_64/fpu/s_sincosf.S
* sysdeps/x86_64/fpu/s_sinf.S:
Just include errno.h; don't define __need_Emath or include
bits/errno.h directly.
ELFv2 functions with localentry:0 are those with a single entry point,
ie. global entry == local entry, that have no requirement on r2 or
r12 and guarantee r2 is unchanged on return. Such an external
function can be called via the PLT without saving r2 or restoring it
on return, avoiding a common load-hit-store for small functions.
This patch implements the ld.so changes necessary for this
optimization. ld.so needs to check that an optimized plt call
sequence is in fact calling a function implemented with localentry:0,
end emit a fatal error otherwise.
The elf/testobj6.c change is to stop "error while loading shared
libraries: expected localentry:0 `preload'" when running
elf/preloadtest, which we'd get otherwise.
* elf/elf.h (PPC64_OPT_LOCALENTRY): Define.
* sysdeps/alpha/dl-machine.h (elf_machine_fixup_plt): Add
refsym and sym parameters. Adjust callers.
* sysdeps/aarch64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/arm/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/generic/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/hppa/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/i386/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/ia64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/m68k/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/microblaze/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/mips/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/nios2/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/powerpc/powerpc32/dl-machine.h (elf_machine_fixup_plt):
Likewise.
* sysdeps/s390/s390-32/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/s390/s390-64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/sh/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/sparc/sparc32/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/tile/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/x86_64/dl-machine.h (elf_machine_fixup_plt): Likewise.
* sysdeps/powerpc/powerpc64/dl-machine.c (_dl_error_localentry): New.
(_dl_reloc_overflow): Increase buffser size. Formatting.
* sysdeps/powerpc/powerpc64/dl-machine.h (ppc64_local_entry_offset):
Delete reloc param, add refsym and sym. Check optimized plt
call stubs for localentry:0 functions. Adjust callers.
(elf_machine_fixup_plt, elf_machine_plt_conflict): Add refsym
and sym parameters. Adjust callers.
(_dl_reloc_overflow): Move attribute.
(_dl_error_localentry): Declare.
* elf/dl-runtime.c (_dl_fixup): Save original sym. Pass
refsym and sym to elf_machine_fixup_plt.
* elf/testobj6.c (preload): Call printf.
This patch optimizes the generic spinlock code.
The type pthread_spinlock_t is a typedef to volatile int on all archs.
Passing a volatile pointer to the atomic macros which are not mapped to the
C11 atomic builtins can lead to extra stores and loads to stack if such
a macro creates a temporary variable by using "__typeof (*(mem)) tmp;".
Thus, those macros which are used by spinlock code - atomic_exchange_acquire,
atomic_load_relaxed, atomic_compare_exchange_weak - have to be adjusted.
According to the comment from Szabolcs Nagy, the type of a cast expression is
unqualified (see http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_423.htm):
__typeof ((__typeof (*(mem)) *(mem)) tmp;
Thus from spinlock perspective the variable tmp is of type int instead of
type volatile int. This patch adjusts those macros in include/atomic.h.
With this construct GCC >= 5 omits the extra stores and loads.
The atomic macros are replaced by the C11 like atomic macros and thus
the code is aligned to it. The pthread_spin_unlock implementation is now
using release memory order instead of sequentially consistent memory order.
The issue with passed volatile int pointers applies to the C11 like atomic
macros as well as the ones used before.
I've added a glibc_likely hint to the first atomic exchange in
pthread_spin_lock in order to return immediately to the caller if the lock is
free. Without the hint, there is an additional jump if the lock is free.
I've added the atomic_spin_nop macro within the loop of plain reads.
The plain reads are also realized by C11 like atomic_load_relaxed macro.
The new define ATOMIC_EXCHANGE_USES_CAS determines if the first try to acquire
the spinlock in pthread_spin_lock or pthread_spin_trylock is an exchange
or a CAS. This is defined in atomic-machine.h for all architectures.
The define SPIN_LOCK_READS_BETWEEN_CMPXCHG is now removed.
There is no technical reason for throwing in a CAS every now and then,
and so far we have no evidence that it can improve performance.
If that would be the case, we have to adjust other spin-waiting loops
elsewhere, too! Using a CAS loop without plain reads is not a good idea
on many targets and wasn't used by one. Thus there is now no option to
do so.
Architectures are now using the generic spinlock automatically if they
do not provide an own implementation. Thus the pthread_spin_lock.c files
in sysdeps folder are deleted.
ChangeLog:
* NEWS: Mention new spinlock implementation.
* include/atomic.h:
(__atomic_val_bysize): Cast type to omit volatile qualifier.
(atomic_exchange_acq): Likewise.
(atomic_load_relaxed): Likewise.
(ATOMIC_EXCHANGE_USES_CAS): Check definition.
* nptl/pthread_spin_init.c (pthread_spin_init):
Use atomic_store_relaxed.
* nptl/pthread_spin_lock.c (pthread_spin_lock):
Use C11-like atomic macros.
* nptl/pthread_spin_trylock.c (pthread_spin_trylock):
Likewise.
* nptl/pthread_spin_unlock.c (pthread_spin_unlock):
Use atomic_store_release.
* sysdeps/aarch64/nptl/pthread_spin_lock.c: Delete File.
* sysdeps/arm/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/hppa/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/m68k/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/microblaze/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/mips/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/nios2/nptl/pthread_spin_lock.c: Likewise.
* sysdeps/aarch64/atomic-machine.h (ATOMIC_EXCHANGE_USES_CAS): Define.
* sysdeps/alpha/atomic-machine.h: Likewise.
* sysdeps/arm/atomic-machine.h: Likewise.
* sysdeps/i386/atomic-machine.h: Likewise.
* sysdeps/ia64/atomic-machine.h: Likewise.
* sysdeps/m68k/coldfire/atomic-machine.h: Likewise.
* sysdeps/m68k/m680x0/m68020/atomic-machine.h: Likewise.
* sysdeps/microblaze/atomic-machine.h: Likewise.
* sysdeps/mips/atomic-machine.h: Likewise.
* sysdeps/powerpc/powerpc32/atomic-machine.h: Likewise.
* sysdeps/powerpc/powerpc64/atomic-machine.h: Likewise.
* sysdeps/s390/atomic-machine.h: Likewise.
* sysdeps/sparc/sparc32/atomic-machine.h: Likewise.
* sysdeps/sparc/sparc32/sparcv9/atomic-machine.h: Likewise.
* sysdeps/sparc/sparc64/atomic-machine.h: Likewise.
* sysdeps/tile/tilegx/atomic-machine.h: Likewise.
* sysdeps/tile/tilepro/atomic-machine.h: Likewise.
* sysdeps/unix/sysv/linux/hppa/atomic-machine.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h: Likewise.
* sysdeps/unix/sysv/linux/nios2/atomic-machine.h: Likewise.
* sysdeps/unix/sysv/linux/sh/atomic-machine.h: Likewise.
* sysdeps/x86_64/atomic-machine.h: Likewise.
Continuing the fixes for namespace issues in sys/ucontext.h, this
patch moves various symbols into the implementation namespace in the
absence of __USE_MISC. As with previous changes, it is nonexhaustive,
just covering more straightforward cases.
Structure fields are generally changed to have a prefix __ in the
absence of __USE_MISC, via a macro __ctx (used without a space before
the open parenthesis, since the result is a single identifier).
Various macros such as NGREG also have leading __ added. No changes
are made to structure tags (and thus to C++ name mangling), except
that in the (unused) file sysdeps/i386/sys/ucontext.h, structures
defined inside other structures as the type for a field have their
tags removed in the non-__USE_MISC case (those structure tags would
not in any case have been visible in C++, because in C++ the scope of
such a tag is limited to the containing structure). No changes are
made to the contents of bits/sigcontext.h, or to whether it is
included. Because of remaining namespace issues, this patch does not
yet fix the bug or allow any XFAILs to be removed.
Tested for x86_64 and x86, and with build-many-glibcs.py.
[BZ #21457]
* sysdeps/arm/sys/ucontext.h (NGREG): Rename to __NGREG and define
NGREG to __NGREG if [__USE_MISC].
(gregset_t): Define using __NGREG.
(__ctx): New macro.
(mcontext_t): Use __ctx in defining fields.
* sysdeps/i386/sys/ucontext.h (NGREG): Rename to __NGREG and
define NGREG to __NGREG if [__USE_MISC].
(gregset_t): Define using __NGREG.
(__ctx): New macro.
(__ctxt): Likewise.
(fpregset_t): Use __ctx and __ctxt in defining fields.
(mcontext_t): Likewise.
* sysdeps/m68k/sys/ucontext.h (NGREG): Rename to __NGREG and
define NGREG to __NGREG if [__USE_MISC].
(gregset_t): Define using __NGREG.
(__ctx): New macro.
(mcontext_t): Use __ctx in defining fields.
* sysdeps/mips/sys/ucontext.h (NGREG): Rename to __NGREG and
define NGREG to __NGREG if [__USE_MISC].
(gregset_t): Define using __NGREG.
(__ctx): New macro.
(fpregset_t): Use __ctx in defining fields.
(mcontext_t): Likewise.
* sysdeps/unix/sysv/linux/alpha/sys/ucontext.h (NGREG): Rename to
__NGREG and define NGREG to __NGREG if [__USE_MISC].
(gregset_t): Define using __NGREG.
(NFPREG): Rename to __NFPREG and define NFPREG to __NFPREG if
[__USE_MISC].
(fpregset_t): Define using __NFPREG.
* sysdeps/unix/sysv/linux/m68k/sys/ucontext.h (NGREG): Rename to
__NGREG and define NGREG to __NGREG if [__USE_MISC].
(gregset_t): Define using __NGREG.
(__ctx): New macro.
(fpregset_t): Use __ctx in defining fields.
(mcontext_t): Likewise.
* sysdeps/unix/sysv/linux/mips/sys/ucontext.h (NGREG): Rename to
__NGREG and define NGREG to __NGREG if [__USE_MISC].
(NFPREG): Rename to __NFPREG and define NFPREG to __NFPREG if
[__USE_MISC].
(gregset_t): Define using __NGREG.
(__ctx): New macro.
(fpregset_t): Use __ctx in defining fields.
(mcontext_t): Likewise.
* sysdeps/unix/sysv/linux/nios2/sys/ucontext.h (__ctx): New macro.
(mcontext_t): Use __ctx in defining fields.
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h (__ctx): New
macro.
[__WORDSIZE == 32] (NGREG): Rename to __NGREG and define NGREG to
__NGREG if [__USE_MISC].
[__WORDSIZE == 32] (gregset_t): Define using __NGREG.
[__WORDSIZE == 32] (fpregset_t): Use __ctx in defining fields.
(mcontext_t): Likewise.
[__WORDSIZE != 32] (NGREG): Rename to __NGREG and define NGREG to
__NGREG if [__USE_MISC].
[__WORDSIZE != 32] (NFPREG): Rename to __NFPREG and define NFPREG
to __NFPREG if [__USE_MISC].
[__WORDSIZE != 32] (NVRREG): Rename to __NVRREG and define NVRREG
to __NVRREG if [__USE_MISC].
[__WORDSIZE != 32] (gregset_t): Define using __NGREG.
[__WORDSIZE != 32] (fpregset_t): Define using __NFPREG.
[__WORDSIZE != 32] (vscr_t): Use __ctx in defining fields.
[__WORDSIZE != 32] (vrregset_t): Likewise.
[__WORDSIZE != 32] (mcontext_t): Likewise.
* sysdeps/unix/sysv/linux/s390/sys/ucontext.h (__ctx): New macro.
(__psw_t): Use __ctx in defining fields.
(NGREG): Rename to __NGREG and define NGREG to __NGREG if
[__USE_MISC].
(gregset_t): Define using __NGREG.
(fpreg_t): Use __ctx in defining fields.
(fpregset_t): Likewise.
(mcontext_t): Likewise.
* sysdeps/unix/sysv/linux/sh/sys/ucontext.h (NGREG): Rename to
__NGREG and define NGREG to __NGREG if [__USE_MISC].
(gregset_t): Define using __NGREG.
(NFPREG): Rename to __NFPREG and define NFPREG to __NFPREG if
[__USE_MISC].
(fpregset_t): Define using __NFPREG.
(__ctx): New macro.
(mcontext_t): Use __ctx in defining fields.
* sysdeps/unix/sysv/linux/x86/sys/ucontext.h (__ctx): New macro.
[__x86_64__] (NGREG): Rename to __NGREG and define NGREG to
__NGREG if [__USE_MISC].
[__x86_64__] (gregset_t): Define using __NGREG.
[__x86_64__] (struct _libc_fpxreg): Use __ctx in defining fields.
[__x86_64__] (struct _libc_fpstate): Likewise.
[__x86_64__] (mcontext_t): Likewise.
[!__x86_64__] (NGREG): Rename to __NGREG and define NGREG to
__NGREG if [__USE_MISC].
[!__x86_64__] (gregset_t): Define using __NGREG.
[!__x86_64__] (struct _libc_fpreg): Use __ctx in defining fields.
[!__x86_64__] (struct _libc_fpstate): Likewise.
[!__x86_64__] (mcontext_t): Likewise.
The various sys/ucontext.h headers include <signal.h> and all the
headers split out of <bits/sigstack.h>. (Except that the powerpc
version does not include <signal.h>.)
None of the standard versions defining ucontext.h require or permit
such inclusions; rather, they all say that the stack_t and sigset_t
types from signal.h are defined. This patch fixes the headers to
include just the bits/ headers for those types (and the existing
includes of bits/sigcontext.h). Since bits/types/sigset_t.h is now
being included instead of bits/types/__sigset_t.h, __sigset_t uses in
the headers are replaced by direct use of the public sigset_t type.
sysdeps/unix/sysv/linux/x86/bits/sigcontext.h was relying on the prior
inclusion of <signal.h> to define types such as __uint32_t, so gets a
bits/types.h include added to provide those types.
Although one could keep some or all of the includes under a __USE_MISC
conditional, that seems unnecessary to me, especially given the lack
of a <signal.h> include in the powerpc version meaning that portable
programs already cannot rely on such an include.
Tested for x86_64 and x86, and with build-many-glibcs.py. As with
other such fixes, more namespace issues remain so this does not permit
any XFAILs to be removed or bugs to be closed.
[BZ #21457]
* sysdeps/arm/sys/ucontext.h: Do not include <signal.h>,
<bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/generic/sys/ucontext.h: Do not include <signal.h>,
<bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/i386/sys/ucontext.h: Do not include <signal.h>,
<bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/m68k/sys/ucontext.h: Do not include <signal.h>,
<bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/mips/sys/ucontext.h: Do not include <signal.h>,
<bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/aarch64/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/alpha/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/arm/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/hppa/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/ia64/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h>.
* sysdeps/unix/sysv/linux/m68k/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/mips/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/nios2/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h: Do not include
<bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>.
* sysdeps/unix/sysv/linux/s390/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/sh/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/sparc/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/tile/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
* sysdeps/unix/sysv/linux/x86/bits/sigcontext.h: Include
<bits/types.h>.
* sysdeps/unix/sysv/linux/x86/sys/ucontext.h: Do not include
<signal.h>, <bits/sigstack.h>, <bits/types/struct_sigstack.h> or
<bits/ss_flags.h>. Include <bits/types/sigset_t.h> instead of
<bits/types/__sigset_t.h>.
(ucontext_t): Use sigset_t instead of __sigset_t.
The types affected are __sig_atomic_t, sig_atomic_t, __sigset_t,
sigset_t, sigval_t, sigevent_t, and siginfo_t. __sig_atomic_t is a
scalar, so it's now directly available from bits/types.h. The others
get bits/types/ headers.
Side effects include: There have been small changes to which
non-signal headers expose which subset of the signal-related types.
A couple of architectures' nested siginfo_t fields had to be renamed
to prevent undesired macro expansion. Internal code that wants to
manipulate signal masks must now include <sigsetops.h> (which is not
installed) and should be aware that __sigaddset, __sigandset,
__sigdelset, __sigemptyset, and __sigorset no longer return a value
(unlike the public API). Relatedly, the public signal.h no longer
declares any of those functions. The obsolete sigmask() macro no
longer has a system-specific definition -- in the cases where it
matters, it didn't work anyway.
New Linux architectures should create bits/siginfo-arch.h and/or
bits/siginfo-consts-arch.h to customize their siginfo_t, rather than
duplicating everything in bits/siginfo.h (which no longer exists).
Add new __SI_* macros if necessary. Ports to other operating systems
are strongly encouraged to generalize this scheme further.
* bits/sigevent-consts.h
* bits/siginfo-consts.h
* bits/types/__sigset_t.h
* bits/types/sigevent_t.h
* bits/types/siginfo_t.h
* sysdeps/unix/sysv/linux/bits/sigevent-consts.h
* sysdeps/unix/sysv/linux/bits/siginfo-consts.h
* sysdeps/unix/sysv/linux/bits/types/__sigset_t.h
* sysdeps/unix/sysv/linux/bits/types/sigevent_t.h
* sysdeps/unix/sysv/linux/bits/types/siginfo_t.h:
New system-dependent bits headers.
* sysdeps/unix/sysv/linux/bits/siginfo-arch.h
* sysdeps/unix/sysv/linux/bits/siginfo-consts-arch.h
* sysdeps/unix/sysv/linux/ia64/bits/siginfo-arch.h
* sysdeps/unix/sysv/linux/ia64/bits/siginfo-consts-arch.h
* sysdeps/unix/sysv/linux/mips/bits/siginfo-arch.h
* sysdeps/unix/sysv/linux/sparc/bits/siginfo-arch.h
* sysdeps/unix/sysv/linux/tile/bits/siginfo-arch.h
* sysdeps/unix/sysv/linux/tile/bits/siginfo-consts-arch.h
* sysdeps/unix/sysv/linux/x86/bits/siginfo-arch.h:
New Linux-only system-dependent bits headers.
* signal/bits/types/sig_atomic_t.h
* signal/bits/types/sigset_t.h
* signal/bits/types/sigval_t.h:
New non-system-dependent bits headers.
* sysdeps/generic/sigsetops.h
* sysdeps/unix/sysv/linux/sigsetops.h:
New internal headers.
* include/bits/types/sig_atomic_t.h
* include/bits/types/sigset_t.h
* include/bits/types/sigval_t.h:
New wrappers.
* signal/sigsetops.h
* bits/siginfo.h
* bits/sigset.h
* sysdeps/unix/sysv/linux/bits/siginfo.h
* sysdeps/unix/sysv/linux/bits/sigset.h
* sysdeps/unix/sysv/linux/ia64/bits/siginfo.h
* sysdeps/unix/sysv/linux/mips/bits/siginfo.h
* sysdeps/unix/sysv/linux/s390/bits/siginfo.h
* sysdeps/unix/sysv/linux/sparc/bits/siginfo.h
* sysdeps/unix/sysv/linux/tile/bits/siginfo.h
* sysdeps/unix/sysv/linux/x86/bits/siginfo.h:
Deleted.
* signal/Makefile, sysdeps/unix/sysv/linux/Makefile:
Update lists of installed headers.
* posix/bits/types.h: Define __sig_atomic_t here.
* signal/signal.h: Use the new bits headers; no need to handle
__need_sig_atomic_t nor __need_sigset_t. Don't use __sigmask
to define sigmask.
* include/signal.h: No need to handle __need_sig_atomic_t
nor __need_sigset_t. Don't define __sigemptyset.
* io/sys/poll.h, setjmp/setjmp.h
* sysdeps/arm/sys/ucontext.h, sysdeps/generic/sys/ucontext.h
* sysdeps/i386/sys/ucontext.h, sysdeps/m68k/sys/ucontext.h
* sysdeps/mach/hurd/i386/bits/sigcontext.h
* sysdeps/mips/sys/ucontext.h, sysdeps/powerpc/novmxsetjmp.h
* sysdeps/pthread/bits/sigthread.h
* sysdeps/unix/sysv/linux/hppa/sys/ucontext.h
* sysdeps/unix/sysv/linux/m68k/sys/ucontext.h
* sysdeps/unix/sysv/linux/mips/sys/ucontext.h
* sysdeps/unix/sysv/linux/nios2/sys/ucontext.h
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h
* sysdeps/unix/sysv/linux/s390/sys/ucontext.h
* sysdeps/unix/sysv/linux/sh/sys/ucontext.h
* sysdeps/unix/sysv/linux/sparc/sys/ucontext.h
* sysdeps/unix/sysv/linux/tile/sys/ucontext.h
* sysdeps/unix/sysv/linux/x86/sys/ucontext.h:
Use bits/types/__sigset_t.h.
* misc/sys/select.h, posix/spawn.h
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h
* sysdeps/unix/sysv/linux/sys/epoll.h
* sysdeps/unix/sysv/linux/sys/signalfd.h:
Use bits/types/sigset_t.h.
* resolv/netdb.h, rt/mqueue.h: Use bits/types/sigevent_t.h.
* rt/aio.h: Use bits/types/sigevent_t.h and bits/sigevent-consts.h.
* socket/sys/socket.h: Don't include bits/sigset.h.
* login/utmp_file.c, shadow/lckpwdf.c, signal/sigandset.c
* signal/sigisempty.c, stdlib/abort.c, sysdeps/posix/profil.c
* sysdeps/posix/sigignore.c, sysdeps/posix/sigintr.c
* sysdeps/posix/signal.c, sysdeps/posix/sigset.c
* sysdeps/posix/sprofil.c, sysdeps/posix/sysv_signal.c
* sysdeps/unix/sysv/linux/nptl-signals.h:
Include sigsetops.h.
* signal/sigaddset.c, signal/sigandset.c, signal/sigdelset.c
* signal/sigorset.c, stdlib/abort.c, sysdeps/posix/sigignore.c
* sysdeps/posix/signal.c, sysdeps/posix/sigset.c:
__sigaddset, __sigandset, __sigdelset, __sigemptyset, __sigorset
now return no value.
* signal/sigaddset.c, signal/sigdelset.c, signal/sigismem.c
Include <errno.h>, <signal.h>, and <sigsetops.h> instead of
"sigsetops.h".
* signal/sigsetops.c: Explicitly define __sigismember,
__sigaddset, and __sigdelset as compatibility symbols.
* signal/Versions: Correct commentary on __sigpause,
__sigaddset, __sigdelset, __sigismember.
* inet/rcmd.c: Include sigsetops.h. Convert old code using
__sigblock/__sigsetmask to use __sigprocmask and friends.
bits/sigstack.h contains four things: the legacy struct sigstack type,
the preferred stack_t type, the SS_* enum values and macros for signal
stack sizes.
These vary in different ways between glibc configurations; in
particular, the stack sizes vary much more than any of the other
pieces. Furthermore, these pieces have different standard namespace
rules for when they should be visible (not currently visible in
conform/ results both because the relevant tests are XFAILed for
sys/ucontext.h namespace issues, and because some of the expectations
are incorrect in the same way as the headers, e.g. neither
expectations nor headers reflect that current POSIX no longer has
either the sigstack function or the sigstack structure).
To reduce duplication of identical definitions, and facilitate
namespace fixes without requiring the same feature test macro
conditions to be repeated in many versions of the same header, this
patch splits bits/sigstack.h up into four headers. It keeps the stack
size macros, while new bits/types/struct_sigstack.h,
bits/types/stack_t.h and bits/ss_flags.h are added for the other
pieces. bits/types/struct_sigstack.h is the same everywhere,
bits/types/stack_t.h has three variants different in the order of the
structure elements (generic = MIPS Linux, and other Linux), and
bits/ss_flags.h has generic and Linux variants.
This patch includes the new headers everywhere that included
<bits/sigstack.h>, so should cause no difference to what any public
header defines. Subsequent namespace fixes would then remove or
condition some of those includes.
There should be no conflicts with Zack's changes to signal.h types,
beyond the trivial conflict of both making additions to
signal/Makefile's headers list; the two patches affect disjoint sets
of types and other definitions.
Tested for x86_64 and x86, and with build-many-glibcs.py.
* bits/ss_flags.h: New file.
* bits/types/stack_t.h: Likewise.
* include/bits/types/struct_sigstack.h: Likewise.
* signal/bits/types/struct_sigstack.h: Likewise.
* sysdeps/unix/sysv/linux/bits/ss_flags.h: Likewise.
* sysdeps/unix/sysv/linux/bits/types/stack_t.h: Likewise.
* sysdeps/unix/sysv/linux/mips/bits/types/stack_t.h: Likewise.
* signal/Makefile (headers): Add bits/types/struct_sigstack.h,
bits/types/stack_t.h and bits/ss_flags.h.
* signal/signal.h [__USE_XOPEN_EXTENDED || __USE_XOPEN2K8]:
Include <bits/types/struct_sigstack.h>, <bits/types/stack_t.h> and
<bits/ss_flags.h>.
* bits/sigstack.h (struct sigstack): Remove.
(stack_t): Likewise.
(SS_ONSTACK): Likewise.
(SS_DISABLE): Likewise.
* sysdeps/unix/sysv/linux/aarch64/bits/sigstack.h
(struct sigstack): Likewise.
(stack_t): Likewise.
(SS_ONSTACK): Likewise.
(SS_DISABLE): Likewise.
* sysdeps/unix/sysv/linux/alpha/bits/sigstack.h (struct sigstack):
Likewise.
(stack_t): Likewise.
(SS_ONSTACK): Likewise.
(SS_DISABLE): Likewise.
* sysdeps/unix/sysv/linux/bits/sigstack.h (struct sigstack):
Likewise.
(stack_t): Likewise.
(SS_ONSTACK): Likewise.
(SS_DISABLE): Likewise.
* sysdeps/unix/sysv/linux/mips/bits/sigstack.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/bits/sigstack.h (struct sigstack):
Likewise.
(stack_t): Likewise.
(SS_ONSTACK): Likewise.
(SS_DISABLE): Likewise.
* sysdeps/unix/sysv/linux/powerpc/bits/sigstack.h
(struct sigstack): Likewise.
(stack_t): Likewise.
(SS_ONSTACK): Likewise.
(SS_DISABLE): Likewise.
* sysdeps/unix/sysv/linux/sparc/bits/sigstack.h (struct sigstack):
Likewise.
(stack_t): Likewise.
(SS_ONSTACK): Likewise.
(SS_DISABLE): Likewise.
* sysdeps/arm/sys/ucontext.h: Include
<bits/types/struct_sigstack.h>, <bits/types/stack_t.h> and
<bits/ss_flags.h>.
* sysdeps/generic/sys/ucontext.h: Likewise.
* sysdeps/i386/sys/ucontext.h: Likewise.
* sysdeps/m68k/sys/ucontext.h: Likewise.
* sysdeps/mips/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/aarch64/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/alpha/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/arm/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/hppa/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/bits/sigcontext.h: Likewise.
* sysdeps/unix/sysv/linux/ia64/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/m68k/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/mips/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/nios2/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/s390/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/sh/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/tile/sys/ucontext.h: Likewise.
* sysdeps/unix/sysv/linux/x86/sys/ucontext.h: Likewise.
SSE2 memchr computes "edx + ecx - 16" where ecx is less than 16. Use
"edx - (16 - ecx)", instead of satured math, to avoid possible addition
overflow. This replaces
add %ecx, %edx
sbb %eax, %eax
or %eax, %edx
sub $16, %edx
with
neg %ecx
add $16, %ecx
sub %ecx, %edx
It is the same for x86_64, except for rcx/rdx, instead of ecx/edx.
* sysdeps/i386/i686/multiarch/memchr-sse2.S (MEMCHR): Use
"edx + ecx - 16" to avoid possible addition overflow.
* sysdeps/x86_64/memchr.S (memchr): Likewise.
dl_platform and dl_hwcap are set from AT_PLATFORM and AT_HWCAP very
early during startup. They are used by dynamic linker to determine
platform and build an array of hardware capability names, which are
added to search path when loading shared object. dl_platform and
dl_hwcap are unused on x86-64. On i386, i386, i486, i586 and i686
platforms were supported and only SSE2 capability was used.
On x86, usage of AT_PLATFORM and AT_HWCAP to determine platform and
processor capabilities is obsolete since all information is available
in dl_x86_cpu_features. This patch sets dl_platform and dl_hwcap from
dl_x86_cpu_features in dynamic linker. On i386, the available plaforms
are changed to i586 and i686 since i386 has been deprecated. On x86-64,
the available plaforms are haswell, which is for Haswell class processors
with BMI1, BMI2, LZCNT, MOVBE, POPCNT, AVX2 and FMA, and xeon_phi, which
is for Xeon Phi class processors with AVX512F, AVX512CD, AVX512ER and
AVX512PF. A capability, avx512_1, is also added to x86-64 for AVX512
ISAs: AVX512F, AVX512CD, AVX512BW, AVX512DQ and AVX512VL.
[BZ #21391]
* sysdeps/i386/dl-machine.h (dl_platform_init) [IS_IN (rtld)]:
Only call init_cpu_features.
[!IS_IN (rtld)]: Only set GLRO(dl_platform) to NULL if needed.
* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
* sysdeps/i386/dl-procinfo.h: Removed.
* sysdeps/unix/sysv/linux/i386/dl-procinfo.h: Don't include
<sysdeps/i386/dl-procinfo.h> nor <ldsodefs.h>. Include
<sysdeps/x86/dl-procinfo.h>.
(_dl_procinfo): Replace _DL_HWCAP_COUNT with 32.
* sysdeps/unix/sysv/linux/x86_64/dl-procinfo.h [!IS_IN (ldconfig)]:
Include <sysdeps/x86/dl-procinfo.h> instead of
<sysdeps/generic/dl-procinfo.h>.
* sysdeps/x86/cpu-features.c: Include <dl-hwcap.h>.
(init_cpu_features): Set dl_platform, dl_hwcap and dl_hwcap_mask.
* sysdeps/x86/cpu-features.h (bit_cpu_LZCNT): New.
(bit_cpu_MOVBE): Likewise.
(bit_cpu_BMI1): Likewise.
(bit_cpu_BMI2): Likewise.
(index_cpu_BMI1): Likewise.
(index_cpu_BMI2): Likewise.
(index_cpu_LZCNT): Likewise.
(index_cpu_MOVBE): Likewise.
(index_cpu_POPCNT): Likewise.
(reg_BMI1): Likewise.
(reg_BMI2): Likewise.
(reg_LZCNT): Likewise.
(reg_MOVBE): Likewise.
(reg_POPCNT): Likewise.
* sysdeps/x86/dl-hwcap.h: New file.
* sysdeps/x86/dl-procinfo.h: Likewise.
* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): New.
(_dl_x86_platforms): Likewise.
Add sysdeps/x86/dl-procinfo.c for x86 version of processor capability
information to reduce duplication between i386 and x86_64 dl-procinfo.c.
* sysdeps/i386/dl-procinfo.c: Include
<sysdeps/x86/dl-procinfo.c>.
* sysdeps/x86_64/dl-procinfo.c: Likewise.
* sysdeps/x86/dl-procinfo.c: New file.
As noted in [1], divdi3 object is only exported in a handful ABIs
(i386, m68k, powerpc32, s390-32, and ia64), however it is built
for all current architectures regardless.
This patch refact the make rules for this object to so only the
aforementioned architectures that actually require it builds it.
Also, to avoid internal PLT calls to the exported symbol from the
module, glibc uses an internal header (symbol-hacks.h) which is
unrequired (and in fact breaks the build for architectures that
intend to get symbol definitions from libgcc.a). The patch also
changes it to create its own header (divdi3-symbol-hacks.h) and
adjust the architectures that require it accordingly.
I checked the build/check (with run-built-tests=no) on the
following architectures (which I think must cover all supported
ABI/builds) using GCC 6.3:
aarch64-linux-gnu
alpha-linux-gnu
arm-linux-gnueabihf
hppa-linux-gnu
ia64-linux-gnu
m68k-linux-gnu
microblaze-linux-gnu
mips64-n32-linux-gnu
mips-linux-gnu
mips64-linux-gnu
nios2-linux-gnu
powerpc-linux-gnu
powerpc-linux-gnu-power4
powerpc64-linux-gnu
powerpc64le-linux-gnu
s390x-linux-gnu
s390-linux-gnu
sh4-linux-gnu
sh4-linux-gnu-soft
sparc64-linux-gnu
sparcv9-linux-gnu
tilegx-linux-gnu
tilegx-linux-gnu-32
tilepro-linux-gnu
x86_64-linux-gnu
x86_64-linux-gnu-x32
i686-linux-gnu
I only saw one regression on sparcv9-linux-gnu (extra PLT call to
.udiv) which I address in next patch in the set. It also correctly
build SH with GCC 7.0.1 (without any regression from c89721e25d).
[1] https://sourceware.org/ml/libc-alpha/2017-03/msg00243.html
* sysdeps/i386/symbol-hacks.h: New file.
* sysdeps/m68k/symbol-hacks.h: New file.
* sysdeps/powerpc/powerpc32/symbol-hacks.h: New file.
* sysdeps/s390/s390-32/symbol-hacks.h: New file.
* sysdeps/unix/sysv/linux/i386/Makefile
[$(subdir) = csu] (sysdep_routines): New rule: divdi3 object.
[$(subdir) = csu] (sysdep-only-routines): Likewise.
[$(subdir) = csu] (CFLAGS-divdi3.c): Likewise.
* sysdeps/unix/sysv/linux/m68k/Makefile
[$(subdir) = csu] (sysdep_routines): Likewise.
[$(subdir) = csu] (sysdep-only-routines): Likewise.
[$(subdir) = csu] (CFLAGS-divdi3.c): Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/Makefile
[$(subdir) = csu] (sysdep_routines): Likewise.
[$(subdir) = csu] (sysdep-only-routines): Likewise.
[$(subdir) = csu] (CFLAGS-divdi3.c): Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/Makefile
[$(subdir) = csu] (sysdep_routines): Likewise.
[$(subdir) = csu] (sysdep-only-routines): Likewise.
[$(subdir) = csu] (CFLAGS-divdi3.c): Likewise.
* sysdeps/wordsize-32/Makefile: Remove file.
* sysdeps/wordsize-32/symbol-hacks.h: Definitions move to ...
* sysdeps/wordsize-32/divdi3-symbol-hacks.h: ... here.
This patch removes CALL_THREAD_FCT macro usage and its defition for
x86. For 32 bits it usage is only for force 16 stack alignment,
however stack is already explicit aligned in clone syscall. For
64 bits and x32 it just a function call and there is no need to
code it with inline assembly.
Checked on i686-linux-gnu, x86_64-linux-gnu, and x86_64-linux-gnu-x32.
* nptl/pthread_create.c (START_THREAD_DEFN): Remove
CALL_THREAD_FCT macro usage.
* sysdeps/i386/nptl/tls.h (CALL_THREAD_FCT): Remove definition.
* sysdeps/x86_64/nptl/tls.h (CALL_THREAD_FCT): Likewise.
* sysdeps/x86_64/32/nptl/tls.h: Remove file.
This patch fixes the regression added by 23d2770 for final address
overflow calculation. The subtraction of the considered size (16)
at line 120 is at wrong place, for sizes less than 16 subsequent
overflow check will not take in consideration an invalid size (since
the subtraction will be negative). Also, the lea instruction also
does not raise the carry flag (CF) that is used in subsequent jbe
to check for overflow.
The fix is to follow x86_64 logic from 3daef2c where the overflow
is first check and a sub instruction is issued. In case of resulting
negative size, CF will be set by the sub instruction and a NULL
result will be returned. The patch also add similar tests reported
in bug report.
Checked on i686-linux-gnu and x86_64-linux-gnu.
* string/test-memchr.c (do_test): Add BZ#21182 checks for address
near end of a page.
* sysdeps/i386/i686/multiarch/memchr-sse2.S (__memchr): Fix
overflow calculation.
posix/wordexp-test.c used libc-internal.h for PTR_ALIGN_DOWN; similar
to what was done with libc-diag.h, I have split the definitions of
cast_to_integer, ALIGN_UP, ALIGN_DOWN, PTR_ALIGN_UP, and PTR_ALIGN_DOWN
to a new header, libc-pointer-arith.h.
It then occurred to me that the remaining declarations in libc-internal.h
are mostly to do with early initialization, and probably most of the
files including it, even in the core code, don't need it anymore. Indeed,
only 19 files actually need what remains of libc-internal.h. 23 others
need libc-diag.h instead, and 12 need libc-pointer-arith.h instead.
No file needs more than one of them, and 16 don't need any of them!
So, with this patch, libc-internal.h stops including libc-diag.h as
well as losing the pointer arithmetic macros, and all including files
are adjusted.
* include/libc-pointer-arith.h: New file. Define
cast_to_integer, ALIGN_UP, ALIGN_DOWN, PTR_ALIGN_UP, and
PTR_ALIGN_DOWN here.
* include/libc-internal.h: Definitions of above macros
moved from here. Don't include libc-diag.h anymore either.
* posix/wordexp-test.c: Include stdint.h and libc-pointer-arith.h.
Don't include libc-internal.h.
* debug/pcprofile.c, elf/dl-tunables.c, elf/soinit.c, io/openat.c
* io/openat64.c, misc/ptrace.c, nptl/pthread_clock_gettime.c
* nptl/pthread_clock_settime.c, nptl/pthread_cond_common.c
* string/strcoll_l.c, sysdeps/nacl/brk.c
* sysdeps/unix/clock_settime.c
* sysdeps/unix/sysv/linux/i386/get_clockfreq.c
* sysdeps/unix/sysv/linux/ia64/get_clockfreq.c
* sysdeps/unix/sysv/linux/powerpc/get_clockfreq.c
* sysdeps/unix/sysv/linux/sparc/sparc64/get_clockfreq.c:
Don't include libc-internal.h.
* elf/get-dynamic-info.h, iconv/loop.c
* iconvdata/iso-2022-cn-ext.c, locale/weight.h, locale/weightwc.h
* misc/reboot.c, nis/nis_table.c, nptl_db/thread_dbP.h
* nscd/connections.c, resolv/res_send.c, soft-fp/fmadf4.c
* soft-fp/fmasf4.c, soft-fp/fmatf4.c, stdio-common/vfscanf.c
* sysdeps/ieee754/dbl-64/e_lgamma_r.c
* sysdeps/ieee754/dbl-64/k_rem_pio2.c
* sysdeps/ieee754/flt-32/e_lgammaf_r.c
* sysdeps/ieee754/flt-32/k_rem_pio2f.c
* sysdeps/ieee754/ldbl-128/k_tanl.c
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c
* sysdeps/ieee754/ldbl-96/k_tanl.c, sysdeps/nptl/futex-internal.h:
Include libc-diag.h instead of libc-internal.h.
* elf/dl-load.c, elf/dl-reloc.c, locale/programs/locarchive.c
* nptl/nptl-init.c, string/strcspn.c, string/strspn.c
* malloc/malloc.c, sysdeps/i386/nptl/tls.h
* sysdeps/nacl/dl-map-segments.h, sysdeps/x86_64/atomic-machine.h
* sysdeps/unix/sysv/linux/spawni.c
* sysdeps/x86_64/nptl/tls.h:
Include libc-pointer-arith.h instead of libc-internal.h.
* elf/get-dynamic-info.h, sysdeps/nacl/dl-map-segments.h
* sysdeps/x86_64/atomic-machine.h:
Add multiple include guard.
This patch moves tests of catan and catanh with finite inputs (other
than the divide-by-zero cases producing an exact infinity) to using
the auto-libm-test machinery. Each of auto-libm-test-out-catan and
auto-libm-test-out-catanh takes about three seconds to generate on my
system (so in fact it wasn't necessary after all to defer the move to
auto-libm-test-* until the output files were split up by function).
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add tests of catan and catanh.
* math/auto-libm-test-out-catan: New generated file.
* math/auto-libm-test-out-catanh: Likewise.
* math/libm-test-catan.inc (catan_test_data): Use AUTO_TESTS_c_c.
Move tests with finite inputs, except divide-by-zero cases, to
auto-libm-test-in.
* math/libm-test-catanh.inc (catanh_test_data): Likewise.
* math/Makefile (libm-test-funcs-auto): Add catan and catanh.
(libm-test-funcs-noauto): Remove catan and catanh.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch moves tests of casin and casinh with finite inputs to using
the auto-libm-test machinery. Each of auto-libm-test-out-casin and
auto-libm-test-out-casinh takes about 38 minutes to generate on my
system because of MPC slowness on special cases that appear in the
tests (with MPC 1.0.3; I don't know to what extent current MPC master
might speed it up).
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add tests of casin and casinh.
* math/auto-libm-test-out-casin: New generated file.
* math/auto-libm-test-out-casinh: Likewise.
* math/libm-test-casin.inc (casin_test_data): Use AUTO_TESTS_c_c.
Move tests with finite inputs to auto-libm-test-in.
* math/libm-test-casinh.inc (casinh_test_data): Likewise.
* math/Makefile (libm-test-funcs-auto): Add casin and casinh.
(libm-test-funcs-noauto): Remove casin and casinh.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch moves tests of cacos and cacosh with finite inputs to using
the auto-libm-test machinery. Each of auto-libm-test-out-cacos and
auto-libm-test-out-cacosh takes about 80 minutes to generate on my
system because of MPC slowness on special cases that appear in the
tests (with MPC 1.0.3; I don't know to what extent current MPC master
might speed it up).
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add tests of cacos and cacosh.
* math/auto-libm-test-out-cacos: New generated file.
* math/auto-libm-test-out-cacosh: Likewise.
* math/libm-test-cacos.inc (cacos_test_data): Use AUTO_TESTS_c_c.
Move tests with finite inputs to auto-libm-test-in.
* math/libm-test-cacosh.inc (cacosh_test_data): Likewise.
* math/Makefile (libm-test-funcs-auto): Add cacos and cacosh.
(libm-test-funcs-noauto): Remove cacos and cacosh.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch removes the COLORING_INCREMENT define and usage on allocatestack.c.
It has not been used since 564cd8b67e (glibc-2.3.3) by any architecture.
The idea is to simplify the code by removing obsolete code.
* nptl/allocatestack.c [COLORING_INCREMENT] (nptl_ncreated): Remove.
(allocate_stack): Remove COLORING_INCREMENT usage.
* nptl/stack-aliasing.h (COLORING_INCREMENT). Likewise.
* sysdeps/i386/i686/stack-aliasing.h (COLORING_INCREMENT): Likewise.
Based on comments on previous attempt to address BZ#16640 [1],
the idea is not support invalid use of strtok (the original
bug report proposal). This leader to a new strtok optimized
strtok implementation [2].
The idea of this patch is to fix BZ#16640 to align all the
implementations to a same contract. However, with newer strtok
code it is better to get remove the old assembly ones instead of
fix them.
For x86 is a gain in all cases since the new implementation can
potentially use sse2/sse42 implementation for strspn and strcspn.
This shows a better performance on both i686 and x86_64 using
the string benchtests.
On powerpc64 the gains are mixed, where only for larger inputs
or keys some gains are showns (based on benchtest it seems that
it shows some gains for keys larger than 10 and inputs larger
than 32). I would prefer to remove the optimized implementation
based on first code simplicity and second because some more gain
could be optimized using a better optimized strcspn/strspn
code (as for x86). However if powerpc arch maintainers prefer I
can send a v2 with the assembly code adjusted instead.
Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu.
[BZ #16640]
* sysdeps/i386/i686/strtok.S: Remove file.
* sysdeps/i386/i686/strtok_r.S: Likewise.
* sysdeps/i386/strtok.S: Likewise.
* sysdeps/i386/strtok_r.S: Likewise.
* sysdeps/powerpc/powerpc64/strtok.S: Likewise.
* sysdeps/powerpc/powerpc64/strtok_r.S: Likewise.
* sysdeps/x86_64/strtok.S: Likewise.
* sysdeps/x86_64/strtok_r.S: Likewise.
[1] https://sourceware.org/ml/libc-alpha/2016-10/msg00411.html
[2] https://sourceware.org/ml/libc-alpha/2016-12/msg00461.html
IFUNC relocation against definition in unrelocated shared library
will lead to segfault when the IFUNC function is called. This
patch allows such IFUNC relocations with a warning. This isn't
a real fix for
https://sourceware.org/bugzilla/show_bug.cgi?id=21041
It simply allows the program to load. The program will segfault
when longjmp is called.
* sysdeps/i386/dl-machine.h (elf_machine_rel): Replace
_dl_fatal_printf with _dl_error_printf for IFUNC relocation
against unrelocated shared library.
* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
This commit moves one step towards the deprecation of wrappers that
use _LIB_VERSION / matherr / __kernel_standard functionality, by
adding the suffix '_compat' to their filenames and adjusting Makefiles
and #includes accordingly.
New template wrappers that do not use such functionality will be added
by future patches and will be first used by the float128 wrappers.
When testing changes to i386 libm functions (that are shadowed for
i686 builds by i686 versions) recently, I saw that the plain i386
libm-test-ulps (as opposed to the i686 multiarch version) needed
updating for tests that had been added since it was last updated.
This patch updates it accordingly.
* sysdeps/i386/fpu/libm-test-ulps: Update.
Similar to BZ#19387, BZ#21014, and BZ#20971, both x86 sse2 strncat
optimized assembly implementations do not handle the size overflow
correctly.
The x86_64 one is in fact an issue with strcpy-sse2-unaligned, but
that is triggered also with strncat optimized implementation.
This patch uses a similar strategy used on 3daef2c8ee, where
saturared math is used for overflow case.
Checked on x86_64-linux-gnu and i686-linux-gnu. It fixes BZ #19390.
[BZ #19390]
* string/test-strncat.c (test_main): Add tests with SIZE_MAX as
maximum string size.
* sysdeps/i386/i686/multiarch/strcat-sse2.S (STRCAT): Avoid overflow
in pointer addition.
* sysdeps/x86_64/multiarch/strcpy-sse2-unaligned.S (STRCPY):
Likewise.
Similar to BZ#19387 and BZ#20971, both i686 memchr optimized assembly
implementations (memchr-sse2-bsf and memchr-sse2) do not handle the
size overflow correctly.
It is shown by the new tests added by commit 3daef2c8ee, where
both implementation fails with size as SIZE_MAX.
This patch uses a similar strategy used on 3daef2c8ee, where
saturared math is used for overflow case.
Checked on i686-linux-gnu.
[BZ #21014]
* sysdeps/i386/i686/multiarch/memchr-sse2-bsf.S (MEMCHR): Avoid overflow
in pointer addition.
* sysdeps/i386/i686/multiarch/memchr-sse2.S (MEMCHR): Likewise.
Various fmax and fmin function implementations mishandle sNaN
arguments:
(a) When both arguments are NaNs, the return value should be a qNaN,
but sometimes it is an sNaN if at least one argument is an sNaN.
(b) Under TS 18661-1 semantics, if either argument is an sNaN then the
result should be a qNaN (whereas if one argument is a qNaN and the
other is not a NaN, the result should be the non-NaN argument).
Various implementations treat sNaNs like qNaNs here.
This patch fixes the x86 and x86_64 versions (ignoring float and
double for 32-bit x86 given the inability to reliably avoid the sNaN
turning into a qNaN before it gets to the called function). Tests of
sNaN inputs to these functions are added.
Note on architecture versions I haven't changed for this issue:
AArch64 already gets this right (it uses a hardware instruction with
the correct semantics for both quiet and signaling NaNs) and does not
need changes. It's possible Alpha, IA64, SPARC might need changes
(this would be shown by the testsuite if so).
Tested for x86_64 and x86 (both i686 and i586 builds, to cover the
different x86 implementations).
[BZ #20947]
* sysdeps/i386/fpu/s_fmaxl.S (__fmaxl): Add the arguments when
either is a signaling NaN.
* sysdeps/i386/fpu/s_fminl.S (__fminl): Likewise. Make code
follow fmaxl more closely.
* sysdeps/i386/i686/fpu/s_fmaxl.S (__fmaxl): Add the arguments
when either is a signaling NaN.
* sysdeps/i386/i686/fpu/s_fminl.S (__fminl): Likewise.
* sysdeps/x86_64/fpu/s_fmax.S (__fmax): Likewise.
* sysdeps/x86_64/fpu/s_fmaxf.S (__fmaxf): Likewise.
* sysdeps/x86_64/fpu/s_fmaxl.S (__fmaxl): Likewise.
* sysdeps/x86_64/fpu/s_fmin.S (__fmin): Likewise.
* sysdeps/x86_64/fpu/s_fminf.S (__fminf): Likewise.
* sysdeps/x86_64/fpu/s_fminl.S (__fminl): Likewise.
* math/libm-test.inc (fmax_test_data): Add tests of sNaN inputs.
(fmin_test_data): Likewise.
The x86_64/x86 powl implementations mishandle sNaN arguments, both by
returning sNaN in some cases (instead of doing arithmetic on the
arguments to produce the result when NaN arguments result in NaN
results) and by treating sNaN the same as qNaN for arguments (1, sNaN)
and (sNaN, 0), contrary to TS 18661-1 which requires those cases to
return qNaN instead of 1.
This patch makes the x86_64/x86 powl implementations follow TS 18661-1
semantics for sNaN arguments; sNaN tests are also added for pow.
Given the problems with testing float and double sNaN arguments on
32-bit x86 (sNaN tests disabled because the compiler may convert
unnecessarily to a qNaN when passing arguments), no changes are made
to the powf and pow implementations there.
Tested for x86_64 and x86.
[BZ #20916]
* sysdeps/i386/fpu/e_powl.S (__ieee754_powl): Do not return 1 for
arguments (sNaN, 0) or (1, sNaN). Do arithmetic on NaN arguments
to compute result.
* sysdeps/x86_64/fpu/e_powl.S (__ieee754_powl): Likewise.
* math/libm-test.inc (pow_test_data): Add tests of sNaN arguments.
This patch remove the PID cache and usage in current GLIBC code. Current
usage is mainly used a performance optimization to avoid the syscall,
however it adds some issues:
- The exposed clone syscall will try to set pid/tid to make the new
thread somewhat compatible with current GLIBC assumptions. This cause
a set of issue with new workloads and usecases (such as BZ#17214 and
[1]) as well for new internal usage of clone to optimize other algorithms
(such as clone plus CLONE_VM for posix_spawn, BZ#19957).
- The caching complexity also added some bugs in the past [2] [3] and
requires more effort of each port to handle such requirements (for
both clone and vfork implementation).
- Caching performance gain in mainly on getpid and some specific
code paths. The getpid performance leverage is questionable [4],
either by the idea of getpid being a hotspot as for the getpid
implementation itself (if it is indeed a justifiable hotspot a
vDSO symbol could let to a much more simpler solution).
Other usage is mainly for non usual code paths, such as pthread
cancellation signal and handling.
For thread creation (on stack allocation) the code simplification in fact
adds some performance gain due the no need of transverse the stack cache
and invalidate each element pid.
Other thread usages will require a direct getpid syscall, such as
cancellation/setxid signal, thread cancellation, thread fail path (at
create_thread), and thread signal (pthread_kill and pthread_sigqueue).
However these are hardly usual hotspots and I think adding a syscall is
justifiable.
It also simplifies both the clone and vfork arch-specific implementation.
And by review each fork implementation there are some discrepancies that
this patch also solves:
- microblaze clone/vfork does not set/reset the pid/tid field
- hppa uses the default vfork implementation that fallback to fork.
Since vfork is deprecated I do not think we should bother with it.
The patch also removes the TID caching in clone. My understanding for
such semantic is try provide some pthread usage after a user program
issue clone directly (as done by thread creation with CLONE_PARENT_SETTID
and pthread tid member). However, as stated before in multiple discussions
threads, GLIBC provides clone syscalls without further supporting all this
semantics.
I ran a full make check on x86_64, x32, i686, armhf, aarch64, and powerpc64le.
For sparc32, sparc64, and mips I ran the basic fork and vfork tests from
posix/ folder (on a qemu system). So it would require further testing
on alpha, hppa, ia64, m68k, nios2, s390, sh, and tile (I excluded microblaze
because it is already implementing the patch semantic regarding clone/vfork).
[1] https://codereview.chromium.org/800183004/
[2] https://sourceware.org/ml/libc-alpha/2006-07/msg00123.html
[3] https://sourceware.org/bugzilla/show_bug.cgi?id=15368
[4] http://yarchive.net/comp/linux/getpid_caching.html
* sysdeps/nptl/fork.c (__libc_fork): Remove pid cache setting.
* nptl/allocatestack.c (allocate_stack): Likewise.
(__reclaim_stacks): Likewise.
(setxid_signal_thread): Obtain pid through syscall.
* nptl/nptl-init.c (sigcancel_handler): Likewise.
(sighandle_setxid): Likewise.
* nptl/pthread_cancel.c (pthread_cancel): Likewise.
* sysdeps/unix/sysv/linux/pthread_kill.c (__pthread_kill): Likewise.
* sysdeps/unix/sysv/linux/pthread_sigqueue.c (pthread_sigqueue):
Likewise.
* sysdeps/unix/sysv/linux/createthread.c (create_thread): Likewise.
* sysdeps/unix/sysv/linux/getpid.c: Remove file.
* nptl/descr.h (struct pthread): Change comment about pid value.
* nptl/pthread_getattr_np.c (pthread_getattr_np): Remove thread
pid assert.
* sysdeps/unix/sysv/linux/pthread-pids.h (__pthread_initialize_pids):
Do not set pid value.
* nptl_db/td_ta_thr_iter.c (iterate_thread_list): Remove thread
pid cache check.
* nptl_db/td_thr_validate.c (td_thr_validate): Likewise.
* sysdeps/aarch64/nptl/tcb-offsets.sym: Remove pid offset.
* sysdeps/alpha/nptl/tcb-offsets.sym: Likewise.
* sysdeps/arm/nptl/tcb-offsets.sym: Likewise.
* sysdeps/hppa/nptl/tcb-offsets.sym: Likewise.
* sysdeps/i386/nptl/tcb-offsets.sym: Likewise.
* sysdeps/ia64/nptl/tcb-offsets.sym: Likewise.
* sysdeps/m68k/nptl/tcb-offsets.sym: Likewise.
* sysdeps/microblaze/nptl/tcb-offsets.sym: Likewise.
* sysdeps/mips/nptl/tcb-offsets.sym: Likewise.
* sysdeps/nios2/nptl/tcb-offsets.sym: Likewise.
* sysdeps/powerpc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/s390/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sh/nptl/tcb-offsets.sym: Likewise.
* sysdeps/sparc/nptl/tcb-offsets.sym: Likewise.
* sysdeps/tile/nptl/tcb-offsets.sym: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym: Likewise.
* sysdeps/unix/sysv/linux/aarch64/clone.S: Remove pid and tid caching.
* sysdeps/unix/sysv/linux/alpha/clone.S: Likewise.
* sysdeps/unix/sysv/linux/arm/clone.S: Likewise.
* sysdeps/unix/sysv/linux/hppa/clone.S: Likewise.
* sysdeps/unix/sysv/linux/i386/clone.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/clone2.S: Likewise.
* sysdeps/unix/sysv/linux/mips/clone.S: Likewise.
* sysdeps/unix/sysv/linux/nios2/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sh/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/clone.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/tile/clone.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/clone.S: Likewise.
* sysdeps/unix/sysv/linux/aarch64/vfork.S: Remove pid set and reset.
* sysdeps/unix/sysv/linux/alpha/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/arm/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/i386/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/ia64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/m68k/clone.S: Likewise.
* sysdeps/unix/sysv/linux/m68k/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/mips/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/nios2/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sh/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/tile/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/x86_64/vfork.S: Likewise.
* sysdeps/unix/sysv/linux/tst-clone2.c (f): Remove direct pthread
struct access.
(clone_test): Remove function.
(do_test): Rewrite to take in consideration pid is not cached anymore.
manual/libm-err-tab.pl hardcodes a list of names for particular
platforms (mapping from sysdeps directory name to friendly name for
the manual). This goes against the principle of keeping information
about individual platforms in their corresponding sysdeps directory,
and the list is also very out-of-date regarding supported platforms
and their corresponding sysdeps directories.
This patch fixes this by adding a libm-test-ulps-name file alongside
each libm-test-ulps file. The script then gets the friendly name from
that file, which is required to exist, so it no longer needs to allow
for the mapping being missing.
Tested for x86_64.
[BZ #14139]
* manual/libm-err-tab.pl (%pplatforms): Initialize to empty.
(find_files): Obtain platform name from libm-test-ulps-name and
store in %pplatforms.
(canonicalize_platform): Remove.
(print_platforms): Use $pplatforms directly.
(by_platforms): Do not allow for platforms missing from
%pplatforms.
* sysdeps/aarch64/libm-test-ulps-name: New file.
* sysdeps/alpha/fpu/libm-test-ulps-name: Likewise.
* sysdeps/arm/libm-test-ulps-name: Likewise.
* sysdeps/generic/libm-test-ulps-name: Likewise.
* sysdeps/hppa/fpu/libm-test-ulps-name: Likewise.
* sysdeps/i386/fpu/libm-test-ulps-name: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps-name: Likewise.
* sysdeps/ia64/fpu/libm-test-ulps-name: Likewise.
* sysdeps/m68k/coldfire/fpu/libm-test-ulps-name: Likewise.
* sysdeps/m68k/m680x0/fpu/libm-test-ulps-name: Likewise.
* sysdeps/microblaze/libm-test-ulps-name: Likewise.
* sysdeps/mips/mips32/libm-test-ulps-name: Likewise.
* sysdeps/mips/mips64/libm-test-ulps-name: Likewise.
* sysdeps/nios2/libm-test-ulps-name: Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps-name: Likewise.
* sysdeps/powerpc/nofpu/libm-test-ulps-name: Likewise.
* sysdeps/s390/fpu/libm-test-ulps-name: Likewise.
* sysdeps/sh/libm-test-ulps-name: Likewise.
* sysdeps/sparc/fpu/libm-test-ulps-name: Likewise.
* sysdeps/tile/libm-test-ulps-name: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps-name: Likewise.
Calling an IFUNC function defined in unrelocated shared library may
lead to segfault. This patch issues an error message to request
relinking the shared library if it references IFUNC function defined
in the unrelocated shared library.
[BZ #20019]
* sysdeps/i386/dl-machine.h (elf_machine_rel): Check IFUNC
definition in unrelocated shared library.
* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
sys/ucontext.h unconditionally uses stack_t, and it does not make
sense to change that. But signal.h only declares stack_t under
__USE_XOPEN_EXTENDED || __USE_XOPEN2K8. The actual definition is
already in a bits header, bits/sigstack.h, but that header insists on
only being included by signal.h, so we have to change that as well as
all of the sys/ucontext.h variants. (Some but not all variants of
bits/sigcontext.h, which sys/ucontext.h may also need, had already
received this adjustment; for consistency, I made them all the same,
even if that's not strictly necessary in some configurations.)
bits/sigcontext.h and bits/sigstack.h also all need to receive
multiple inclusion guards.
* sysdeps/generic/sys/ucontext.h
* sysdeps/arm/sys/ucontext.h
* sysdeps/i386/sys/ucontext.h
* sysdeps/m68k/sys/ucontext.h
* sysdeps/mips/sys/ucontext.h
* sysdeps/unix/sysv/linux/aarch64/sys/ucontext.h
* sysdeps/unix/sysv/linux/alpha/sys/ucontext.h
* sysdeps/unix/sysv/linux/arm/sys/ucontext.h
* sysdeps/unix/sysv/linux/hppa/sys/ucontext.h
* sysdeps/unix/sysv/linux/ia64/sys/ucontext.h
* sysdeps/unix/sysv/linux/m68k/sys/ucontext.h
* sysdeps/unix/sysv/linux/mips/sys/ucontext.h
* sysdeps/unix/sysv/linux/nios2/sys/ucontext.h
* sysdeps/unix/sysv/linux/powerpc/sys/ucontext.h
* sysdeps/unix/sysv/linux/s390/sys/ucontext.h
* sysdeps/unix/sysv/linux/sh/sys/ucontext.h
* sysdeps/unix/sysv/linux/sparc/sys/ucontext.h
* sysdeps/unix/sysv/linux/tile/sys/ucontext.h
* sysdeps/unix/sysv/linux/x86/sys/ucontext.h:
Include both bits/sigcontext.h and bits/sigstack.h.
Fix grammar error in comment, if present.
* bits/sigstack.h
* sysdeps/unix/sysv/linux/aarch64/bits/sigstack.h
* sysdeps/unix/sysv/linux/alpha/bits/sigstack.h
* sysdeps/unix/sysv/linux/bits/sigstack.h
* sysdeps/unix/sysv/linux/ia64/bits/sigstack.h
* sysdeps/unix/sysv/linux/mips/bits/sigstack.h
* sysdeps/unix/sysv/linux/powerpc/bits/sigstack.h
* sysdeps/unix/sysv/linux/sparc/bits/sigstack.h
* bits/sigcontext.h
* sysdeps/mach/hurd/i386/bits/sigcontext.h
* sysdeps/unix/sysv/linux/bits/sigcontext.h
* sysdeps/unix/sysv/linux/ia64/bits/sigcontext.h
* sysdeps/unix/sysv/linux/sparc/bits/sigcontext.h:
Add multiple inclusion guard. Permit inclusion by sys/ucontext.h
as well as signal.h, if this was not already allowed. Request
definition of size_t if necessary. Minimize semantically-null
differences across files.
This was used by --enable-omitfp, and the bulk of it was removed in this
commit:
commit bdeba1354b
Author: Ulrich Drepper <drepper@gmail.com>
Date: Sat Jan 7 11:29:31 2012 -0500
Remove --enable-omitfp support
TS 18661-1 defines a type femode_t to represent the set of dynamic
floating-point control modes (such as the rounding mode and trap
enablement modes), and functions fegetmode and fesetmode to manipulate
those modes (without affecting other state such as the raised
exception flags) and a corresponding macro FE_DFL_MODE.
This patch series implements those interfaces for glibc. This first
patch adds the architecture-independent pieces, the x86 and x86_64
implementations, and the <bits/fenv.h> and ABI baseline updates for
all architectures so glibc keeps building and passing the ABI tests on
all architectures. Subsequent patches add the fegetmode and fesetmode
implementations for other architectures.
femode_t is generally an integer type - the same type as fenv_t, or as
the single element of fenv_t where fenv_t is a structure containing a
single integer (or the single relevant element, where it has elements
for both status and control registers) - except where architecture
properties or consistency with the fenv_t implementation indicate
otherwise. FE_DFL_MODE follows FE_DFL_ENV in whether it's a magic
pointer value (-1 cast to const femode_t *), a value that can be
distinguished from valid pointers by its high bits but otherwise
contains a representation of the desired register contents, or a
pointer to a constant variable (the powerpc case; __fe_dfl_mode is
added as an exported constant object, an alias to __fe_dfl_env).
Note that where architectures (that share a register between control
and status bits) gain definitions of new floating-point control or
status bits in future, the implementations of fesetmode for those
architectures may need updating (depending on whether the new bits are
control or status bits and what the implementation does with
previously unknown bits), just like existing implementations of
<fenv.h> functions that take care not to touch reserved bits may need
updating when the set of reserved bits changes. (As any new bits are
outside the scope of ISO C, that's just a quality-of-implementation
issue for supporting them, not a conformance issue.)
As with fenv_t, femode_t should properly include any software DFP
rounding mode (and for both fenv_t and femode_t I'd consider that
fragment of DFP support appropriate for inclusion in glibc even in the
absence of the rest of libdfp; hardware DFP rounding modes should
already be included if the definitions of which bits are status /
control bits are correct).
Tested for x86_64, x86, mips64 (hard float, and soft float to test the
fallback version), arm (hard float) and powerpc (hard float, soft
float and e500). Other architecture versions are untested.
* math/fegetmode.c: New file.
* math/fesetmode.c: Likewise.
* sysdeps/i386/fpu/fegetmode.c: Likewise.
* sysdeps/i386/fpu/fesetmode.c: Likewise.
* sysdeps/x86_64/fpu/fegetmode.c: Likewise.
* sysdeps/x86_64/fpu/fesetmode.c: Likewise.
* math/fenv.h: Update comment on inclusion of <bits/fenv.h>.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (fegetmode): New function
declaration.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (fesetmode): Likewise.
* bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)] (femode_t): New
typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/aarch64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/alpha/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/arm/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/hppa/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/ia64/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/m68k/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/microblaze/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/mips/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/nios2/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/powerpc/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (__fe_dfl_mode): New variable
declaration.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/s390/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/sh/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/sparc/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/tile/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* sysdeps/x86/fpu/bits/fenv.h [__GLIBC_USE (IEC_60559_BFP_EXT)]
(femode_t): New typedef.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (FE_DFL_MODE): New macro.
* manual/arith.texi (FE_DFL_MODE): Document macro.
(fegetmode): Document function.
(fesetmode): Likewise.
* math/Versions (fegetmode): New libm symbol at version
GLIBC_2.25.
(fesetmode): Likewise.
* math/Makefile (libm-support): Add fegetmode and fesetmode.
(tests): Add test-femode and test-femode-traps.
* math/test-femode-traps.c: New file.
* math/test-femode.c: Likewise.
* sysdeps/powerpc/fpu/fenv_const.c (__fe_dfl_mode): Declare as
alias for __fe_dfl_env.
* sysdeps/powerpc/nofpu/fenv_const.c (__fe_dfl_mode): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/fenv_const.c
(__fe_dfl_mode): Likewise.
* sysdeps/powerpc/Versions (__fe_dfl_mode): New libm symbol at
version GLIBC_2.25.
* sysdeps/nacl/libm.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
This is only used for the float and double variants.
Instead, just add it to the type specific list of files,
and remove all stubs, and remove the declaration from
math_private.h.
I verified x86_64, i486, ia64, m68k, and ppc64 build.
TS 18661-1 defines an fesetexcept function for setting floating-point
exception flags without the side-effect of causing enabled traps to be
taken.
This patch series implements this function for glibc. The present
patch adds the fallback stub implementation, x86 and x86_64
implementations, documentation, tests and ABI baseline updates. The
remaining patches, some of them untested, add implementations for
other architectures. The implementations generally follow those of
the fesetexceptflag function.
As for fesetexceptflag, the approach taken for architectures where
setting flags causes enabled traps to be taken is to set the flags
(and potentially cause traps) rather than refusing to set the flags
and returning an error. Since ISO C and TS 18661 provide no way to
enable traps, this is formally in accordance with the standards.
The NEWS entry should be considered a placeholder, since this patch
series is intended to be followed by further such series adding other
TS 18661-1 features, so that the NEWS entry would end up looking more
like
* New <fenv.h> features from TS 18661-1:2014 are added to libm: the
fesetexcept, fetestexceptflag, fegetmode and fesetmode functions,
the femode_t type and the FE_DFL_MODE macro.
with hopefully more such entries for other features, rather than
having an entry for a single function in the end.
I believe we have consensus for adding TS 18661-1 interfaces as per
<https://sourceware.org/ml/libc-alpha/2016-06/msg00421.html>.
Tested for x86_64, x86, mips64 (hard float, and soft float to test the
fallback version), arm (hard float) and powerpc (hard float, soft
float and e500).
* math/fesetexcept.c: New file.
* sysdeps/i386/fpu/fesetexcept.c: Likewise.
* sysdeps/x86_64/fpu/fesetexcept.c: Likewise.
* math/fenv.h: Define
__GLIBC_INTERNAL_STARTING_HEADER_IMPLEMENTATION and include
<bits/libc-header-start.h> instead of including <features.h>.
[__GLIBC_USE (IEC_60559_BFP_EXT)] (fesetexcept): New function
declaration.
* manual/arith.texi (fesetexcept): Document function.
* math/Versions (fesetexcept): New libm symbol at version
GLIBC_2.25.
* math/Makefile (libm-support): Add fesetexcept.
(tests): Add test-fesetexcept and test-fesetexcept-traps.
* math/test-fesetexcept.c: New file.
* math/test-fesetexcept-traps.c: Likewise.
* sysdeps/nacl/libm.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/arm/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/hppa/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/coldfire/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/m68k/m680x0/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/microblaze/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/nios2/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sh/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx32/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilegx/tilegx64/libm.abilist:
Likewise.
* sysdeps/unix/sysv/linux/tile/tilepro/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
Compile i386 rtld-*.os with -mno-sse -mno-mmx -mfpmath=387 so that no
code in ld.so uses mm/xmm/ymm/zmm registers on i386 since the first 3
mm/xmm/ymm/zmm registers are used to pass vector parameters which must
be preserved.
* sysdeps/i386/Makefile (rtld-CFLAGS): New.
[subdir == elf] (CFLAGS-.os): Replace -mno-sse -mno-mmx
-mfpmath=387 with $(rtld-CFLAGS).
[subdir != elf] (CFLAGS-.os): Compile rtld-*.os with
$(rtld-CFLAGS).
As discussed in
<https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS
18661-1 disallows ceil, floor, round and trunc functions from raising
the "inexact" exception, in accordance with general IEEE 754 semantics
for when that exception is raised. Fixing this for x87 floating point
is more complicated than for the other versions of these functions,
because they use the frndint instruction that raises "inexact" and
this can only be avoided by saving and restoring the whole
floating-point environment.
As I noted in
<https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have
now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7,
such that GCC will inline these functions on x86, without caring about
"inexact", when the default -ffp-int-builtin-inexact is in effect.
This allows users to get optimized code depending on the options they
pass to the compiler, while making the out-of-line functions follow TS
18661-1 semantics and avoid "inexact".
This patch duly fixes the out-of-line trunc function implementations
to avoid "inexact", in the same way as the nearbyint implementations.
I do not know how the performance of implementations such as these
based on saving the environment and changing the rounding mode
temporarily compares to that of the C versions or SSE 4.1 versions (of
course, for 32-bit x86 SSE implementations still need to get the
return value in an x87 register); it's entirely possible other
implementations could be faster in some cases.
Tested for x86_64 and x86.
[BZ #15479]
* sysdeps/i386/fpu/s_trunc.S (__trunc): Save and restore
floating-point environment rather than just control word.
* sysdeps/i386/fpu/s_truncf.S (__truncf): Likewise.
* sysdeps/i386/fpu/s_truncl.S (__truncl): Save and restore
floating-point environment, with "invalid" exceptions merged in,
rather than just control word.
* sysdeps/x86_64/fpu/s_truncl.S (__truncl): Likewise.
* math/libm-test.inc (trunc_test_data): Do not allow spurious
"inexact" exceptions.
As discussed in
<https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS
18661-1 disallows ceil, floor, round and trunc functions from raising
the "inexact" exception, in accordance with general IEEE 754 semantics
for when that exception is raised. Fixing this for x87 floating point
is more complicated than for the other versions of these functions,
because they use the frndint instruction that raises "inexact" and
this can only be avoided by saving and restoring the whole
floating-point environment.
As I noted in
<https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have
now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7,
such that GCC will inline these functions on x86, without caring about
"inexact", when the default -ffp-int-builtin-inexact is in effect.
This allows users to get optimized code depending on the options they
pass to the compiler, while making the out-of-line functions follow TS
18661-1 semantics and avoid "inexact".
This patch duly fixes the out-of-line floor function implementations
to avoid "inexact", in the same way as the nearbyint implementations.
I do not know how the performance of implementations such as these
based on saving the environment and changing the rounding mode
temporarily compares to that of the C versions or SSE 4.1 versions (of
course, for 32-bit x86 SSE implementations still need to get the
return value in an x87 register); it's entirely possible other
implementations could be faster in some cases.
Tested for x86_64 and x86.
[BZ #15479]
* sysdeps/i386/fpu/s_floor.S (__floor): Save and restore
floating-point environment rather than just control word.
* sysdeps/i386/fpu/s_floorf.S (__floorf): Likewise.
* sysdeps/i386/fpu/s_floorl.S (__floorl): Save and restore
floating-point environment, with "invalid" exceptions merged in,
rather than just control word.
* sysdeps/x86_64/fpu/s_floorl.S (__floorl): Likewise.
* math/libm-test.inc (floor_test_data): Do not allow spurious
"inexact" exceptions.
As discussed in
<https://sourceware.org/ml/libc-alpha/2016-05/msg00577.html>, TS
18661-1 disallows ceil, floor, round and trunc functions from raising
the "inexact" exception, in accordance with general IEEE 754 semantics
for when that exception is raised. Fixing this for x87 floating point
is more complicated than for the other versions of these functions,
because they use the frndint instruction that raises "inexact" and
this can only be avoided by saving and restoring the whole
floating-point environment.
As I noted in
<https://sourceware.org/ml/libc-alpha/2016-06/msg00128.html>, I have
now implemented a GCC option -fno-fp-int-builtin-inexact for GCC 7,
such that GCC will inline these functions on x86, without caring about
"inexact", when the default -ffp-int-builtin-inexact is in effect.
This allows users to get optimized code depending on the options they
pass to the compiler, while making the out-of-line functions follow TS
18661-1 semantics and avoid "inexact".
This patch duly fixes the out-of-line ceil function implementations to
avoid "inexact", in the same way as the nearbyint implementations.
I do not know how the performance of implementations such as these
based on saving the environment and changing the rounding mode
temporarily compares to that of the C versions or SSE 4.1 versions (of
course, for 32-bit x86 SSE implementations still need to get the
return value in an x87 register); it's entirely possible other
implementations could be faster in some cases.
Tested for x86_64 and x86.
[BZ #15479]
* sysdeps/i386/fpu/s_ceil.S (__ceil): Save and restore
floating-point environment rather than just control word.
* sysdeps/i386/fpu/s_ceilf.S (__ceilf): Likewise.
* sysdeps/i386/fpu/s_ceill.S (__ceill): Save and restore
floating-point environment, with "invalid" exceptions merged in,
rather than just control word.
* sysdeps/x86_64/fpu/s_ceill.S (__ceill): Likewise.
* math/libm-test.inc (ceil_test_data): Do not allow spurious
"inexact" exceptions.
The x86_64 and i386 versions of scalbl return sNaN for some cases of
sNaN input and are missing "invalid" exceptions for other cases. This
results from overly complicated code that either returns a NaN input,
or discards both inputs when one is NaN and loads a NaN from memory.
This patch fixes this by simplifying the code to add the arguments
when either one is NaN.
Tested for x86_64 and x86.
[BZ #20296]
* sysdeps/i386/fpu/e_scalbl.S (__ieee754_scalbl): Add arguments
when either argument is a NaN.
* sysdeps/x86_64/fpu/e_scalbl.S (__ieee754_scalbl): Likewise.
* math/libm-test.inc (scalb_test_data): Add sNaN tests.
The i386 implementations of nearbyint functions, and x86_64
nearbyintl, contain code to mask the "inexact" exception. However,
the fnstenv instruction has the effect of masking all exceptions, so
this masking code has been redundant since fnstenv was added to those
implementations (by commit 846d9a4a3acdb4939ca7bf6aed48f9f6f26911be;
commit 71d1b0166b added the test
math/test-nearbyint-except-2.c that verifies these functions do work
when called with "inexact" traps enabled); this patch removes the
redundant code.
Tested for x86_64 and x86.
* sysdeps/i386/fpu/s_nearbyint.S (__nearbyint): Do not mask
"inexact" exceptions after fnstenv.
* sysdeps/i386/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sysdeps/i386/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
* sysdeps/x86_64/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
fdim suffers from double rounding on i386 because subtracting two
double values can produce an inexact long double value exactly half
way between two double values. This patch fixes this by creating an
i386-specific version of fdim - C, based on the generic version,
unlike the previous .S version - which sets the x87 precision control
to double precision for the subtraction and then restores it
afterwards. As noted in the comment added, there are no issues of
double rounding for subnormals (a case that setting precision control
does not address) because subtraction cannot produce an inexact result
in the subnormal range.
Tested for x86_64 and x86.
[BZ #20255]
* sysdeps/i386/fpu/s_fdim.c: New file. Based on math/s_fdim.c.
* math/libm-test.inc (fdim_test_data): Add another test.
Some architectures have their own versions of fdim functions, which
are missing errno setting (bug 6796) and may also return sNaN instead
of qNaN for sNaN input, in the case of the x86 / x86_64 long double
versions (bug 20256).
These versions are not actually doing anything that a compiler
couldn't generate, just straightforward comparisons / arithmetic (and,
in the x86 / x86_64 case, testing for NaNs with fxam, which isn't
actually needed once you use an unordered comparison and let the NaNs
pass through the same subtraction as non-NaN inputs). This patch
removes the x86 / x86_64 / powerpc versions, so that those
architectures use the generic C versions, which correctly handle
setting errno and deal properly with sNaN inputs. This seems better
than dealing with setting errno in lots of .S versions.
The i386 versions also return results with excess range and precision,
which is not appropriate for a function exactly defined by reference
to IEEE operations. For errno setting to work correctly on overflow,
it's necessary to remove excess range with math_narrow_eval, which
this patch duly does in the float and double versions so that the
tests can reliably pass on x86. For float, this avoids any double
rounding issues as the long double precision is more than twice that
of float. For double, double rounding issues will need to be
addressed separately, so this patch does not fully fix bug 20255.
Tested for x86_64, x86 and powerpc.
[BZ #6796]
[BZ #20255]
[BZ #20256]
* math/s_fdim.c: Include <math_private.h>.
(__fdim): Use math_narrow_eval on result.
* math/s_fdimf.c: Include <math_private.h>.
(__fdimf): Use math_narrow_eval on result.
* sysdeps/i386/fpu/s_fdim.S: Remove file.
* sysdeps/i386/fpu/s_fdimf.S: Likewise.
* sysdeps/i386/fpu/s_fdiml.S: Likewise.
* sysdeps/i386/i686/fpu/s_fdim.S: Likewise.
* sysdeps/i386/i686/fpu/s_fdimf.S: Likewise.
* sysdeps/i386/i686/fpu/s_fdiml.S: Likewise.
* sysdeps/powerpc/fpu/s_fdim.c: Likewise.
* sysdeps/powerpc/fpu/s_fdimf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_fdim.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_fdim.c: Likewise.
* sysdeps/x86_64/fpu/s_fdiml.S: Likewise.
* math/libm-test.inc (fdim_test_data): Expect errno setting on
overflow. Add sNaN tests.
Various implementations of frexp functions return sNaN for sNaN
input. This patch fixes them to add such arguments to themselves so
that qNaN is returned.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #20250]
* sysdeps/i386/fpu/s_frexpl.S (__frexpl): Add non-finite input to
itself.
* sysdeps/ieee754/dbl-64/s_frexp.c (__frexp): Add non-finite or
zero input to itself.
* sysdeps/ieee754/dbl-64/wordsize-64/s_frexp.c (__frexp):
Likewise.
* sysdeps/ieee754/flt-32/s_frexpf.c (__frexpf): Likewise.
* sysdeps/ieee754/ldbl-128/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-96/s_frexpl.c (__frexpl): Likewise.
* math/libm-test.inc (frexp_test_data): Add sNaN tests.
The i386/x86_64 versions of log2l return sNaN for sNaN input. This
patch fixes them to add NaN inputs to themselves so that qNaN is
returned in this case.
Tested for x86_64 and x86.
[BZ #20235]
* sysdeps/i386/fpu/e_log2l.S (__ieee754_log2l): Add NaN input to
itself.
* sysdeps/x86_64/fpu/e_log2l.S (__ieee754_log2l): Likewise.
* math/libm-test.inc (log2_test_data): Add sNaN tests.
The i386/x86_64 versions of log1pl return sNaN for sNaN input. This
patch fixes them to add a NaN input to itself so that qNaN is returned
in this case.
Tested for x86_64 and x86.
[BZ #20229]
* sysdeps/i386/fpu/s_log1pl.S (__log1pl): Add NaN input to itself.
* sysdeps/x86_64/fpu/s_log1pl.S (__log1pl): Likewise.
* math/libm-test.inc (log1p_test_data): Add sNaN tests.
The i386/x86_64 versions of log10l return sNaN for sNaN input. This
patch fixes them to add a NaN input to itself so that qNaN is returned
in this case.
Tested for x86_64 and x86.
[BZ #20228]
* sysdeps/i386/fpu/e_log10l.S (__ieee754_log10l): Add NaN input to
itself.
* sysdeps/x86_64/fpu/e_log10l.S (__ieee754_log10l): Likewise.
* math/libm-test.inc (log10_test_data): Add sNaN tests.
The i386/x86_64 versions of logl return sNaN for sNaN input. This
patch fixes them to add a NaN input to itself so that qNaN is returned
in this case.
Tested for x86_64 and x86 (including a build for i586 to cover the
non-i686 logl version).
[BZ #20227]
* sysdeps/i386/fpu/e_logl.S (__ieee754_logl): Add NaN input to
itself.
* sysdeps/i386/i686/fpu/e_logl.S (__ieee754_logl): Likewise.
* sysdeps/x86_64/fpu/e_logl.S (__ieee754_logl): Likewise.
* math/libm-test.inc (log_test_data): Add sNaN tests.
The i386 and x86_64 implementations of expl, exp10l and expm1l (code
shared between the functions) return sNaN for sNaN input. This patch
fixes them to add NaN inputs to themselves so that qNaN is returned in
this case.
Tested for x86_64 and x86.
[BZ #20226]
* sysdeps/i386/fpu/e_expl.S (IEEE754_EXPL): Add NaN argument to
itself.
* sysdeps/x86_64/fpu/e_expl.S (IEEE754_EXPL): Likewise.
* math/libm-test.inc (exp_test_data): Add sNaN tests.
(exp10_test_data): Likewise.
(expm1_test_data): Likewise.
The i386 version of cbrtl returns sNaN (without raising any
exceptions) for sNaN input. This patch fixes it to add non-finite
arguments to themselves (the code path in question is also reached for
zero arguments, for which adding them to themselves is also harmless),
so that "invalid" is raised and qNaN returned.
Tested for x86_64 and x86.
[BZ #20224]
* sysdeps/i386/fpu/s_cbrtl.S (__cbrtl): Add non-finite or zero
argument to itself.
* math/libm-test.inc (cbrt_test_data): Add sNaN tests.
The i386 version of atanhl returns sNaN for sNaN input. This patch
fixes it to add NaN arguments to themselves so it returns qNaN in this
case.
Tested for x86_64 and x86.
[BZ #20219]
* sysdeps/i386/fpu/e_atanhl.S (__ieee754_atanhl): Add NaN argument
to itself.
* math/libm-test.inc (atanh_test_data): Add sNaN tests.
The i386 version of asinhl returns sNaN (without raising any
exceptions) for sNaN input. This patch fixes it to add non-finite
arguments to themselves, so that "invalid" is raised and qNaN
returned.
Tested for x86_64 and x86.
[BZ #20218]
* sysdeps/i386/fpu/s_asinhl.S (__asinhl): Add non-finite argument
to itself.
* math/libm-test.inc (asinh_test_data): Add sNaN tests.
The x86 / x86_64 implementation of nextafterl (also used for
nexttowardl) produces incorrect results (NaNs) when negative
subnormals, the low 32 bits of whose mantissa are zero, are
incremented towards zero. This patch fixes this by disabling the
logic to decrement the exponent in that case.
Tested for x86_64 and x86.
[BZ #20205]
* sysdeps/i386/fpu/s_nextafterl.c (__nextafterl): Do not adjust
exponent when incrementing negative subnormal with low mantissa
word zero.
* math/libm-test.inc (nextafter_test_data) [TEST_COND_intel96]:
Add another test.
In static executable, since init_cpu_features is called early from
__libc_start_main, there is no need to call it again in dl_platform_init.
[BZ #20072]
* sysdeps/i386/dl-machine.h (dl_platform_init): Call
init_cpu_features only if SHARED is defined.
* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
Merge x86 ifunc-defines.sym with x86 cpu-features-offsets.sym. Remove
x86 ifunc-defines.sym and rtld-global-offsets.sym. No code changes on
i686 and x86-64.
* sysdeps/i386/i686/multiarch/Makefile (gen-as-const-headers):
Remove ifunc-defines.sym.
* sysdeps/x86_64/multiarch/Makefile (gen-as-const-headers):
Likewise.
* sysdeps/i386/i686/multiarch/ifunc-defines.sym: Removed.
* sysdeps/x86/rtld-global-offsets.sym: Likewise.
* sysdeps/x86_64/multiarch/ifunc-defines.sym: Likewise.
* sysdeps/x86/Makefile (gen-as-const-headers): Remove
rtld-global-offsets.sym.
* sysdeps/x86_64/multiarch/ifunc-defines.sym: Merged with ...
* sysdeps/x86/cpu-features-offsets.sym: This.
* sysdeps/x86/cpu-features.h: Include <cpu-features-offsets.h>
instead of <ifunc-defines.h> and <rtld-global-offsets.h>.
Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86. No code changes on x86
and x86_64.
* sysdeps/i386/cacheinfo.c: Include <sysdeps/x86/cacheinfo.c>
instead of <sysdeps/x86_64/cacheinfo.c>.
* sysdeps/x86_64/cacheinfo.c: Moved to ...
* sysdeps/x86/cacheinfo.c: Here.
This fixes errors when we inject sse options through CFLAGS and now
that we have -Werror turned on by default this warning turns into an
error on x86:
$ gcc -m32 -march=core2 -mtune=core2 -msse3 -mfpmath=sse -x c /dev/null -S -mno-sse -mno-mmx
/dev/null:1:0: warning: SSE instruction set disabled, using 387 arithmetics
Where as:
$ gcc -m32 -march=core2 -mtune=core2 -msse3 -mfpmath=sse -x c /dev/null -S -mno-sse -mno-mmx -mfpmath=387
Generates no warnings.
Improve strcspn performance using a much faster algorithm. It is kept simple
so it works well on most targets. It is generally at least 10 times faster
than the existing implementation on bench-strcspn on a few AArch64
implementations, and for some tests 100 times as fast (repeatedly calling
strchr on a small string is extremely slow...).
In fact the string/bits/string2.h inlines make no longer sense, as GCC
already uses strlen if reject is an empty string, strchrnul is 5 times as
fast as __strcspn_c1, while __strcspn_c2 and __strcspn_c3 are slower than
the strcspn main loop for large strings (though reject length 2-4 could be
special cased in the future to gain even more performance).
Tested on x86_64, i686, and aarch64.
* string/Version (libc): Add GLIBC_2.24.
* string/strcspn.c (strcspn): Rewrite function.
* string/bits/string2.h (strcspn): Use __builtin_strcspn.
(__strcspn_c1): Remove inline function.
(__strcspn_c2): Likewise.
(__strcspn_c3): Likewise.
* string/string-inline.c
[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c1): Add
compatibility symbol.
[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c2):
Likewise.
[SHLIB_COMPAT(libc, GLIBC_2_1_1, GLIBC_2_24)] (__strcspn_c3):
Likewise.
* sysdeps/i386/string-inlines.c: Include generic string-inlines.c.
Bug 19848 reports cases where powl on x86 / x86_64 has error
accumulation, for small integer exponents, larger than permitted by
glibc's accuracy goals, at least in some rounding modes. This patch
further restricts the exponent range for which the
small-integer-exponent logic is used to limit the possible error
accumulation.
Tested for x86_64 and x86 and ulps updated accordingly.
[BZ #19848]
* sysdeps/i386/fpu/e_powl.S (p3): Rename to p2 and change value
from 8 to 4.
(__ieee754_powl): Compare integer exponent against 4 not 8.
* sysdeps/x86_64/fpu/e_powl.S (p3): Rename to p2 and change value
from 8 to 4.
(__ieee754_powl): Compare integer exponent against 4 not 8.
* math/auto-libm-test-in: Add more tests of pow.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
When building on i686, x86_64, and arm, and with NDEBUG, or --with-cpu
there are various variables and functions which are unused based on
these settings.
This patch marks all such variables with __attribute__((unused)) to
avoid the compiler warnings when building with the aformentioned
options.
The i386 ULPs are actually the i686/multiarch ones. The i686/multiarch
float ULPs are more precise as the SSE2 version (when available) uses
double for the cosf and sinf functions.
On the other hand the higher precision of the x86 FPU improves the
precision for a few other math functions.
* sysdeps/i386/fpu/libm-test-ulps: Move to ....
* sysdeps/i386/i686/multiarch/fpu/libm-test-ulps: ...here.
* sysdeps/i386/fpu/libm-test-ulps: Regenerate.
Various math_private.h headers are guarded by "#ifndef
_MATH_PRIVATE_H", but never define the macro. Nothing else defines
the macro either (the generic math_private.h that they include defines
a different macro, _MATH_PRIVATE_H_), so those guards are ineffective.
With the recent inclusion of s_sin.c in s_sincos.c, this breaks the
build for MIPS, since the build of s_sincos.c ends up including
<math_private.h> twice and the MIPS version defines inline functions
such as libc_feholdexcept_mips, without a separate fenv_private.h
header with its own guards such as some architectures have.
This patch fixes all the problem headers to use architecture-specific
guard macro names, and to define those macros in the headers they
guard, just as some architectures already do.
Tested for x86 (testsuite, and that installed shared libraries are
unchanged by the patch), and for mips64 (that it fixes the build).
* sysdeps/arm/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!ARM_MATH_PRIVATE_H].
[!ARM_MATH_PRIVATE_H] (ARM_MATH_PRIVATE_H): Define macro.
* sysdeps/hppa/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!HPPA_MATH_PRIVATE_H].
[!HPPA_MATH_PRIVATE_H] (HPPA_MATH_PRIVATE_H): Define macro.
* sysdeps/i386/fpu/math_private.h [!_MATH_PRIVATE_H]: Change guard
to [!I386_MATH_PRIVATE_H].
[!I386_MATH_PRIVATE_H] (I386_MATH_PRIVATE_H): Define macro.
* sysdeps/m68k/m680x0/fpu/math_private.h [!_MATH_PRIVATE_H]:
Change guard to [!M68K_MATH_PRIVATE_H].
[!M68K_MATH_PRIVATE_H] (M68K_MATH_PRIVATE_H): Define macro.
* sysdeps/microblaze/math_private.h [!_MATH_PRIVATE_H]: Change
guard to [!MICROBLAZE_MATH_PRIVATE_H].
[!MICROBLAZE_MATH_PRIVATE_H] (MICROBLAZE_MATH_PRIVATE_H): Define
macro.
* sysdeps/mips/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!MIPS_MATH_PRIVATE_H].
[!MIPS_MATH_PRIVATE_H] (MIPS_MATH_PRIVATE_H): Define macro.
* sysdeps/nios2/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!NIO2_MATH_PRIVATE_H].
[!NIO2_MATH_PRIVATE_H] (NIO2_MATH_PRIVATE_H): Define macro.
* sysdeps/tile/math_private.h [!_MATH_PRIVATE_H]: Change guard to
[!TILE_MATH_PRIVATE_H].
[!TILE_MATH_PRIVATE_H] (TILE_MATH_PRIVATE_H): Define macro.
For the -ffinite-math-only versions of various x86_64 and x86 log*
functions, a zero result from log* (1) is returned with incorrect sign
in round-downward mode. This patch fixes this in a similar way to the
previous fixes for the non-*_finite versions of the functions.
Tested for x86_64 and x86 (including an i586 build), together with a
patch that will be applied separately to enable the main libm-test.inc
tests for the finite-math-only functions.
[BZ #19213]
* sysdeps/i386/fpu/e_log.S (__log_finite): Ensure +0 is always
returned for argument 1.
* sysdeps/i386/fpu/e_logf.S (__logf_finite): Likewise.
* sysdeps/i386/fpu/e_logl.S (__logl_finite): Likewise.
* sysdeps/i386/i686/fpu/e_logl.S (__logl_finite): Likewise.
* sysdeps/x86_64/fpu/e_log10l.S (__log10l_finite): Likewise.
* sysdeps/x86_64/fpu/e_log2l.S (__log2l_finite): Likewise.
* sysdeps/x86_64/fpu/e_logl.S (__logl_finite): Likewise.
nextafter and nexttoward fail to set errno on overflow and underflow.
This patch makes them do so in cases that should include all the cases
where such errno setting is required by glibc's goals for when to set
errno (but not all cases of underflow where the result is nonzero and
so glibc's goals do not require errno setting).
Tested for x86_64, x86, mips64 and powerpc.
[BZ #6799]
* math/s_nextafter.c: Include <errno.h>.
(__nextafter): Set errno on overflow and underflow.
* math/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/i386/fpu/s_nextafterl.c: Include <errno.h>.
(__nextafterl): Set errno on overflow and underflow.
* sysdeps/i386/fpu/s_nexttoward.c: Include <errno.h>.
(__nexttoward): Set errno on overflow and underflow.
* sysdeps/i386/fpu/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/ieee754/flt-32/s_nextafterf.c: Include <errno.h>.
(__nextafterf): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128/s_nextafterl.c: Include <errno.h>.
(__nextafterl): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c: Include <errno.h>.
(__nexttoward): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c: Include <errno.h>.
(__nextafterl): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c: Include <errno.h>.
(__nexttoward): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c: Include <errno.h>.
(__nexttoward): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c: Include <errno.h>.
(__nexttowardf): Set errno on overflow and underflow.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c: Include <errno.h>.
(__nldbl_nexttowardf): Set errno on overflow and underflow.
* sysdeps/m68k/m680x0/fpu/s_nextafterl.c: Include <errno.h>.
(__nextafterl): Set errno on overflow and underflow.
* math/libm-test.inc (nextafter_test_data): Do not allow errno
setting to be missing on overflow. Add more tests.
(nexttoward_test_data): Likewise.
There are configure tests for the cpuid.h header for x86 / x86_64.
GCC 4.3 and later install this header, so those tests are obsolete.
This patch removes them.
Tested for x86_64 and x86 (testsuite, and that installed shared
libraries are unchanged by the patch).
* sysdeps/i386/configure.ac (cpuid.h): Do not test for header.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (cpuid.h): Do not test for header.
* sysdeps/x86_64/configure: Regenerated.
fenv_t should include architecture-specific floating-point modes and
status flags. i386 and x86_64 fesetenv limit which bits they use from
the x87 status and control words, when using saved state, and limit
which parts of the state they set to fixed values, when using
FE_DFL_ENV / FE_NOMASK_ENV. The following should be included but are
excluded in at least some cases: status and masking for the "denormal
operand" exception (which isn't part of FE_ALL_EXCEPT); precision
control (explicitly mentioned in Annex F as something that counts as
part of the floating-point environment); MXCSR FZ and DAZ bits (for
FE_DFL_ENV and FE_NOMASK_ENV). This patch arranges for this extra
state to be handled by fesetenv (and thereby by feupdateenv, which
calls fesetenv).
(Note that glibc functions using floating point are not generally
expected to work correctly with non-default values of this state,
especially precision control, but it is still logically part of the
floating-point environment and should be handled as such by fesetenv.
Changes to the state relating to subnormals ought generally to work
with libm functions when the arguments aren't subnormal and neither
are the expected results; that's a consequence of functions avoiding
spurious internal underflows.)
A question arising from this is whether FE_NOMASK_ENV should or should
not mask the "denormal operand" exception. I decided it should mask
that exception. This is the status quo - previously that exception
could only be unmasked by direct manipulation of control registers
(possibly via <fpu_control.h>). In addition, it means that use of
FE_NOMASK_ENV leaves a floating-point environment the same as could be
obtained by fesetenv (FE_DFL_ENV); feenableexcept (FE_ALL_EXCEPT);,
rather than an environment in which an exception is unmasked that
could only be masked again by using fesetenv with FE_DFL_ENV (or a
previously saved environment) - this exception not being usable with
other <fenv.h> functions because it's outside FE_ALL_EXCEPT.
Tested for x86_64 and x86.
[BZ #16068]
* sysdeps/i386/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86_64/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86/fpu/test-fenv-sse-2.c: New file.
* sysdeps/x86/fpu/test-fenv-x87.c: Likewise.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-x87 and test-fenv-sse-2.
[$(subdir) = math] (CFLAGS-test-fenv-sse-2.c): New variable.
The i386 and x86_64 versions of fesetenv, when called with FE_DFL_ENV
or FE_NOMASK_ENV as argument, do not clear SSE exceptions raised in
MXCSR. These arguments should, like other fenv_t values, represent
the whole of the floating-point state, so such exceptions should be
cleared; this patch adds the required clearing. (Discovered while
working on bug 16068.)
Tested for x86_64 and x86.
[BZ #19181]
* sysdeps/i386/fpu/fesetenv.c (__fesetenv): Clear already-raised
SSE exceptions when argument is FE_DFL_ENV or FE_NOMASK_ENV.
* sysdeps/x86_64/fpu/fesetenv.c (__fesetenv): Likewise.
* math/test-fenv-clear-main.c: New file.
* math/test-fenv-clear.c: Likewise.
* math/Makefile (tests): Add test-fenv-clear.
* sysdeps/x86/fpu/test-fenv-clear-sse.c: New file.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-clear-sse.
[$(subdir) = math] (CFLAGS-test-fenv-clear-sse.c): New variable.
There are configure tests for the -mavx2 compiler option. AVX2
support was added in GCC 4.7, so these tests are now obsolete; this
patch removes them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).
* sysdeps/i386/configure.ac (libc_cv_cc_avx2): Remove configure
test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (libc_cv_cc_avx2): Remove configure
test.
* sysdeps/x86_64/configure: Regenerated.
* config.h.in (HAVE_AVX2_SUPPORT): Remove #undef.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memset-avx2 unconditionally instead of conditionally on
[$(config-cflags-avx2) = yes].
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list) [HAVE_AVX2_SUPPORT]: Make code
unconditional.
* sysdeps/x86_64/multiarch/memset.S [HAVE_AVX2_SUPPORT]: Likewise.
* sysdeps/x86_64/multiarch/memset_chk.S
[IS_IN (libc) && SHARED && HAVE_AVX2_SUPPORT]: Change conditional
to [IS_IN (libc) && SHARED].
The implementations of nearbyint functions using x87 floating point
(i386 all versions, x86_64 long double only) use the fclex
instruction, which clears any exceptions that were raised before the
function was called. These functions must not clear exceptions that
were raised before they were called.
This patch fixes these functions to save and restore the whole
floating-point environment (fnstenv / fldenv) as the way of avoiding
raising "inexact" (recall that there isn't an x87 instruction for
loading just the status word, so the whole environment has to be saved
and loaded instead - the code already saved and loaded the control
word, which is now obtained from the saved environment after this
patch, to disable traps on "inexact"). In the case of the long double
functions, any "invalid" exception from frndint (applied to a
signaling NaN) needs merging into the saved state; this issue doesn't
apply to the float and double functions because that exception would
have been raised when the argument is loaded, before the environment
is saved.
[BZ #15491]
* sysdeps/i386/fpu/s_nearbyint.S (__nearbyint): Save and restore
floating-point environment instead of clearing all exceptions.
* sysdeps/i386/fpu/s_nearbyintf.S (__nearbyintf): Likewise.
* sysdeps/i386/fpu/s_nearbyintl.S (__nearbyintl): Likewise,
merging in "invalid" exceptions from frndint.
* sysdeps/x86_64/fpu/s_nearbyintl.S (__nearbyintl): Likewise.
* math/test-nearbyint-except.c: New file.
* math/Makefile (tests): Add test-nearbyint-except.
Since x86 _dl_unmap and _dl_make_tlsdesc_dynamic are only used
internally in ld.so, they can be made hidden.
[BZ #19122]
* sysdeps/i386/dl-lookupcfg.h (_dl_unmap): Add attribute_hidden.
* sysdeps/i386/dl-tlsdesc.h (_dl_make_tlsdesc_dynamic):
Likewise.
* sysdeps/x86_64/dl-tlsdesc.h (_dl_make_tlsdesc_dynamic):
Likewise.
* sysdeps/x86_64/dl-lookupcfg.h (_dl_unmap): Likewise.
There is a configure test for assembler support for -mtune=i686. This
option was added in binutils 2.18 so the test is obsolete; this patch
removes it.
Tested for x86 (testsuite, and that installed shared libraries are
unchanged by the patch).
* sysdeps/i386/configure.ac (libc_cv_as_i686): Remove configure
test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/i386/i686/Makefile [$(config-asflags-i686) = yes]: Make
code unconditional.
GCC added support for -mno-vzeroupper in version 4.6. Thus the
configure tests for this support are obsolete, and this patch removes
them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).
* sysdeps/i386/configure.ac (libc_cv_cc_novzeroupper): Remove
configure test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/x86_64/configure.ac (libc_cv_cc_novzeroupper): Remove
configure test.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/Makefile [$(config-cflags-novzeroupper) = yes]:
Make code unconditional.
GCC added support for -msse4 in version 4.3. Thus the configure tests
for it are obsolete, and this patch removes them.
Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by this patch).
* sysdeps/i386/configure.ac (libc_cv_cc_sse4): Remove configure
test.
* sysdeps/i386/configure: Regenerated.
* sysdeps/i386/i686/multiarch/Makefile
[$(config-cflags-sse4) = yes]: Make code unconditional.
* sysdeps/i386/i686/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]:
Likewise.
* sysdeps/i386/i686/multiarch/strspn.S [HAVE_SSE4_SUPPORT]:
Likewise.
* sysdeps/x86_64/configure.ac (libc_cv_cc_sse4): Remove configure
test.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/multiarch/Makefile [$(config-cflags-sse4) = yes]:
Make code unconditional.
* sysdeps/x86_64/multiarch/strcspn.S [HAVE_SSE4_SUPPORT]:
Likewise.
* sysdeps/x86_64/multiarch/strspn.S [HAVE_SSE4_SUPPORT]: Likewise.
* config.h.in (HAVE_SSE4_SUPPORT): Remove #undef.
ISO C requires overflowing results from nexttoward to be the
appropriate infinity independent of the rounding mode, but some
implementations use a rounding-mode-dependent result (this is the same
issue as was fixed for nextafter in bug 16677). This patch fixes the
problem by making the nexttoward implementations discard the result
from the floating-point computation that forced an overflow exception
and then return the infinity previously computed with integer
arithmetic.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #19059]
* math/s_nexttowardf.c (__nexttowardf): Do not return value from
overflowing computation.
* sysdeps/i386/fpu/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/i386/fpu/s_nexttowardf.c (__nexttowardf): Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c (__nldbl_nexttowardf):
Likewise.
* math/libm-test.inc (nexttoward_test_data): Add more tests.
The i386 versions of acoshf and acosh raise a spurious "invalid"
exception for an argument that is a quiet NaN with the sign bit set.
The integer arithmetic to detect arguments < 1 also detects -NaN, and
then the computation 0 / 0 in that case raises the exception. This
patch fixes this by using (x - x) / (x - x) as the computation in that
case instead, which will always raise the exception for non-NaN
arguments reaching that code, but not for quiet NaN arguments.
Tested for x86_64 and x86.
[BZ #19032]
* sysdeps/i386/fpu/e_acosh.S (__ieee754_acosh): For arguments < 1,
compute result as (x - x) / (x - x) not as 0 / 0.
* sysdeps/i386/fpu/e_acoshf.S (__ieee754_acoshf): Likewise.
* math/libm-test.inc (acosh_test_data): Add another test of acosh.
For arguments with X^2 + Y^2 close to 1, clog and clog10 avoid large
errors from log(hypot) by computing X^2 + Y^2 - 1 in a way that avoids
cancellation error and then using log1p.
However, the thresholds for using that approach still result in log
being used on argument as large as sqrt(13/16) > 0.9, leading to
significant errors, in some cases above the 9ulp maximum allowed in
glibc libm. This patch arranges for the approach using log1p to be
used in any cases where |X|, |Y| < 1 and X^2 + Y^2 >= 0.5 (with the
existing allowance for cases where one of X and Y is very small),
adjusting the __x2y2m1 functions to work with the wider range of
inputs. This way, log only gets used on arguments below sqrt(1/2) (or
substantially above 1), where the error involved is much less.
Tested for x86_64, x86, mips64 and powerpc. For the ulps regeneration
I removed the existing clog and clog10 ulps before regenerating to
allow any reduced ulps to appear. Tests added include those found by
random test generation to produce large ulps either before or after
the patch, and some found by trying inputs close to the (0.75, 0.5)
threshold where the potential errors from using log are largest.
[BZ #19016]
* sysdeps/generic/math_private.h (__x2y2m1f): Update comment to
allow more cases with X^2 + Y^2 >= 0.5.
* sysdeps/ieee754/dbl-64/x2y2m1.c (__x2y2m1): Likewise. Add -1 as
normal element in sum instead of special-casing based on values of
arguments.
* sysdeps/ieee754/dbl-64/x2y2m1f.c (__x2y2m1f): Update comment.
* sysdeps/ieee754/ldbl-128/x2y2m1l.c (__x2y2m1l): Likewise. Add
-1 as normal element in sum instead of special-casing based on
values of arguments.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c (__x2y2m1l): Likewise.
* sysdeps/ieee754/ldbl-96/x2y2m1.c [FLT_EVAL_METHOD != 0]
(__x2y2m1): Update comment.
* sysdeps/ieee754/ldbl-96/x2y2m1l.c (__x2y2m1l): Likewise. Add -1
as normal element in sum instead of special-casing based on values
of arguments.
* math/s_clog.c (__clog): Handle more cases using log1p without
hypot.
* math/s_clog10.c (__clog10): Likewise.
* math/s_clog10f.c (__clog10f): Likewise.
* math/s_clog10l.c (__clog10l): Likewise.
* math/s_clogf.c (__clogf): Likewise.
* math/s_clogl.c (__clogl): Likewise.
* math/auto-libm-test-in: Add more tests of clog and clog10.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Similar to various other bugs in this area, pow functions can fail to
raise the underflow exception when the result is tiny and inexact but
one or more low bits of the intermediate result that is scaled down
(or, in the i386 case, converted from a wider evaluation format) are
zero. This patch forces the exception in a similar way to previous
fixes, thereby concluding the fixes for known bugs with missing
underflow exceptions currently filed in Bugzilla.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18825]
* sysdeps/i386/fpu/i386-math-asm.h (FLT_NARROW_EVAL_UFLOW_NONNAN):
New macro.
(DBL_NARROW_EVAL_UFLOW_NONNAN): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNAN): Likewise.
* sysdeps/i386/fpu/e_pow.S: Use DEFINE_DBL_MIN.
(__ieee754_pow): Use DBL_NARROW_EVAL_UFLOW_NONNAN instead of
DBL_NARROW_EVAL, reloading the PIC register as needed.
* sysdeps/i386/fpu/e_powf.S: Use DEFINE_FLT_MIN.
(__ieee754_powf): Use FLT_NARROW_EVAL_UFLOW_NONNAN instead of
FLT_NARROW_EVAL. Use separate return path for case when first
argument is NaN.
* sysdeps/i386/fpu/e_powl.S: Include <i386-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN, reloading the
PIC register.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Force
underflow for subnormal result.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Use
math_check_force_underflow_nonneg.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Use
math_check_force_underflow.
* sysdeps/x86_64/fpu/x86_64-math-asm.h
(LDBL_CHECK_FORCE_UFLOW_NONNAN): New macro.
* sysdeps/x86_64/fpu/e_powl.S: Include <x86_64-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN.
* math/auto-libm-test-in: Add more tests of pow.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, hypot functions can fail
to raise the underflow exception when the result is tiny and inexact
but one or more low bits of the intermediate result that is scaled
down (or, in the i386 case, converted from a wider evaluation format)
are zero. This patch forces the exception in a similar way to
previous fixes.
Note that this issue cannot arise for implementations of hypotf using
double (or wider) for intermediate evaluation (if hypotf should
underflow, that means the double square root is being computed of some
number of the form N*2^-298, for 0 < N < 2^46, which is exactly
represented as a double, and whatever the rounding mode such a square
root cannot have a mantissa with all zeroes after the initial 23
bits). Thus no changes are made to hypotf implementations in this
patch, only to hypot and hypotl.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18803]
* sysdeps/i386/fpu/e_hypot.S: Use DEFINE_DBL_MIN.
(MO): New macro.
(__ieee754_hypot) [PIC]: Load PIC register.
(__ieee754_hypot): Use DBL_NARROW_EVAL_UFLOW_NONNEG instead of
DBL_NARROW_EVAL.
* sysdeps/ieee754/dbl-64/e_hypot.c (__ieee754_hypot): Use
math_check_force_underflow_nonneg in case where result might be
tiny.
* sysdeps/ieee754/ldbl-128/e_hypotl.c (__ieee754_hypotl):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c (__ieee754_hypotl):
Likewise.
* sysdeps/ieee754/ldbl-96/e_hypotl.c (__ieee754_hypotl): Likewise.
* sysdeps/powerpc/fpu/e_hypot.c (__ieee754_hypot): Likewise.
* math/auto-libm-test-in: Add more tests of hypot.
* math/auto-libm-test-out: Regenerated.
sysdeps/i386/fpu/e_atanh.S, unlike all other functions in that
directory, loads the PIC register with its own code using
_GLOBAL_OFFSET_TABLE_, rather than with the LOAD_PIC_REG macro. I see
no good reason for the difference; this patch makes it use the common
macro.
Tested for x86.
* sysdeps/i386/fpu/e_atanh.S (__ieee754_atanh) [PIC]: Use
LOAD_PIC_REG.
This patch refactors code in sysdeps/i386/fpu that forces underflow
exceptions to use common macros for that purpose as far as possible.
(Although some of the macros end up used in only one place, I think
it's cleanest to define all these macros together so that all the code
forcing underflow uses such macros. Some more uses of such macros
will also be introduced when fixing remaining bugs about missing
underflow exceptions, and it would be possible to do further
refactoring of the macros in i386-math-asm.h to share more code by
using other macros internally. Places that test for underflow by
examining the representation of the argument with integer operations,
rather that using floating-point comparisons on the argument or
result, are unchanged by this patch.)
Most of this code uses a macro MO to abstract away the differences
between PIC and non-PIC memory references to constants. log1p
functions, however, hardcoded PIC conditionals for this. Because the
common macros rely on the use of MO, I changed the log1p functions to
use the normal style here, and, for consistency, also made that change
to log1pl which is otherwise unaffected by this patch.
Tested for x86.
* sysdeps/i386/fpu/i386-math-asm.h (DEFINE_LDBL_MIN): New macro.
(FLT_CHECK_FORCE_UFLOW): Likewise.
(DBL_CHECK_FORCE_UFLOW): Likewise.
(FLT_CHECK_FORCE_UFLOW_NARROW): Likewise.
(DBL_CHECK_FORCE_UFLOW_NARROW): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNEG_NAN): Likewise.
(FLT_CHECK_FORCE_UFLOW_NONNAN): Likewise.
(DBL_CHECK_FORCE_UFLOW_NONNAN): Likewise.
(FLT_CHECK_FORCE_UFLOW_NONNEG): Likewise.
(DBL_CHECK_FORCE_UFLOW_NONNEG): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNEG): Likewise.
* sysdeps/i386/fpu/e_asin.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_asin): Use DBL_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/e_asinf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_asinf): Use FLT_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/e_atan2.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_atan2): Use DBL_CHECK_FORCE_UFLOW_NARROW.
* sysdeps/i386/fpu/e_atan2f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_atan2f): Use FLT_CHECK_FORCE_UFLOW_NARROW.
* sysdeps/i386/fpu/e_atanh.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_atanh): Use DBL_CHECK_FORCE_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_atanhf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_atanhf): Use FLT_CHECK_FORCE_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_exp2l.S: Include <i386-math-asm.h>.
(ldbl_min): Replace with use of DEFINE_LDBL_MIN.
(__ieee754_exp2l): Use LDBL_CHECK_FORCE_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_expl.S: Include <i386-math-asm.h>.
[!USE_AS_EXPM1L] (cmin): Replace with use of DEFINE_LDBL_MIN.
(IEEE754_EXPL): Use LDBL_CHECK_FORCE_UFLOW_NONNEG.
* sysdeps/i386/fpu/s_atan.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__atan): Use DBL_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/s_atanf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__atanf): Use FLT_CHECK_FORCE_UFLOW.
* sysdeps/i386/fpu/s_expm1.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__expm1): Use DBL_CHECK_FORCE_UFLOW. Move underflow check after
main computation.
* sysdeps/i386/fpu/s_expm1f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__expm1f): Use FLT_CHECK_FORCE_UFLOW. Move underflow check after
main computation.
* sysdeps/i386/fpu/s_log1p.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(MO): New macro.
(__log1p): Use MO. Use DBL_CHECK_FORCE_UFLOW_NONNAN.
* sysdeps/i386/fpu/s_log1pf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(MO): New macro.
(__log1pf): Use MO. Use FLT_CHECK_FORCE_UFLOW_NONNAN.
* sysdeps/i386/fpu/s_log1pl.S (MO): New macro.
(__log1pl): Use MO.
Where glibc code needs to avoid excess range and precision in
floating-point arithmetic, code variously uses either asms or volatile
to force the results of that arithmetic to memory; mostly this is
conditional on FLT_EVAL_METHOD, but in the case of lrint / llrint
functions some use of volatile is unconditional (and is present
unnecessarily in versions for long double). This patch make such code
use the recently-added math_narrow_eval macro consistently, removing
the unnecessary uses of volatile in long double lrint / llrint
implementations completely.
Tested for x86_64, x86, mips64 and powerpc.
* math/s_nexttowardf.c (__nexttowardf): Use math_narrow_eval.
* stdlib/strtod_l.c: Include <math_private.h>.
(overflow_value): Use math_narrow_eval.
(underflow_value): Likewise.
* sysdeps/i386/fpu/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/i386/fpu/s_nexttowardf.c (__nexttowardf): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Likewise.
(__ieee754_gamma_r): Likewise.
* sysdeps/ieee754/dbl-64/gamma_productf.c (__gamma_productf):
Likewise.
* sysdeps/ieee754/dbl-64/k_rem_pio2.c (__kernel_rem_pio2):
Likewise.
* sysdeps/ieee754/dbl-64/lgamma_neg.c (__lgamma_neg): Likewise.
* sysdeps/ieee754/dbl-64/s_erf.c (__erfc): Likewise.
* sysdeps/ieee754/dbl-64/s_llrint.c (__llrint): Likewise.
* sysdeps/ieee754/dbl-64/s_lrint.c (__lrint): Likewise.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
(__ieee754_gammaf_r): Likewise.
* sysdeps/ieee754/flt-32/k_rem_pio2f.c (__kernel_rem_pio2f):
Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c (__lgamma_negf): Likewise.
* sysdeps/ieee754/flt-32/s_erff.c (__erfcf): Likewise.
* sysdeps/ieee754/flt-32/s_llrintf.c (__llrintf): Likewise.
* sysdeps/ieee754/flt-32/s_lrintf.c (__lrintf): Likewise.
* sysdeps/ieee754/ldbl-128/s_llrintl.c (__llrintl): Do not use
volatile.
* sysdeps/ieee754/ldbl-128/s_lrintl.c (__lrintl): Likewise.
* sysdeps/ieee754/ldbl-128/s_nexttoward.c (__nexttoward): Use
math_narrow_eval.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-96/gamma_product.c (__gamma_product):
Likewise.
* sysdeps/ieee754/ldbl-96/s_llrintl.c (__llrintl): Do not use
volatile.
* sysdeps/ieee754/ldbl-96/s_lrintl.c (__lrintl): Likewise.
* sysdeps/ieee754/ldbl-96/s_nexttoward.c (__nexttoward): Use
math_narrow_eval.
* sysdeps/ieee754/ldbl-96/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-opt/s_nexttowardfd.c (__nldbl_nexttowardf):
Likewise.
i386 exp, hypot and pow functions can return overflowing and
underflowing values with excess range and precision; ; Wilco
Dijkstra's patches to make isfinite etc. expand inline cause this
pre-existing issue to result in test failures.
This patch fixes those functions to avoid excess range and precision
in their return values. Appropriate macros are added for the repeated
code sequences; in future I'll add more such macros and refactor
existing code forcing underflow (with or without also eliminating
excess range and precision from the return value) to use such macros.
Tested for x86. If, after this patch, you still see x86 libm test
failures with excess range or precision, please file bugs in Bugzilla.
[BZ #18980]
* sysdeps/i386/fpu/i386-math-asm.h (DEFINE_FLT_MIN): New macro.
(DEFINE_DBL_MIN): Likewise.
(FLT_NARROW_EVAL_UFLOW_NONNEG_NAN): Likewise.
(DBL_NARROW_EVAL_UFLOW_NONNEG_NAN): Likewise.
(FLT_NARROW_EVAL_UFLOW_NONNEG): Likewise.
(DBL_NARROW_EVAL_UFLOW_NONNEG): Likewise.
* sysdeps/i386/fpu/e_exp.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_exp): Use DBL_NARROW_EVAL_UFLOW_NONNEG_NAN.
(__exp_finite): Use DBL_NARROW_EVAL_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_exp10.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_exp10): Use DBL_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_exp10f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_exp10f): Use FLT_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_exp2.S: Include <i386-math-asm.h>.
(dbl_min): Replace with use of DEFINE_DBL_MIN.
(__ieee754_exp2): Use DBL_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_exp2f.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_exp2f): Use FLT_NARROW_EVAL_UFLOW_NONNEG_NAN.
* sysdeps/i386/fpu/e_expf.S: Include <i386-math-asm.h>.
(flt_min): Replace with use of DEFINE_FLT_MIN.
(__ieee754_expf): Use FLT_NARROW_EVAL_UFLOW_NONNEG_NAN.
(__expf_finite): Use FLT_NARROW_EVAL_UFLOW_NONNEG.
* sysdeps/i386/fpu/e_hypot.S: Include <i386-math-asm.h>.
(__ieee754_hypot): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/e_hypotf.S: Include <i386-math-asm.h>.
(__ieee754_hypotf): Use FLT_NARROW_EVAL.
* sysdeps/i386/fpu/e_pow.S: Include <i386-math-asm.h>.
(__ieee754_pow): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/e_powf.S: Include <i386-math-asm.h>.
(__ieee754_powf): Use FLT_NARROW_EVAL.
* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S
(__ieee754_expf_sse2): Convert double-precision result to single
precision.
* sysdeps/i386/fpu/libm-test-ulps: Update.
i386 scalb / scalbn / scalbln (and thus ldexp) functions for float and
double can return results with excess range (and consequently excess
precision for subnormal results). As the results of these functions
are fully determined by reference to IEEE 754 operations, this is
unambiguously a bug, apart from the testsuite failures it causes.
This patch makes those functions store their results on the stack and
load them back to eliminate the excess range. Double rounding is not
a problem, as the only cases where it could occur are when the result
overflows or underflows for extended precision, and then the
double-rounded results are the same as the single-rounded results.
The new macros will be used for more functions, more such macros
added, and existing code refactored to use such macros, in subsequent
patches.
Tested for x86. Committed.
[BZ #18981]
* sysdeps/i386/fpu/i386-math-asm.h: New file.
* sysdeps/i386/fpu/e_scalb.S: Include <i386-math-asm.h>.
(__ieee754_scalb): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/e_scalbf.S: Include <i386-math-asm.h>.
(__ieee754_scalbf): Use FLT_NARROW_EVAL.
* sysdeps/i386/fpu/s_scalbn.S: Include <i386-math-asm.h>.
(__scalbn): Use DBL_NARROW_EVAL.
* sysdeps/i386/fpu/s_scalbnf.S: Include <i386-math-asm.h>.
(__scalbnf): Use FLT_NARROW_EVAL.
As noted in bug 6803, scalbn fails to set errno on overflow and
underflow. This patch fixes this by making scalbn an alias of ldexp,
which has exactly the same semantics (for floating-point types with
radix 2) and already has wrappers that deal with setting errno,
instead of an alias of the internal __scalbn (which ldexp calls).
Notes:
* Where compat symbols were defined for scalbn functions, I didn't
change what they point to (to keep the patch minimal), so such
compat symbols continue to go directly to the non-errno-setting
functions.
* Mike, I didn't do anything with the IA64 versions of these
functions, where I think both the ldexp and scalbn functions already
deal with setting errno. As a cleanup (not needed to fix this bug)
however you might want to make those functions into aliases for
IA64; there is no need for them to be separate function
implementations at all.
* This concludes the fix for bug 6803 since the scalb and scalbln
cases of that bug were fixed some time ago.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #6803]
* math/s_ldexp.c (scalbn): Define as weak alias of __ldexp.
[NO_LONG_DOUBLE] (scalbnl): Define as weak alias of __ldexp.
* math/s_ldexpf.c (scalbnf): Define as weak alias of __ldexpf.
* math/s_ldexpl.c (scalbnl): Define as weak alias of __ldexpl.
* sysdeps/i386/fpu/s_scalbn.S (scalbn): Remove alias.
* sysdeps/i386/fpu/s_scalbnf.S (scalbnf): Likewise.
* sysdeps/i386/fpu/s_scalbnl.S (scalbnl): Likewise.
* sysdeps/ieee754/dbl-64/s_scalbn.c (scalbn): Likewise.
[NO_LONG_DOUBLE] (scalbnl): Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_scalbn.c (scalbn):
Likewise.
[NO_LONG_DOUBLE] (scalbnl): Likewise.
* sysdeps/ieee754/flt-32/s_scalbnf.c (scalbnf): Likewise.
* sysdeps/ieee754/ldbl-128/s_scalbnl.c (scalbnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (scalbnl): Remove
long_double_symbol calls.
* sysdeps/ieee754/ldbl-64-128/s_scalbnl.c (scalbnl): Likewise.
* sysdeps/ieee754/ldbl-opt/s_ldexpl.c (__ldexpl_2): Define as
strong alias of __ldexpl.
(scalbnl): Define using long_double_symbol.
* sysdeps/m68k/m680x0/fpu/s_scalbn.c (__CONCATX(scalbn,suffix)):
Remove alias.
* sysdeps/sparc/sparc64/soft-fp/s_scalbnl.c (scalbnl): Likewise.
* sysdeps/x86_64/fpu/s_scalbnl.S (scalbnl): Likewise.
* math/libm-test.inc (scalbn_test_data): Add errno expectations.
(scalbln_test_data): Add more errno expectations.
On i386, the double version of exp10 can miss underflow exceptions if
the result is in the subnormal range for double but the last 11 bits
of the 64-bit extended-precision mantissa happen to be zero. This
patch forces the exception in a similar way to previous fixes.
As with the exp2 and exp fixes, the exp10f changes may in fact not be
needed to ensure underflow exceptions, but are included for
consistency and to fix the exp10 part of bug 18875 by ensuring that
excess range and precision is removed from underflowing return values.
Tested for x86_64 and x86.
[BZ #18875]
[BZ #18966]
* sysdeps/i386/fpu/e_exp10.S (dbl_min): New object.
(MO): New macro.
(__ieee754_exp10): For small results, force underflow exception
and remove excess range and precision from return value.
* sysdeps/i386/fpu/e_exp10f.S (flt_min): New object.
(MO): New macro.
(__ieee754_exp10f): For small results, force underflow exception
and remove excess range and precision from return value.
* math/auto-libm-test-in: Add more tests of exp10.
* math/auto-libm-test-out: Regenerated.
On i386, the double version of exp can miss underflow exceptions if
the result is in the subnormal range for double but the last 11 bits
of the 64-bit extended-precision mantissa happen to be zero. This
patch forces the exception in a similar way to previous fixes.
As with the exp2 fixes, the expf changes may in fact not be needed to
ensure underflow exceptions, but are included for consistency and to
fix the exp part of bug 18875 by ensuring that excess range and
precision is removed from underflowing return values.
Tested for x86_64 and x86.
[BZ #18875]
[BZ #18961]
* sysdeps/i386/fpu/e_exp.S (dbl_min): New object.
(MO): New macro.
(__ieee754_exp): For small results, force underflow exception and
remove excess range and precision from return value.
(__exp_finite): Likewise.
* sysdeps/i386/fpu/e_expf.S (flt_min): New object.
(MO): New macro.
(__ieee754_expf): For small results, force underflow exception and
remove excess range and precision from return value.
(__expf_finite): Likewise.
* math/auto-libm-test-in: Add more tests of exp.
* math/auto-libm-test-out: Regenerated.
Various exp2 implementations in glibc can miss underflow exceptions
when the scaling down part of the calculation is exact (or, in the x86
case, when the conversion from extended precision to the target
precision is exact). This patch forces the exception in a similar way
to previous fixes.
The x86 exp2f changes may in fact not be needed for this purpose -
it's likely to be the case that no argument of type float has an exp2
result so close to an exact subnormal float value that it equals that
value when rounded to 64 bits (even taking account of variation
between different x86 implementations). However, they are included
for consistency with the changes to exp2 and so as to fix the exp2f
part of bug 18875 by ensuring that excess range and precision is
removed from underflowing return values.
Tested for x86_64, x86 and mips64.
[BZ #16521]
[BZ #18875]
* math/e_exp2l.c (__ieee754_exp2l): Force underflow exception for
small results.
* sysdeps/i386/fpu/e_exp2.S (dbl_min): New object.
(MO): New macro.
(__ieee754_exp2): For small results, force underflow exception and
remove excess range and precision from return value.
* sysdeps/i386/fpu/e_exp2f.S (flt_min): New object.
(MO): New macro.
(__ieee754_exp2f): For small results, force underflow exception
and remove excess range and precision from return value.
* sysdeps/i386/fpu/e_exp2l.S (ldbl_min): New object.
(MO): New macro.
(__ieee754_exp2l): Force underflow exception for small results.
* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c (__ieee754_exp2f): Likewise.
* sysdeps/x86_64/fpu/e_exp2l.S (ldbl_min): New object.
(MO): New macro.
(__ieee754_exp2l): Force underflow exception for small results.
* math/auto-libm-test-in: Add more tests or exp2.
* math/auto-libm-test-out: Regenerated.
This patch adds more libm test inputs found through random test
generation to increase previously known ulps. This particular test
generation was run for mips64, so most of the increased ulps are for
ldbl-128 (float and double having been fairly well covered by such
testing for x86_64), but there's the odd ulps increase for other
formats.
Tested for x86_64, x86 and mips64.
* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
asinh, atan, atan2, atanh, cabs, carg, cos, csqrt, erfc, exp,
exp10, exp2, log, log1p, log2, pow, sin, sincos, sinh, tan and
tanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/mips/mips32/libm-test-ulps: Likewise.
* sysdeps/mips/mips64/libm-test-ulps: Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
It was noted in
<https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the
bits/*.h naming scheme should only be used for installed headers.
This patch renames bits/atomic.h to atomic-machine.h to follow that
convention.
This is the only change in this series that needs to change the
filename rather than simply removing a directory level (because both
atomic.h and bits/atomic.h exist at present).
Tested for x86_64 (testsuite, and that installed stripped shared
libraries are unchanged by the patch).
[BZ #14912]
* sysdeps/aarch64/bits/atomic.h: Move to ...
* sysdeps/aarch64/atomic-machine.h: ...here.
(_AARCH64_BITS_ATOMIC_H): Rename macro to
_AARCH64_ATOMIC_MACHINE_H.
* sysdeps/alpha/bits/atomic.h: Move to ...
* sysdeps/alpha/atomic-machine.h: ...here.
* sysdeps/arm/bits/atomic.h: Move to ...
* sysdeps/arm/atomic-machine.h: ...here. Update comments.
* bits/atomic.h: Move to ...
* sysdeps/generic/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/i386/bits/atomic.h: Move to ...
* sysdeps/i386/atomic-machine.h: ...here.
* sysdeps/ia64/bits/atomic.h: Move to ...
* sysdeps/ia64/atomic-machine.h: ...here.
* sysdeps/m68k/coldfire/bits/atomic.h: Move to ...
* sysdeps/m68k/coldfire/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/m68k/m680x0/m68020/bits/atomic.h: Move to ...
* sysdeps/m68k/m680x0/m68020/atomic-machine.h: ...here.
* sysdeps/microblaze/bits/atomic.h: Move to ...
* sysdeps/microblaze/atomic-machine.h: ...here.
* sysdeps/mips/bits/atomic.h: Move to ...
* sysdeps/mips/atomic-machine.h: ...here.
(_MIPS_BITS_ATOMIC_H): Rename macro to _MIPS_ATOMIC_MACHINE_H.
* sysdeps/powerpc/bits/atomic.h: Move to ...
* sysdeps/powerpc/atomic-machine.h: ...here. Update comments.
* sysdeps/powerpc/powerpc32/bits/atomic.h: Move to ...
* sysdeps/powerpc/powerpc32/atomic-machine.h: ...here. Update
comments. Include <atomic-machine.h> instead of <bits/atomic.h>.
* sysdeps/powerpc/powerpc64/bits/atomic.h: Move to ...
* sysdeps/powerpc/powerpc64/atomic-machine.h: ...here. Include
<atomic-machine.h> instead of <bits/atomic.h>.
* sysdeps/s390/bits/atomic.h: Move to ...
* sysdeps/s390/atomic-machine.h: ...here.
* sysdeps/sparc/sparc32/bits/atomic.h: Move to ...
* sysdeps/sparc/sparc32/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/sparc/sparc32/sparcv9/bits/atomic.h: Move to ...
* sysdeps/sparc/sparc32/sparcv9/atomic-machine.h: ...here.
* sysdeps/sparc/sparc64/bits/atomic.h: Move to ...
* sysdeps/sparc/sparc64/atomic-machine.h: ...here.
* sysdeps/tile/bits/atomic.h: Move to ...
* sysdeps/tile/atomic-machine.h: ...here.
* sysdeps/tile/tilegx/bits/atomic.h: Move to ...
* sysdeps/tile/tilegx/atomic-machine.h: ...here. Include
<sysdeps/tile/atomic-machine.h> instead of
<sysdeps/tile/bits/atomic.h>.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/tile/tilepro/bits/atomic.h: Move to ...
* sysdeps/tile/tilepro/atomic-machine.h: ...here. Include
<sysdeps/tile/atomic-machine.h> instead of
<sysdeps/tile/bits/atomic.h>.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/arm/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/arm/atomic-machine.h: ...here. Include
<sysdeps/arm/atomic-machine.h> instead of
<sysdeps/arm/bits/atomic.h>.
* sysdeps/unix/sysv/linux/hppa/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/hppa/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/m68k/coldfire/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/nios2/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/nios2/atomic-machine.h: ...here.
(_NIOS2_BITS_ATOMIC_H): Rename macro to _NIOS2_ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/sh/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/sh/atomic-machine.h: ...here.
* sysdeps/x86_64/bits/atomic.h: Move to ...
* sysdeps/x86_64/atomic-machine.h: ...here.
* include/atomic.h: Include <atomic-machine.h> instead of
<bits/atomic.h>.
This patch adds more libm test inputs found through random test
generation to increase observed ulps on x86_64.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acosh, atanh, cbrt,
cosh, csqrt, erfc, expm1 and lgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The existing implementations of lgamma functions (except for the ia64
versions) use the reflection formula for negative arguments. This
suffers large inaccuracy from cancellation near zeros of lgamma (near
where the gamma function is +/- 1).
This patch fixes this inaccuracy. For arguments above -2, there are
no zeros and no large cancellation, while for sufficiently large
negative arguments the zeros are so close to integers that even for
integers +/- 1ulp the log(gamma(1-x)) term dominates and cancellation
is not significant. Thus, it is only necessary to take special care
about cancellation for arguments around a limited number of zeros.
Accordingly, this patch uses precomputed tables of relevant zeros,
expressed as the sum of two floating-point values. The log of the
ratio of two sines can be computed accurately using log1p in cases
where log would lose accuracy. The log of the ratio of two gamma(1-x)
values can be computed using Stirling's approximation (the difference
between two values of that approximation to lgamma being computable
without computing the two values and then subtracting), with
appropriate adjustments (which don't reduce accuracy too much) in
cases where 1-x is too small to use Stirling's approximation directly.
In the interval from -3 to -2, using the ratios of sines and of
gamma(1-x) can still produce too much cancellation between those two
parts of the computation (and that interval is also the worst interval
for computing the ratio between gamma(1-x) values, which computation
becomes more accurate, while being less critical for the final result,
for larger 1-x). Because this can result in errors slightly above
those accepted in glibc, this interval is instead dealt with by
polynomial approximations. Separate polynomial approximations to
(|gamma(x)|-1)(x-n)/(x-x0) are used for each interval of length 1/8
from -3 to -2, where n (-3 or -2) is the nearest integer to the
1/8-interval and x0 is the zero of lgamma in the relevant half-integer
interval (-3 to -2.5 or -2.5 to -2).
Together, the two approaches are intended to give sufficient accuracy
for all negative arguments in the problem range. Outside that range,
the previous implementation continues to be used.
Tested for x86_64, x86, mips64 and powerpc. The mips64 and powerpc
testing shows up pre-existing problems for ldbl-128 and ldbl-128ibm
with large negative arguments giving spurious "invalid" exceptions
(exposed by newly added tests for cases this patch doesn't affect the
logic for); I'll address those problems separately.
[BZ #2542]
[BZ #2543]
[BZ #2558]
* sysdeps/ieee754/dbl-64/e_lgamma_r.c (__ieee754_lgamma_r): Call
__lgamma_neg for arguments from -28.0 to -2.0.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (__ieee754_lgammaf_r): Call
__lgamma_negf for arguments from -15.0 to -2.0.
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
Call __lgamma_negl for arguments from -48.0 or -50.0 to -2.0.
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c (__ieee754_lgammal_r):
Call __lgamma_negl for arguments from -33.0 to -2.0.
* sysdeps/ieee754/dbl-64/lgamma_neg.c: New file.
* sysdeps/ieee754/dbl-64/lgamma_product.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_negf.c: Likewise.
* sysdeps/ieee754/flt-32/lgamma_productf.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128/lgamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/lgamma_productl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_negl.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_product.c: Likewise.
* sysdeps/ieee754/ldbl-96/lgamma_productl.c: Likewise.
* sysdeps/generic/math_private.h (__lgamma_negf): New prototype.
(__lgamma_neg): Likewise.
(__lgamma_negl): Likewise.
(__lgamma_product): Likewise.
(__lgamma_productl): Likewise.
* math/Makefile (libm-calls): Add lgamma_neg and lgamma_product.
* math/auto-libm-test-in: Add more tests of lgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
We detect i586 and i686 features at run-time by checking CX8 and CMOV
CPUID features bits. We can use these information to select the best
implementation in ix86 multiarch. HAS_I586/HAS_I686 is true if i586/i686
instructions are available on the processor.
Due to the reordering and the other nifty extensions in i686, it is not
really good to use heavily i586 optimized code on an i686. It's better
to use i486 code if it isn't an i586. USE_I586/USE_I686 is true if
i586/i686 implementation should be used for the processor. USE_I586
is true only if i686 instructions aren't available. If i686 instructions
are available, we always choose i686 or i486 implementation, in that order,
and we never choose i586 implementation for i686-class processors.
* sysdeps/i386/init-arch.h: New file.
* sysdeps/i386/i586/init-arch.h: Likewise.
* sysdeps/i386/i686/init-arch.h: Likewise.
* sysdeps/x86/cpu-features.c (init_cpu_features): Set bit_I586
bit if CX8 is available. Set bit_I686 bit if CMOV is available.
* sysdeps/x86/cpu-features.h (bit_I586): New.
(bit_I686): Likewise.
(bit_CX8): Likewise.
(bit_CMOV): Likewise.
(index_CX8): Likewise.
(index_CMOV): Likewise.
(index_I586): Likewise.
(index_I686): Likewise.
(reg_CX8): Likewise.
(reg_CMOV): Likewise.
(HAS_I586): Defined as HAS_ARCH_FEATURE (I586) if i586 isn't
available at compile-time.
(HAS_I686): Defined as HAS_ARCH_FEATURE (I686) if i686 isn't
available at compile-time.
* sysdeps/x86/init-arch.h (USE_I586): New macro.
(USE_I686): Likewise.
Since glibc doesn't support i386 any more, we can remove i486 subdirectory.
* sysdeps/i386/i586/Implies: Removed.
* sysdeps/i386/i686/Implies: Likewise.
Since glibc doesn't support i386 any more, we can move i486/strlen.S
to strlen.S.
* sysdeps/i386/i486/strlen.S: Moved to ...
* sysdeps/i386/strlen.S: Here.
Since glibc doesn't support i386 any more, we can move i486/strcat.S
to strcat.S.
* sysdeps/i386/i486/strcat.S: Moved to ...
* sysdeps/i386/strcat.S: Here.
* sysdeps/i386/i686/multiarch/strcat.S: Updated.
Since glibc doesn't support i386 any more, we can move
i486/pthread_spin_trylock.S to pthread_spin_trylock.S
* sysdeps/i386/i486/pthread_spin_trylock.S: Moved to ...
* sysdeps/i386/pthread_spin_trylock.S: Here.
* sysdeps/i386/i586/pthread_spin_trylock.S: Removed.
* sysdeps/i386/i686/pthread_spin_trylock.S: Updated.
Since glibc doesn't support i386 any more, we can move
i486/string-inlines.c to string-inlines.c.
* sysdeps/i386/i486/string-inlines.c: Moved to ...
* sysdeps/i386/string-inlines.c: Here.
Since glibc doesn't support i386 any more, we can move i486/bits/atomic.h
to bits/atomic.h.
* sysdeps/i386/i486/bits/atomic.h: Moved to ...
* sysdeps/i386/bits/atomic.h: Here.
Replace BZERO_P with USE_AS_BZERO in i586/i686 memset.S to support i386
multi-arch memset. Also we should check SHARED not PIC for libc.so
since libc.a may be compiled with PIC.
* sysdeps/i386/i586/bzero.S (USE_AS_BZERO): New.
* sysdeps/i386/i686/bzero.S (USE_AS_BZERO): Likewise.
* sysdeps/i386/i586/memset.S (BZERO_P): Removed.
Check USE_AS_BZERO/SHARED instead of BZERO_P/PIC.
(__memset_zero_constant_len_parameter): New.
* sysdeps/i386/i686/memset.S (BZERO_P): Removed.
Check USE_AS_BZERO/SHARED instead of BZERO_P/PIC.
(__memset_zero_constant_len_parameter): Don't define if
__memset_chk or USE_AS_BZERO are defined.
Replace MEMPCPY_P with USE_AS_MEMPCPY in i586 memcpy.S to support i386
multi-arch memcpy. Also we should check SHARED not PIC for libc.so
since libc.a may be compiled with PIC.
* sysdeps/i386/i586/memcpy.S (MEMPCPY_P): Removed.
Check USE_AS_MEMPCPY/SHARED instead of MEMPCPY_P/PIC.
* sysdeps/i386/i586/mempcpy.S (USE_AS_MEMPCPY): New.
Since x86-64 ld.so preserves vector registers now, we can use SSE in
x86-64 ld.so. We should run tst-ld-sse-use.sh only on i386.
* sysdeps/x86/Makefile [$(subdir) == elf] (CFLAGS-.os,
tests-special, $(objpfx)tst-ld-sse-use.out): Moved to ...
* sysdeps/i386/Makefile [$(subdir) == elf] (CFLAGS-.os,
tests-special, $(objpfx)tst-ld-sse-use.out): Here. Update
comments.
* sysdeps/x86_64/Makefile [$(subdir) == elf] (CFLAGS-.os): Add
-mno-mmx for $(all-rtld-routines).
* sysdeps/x86/tst-ld-sse-use.sh: Moved to ...
* sysdeps/i386/tst-ld-sse-use.sh: Here. Replace x86-64 with
i386.
sysdeps/i386/i686/multiarch/strcasestr-c.c became unused after
commit 1818483b15
Author: Andreas Schwab <schwab@suse.de>
Date: Wed Dec 18 11:53:27 2013 +1000
Remove use of SSE4.2 functions for strstr on i686
which contains
-sysdep_routines += strcspn-c strpbrk-c strspn-c strstr-c strcasestr-c
+sysdep_routines += strcspn-c strpbrk-c strspn-c
sysdeps/x86_64/multiarch/strcasestr.c became useless after
t 584b18eb4d
Author: Ondřej Bílka <neleai@seznam.cz>
Date: Sat Dec 14 19:33:56 2013 +0100
Add strstr with unaligned loads. Fixes bug 12100.
which changes sysdeps/x86_64/multiarch/strcasestr.c to
libc_ifunc (__strcasestr, __strcasestr_sse2);
This patch removes these file.
* i386/i686/multiarch/strcasestr-c.c: Removed.
* x86_64/multiarch/strcasestr.c: Likewise.
* x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list):
Remove strcasestr.
Move sysdeps/x86_64/multiarch/init-arch.h to sysdeps/x86/init-arch.h
which can be used for both i386 and x86_64.
* sysdeps/i386/i686/multiarch/init-arch.h: Removed.
* sysdeps/unix/sysv/linux/x86/init-arch.h: Likewise.
* sysdeps/x86_64/cacheinfo.c: Include <init-arch.h> instead
of "multiarch/init-arch.h".
* sysdeps/x86_64/multiarch/init-arch.h: Renamed to ...
* sysdeps/x86/init-arch.h: This.
Both files include sysdeps/x86_64/multiarch/init-arch.c which has been
removed.
* sysdeps/i386/i686/multiarch/init-arch.c: Removed.
* sysdeps/unix/sysv/linux/x86/init-arch.c: Likewise.
The csqrt implementations in glibc can miss underflow exceptions when
the real or imaginary part of the result becomes tiny in the course of
scaling down (in particular, multiplication by 0.5) and that scaling
is exact although the relevant part of the mathematical result isn't.
This patch forces the exception in a similar way to previous fixes.
Tested for x86_64 and x86.
[BZ #18370]
* math/s_csqrt.c (__csqrt): Force underflow exception for results
whose real or imaginary part has small absolute value.
* math/s_csqrtf.c (__csqrtf): Likewise.
* math/s_csqrtl.c (__csqrtl): Likewise.
* math/auto-libm-test-in: Add more tests of csqrt.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
Since _dl_x86_cpu_features is always available, we can use x86-64
cacheinfo.c and sysconf.c for both i386 and x86-64.
* sysdeps/i386/i686/Makefile
[$(subdir) == string] (sysdep_routines): Moved to ...
* sysdeps/i386/Makefile: Here.
* sysdeps/i386/i686/cacheinfo.c: Moved to ...
* sysdeps/i386/cacheinfo.c: Here.
* sysdeps/unix/sysv/linux/i386/sysconf.c: Removed.
* sysdeps/unix/sysv/linux/i386/i686/sysconf.c: Likewise.
* sysdeps/unix/sysv/linux/x86_64/sysconf.c: Moved to ...
* sysdeps/unix/sysv/linux/x86/sysconf.c: Here.
This patch adds more test inputs to various libm functions found
through random generation to have larger ulps errors than previously
listed in libm-test-ulp, on at least one of x86_64 and x86.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
asinh, atan, atan2, atanh, cabs, cbrt, cosh, csqrt, erf, erfc,
exp, exp2, lgamma, log, log1p, log2, pow, sin, sincos, tan, tanh
and tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Similar to various other bugs in this area, some tanh implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #16520]
* sysdeps/ieee754/dbl-64/s_tanh.c: Include <float.h>.
(__tanh): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/flt-32/s_tanhf.c: Include <float.h>.
(__tanhf): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_tanhl.c: Include <float.h>.
(__tanhl): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c: Include <float.h>.
(__tanhl): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-96/s_tanhl.c: Include <float.h>.
(__tanhl): Force underflow exception for arguments with small
absolute value.
* math/auto-libm-test-in: Add more tests of tanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
This patch adds more tests of various libm functions found through
random test generation to give increased ulps on 32-bit x86.
Tested for x86_64 and x86.
* math/auto-libm-test-in: Add more tests of acosh, asin, asinh,
atanh, cabs, carg, cbrt, cosh, csqrt, erf, erfc, exp, exp10,
expm1, hypot, log, log10, log1p, log2, pow, sinh, tan and tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
ldbl-128ibm tanhl uses a too-small threshold to decide when to return
+/-1, resulting in large errors. This patch changes it to a more
appropriate threshold (the requirement is for 2*exp(-2|x|) to be small
in terms of ulps of 1).
Tested for x86_64, x86 and powerpc.
[BZ #18790]
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c (__tanhl): Increase
threshold for returning +/- 1.
* math/auto-libm-test-in: Add more tests of tanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
ldbl-128ibm sinhl uses a too-big threshold to decide when to return
the argument, resulting in large errors. This patch fixes it to use a
more appropriate threshold.
Tested for x86_64, x86 and powerpc.
[BZ #18789]
* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c (__ieee754_sinhl): Use
smaller threshold for returning the argument.
* math/auto-libm-test-in: Add more tests of sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
Similar to various other bugs in this area, some sinh implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #16519]
* sysdeps/ieee754/dbl-64/e_sinh.c: Include <float.h>.
(__ieee754_sinh): Force underflow exception for arguments with
small absolute value.
* sysdeps/ieee754/flt-32/e_sinhf.c: Include <float.h>.
(__ieee754_sinhf): Force underflow exception for arguments with
small absolute value.
* sysdeps/ieee754/ldbl-128/e_sinhl.c: Include <float.h>.
(__ieee754_sinhl): Force underflow exception for arguments with
small absolute value.
* sysdeps/ieee754/ldbl-128ibm/e_sinhl.c: Include <float.h>.
(__ieee754_sinhl): Force underflow exception for arguments with
small absolute value.
* sysdeps/ieee754/ldbl-96/e_sinhl.c: Include <float.h>.
(__ieee754_sinhl): Force underflow exception for arguments with
small absolute value.
* math/auto-libm-test-in: Add more tests of sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
The testcase stdlib/tst-makecontext fails on i686 because
_Unwind_Backtrace from libgcc produces a segmentation fault if it was
called within a context created by makecontext. See Bug 18635.
ChangeLog:
* sysdeps/i386/i686/Makefile (test-xfail-tst-makecontext):
New variable.
We need to save/restore bound registers and add a BND prefix before
branches in _dl_runtime_profile so that bound registers for pointer
pass and return are preserved when LD_AUDIT is used.
[BZ #18134]
* sysdeps/i386/configure.ac: Set HAVE_MPX_SUPPORT.
* sysdeps/i386/configure: Regenerated.
* sysdeps/i386/dl-trampoline.S (PRESERVE_BND_REGS_PREFIX): New.
(_dl_runtime_profile): Save and restore Intel MPX return bound
registers when calling _dl_call_pltexit. Add
PRESERVE_BND_REGS_PREFIX before return.
* sysdeps/i386/link-defines.sym (LRV_BND0_OFFSET): New.
(LRV_BND1_OFFSET): Likewise.
* sysdeps/x86/bits/link.h (La_i86_retval): Add lrv_bnd0 and
lrv_bnd1.
* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Fix
typo in bndmov encoding.
* sysdeps/x86_64/dl-trampoline.h: Properly save and restore
Intel MPX bound registers. Add PRESERVE_BND_REGS_PREFIX before
branch instructions to preserve bounds.
Define macros for fields in La_i86_regs and La_i86_retval and use them
in dl-trampoline.S, instead of hardcoded values.
* sysdeps/i386/Makefile (gen-as-const-headers)[elf]: Add
link-defines.sym.
* sysdeps/i386/dl-trampoline.S: Include <link-defines.h>.
(_dl_runtime_profile): Use LONG_DOUBLE_SIZE, LRV_SIZE,
LRV_EAX_OFFSET, LRV_EDX_OFFSET, LRV_ST0_OFFSET, LRV_ST1_OFFSET
and LR_SIZE.
* sysdeps/i386/link-defines.sym: New file.
This patch adds a testcase for i386 LD_AUDIT to check function return
and parameters passed in registers.
* sysdeps/i386/Makefile (tests)[elf]: Add tst-audit3.
(modules-names): Add tst-auditmod3a tst-auditmod3b.
($(objpfx)tst-audit3): New rule.
($(objpfx)tst-audit3.out): Likewise.
* sysdeps/i386/tst-audit3.c: New file.
* sysdeps/i386/tst-audit3.h: Likewise.
* sysdeps/i386/tst-auditmod3a.c: Likewise.
* sysdeps/i386/tst-auditmod3b.c: Likewise.
This patch combines BUSY_WAIT_NOP and atomic_delay into a new
atomic_spin_nop function and adjusts all clients. The new function is
put into atomic.h because what is best done in a spin loop is
architecture-specific, and atomics must be used for spinning. The
function name is meant to tell users that this has no effect on
synchronization semantics but is a performance aid for spinning.
In non-default rounding modes, tgamma can be slightly less accurate
than permitted by glibc's accuracy goals.
Part of the problem is error accumulation, addressed in this patch by
setting round-to-nearest for internal computations. However, there
was also a bug in the code dealing with computing pow (x + n, x + n)
where x + n is not exactly representable, providing another source of
error even in round-to-nearest mode; it was necessary to address both
bugs to get errors for all testcases within glibc's accuracy goals.
Given this second fix, accuracy in round-to-nearest mode is also
improved (hence regeneration of ulps for tgamma should be from scratch
- truncate libm-test-ulps or at least remove existing tgamma entries -
so that the expected ulps can be reduced).
Some additional complications also arose. Certain tgamma tests should
strictly, according to IEEE semantics, overflow or not depending on
the rounding mode; this is beyond the scope of glibc's accuracy goals
for any function without exactly-determined results, but
gen-auto-libm-tests doesn't handle being lax there as it does for
underflow. (libm-test.inc also doesn't handle being lax about whether
the result in cases very close to the overflow threshold is infinity
or a finite value close to overflow, but that doesn't cause problems
in this case though I've seen it cause problems with random test
generation for some functions.) Thus, spurious-overflow markings,
with a comment, are added to auto-libm-test-in (no bug in Bugzilla
because the issue is with the testsuite, not a user-visible bug in
glibc). And on x86, after the patch I saw ERANGE issues as previously
reported by Carlos (see my commentary in
<https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>), which
needed addressing by ensuring excess range and precision were
eliminated at various points if FLT_EVAL_METHOD != 0.
I also noticed and fixed a cosmetic issue where 1.0f was used in long
double functions and should have been 1.0L.
This completes the move of all functions to testing in all rounding
modes with ALL_RM_TEST, so gen-libm-have-vector-test.sh is updated to
remove the workaround for some functions not using ALL_RM_TEST.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18613]
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Take log of
X_ADJ not X when adjusting exponent.
(__ieee754_gamma_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Take log
of X_ADJ not X when adjusting exponent.
(__ieee754_gammaf_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive): Take
log of X_ADJ not X when adjusting exponent.
(__ieee754_gammal_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed. Use 1.0L not 1.0f as numerator of division.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive): Take
log of X_ADJ not X when adjusting exponent.
(__ieee754_gammal_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed. Use 1.0L not 1.0f as numerator of division.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive): Take log
of X_ADJ not X when adjusting exponent.
(__ieee754_gammal_r): Do intermediate computations in
round-to-nearest then adjust overflowing and underflowing results
as needed. Use 1.0L not 1.0f as numerator of division.
* math/libm-test.inc (tgamma_test_data): Remove one test. Moved
to auto-libm-test-in.
(tgamma_test): Use ALL_RM_TEST.
* math/auto-libm-test-in: Add one test of tgamma. Mark some other
tests of tgamma with spurious-overflow.
* math/auto-libm-test-out: Regenerated.
* math/gen-libm-have-vector-test.sh: Do not check for START.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Some existing jn tests, if run in non-default rounding modes, produce
errors above those accepted in glibc, which causes problems for moving
tests of jn to use ALL_RM_TEST. This patch makes jn set rounding
to-nearest internally, as was done for yn some time ago, then computes
the appropriate underflowing value for results that underflowed to
zero in to-nearest, and moves the tests to ALL_RM_TEST. It does
nothing about the general inaccuracy of Bessel function
implementations in glibc, though it should make jn more accurate on
average in non-default rounding modes through reduced error
accumulation. The recomputation of results that underflowed to zero
should as a side-effect fix some cases of bug 16559, where jn just
used an exact zero, but that is *not* the goal of this patch and other
cases of that bug remain unfixed.
(Most of the changes in the patch are reindentation to add new scopes
for SET_RESTORE_ROUND*.)
Tested for x86_64, x86, powerpc and mips64.
[BZ #16559]
[BZ #18602]
* sysdeps/ieee754/dbl-64/e_jn.c (__ieee754_jn): Set
round-to-nearest internally then recompute results that
underflowed to zero in the original rounding mode.
* sysdeps/ieee754/flt-32/e_jnf.c (__ieee754_jnf): Likewise.
* sysdeps/ieee754/ldbl-128/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
* sysdeps/ieee754/ldbl-96/e_jnl.c (__ieee754_jnl): Likewise
* math/libm-test.inc (jn_test): Use ALL_RM_TEST.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
cexp, ccos, ccosh, csin and csinh have spurious underflows in cases
where they compute sin of the smallest normal, that produces an
underflow exception (depending on which sin implementation is in use)
but the final result does not underflow. ctan and ctanh may also have
such underflows, or they may be latent (the issue there is that
e.g. ctan (DBL_MIN) should, rounded upwards, be the next double value
above DBL_MIN, which under glibc's accuracy goals may not have an
underflow exception, but the intermediate computation of sin (DBL_MIN)
would legitimately underflow on before-rounding architectures).
This patch fixes all those functions so they use plain comparisons (>
DBL_MIN etc.) instead of comparing the result of fpclassify with
FP_SUBNORMAL (in all these cases, we already know the number being
compared is finite). Note that in the case of csin / csinf / csinl,
there is no need for fabs calls in the comparison because the real
part has already been reduced to its absolute value.
As the patch fixes the failures that previously obstructed moving
tests of cexp to use ALL_RM_TEST, those tests are moved to ALL_RM_TEST
by the patch (two functions remain yet to be converted).
Tested for x86_64 and x86 and ulps updated accordingly.
[BZ #18594]
* math/s_ccosh.c (__ccosh): Compare with least normal value
instead of comparing class with FP_SUBNORMAL.
* math/s_ccoshf.c (__ccoshf): Likewise.
* math/s_ccoshl.c (__ccoshl): Likewise.
* math/s_cexp.c (__cexp): Likewise.
* math/s_cexpf.c (__cexpf): Likewise.
* math/s_cexpl.c (__cexpl): Likewise.
* math/s_csin.c (__csin): Likewise.
* math/s_csinf.c (__csinf): Likewise.
* math/s_csinh.c (__csinh): Likewise.
* math/s_csinhf.c (__csinhf): Likewise.
* math/s_csinhl.c (__csinhl): Likewise.
* math/s_csinl.c (__csinl): Likewise.
* math/s_ctan.c (__ctan): Likewise.
* math/s_ctanf.c (__ctanf): Likewise.
* math/s_ctanh.c (__ctanh): Likewise.
* math/s_ctanhf.c (__ctanhf): Likewise.
* math/s_ctanhl.c (__ctanhl): Likewise.
* math/s_ctanl.c (__ctanl): Likewise.
* math/auto-libm-test-in: Add more tests of ccos, ccosh, cexp,
csin, csinh, ctan and ctanh.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (cexp_test): Use ALL_RM_TEST.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This fixes BZ #17403 by defining atomic_full_barrier,
atomic_read_barrier, and atomic_write_barrier on x86 and x86_64. A full
barrier is implemented through an atomic idempotent modification to the
stack and not through using mfence because the latter can supposedly be
somewhat slower due to having to provide stronger guarantees wrt.
self-modifying code, for example.
The csqrt implementations in glibc can cause spurious underflows in
some cases as a side-effect of the scaling for large arguments (when
underflow is correct for the square root of the argument that was
scaled down to avoid overflow, but not for the original argument).
This patch arranges to avoid the underflowing intermediate computation
(eliminating a multiplication in 0.5 in the problem cases where a
subsequent scaling by 2 would follow).
Tested for x86_64 and x86 and ulps updated accordingly (only needed
for x86).
[BZ #18371]
* math/s_csqrt.c (__csqrt): Avoid multiplication by 0.5 where
intermediate but not final result might underflow.
* math/s_csqrtf.c (__csqrtf): Likewise.
* math/s_csqrtl.c (__csqrtl): Likewise.
* math/auto-libm-test-in: Add more tests of csqrt.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
Similar to various other bugs in this area, some expm1 implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes.
(The issue does not apply to the ldbl-* implementations or to those
for x86 / x86_64 long double. The change to
sysdeps/ieee754/dbl-64/wordsize-64/e_cosh.c is one I missed when
previously fixing bug 16354; the bug in that implementation was
previously latent, but the expm1 fixes stopped it being latent and so
required it to be fixed to avoid spurious underflows from cosh.)
Tested for x86_64 and x86.
[BZ #16353]
* sysdeps/i386/fpu/s_expm1.S (dbl_min): New object.
(__expm1): Force underflow exception for arguments with small
absolute value.
* sysdeps/i386/fpu/s_expm1f.S (flt_min): New object.
(__expm1f): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/dbl-64/s_expm1.c: Include <float.h>.
(__expm1): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/flt-32/s_expm1f.c: Include <float.h>.
(__expm1f): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/dbl-64/wordsize-64/e_cosh.c (__ieee754_cosh):
Check for small arguments before calling __expm1.
* math/auto-libm-test-in: Do not mark underflow exceptions as
possibly missing for bug 16353.
* math/auto-libm-test-out: Regenerated.
In the x86 / x86_64 implementations of expm1l, when expm1l's result
should underflow to 0 (argument minus the least subnormal, in some
rounding modes), it can be a zero of the wrong sign. This patch fixes
this by returning the argument with underflow forced in that case
(this is a 1ulp error relative to the correctly rounded result of -0,
which is OK in terms of the documented accuracy goals, whereas a
result with the wrong sign never is).
Tested for x86_64 and x86.
[BZ #18569]
* sysdeps/i386/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]: Force
underflow and return argument in case of subnormal argument.
* sysdeps/x86_64/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]:
Likewise.
* math/auto-libm-test-in: Add more tests of expm1.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, the x86 and x86_64
implementations of expl / exp10l can fail to produce underflow
exceptions when the unscaled result has trailing 0 bits so the scaling
down to subnormal precision is exact. This patch fixes this by
forcing the exception in the case of tiny results.
Tested for x86_64 and x86.
[BZ #16361]
* sysdeps/i386/fpu/e_expl.S [!USE_AS_EXPM1L] (cmin): New object.
[!USE_AS_EXPM1L] (IEEE754_EXPL): Force underflow exception for
tiny results.
* sysdeps/x86_64/fpu/e_expl.S [!USE_AS_EXPM1L] (cmin): New object.
[!USE_AS_EXPM1L] (IEEE754_EXPL): Force underflow exception for
tiny results.
* math/auto-libm-test-in: Add more tests of exp and exp10. Do not
mark underflow exceptions as possibly missing for bug 16361.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, some asinh implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes.
Tested for x86_64, x86 and mips64.
[BZ #16350]
* sysdeps/i386/fpu/s_asinh.S (__asinh): Force underflow exception
for arguments with small absolute value.
* sysdeps/i386/fpu/s_asinhf.S (__asinhf): Likewise.
* sysdeps/i386/fpu/s_asinhl.S (__asinhl): Likewise.
* sysdeps/ieee754/dbl-64/s_asinh.c: Include <float.h>.
(__asinh): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/flt-32/s_asinhf.c: Include <float.h>.
(__asinhf): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_asinhl.c: Include <float.h>.
(__asinhl): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c: Include <float.h>.
(__asinhl): Force underflow exception for arguments with small
absolute value.
* sysdeps/ieee754/ldbl-96/s_asinhl.c: Include <float.h>.
(__asinhl): Force underflow exception for arguments with small
absolute value.
* math/auto-libm-test-in: Do not mark underflow exceptions as
possibly missing for bug 16350.
* math/auto-libm-test-out: Regenerated.
regcomp brings in references to wcscoll, which isn't in all the
standards that contain regcomp. In turn, wcscoll brings in references
to wcscmp, also not in all those standards. This patch fixes this by
making those functions into weak aliases of __wcscoll and __wcscmp and
calling those names instead as needed.
Tested for x86_64 and x86 (testsuite, and that disassembly of
installed shared libraries is unchanged by the patch).
[BZ #18497]
* wcsmbs/wcscmp.c [!WCSCMP] (WCSCMP): Define as __wcscmp instead
of wcscmp.
(wcscmp): Define as weak alias of WCSCMP.
* wcsmbs/wcscoll.c (STRCOLL): Define as __wcscoll instead of
wcscoll.
(USE_HIDDEN_DEF): Define.
[!USE_IN_EXTENDED_LOCALE_MODEL] (wcscoll): Define as weak alias of
__wcscoll. Don't use libc_hidden_weak.
* wcsmbs/wcscoll_l.c (STRCMP): Define as __wcscmp instead of
wcscmp.
* sysdeps/i386/i686/multiarch/wcscmp-c.c
[SHARED] (libc_hidden_def): Define __GI___wcscmp instead of
__GI_wcscmp.
(weak_alias): Undefine and redefine.
* sysdeps/i386/i686/multiarch/wcscmp.S (wcscmp): Rename to
__wcscmp and define as weak alias of __wcscmp.
* sysdeps/x86_64/wcscmp.S (wcscmp): Likewise.
* include/wchar.h (__wcscmp): Declare. Use libc_hidden_proto.
(__wcscoll): Likewise.
(wcscmp): Don't use libc_hidden_proto.
(wcscoll): Likewise.
* posix/regcomp.c (build_range_exp): Call __wcscoll instead of
wcscoll.
* posix/regexec.c (check_node_accept_bytes): Likewise.
* conform/Makefile (test-xfail-XPG3/regex.h/linknamespace): Remove
variable.
(test-xfail-XPG4/regex.h/linknamespace): Likewise.
(test-xfail-POSIX/regex.h/linknamespace): Likewise.
Various code in glibc uses __strnlen instead of strnlen for namespace
reasons. However, __strnlen does not use libc_hidden_proto /
libc_hidden_def (as is normally done for any function defined and
called within the same library, whether or not exported from the
library and whatever namespace it is in), so the compiler does not
know that those calls are to a function within libc.
This patch uses libc_hidden_proto / libc_hidden_def with __strnlen.
On x86_64, it makes no difference to the installed stripped shared
libraries. On 32-bit x86, it causes __strnlen calls to go to the same
place as strnlen calls (the fallback strnlen implementation), rather
than through a PLT entry for the strnlen IFUNC; I'm not sure of the
logic behind when calls from within libc should use IFUNCs versus when
they should go direct to a particular function implementation, but
clearly it doesn't make sense for strnlen and __strnlen to be handled
differently in this regard.
Tested for x86_64 and x86 (testsuite, and comparison of installed
shared libraries as described above).
* string/strnlen.c [!STRNLEN] (__strnlen): Use libc_hidden_def.
* include/string.h (__strnlen): Use libc_hidden_proto.
* sysdeps/aarch64/strnlen.S (__strnlen): Use libc_hidden_def.
* sysdeps/i386/i686/multiarch/strnlen-c.c [SHARED]
(libc_hidden_def): Define __GI___strnlen as well as __GI_strnlen.
* sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-power7.S
(libc_hidden_def): Undefine and redefine.
* sysdeps/powerpc/powerpc32/power4/multiarch/strnlen-ppc32.c
[SHARED] (libc_hidden_def): Define __GI___strnlen as well as
__GI_strnlen.
* sysdeps/powerpc/powerpc32/power7/strnlen.S (__strnlen): Use
libc_hidden_def.
* sysdeps/tile/tilegx/strnlen.c (__strnlen): Likewise.
The i386 implementation of atanhl, for small arguments, does a
calculation that involves computing twice the square of the argument,
resulting in spurious underflows for some arguments. This patch fixes
this by just returning the argument when its exponent is below -32,
with underflow being forced as needed for subnormal arguments.
Tested for x86 and x86_64.
[BZ #18049]
* sysdeps/i386/fpu/e_atanhl.S (__ieee754_atanhl): For exponents
below -32, return the argument, with underflow if subnormal.
* math/auto-libm-test-in: Add more tests of atanh.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, some atanh implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes. (No change in this regard is needed
for the i386 implementation; special handling to force underflows in
these cases will only be needed there when the spurious underflows,
bug 18049, get fixed.)
Tested for x86_64, x86, powerpc and mips64.
[BZ #16352]
* sysdeps/i386/fpu/e_atanh.S (dbl_min): New object.
(__ieee754_atanh): Force underflow exception for results with
small absolute value.
* sysdeps/i386/fpu/e_atanhf.S (flt_min): New object.
(__ieee754_atanhf): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/dbl-64/e_atanh.c: Include <float.h>.
(__ieee754_atanh): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/flt-32/e_atanhf.c: Include <float.h>.
(__ieee754_atanhf): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/ldbl-128/e_atanhl.c: Include <float.h>.
(__ieee754_atanhl): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/ldbl-128ibm/e_atanhl.c: Include <float.h>.
(__ieee754_atanhl): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/ldbl-96/e_atanhl.c: Include <float.h>.
(__ieee754_atanhl): Force underflow exception for results with
small absolute value.
* math/auto-libm-test-in: Do not allow missing underflow
exceptions from atanh.
* math/auto-libm-test-out: Regenerated.
Similar to various other bugs in this area, some log1p implementations
do not raise the underflow exception for subnormal arguments, when the
result is tiny and inexact. This patch forces the exception in a
similar way to previous fixes. (The ldbl-128ibm implementation
doesn't currently need any change as it already generates this
exception, albeit through code that would generate spurious exceptions
in other cases; special code for this issue will only be needed there
when fixing the spurious exceptions.)
Tested for x86_64, x86, powerpc and mips64.
[BZ #16339]
* sysdeps/i386/fpu/s_log1p.S (dbl_min): New object.
(__log1p): Force underflow exception for results with small
absolute value.
* sysdeps/i386/fpu/s_log1pf.S (flt_min): New object.
(__log1pf): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/dbl-64/s_log1p.c: Include <float.h>.
(__log1p): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/flt-32/s_log1pf.c: Include <float.h>.
(__log1pf): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_log1pl.c: Include <float.h>.
(__log1pl): Force underflow exception for results with small
absolute value.
* math/auto-libm-test-in: Do not allow missing underflow
exceptions from log1p.
* math/auto-libm-test-out: Regenerated.
This patch adds more randomly-generated tests of various libm
functions that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of csqrt, lgamma, log10
and sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds more randomly-generated tests of various libm
functions that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of acosh, atanh, cos,
csqrt, erfc, sin and sincos.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds more randomly-generated tests of various libm
functions that are observed to increase ulps on x86_64. (This process
must eventually converge, when my random test generation stops finding
inputs that increase the listed ulps, except maybe for any cases
uncovered where the errors exceed the maximum allowed 9ulp error and
so indicate actual libm bugs needing fixing.)
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of acosh, atanh, clog,
clog10, csqrt, erfc, exp2, expm1, log10, log2 and sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds more randomly-generated tests of various libm
functions that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of atan, clog, clog10,
cos, csqrt, erf, erfc, exp2, lgamma, log1p, sin, sincos, tanh and
tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of tgamma that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of tgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of tanh that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of tanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of tan that are observed
to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of tan.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of cos, sin and sincos
that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of cos, sin and sincos.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of lgamma that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of lgamma.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of log, log10, log1p and
log2 that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of log, log10, log2 and
log1p.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of exp, exp10, exp2 and
expm1 that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of exp, exp10, exp2 and
expm1.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of erf and erfc that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of erf and erfc.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of csqrt that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of csqrt.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some further randomly-generated tests of cosh and sinh
that are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of cosh and sinh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch fix the static build for strftime, which uses __wcschr.
Current powerpc32 implementation defines the __wcschr be an alias to
__wcschr_ppc32 and current implementation misses the correct alias for
static build.
It also changes the default wcschr.c logic so a IFUNC implementation
should just define WCSCHR and undefine the required alias/internal
definitions.
According to bug 6792, errno is not set to ERANGE/EDOM
by calling log1p/log1pf/log1pl with x = -1 or x < -1.
This patch adds a wrapper which sets errno in those cases
and returns the value of the existing __log1p function.
The log1p is now an alias to the wrapper function
instead of __log1p.
The files in sysdeps are reflecting these changes.
The ia64 implementation sets errno by itself,
thus the wrapper-file is empty.
The libm-test is adjusted for log1p-tests to check errno.
[BZ #6792]
* math/w_log1p.c: New file.
* math/w_log1pf.c: Likewise.
* math/w_log1pl.c: Likewise.
* math/Makefile (libm-calls): Add w_log1p.
* math/s_log1pl.c (log1pl): Remove weak_alias.
* sysdeps/i386/fpu/s_log1p.S (log1p): Likewise.
* sysdeps/i386/fpu/s_log1pf.S (log1pf): Likewise.
* sysdeps/i386/fpu/s_log1pl.S (log1pl): Likewise.
* sysdeps/x86_64/fpu/s_log1pl.S (log1pl): Likewise.
* sysdeps/ieee754/dbl-64/s_log1p.c (log1p): Likewise.
[NO_LONG_DOUBLE] (log1pl): Likewise.
* sysdeps/ieee754/flt-32/s_log1pf.c (log1pf): Likewise.
* sysdeps/ieee754/ldbl-128/s_log1pl.c (log1pl): Likewise.
* sysdeps/ieee754/ldbl-64-128/s_log1pl.c
(log1p): Remove long_double_symbol.
* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c (log1pl): Likewise.
* sysdeps/ieee754/ldbl-64-128/w_log1pl.c: New file.
* sysdeps/ieee754/ldbl-128ibm/w_log1pl.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_log1p.c: Define empty weak_alias to
remove weak_alias for corresponding log1p function.
* sysdeps/m68k/m680x0/fpu/s_log1pf.c: Likewise.
* sysdeps/m68k/m680x0/fpu/s_log1pl.c: Likewise.
* sysdeps/ia64/fpu/w_log1p.c: New file.
* sysdeps/ia64/fpu/w_log1pf.c: Likewise.
* sysdeps/ia64/fpu/w_log1pl.c: Likewise.
* math/libm-test.inc (log1p_test_data): Add errno expectations.
This patch adds some randomly-generated tests of clog and clog10 that
are observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of clog and clog10.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of atanh that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of atanh.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of atan that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of atan.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch adds some randomly-generated tests of cabs that are
observed to increase ulps on x86_64.
Tested for x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add more tests of cabs.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
The dbl-64 implementation of atan2 does computations that expect to
run in round-to-nearest mode, and in other modes the errors can
accumulate to more than the maximum accepted 9ulp. This patch makes
it use FE_TONEAREST internally, similar to other functions with such
issues. Tests that previously produced large errors are added for
atan2 and the closely related carg, clog and clog10 functions.
Tested for x86_64 and x86 and ulps updated accordingly.
[BZ #18210]
[BZ #18211]
* sysdeps/ieee754/dbl-64/e_atan2.c: Include <fenv.h>.
(__ieee754_atan2): Set FE_TONEAREST mode for internal
computations.
* math/auto-libm-test-in: Add more tests of atan2, carg, clog and
clog10.
* math/auto-libm-test-out: Regenerated.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
With copy relocation, address of protected data defined in the shared
library may be external. When there is a relocation against the
protected data symbol within the shared library, we need to check if we
should skip the definition in the executable copied from the protected
data. This patch adds ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA and defines
it for x86. If ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA isn't 0, do_lookup_x
will skip the data definition in the executable from copy reloc.
[BZ #17711]
* elf/dl-lookup.c (do_lookup_x): When UNDEF_MAP is NULL, which
indicates it is called from do_lookup_x on relocation against
protected data, skip the data definion in the executable from
copy reloc.
(_dl_lookup_symbol_x): Pass ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA,
instead of ELF_RTYPE_CLASS_PLT, to do_lookup_x for
EXTERN_PROTECTED_DATA relocation against STT_OBJECT symbol.
* sysdeps/generic/ldsodefs.h * (ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA):
New. Defined to 4 if DL_EXTERN_PROTECTED_DATA is defined,
otherwise to 0.
* sysdeps/i386/dl-lookupcfg.h (DL_EXTERN_PROTECTED_DATA): New.
* sysdeps/i386/dl-machine.h (elf_machine_type_class): Set class
to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA for R_386_GLOB_DAT.
* sysdeps/x86_64/dl-lookupcfg.h (DL_EXTERN_PROTECTED_DATA): New.
* sysdeps/x86_64/dl-machine.h (elf_machine_type_class): Set class
to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA for R_X86_64_GLOB_DAT.