The NEWS entry for sinf improvements is listed for 2.28, while it was
committed in 2.29, so move it there and mention tanf.
Committed as obvious.
* NEWS: Move optimized sinf entry to 2.29.
Speedup tanf range reduction by using the new sincosf range
reduction algorithm. Overall code quality is improved due to
inlining, so there is a speedup even if no range reduction is
required.
tanf throughput gains on Cortex-A72:
* |x| < M_PI_4 : 1.1x
* |x| < M_PI_2 : 1.2x
* |x| < 2 * M_PI: 1.5x
* |x| < 120.0 : 1.6x
* |x| < Inf : 12.1x
* sysdeps/ieee754/flt-32/s_tanf.c (__tanf): Use fast range reduction.
This patch completes the move of ROUNDING_TESTS_* macros to typo-proof
conventions by stopping redefining them in test-*-vlen*.h. Instead,
libm-test-driver.c is made to check TEST_MATHVEC when setting
non-to-nearest rounding modes.
Tested for x86_64.
* math/test-double-vlen2.h: Don't include <math-tests-rounding.h>.
(ROUNDING_TESTS_double): Remove.
* math/test-double-vlen4.h: Don't include <math-tests-rounding.h>.
(ROUNDING_TESTS_double): Remove.
* math/test-double-vlen8.h: Don't include <math-tests-rounding.h>.
(ROUNDING_TESTS_double): Remove.
* math/test-float-vlen16.h: Don't include <math-tests-rounding.h>.
(ROUNDING_TESTS_float): Remove.
* math/test-float-vlen4.h: Don't include <math-tests-rounding.h>.
(ROUNDING_TESTS_float): Remove.
* math/test-float-vlen8.h: Don't include <math-tests-rounding.h>.
(ROUNDING_TESTS_float): Remove.
* math/libm-test-driver.c (IF_ROUND_INIT_FE_DOWNWARD): Check
!TEST_MATHVEC here.
(IF_ROUND_INIT_FE_TOWARDZERO): Likewise.
(IF_ROUND_INIT_FE_UPWARD): Likewise.
Continuing moving macros out of math-tests.h to smaller headers
following typo-proof conventions instead of using #ifndef, this patch
moves the ROUNDING_TESTS_* macros for individual types out to their
own sysdeps header.
In the soft-float case where FE_TONEAREST is the only rounding mode
macro defined, there is no need to define ROUNDING_TESTS_*; it is only
necessary when rounding modes macros are defined that may not be
supported at runtime. Thus, the ROUNDING_TESTS_* definitions for some
configurations are just removed, not moved to new
math-tests-rounding.h headers; the only architectures needing
math-tests-rounding.h are those where the macros are defined in
bits/fenv.h because of the possibility of a soft-float compilation
using a hard-float glibc with the same ABI (i.e., ARM and RISC-V).
The test-*-vlen*.h headers, by using #undef, do not yet follow
typo-proof conventions (but they no longer implicitly rely on being
included before math-tests.h, and this area can always be cleaned up
further in future).
Tested with build-many-glibcs.py.
* sysdeps/generic/math-tests-rounding.h: New file.
* sysdeps/generic/math-tests.h: Include <math-tests-rounding.h>.
(ROUNDING_TESTS_float): Do not define here.
(ROUNDING_TESTS_double): Likewise.
(ROUNDING_TESTS_long_double): Likewise.
(ROUNDING_TESTS_float128): Likewise.
* math/test-double-vlen2.h: Include <math-tests-rounding.h>.
(ROUNDING_TESTS_double): Undefine before defining.
* math/test-double-vlen4.h: Include <math-tests-rounding.h>.
(ROUNDING_TESTS_double): Undefine before defining.
* math/test-double-vlen8.h: Include <math-tests-rounding.h>.
(ROUNDING_TESTS_double): Undefine before defining.
* math/test-float-vlen16.h: Include <math-tests-rounding.h>.
(ROUNDING_TESTS_float): Undefine before defining.
* math/test-float-vlen4.h: Include <math-tests-rounding.h>.
(ROUNDING_TESTS_float): Undefine before defining.
* math/test-float-vlen8.h: Include <math-tests-rounding.h>.
(ROUNDING_TESTS_float): Undefine before defining.
* sysdeps/arm/nofpu/math-tests-rounding.h: New file.
* sysdeps/arm/math-tests.h [__SOFTFP__] (ROUNDING_TESTS_float): Do
not define here.
[__SOFTFP__] (ROUNDING_TESTS_double): Likewise.
[__SOFTFP__] (ROUNDING_TESTS_long_double): Likewise.
* sysdeps/riscv/nofpu/math-tests-rounding.h: New file.
* sysdeps/riscv/math-tests.h [!__riscv_flen]
(ROUNDING_TESTS_float): Do not define here.
[!__riscv_flen] (ROUNDING_TESTS_double): Likewise.
[!__risv_flen] (ROUNDING_TESTS_long_double): Likewise.
* sysdeps/m68k/coldfire/math-tests.h [!__mcffpu__]
(ROUNDING_TESTS_float): Likewise.
[!__mcffpu__] (ROUNDING_TESTS_double): Likewise.
[!__mcffpu__] (ROUNDING_TESTS_long_double): Likewise.
* sysdeps/mips/math-tests.h [__mips_soft_float]
(ROUNDING_TESTS_float): Likewise.
[__mips_soft_float] (ROUNDING_TESTS_double): Likewise.
[__mips_soft_float] (ROUNDING_TESTS_long_double): Likewise.
* sysdeps/nios2/math-tests.h (ROUNDING_TESTS_float): Likewise.
(ROUNDING_TESTS_double): Likewise.
(ROUNDING_TESTS_long_double): Likewise.
This patch adds the PF_XDP, AF_XDP and SOL_XDP macros from Linux 4.18 to
sysdeps/unix/sysv/linux/bits/socket.h.
* sysdeps/unix/sysv/linux/bits/socket.h (PF_MAX): Set to 45.
(PF_XDP): New macro.
(AF_XDP): New macro.
(SOL_XDP): New macro.
This patch adds constants from netinet/tcp.h in Linux 4.18, and an
associated struct tcp_zerocopy_receive, to sysdeps/gnu/netinet/tcp.h.
The new TCP_REPAIR_* constants seemed sufficiently related to those
already present to include them.
Note that this patch does not include additions to struct tcp_info;
there are many other elements in this structure in the Linux kernel
that are not included in the glibc version (which was last extended in
2007, it seems). Such additions to the end of the structure may be OK
with the expected way it is used (size passed explicitly to the kernel
with getsockopt), but in principle any change to the size of a type
provided by glibc is an ABI change for external applications /
libraries using that type in their ABIs, and has the associated risks
of such a change.
Tested for x86_64.
* sysdeps/gnu/netinet/tcp.h (TCP_ZEROCOPY_RECEIVE): New macro.
(TCP_INQ): Likewise.
(TCP_CM_INQ): Likewise.
(TCP_REPAIR_ON): Likewise.
(TCP_REPAIR_OFF): Likewise.
(TCP_REPAIR_OFF_NO_WP): Likewise.
(struct tcp_zerocopy_receive): New type.
This patch updates struct signalfd_siginfo in sys/signalfd.h with new
members from Linux 4.18 (plus ssi_addr_lsb, added to the kernel in
2.6.37 without being added to sys/signalfd.h at that time). The
__pad2 member name follows the kernel and the existing __pad name.
Tested for x86_64.
* sysdeps/unix/sysv/linux/sys/signalfd.h (struct
signalfd_siginfo): Add ssi_addr_lsb, ssi_syscall, ssi_call_addr
and ssi_arch members.
This patch adds two new constants from Linux 4.18 to elf.h,
NT_VMCOREDD and AT_MINSIGSTKSZ.
Tested for x86_64.
* elf/elf.c (NT_VMCOREDD): New macro.
(AT_MINSIGSTKSZ): Likewise.
New generic optimization of sinf and cosf introduced by commit
599cf39766 shows improvement
compared to powerpc specific assembly version. Hence removing
the powerpc assembly versions to make use of generic code.
The House of Force is a well-known technique to exploit heap
overflow. In essence, this exploit takes three steps:
1. Overwrite the size of top chunk with very large value (e.g. -1).
2. Request x bytes from top chunk. As the size of top chunk
is corrupted, x can be arbitrarily large and top chunk will
still be offset by x.
3. The next allocation from top chunk will thus be controllable.
If we verify the size of top chunk at step 2, we can stop such attack.
This patch moves little endian specific POWER9 optimization files to
sysdeps/powerpc/powerpc64/le and creates POWER9 ifunc functions
only for little endian.
This variant of strlen uses vector loads and operations to reduce the
size of the code and also eliminate the non-ascii fallback. This
works very well for falkor because of its two vector units and
efficient vector ops. In the best case it reduces latency of cases in
bench-strlen by 48%, with gains throughout the benchmark.
strlen-walk also sees uniform gains in the 5%-15% range.
Overall the routine appears to work better than the stock one for falkor
regardless of the benchmark, length of string or cache state.
The same cannot be said of a53 and a72 though. a53 performance was
greatly reduced and for a72 it was a bit of a mixed bag, slightly on the
negative side but I reckon it might be fast in some situations.
* sysdeps/aarch64/strlen.S (__strlen): Rename to STRLEN.
[!STRLEN](STRLEN): Set to __strlen.
* sysdeps/aarch64/multiarch/strlen.c: New file.
* sysdeps/aarch64/multiarch/strlen_generic.S: Likewise.
* sysdeps/aarch64/multiarch/strlen_asimd.S: Likewise.
* sysdeps/aarch64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add strlen.
* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add
strlen_generic and strlen_asimd.
Reviewed-By: szabolcs.nagy@arm.com
CC: pinskia@gmail.com
The internal functions __kernel_sinf and __kernel_cosf are used only by
lgammaf_r. Removing the internal functions and using the generic sinf
and cosf is better overall. Benchmarking on Cortex-A72 shows the generic
sinf and cosf are 1.4x and 2.3x faster in the range |x| < PI/4, and 0.66x
and 1.1x for |x| < PI/2, so it should make lgammaf_r faster on average.
GLIBC regression tests pass on AArch64.
* sysdeps/ieee754/flt-32/e_lgammaf_r.c (sin_pif): Use __sinf/__cosf.
* sysdeps/ieee754/flt-32/k_cosf.c (__kernel_cosf): Remove all code.
* sysdeps/ieee754/flt-32/k_sinf.c (__kernel_sinf): Likewise.
Fix a few missing spaces, it's now identical to the regenerated version.
Passes GLIBC tests on x64.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerate to fix spaces.
The second patch improves performance of sinf and cosf using the same
algorithms and polynomials. The returned values are identical to sincosf
for the same input. ULP definitions for AArch64 and x64 are updated.
sinf/cosf througput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.2x
* |x| < M_PI_4 : 1.8x
* |x| < 2 * M_PI: 1.7x
* |x| < 120.0 : 2.3x
* |x| < Inf : 3.0x
* NEWS: Mention sinf, cosf, sincosf.
* sysdeps/aarch64/libm-test-ulps: Update ULP for sinf, cosf, sincosf.
* sysdeps/x86_64/fpu/libm-test-ulps: Update ULP for sinf and cosf.
* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Add definitions of
constants rather than including generic sincosf.h.
* sysdeps/x86_64/fpu/s_sincosf_data.c: Remove.
* sysdeps/ieee754/flt-32/s_cosf.c (cosf): Rewrite.
* sysdeps/ieee754/flt-32/s_sincosf.h (reduced_sin): Remove.
(reduced_cos): Remove.
(sinf_poly): New function.
* sysdeps/ieee754/flt-32/s_sinf.c (sinf): Rewrite.
This patch updates sysdeps/unix/sysv/linux/syscall-names.list for
Linux 4.18. The io_pgetevents and rseq syscalls are added to the
kernel on various architectures, so need to be mentioned in this file.
Tested with build-many-glibcs.py.
* sysdeps/unix/sysv/linux/syscall-names.list: Update kernel
version to 4.18.
(io_pgetevents): New syscall.
(rseq): Likewise.
The install.texi documentation of uses of Perl and Python is
substantially out of date.
The description of Perl is "to test the installation" (which I
interpret as referring to test-installation.pl), but it's used for
more tests than that, and to build the manual, and to regenerate one
file in the source tree.
The description of Python is only for pretty-printer tests, but it's
used for other tests / benchmarks as well (and for other internal uses
such as updating Unicode data, for which we already require Python 3,
but I think install.texi only needs to describe uses from the main
glibc Makefiles).
This patch updates the descriptions of what those tools are used for.
The Python information (and information about other tools for testing
pretty printers) was awkwardly in the middle of the general
description of building and testing glibc, rather than with the rest
of information about tools used in glibc build and test; this patch
moves the information about those tools into the main list.
Tested with regeneration of INSTALL as well as "make info" and "make
pdf".
* manual/install.texi (Configuring and compiling): Do not list
tools used for testing pretty printers here.
(Tools for Compilation): List Python, PExpect and GDB here.
Update descriptions of uses of Perl and Python.
* INSTALL: Regenerate.
Add the workload test properties (max-throughput, latency, etc.) to
the schema to prevent benchmark output validation from failing.
* benchtests/scripts/benchout.schema.json (properties): Add
new properties.
Add the duration and iterations attributes to the workloads tests to
make the json schema parser happy
* benchtests/bench-skeleton.c (main): Add duration and
iterations attributes.
Adjust the non-glibc code to agree with what Gawk needs for
rational range interpretation (RRI) for regular expression ranges.
In unibyte locales, Gawk wants ranges to use the underlying byte
rather than the character code point. This change does not affect
glibc proper.
* posix/regcomp.c (parse_byte) [!LIBC && RE_ENABLE_I18N]:
In unibyte locales, use the byte value rather than
running it through btowc.
Continuing moving macros out of math-tests.h to smaller headers
following typo-proof conventions instead of using #ifndef, this patch
moves the SNAN_TESTS_* macros for individual types out to their own
sysdeps header (while the type-generic SNAN_TESTS wrapper for those
macros remains in math-tests.h).
Tested for x86_64 and x86, and with build-many-glibcs.py.
* sysdeps/generic/math-tests-snan.h: New file.
* sysdeps/generic/math-tests.h: Include <math-tests-snan.h>.
(SNAN_TESTS_float): Do not define here.
(SNAN_TESTS_double): Likewise.
(SNAN_TESTS_long_double): Likewise.
(SNAN_TESTS_float128): Likewise.
* sysdeps/i386/fpu/math-tests-snan.h: New file.
* sysdeps/i386/fpu/math-tests.h: Remove file.
* sysdeps/ia64/math-tests-snan.h: New file.
* sysdeps/ia64/math-tests.h: Remove file.
* sysdeps/x86/math-tests.h: Likewise.
* sysdeps/x86_64/fpu/math-tests-snan.h: New file.
This patch is a complete rewrite of sincosf. The new version is
significantly faster, as well as simple and accurate.
The worst-case ULP is 0.5607, maximum relative error is 0.5303 * 2^-23 over
all 4 billion inputs. In non-nearest rounding modes the error is 1ULP.
The algorithm uses 3 main cases: small inputs which don't need argument
reduction, small inputs which need a simple range reduction and large inputs
requiring complex range reduction. The code uses approximate integer
comparisons to quickly decide between these cases.
The small range reducer uses a single reduction step to handle values up to
120.0. It is fastest on targets which support inlined round instructions.
The large range reducer uses integer arithmetic for simplicity. It does a
32x96 bit multiply to compute a 64-bit modulo result. This is more than
accurate enough to handle the worst-case cancellation for values close to
an integer multiple of PI/4. It could be further optimized, however it is
already much faster than necessary.
sincosf throughput gains on Cortex-A72:
* |x| < 0x1p-12 : 1.6x
* |x| < M_PI_4 : 1.7x
* |x| < 2 * M_PI: 1.5x
* |x| < 120.0 : 1.8x
* |x| < Inf : 2.3x
* math/Makefile: Add s_sincosf_data.c.
* sysdeps/ia64/fpu/s_sincosf_data.c: New file.
* sysdeps/ieee754/flt-32/s_sincosf.h (abstop12): Add new function.
(sincosf_poly): Likewise.
(reduce_small): Likewise.
(reduce_large): Likewise.
* sysdeps/ieee754/flt-32/s_sincosf.c (sincosf): Rewrite.
* sysdeps/ieee754/flt-32/s_sincosf_data.c: New file with sincosf data.
* sysdeps/m68k/m680x0/fpu/s_sincosf_data.c: New file.
* sysdeps/x86_64/fpu/s_sincosf_data.c: New file.
This patch currently only affects aarch64.
The roundtoint and converttoint internal functions are only called with small
values, so 32 bit result is enough for converttoint and it is a signed int
conversion so the return type is changed to int32_t.
The original idea was to help the compiler keeping the result in uint64_t,
then it's clear that no sign extension is needed and there is no accidental
undefined or implementation defined signed int arithmetics.
But it turns out gcc does a good job with inlining so changing the type has
no overhead and the semantics of the conversion is less surprising this way.
Since we want to allow the asuint64 (x + 0x1.8p52) style conversion, the top
bits were never usable and the existing code ensures that only the bottom
32 bits of the conversion result are used.
On aarch64 the neon intrinsics (which round ties to even) are changed to
round and lround (which round ties away from zero) this does not affect the
results in a significant way, but more portable (relies on round and lround
being inlined which works with -fno-math-errno).
The TOINT_SHIFT and TOINT_RINT macros were removed, only keep separate code
paths for TOINT_INTRINSICS and !TOINT_INTRINSICS.
* sysdeps/aarch64/fpu/math_private.h (roundtoint): Use round.
(converttoint): Use lround.
* sysdeps/ieee754/flt-32/math_config.h (roundtoint): Declare and
document the semantics when TOINT_INTRINSICS is set.
(converttoint): Likewise.
(TOINT_RINT): Remove.
(TOINT_SHIFT): Remove.
* sysdeps/ieee754/flt-32/e_expf.c (__expf): Remove the TOINT_RINT code
path.
Commit 298d0e3129 ("Consolidate Linux
getdents{64} implementation") broke the implementation because it does
not take into account struct offset differences.
The new implementation is close to the old one, before the
consolidation, but has been cleaned up slightly.
* Since __fentry__ is almost the same as _mcount, reuse the code by
#including it twice with different #defines around.
* Remove LA usages - they are needed in 31-bit mode to clear the top
bit, but in 64-bit they appear to do nothing.
* Add CFI rule for the nonstandard return register. This rule applies
to the current function (binutils generates a new CIE - see
gas/dw2gencfi.c:select_cie_for_fde()), so it is not necessary to put
__fentry__ into a new file.
* Fix CFI offset for %r14.
* Add CFI rule for %r0.
* Fix unwound value of %r15 being off by 244 bytes.
* Unwinding in __fentry__@plt does not work, no plan to fix it - it
would require asking linker to generate CFI for return address in
%r0. From functional perspective keeping it broken is fine, since
the callee did not have a chance to do anything yet. From
convenience perspective it would be possible to enhance GDB in the
future to treat __fentry__@plt in a special way.
* Fix whitespace.
* Fix offsets in comments, which were copied from 32-bit code.
* 32-bit version will not be implemented, since it's not compatible
with the corresponding PLT stubs: they assume %r12 points to GOT,
which is not the case for gcc-emitted __fentry__ stub, which runs
before the prolog.
This patch adds the runtime support in glibc for the -mfentry
gcc feature introduced in [1] and [2].
[1] https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00784.html
[2] https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00912.html
ChangeLog:
* sysdeps/s390/s390-64/Versions (__fentry__): Add.
* sysdeps/s390/s390-64/s390x-mcount.S: Move the common
code to s390x-mcount.h and #include it.
* sysdeps/s390/s390-64/s390x-mcount.h: New file.
* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
(__fentry__): Add.
__fentry__ symbol is currently not defined for other architectures.
Attempts to introduce it cause abicheck to fail, because it will be
available since 2.29 earliest, and not 2.13, which is the case for
Intel. With the new code, abicheck passes for i686-linux-gnu,
x86_64-linux-gnu and x86_64-linux-gnu32 triples.
ChangeLog:
* stdlib/Versions: Remove __fentry__.
* sysdeps/i386/Versions: Add __fentry__.
* sysdeps/x86_64/Versions: Add __fentry__.
The following combinations need to be tested:
* 32- (g5, esa and zarch) and 64-bit
* linux32 glibc/configure CC='gcc -m31 -march=g5'
* linux32 glibc/configure CC='gcc -m31'
* linux32 glibc/configure CC='gcc -m31 -mzarch'
* With and without VX:
* glibc/configure libc_cv_asm_s390_vx=no
* With and without profiling (using LD_PROFILE)
* With and without pltexit (using LD_AUDIT)
ChangeLog:
* sysdeps/s390/Makefile: Register the new tests.
* sysdeps/s390/tst-dl-runtime-mod.S: New file.
* sysdeps/s390/tst-dl-runtime-profile-audit.c: New file.
* sysdeps/s390/tst-dl-runtime-profile-noaudit.c: New file.
* sysdeps/s390/tst-dl-runtime-resolve-audit.c: New file.
* sysdeps/s390/tst-dl-runtime-resolve-noaudit.c: New file.
* sysdeps/s390/tst-dl-runtime.c: New file.