Since the x86-64 assembly version of sincosf is higly optimized with
vector instructions, there isn't much room for improvement. However
s_sincosf.c written in C with vector math and intrinsics can be
optimized by GCC with FMA.
On Skylake, bench-sincosf reports performance improvement:
Assembly FMA improvement
max 104.042 101.008 3%
min 9.426 8.586 10%
mean 20.6209 18.2238 13%
* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
Add s_sincosf-sse2 and s_sincosf-fma.
(CFLAGS-s_sincosf-fma.c): New.
* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: New file.
* sysdeps/x86_64/fpu/multiarch/s_sincosf-sse2.S: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_sincosf.c: Likewise.
* sysdeps/x86_64/fpu/s_sincosf.S: Don't add alias if
__sincosf is defined.
GCC PR 83641 results in a miscompilation of libpthread, which
causes pthread_exit not to restore callee-saved registers before
running destructors for objects on the stack. This test detects
this situation:
info: unsigned int, direct pthread_exit call
tst-thread-exit-clobber.cc:80: numeric comparison failure
left: 4148288912 (0xf741dd90); from: value
right: 1600833940 (0x5f6ac994); from: magic_values.v2
info: double, direct pthread_exit call
info: unsigned int, indirect pthread_exit call
info: double, indirect pthread_exit call
error: 1 test failures
Commit 24731685 ("prlimit: Translate old_rlimit from RLIM64_INFINITY to
RLIM_INFINITY") broken the getrlimit64 for 32-bit configurations which
do no need the 2GiB limited compat getrlimit (default version >= 2.2).
This patch fixes that by restoring the weak alias in that case.
Changelog:
* sysdeps/unix/sysv/linux/getrlimit64 (getrlimit64)
[!__RLIM_T_MATCHES_RLIM64_T]
[!SHLIB_COMPAT (libc, GLIBC_2_1, GLIBC_2_2)]: Define as weak alias of
__getrlimit64. Add libc_hidden_weak.
This follows c45d78aac ('posix: Fix generic p{read,write}v buffer allocation
(BZ#22457)'), which made pwritev to use __mmap instead of __posix_memalign,
but didn't pass PROT_READ to it, while the pwrite() call does need to
read the data we have just copied over.
* sysdeps/posix/pwritev_common.c: Add PROT_READ to __mmap prot.
`make check' sometimes triggers a rebuild of librt.so using
nptl/Makefile, which ignores librt's dependence on libpthread. This
causes the build to blow up when we attempt to run the test suite on
RISC-V.
2018-01-06 Palmer Dabbelt <palmer@sifive.com>
* nptl/Makefile (/librt.so): Always depend on
"$(shared-thread-library)".
The RISC-V port will have libraries in subdirectories of lib, like
"lib64/lp64d". This adds support for stripping these installed
libraries.
2018-01-06 Palmer Dabbelt <palmer@sifive.com>
* scripts/build-many-glibcs.py (class Glibc): Strip shared objects
in subdirectories of lib.
The RISC-V Linux port defines VDSO symbols
2018-01-06 Palmer Dabbelt <palmer@sifive.com>
* sysdeps/unix/sysv/linux/dl-vdso.h (VDSO_NAME_LINUX_4_15): New
define.
(VDSO_HASH_LINUX_4_15): Likewise.
The RISC-V Linux ABI doesn't define any libraries that go directly in
lib, instead they go into lib32/ilp32 or lib64/lp64. This casuse
make-link-multidir to fail when attempting to make library directories
when building a static libc on multilib RISC-V systems.
This patch uses scripts/mkinstalldirs to make the base directory of the
target symlink of make-link-multidir.
2018-01-06 Palmer Dabbelt <palmer@sifive.com>
* Makerules (make-link-multidir): Make directories before linking into
them.
This follows ccf970c7a ('posix: Add compat glob symbol to not follow
dangling symbols') by adding to gnu/ the same compatibility as for Linux.
* sysdeps/gnu/glob64.c (__glob): Define macro instead of glob macro.
(__glob64): Define GLIBC_2_27 versioned symbol instead of glob64.
* sysdeps/gnu/glob-lstat-compat.c: New file.
* sysdeps/gnu/glob64-lstat-compat.c: New file.
The function _itoa_word() writes characters from the higher address to
the lower address, requiring the destination string to reserve that size
before calling it.
* sysdeps/powerpc/powerpc64/dl-machine.c (_dl_reloc_overflow):
Reserve 16 chars to reloc_addr before calling _itoa_word.
Signed-off-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add a test to check that the getrlimit, setrlimit and prlimit functions
and their 64-bit equivalent behave correctly with RLIM_INFINITY and
RLIM64_INFINITY. For that it assumes that the prlimit64 function calls
the syscall directly without translating the value and that the kernel
uses the -1 value to represent infinity.
It first finds a resource with the hard limit set to infinity so the
soft limit can be manipulated easily and check for the consistency
between the value set or get by the prlimit64 and the other functions.
It is Linux specific add it uses the prlimit and prlimit64 functions.
Changelog:
* sysdeps/unix/sysv/linux/tst-rlimit-infinity.c: New file.
* sysdeps/unix/sysv/linux/Makefile (tests): Add tst-rlimit-infinity.
prlimit called without a new value fails on 32-bit machines if any of
the soft or hard limits are infinity. This is because prlimit does not
translate old_rlimit from RLIM64_INFINITY to RLIM_INFINITY, but checks
that the value returned by the prlimit64 syscall fits into a 32-bit
value, like it is done for example in getrlimit. Note that on the
other hand new_rlimit is correctly translated from RLIM_INFINITY to
RLIM64_INFINITY before calling the syscall.
This patch fixes that.
Changelog:
[BZ #22678]
* sysdeps/unix/sysv/linux/prlimit.c (prlimit): Translate
old_rlimit from RLIM64_INFINITY to RLIM_INFINITY.
Fix the RLIM_INFINITY and RLIM64_INFINITY constants on alpha to match
the kernel one and all other architectures. Change the getrlimit,
getrlimit64, setrlimit, setrlimit64 into old compat symbols, and provide
the Linux generic functions as GLIBC_2_27 version.
Changelog:
* sysdeps/unix/sysv/linux/getrlimit64.c [USE_VERSIONED_RLIMIT]: Do not
define getrlimit and getrlimit64 as weak aliases of __getrlimit64.
Define __GI_getrlimit64 as weak alias of __getrlimit64.
[__RLIM_T_MATCHES_RLIM64_T]: Do not redefine SHLIB_COMPAT, use #elif
instead.
* sysdeps/unix/sysv/linux/setrlimit64.c [USE_VERSIONED_RLIMIT]: Do not
define setrlimit and setrlimit64 as weak aliases of __setrlimit64.
* sysdeps/unix/sysv/linux/alpha/bits/resource.h (RLIM_INFINITY,
RLIM64_INFINITY): Fix values to match the kernel ones.
* sysdeps/unix/sysv/linux/alpha/getrlimit64.c: Define
USE_VERSIONED_RLIMIT. Rename __getrlimit64 into __old_getrlimit64 and
provide it as getrlimit@@GLIBC_2_0 and getrlimit64@@GLIBC_2_1. Add a
__getrlimit64 function and provide it as getrlimit@@GLIBC_2_27 and
getrlimit64@@GLIBC_2_27.
* sysdeps/unix/sysv/linux/alpha/setrlimit64.c: Ditto with setrlimit
and setrlimit64.
* sysdeps/unix/sysv/linux/alpha/libc.abilist (GLIBC_2.27): Add
getrlimit, setrlimit, getrlimit64 and setrlimit64.
* sysdeps/unix/sysv/linux/alpha/Versions (libc): Add getrlimit,
setrlimit, getrlimit64 and setrlimit64.
RLIM64_INFINITY was supposed to be a glibc convention rather than
anything seen by the kernel, but it ended being passed to the kernel
through the prlimit64 syscall.
* On the kernel side, the value is defined for the prlimit64 syscall for
all architectures in include/uapi/linux/resource.h:
#define RLIM64_INFINITY (~0ULL)
* On the kernel side, the value is defined for getrlimit and setrlimit
in arch/alpha/include/uapi/asm/resource.h
#define RLIM_INFINITY 0x7ffffffffffffffful
* On the GNU libc side, the value is defined in
sysdeps/unix/sysv/linux/alpha/bits/resource.h:
# define RLIM64_INFINITY 0x7fffffffffffffffLL
This was not an issue until the getrlimit and setrlimit glibc functions
have been changed in commit 045c13d185 ("Consolidate Linux setrlimit and
getrlimit implementation") to use the prlimit64 syscall instead of the
getrlimit and setrlimit ones.
This patch fixes that by adding a wrapper to fix the value passed to or
received from the kernel, before or after calling the prlimit64 syscall.
Changelog:
[BZ #22648]
* sysdeps/unix/sysv/linux/alpha/getrlimit64.c: New file.
* sysdeps/unix/sysv/linux/alpha/setrlimit64.c: Ditto.
This patch increases timeouts on three tests I observed timing out on
slow systems.
* malloc/tst-malloc-tcache-leak.c (TIMEOUT): Define to 50.
* posix/tst-glob-tilde.c (TIMEOUT): Define to 200.
* resolv/tst-resolv-res_ninit.c (TIMEOUT): Define to 50.
As discussed in libc-alpha [1], alpha trunc{f} implementation uses
addt/suc and subt/suc and although the Alpha Architecture
Handbook version 3 states that that ADDx SUBx OUTPUT Exceptions
(B.3 Mapping to IEEE Standard) should not generate Inexact if INE
bit is set, the Alpha 21264 [2] chip manual (A.8 IEEE Floating-Point
Conformance) states that ADDx SUBx OUTPUT does generate inexact
exception for inexact result regardless.
As Joseph noted [3] to correctly fix it on alpha we need to either
avoid the instruction or avoid any inexact bit from it being set
on return from the function (while preserving the inexact bit that
might be set on the entry to the function). The later will result
mf_fpcr followed by a mt_fpcr to get and set the fpcr which will
defeat the optimization itself.
So the patch just remove the alpha optimized and rely on generic
implementation. It fixes the math/test-*-{trunc} on alpha.
[BZ #15479]
[BZ #22666]
* sysdeps/alpha/fpu/s_trunc.c: Remove file.
* sysdeps/alpha/fpu/s_truncf.c: Likewise.
[1] https://sourceware.org/ml/libc-alpha/2018-01/msg00114.html
[2] https://www.star.bnl.gov/public/daq/HARDWARE/21264_data_sheet.pdf
[3] https://sourceware.org/ml/libc-alpha/2018-01/msg00086.html
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
As discussed in libc-alpha [1], alpha ceil{f} and floor{f}
implementation uses cvttq/svm and although the Alpha Architecture
Handbook version 3 states that that CVTfi OUTPUT Exceptions
(B.3 Mapping to IEEE Standard) should not generate Inexact if INE
bit is set on fpcr, the Alpha 21264 [1] chip manual (A.8 IEEE
Floating-Point Conformance) states that CVTfi and CVTif OUTPUT
does generate inexact exception for inexact result regardless.
As Joseph noted [2] to correctly fix it on alpha we need to either
avoid the instruction or avoid any inexact bit from it being set
on return from the function (while preserving the inexact bit that
might be set on the entry to the function). The later will result
mf_fpcr followed by a mt_fpcr to get and set the fpcr which will
defeat the optimization itself.
So the patch just remove the alpha optimized and rely on generic
implementation. It fixes the math/test-*-{ceil,floor} on alpha.
[BZ #15479]
[BZ #22665]
* sysdeps/alpha/fpu/s_ceil.c: Remove file.
* sysdeps/alpha/fpu/s_ceilf.c: Likewise.
* sysdeps/alpha/fpu/s_floor.c: Likewise.
* sysdeps/alpha/fpu/s_floorf.c: Likewise.
[1] https://www.star.bnl.gov/public/daq/HARDWARE/21264_data_sheet.pdf
[2] https://sourceware.org/ml/libc-alpha/2018-01/msg00086.html
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Before this change, if glibc was compiled with SSE instructions and a
sufficiently recent GCC, an unaligned stack access in
__run_exit_handlers would cause stdlib/tst-makecontext to crash.
This commit adds a new _dl_open_hook entry for dlvsym and implements the
function using the existing dl_lookup_symbol_x function supplied by the
dynamic loader.
A new hook variable, _dl_open_hook2, is introduced, which should make
this change suitable for backporting: For old statically linked
binaries, __libc_dlvsym will always return NULL.
Currently math_errhandling is always set to MATH_ERRNO | MATH_ERREXCEPT
even if -fno-math-errno is used. It is not defined at all when fast-math
is used. Set it to 0 with fast-math - this is noncomforming but more
useful than not define math_errhandling at all. Also take __NO_MATH_ERRNO__
into account and update comment.
* math/math.h (math_errhandling): Set to 0 with __FAST_MATH__.
Add __NO_MATH_ERRNO__ check.
Changelog:
* sysdeps/unix/sysv/linux/alpha/getrlimit64.c (__old_getrlimit64):
Drop __RLIM_T_MATCHES_RLIM64_T conditional as __old_getrlimit64 is
never defined in that case.
Changelog:
* sysdeps/unix/sysv/linux/alpha/getrlimit64.c: Fix a typo in the
comment.
* sysdeps/unix/sysv/linux/alpha/setrlimit64.c: Fix a typo in the
comment.
(settrlimit): Rename into setrlimit.
(__sttrlimit): Rename into __setrlimit.
I found that "make regen-ulps" failed when building with unmodified
GNU make 4.1, and an objdir /some/where/math/ longer than about 37
characters, because the list of tests in the "for run in $^" loop
exceeded the Linux kernel's MAX_ARG_STRLEN limit (131072 bytes) on the
length of a single argument passed to a command.
Some GNU/Linux distributions have a patch to make to work around this
limit (see e.g. Debian bug 688601), but clearly this ought to work
without needing such a patch. This patch arranges for the shell loop
to be over the test names without a $(objdir) prefix, which reduces
the space used to less than half MAX_ARG_STRLEN.
(I think we ought to aim to get rid of bits/mathinline.h completely -
filing GCC bugs for any optimizations GCC can't currently do with
-ffast-math - which would mean we could halve the number of libm tests
run because separate inline function tests would no longer be needed.
However, with a long directory name even half the number of tests
could make this command exceed MAX_ARG_STRLEN without my patch.)
Tested regen-ulps on a system where it failed before this patch.
* math/Makefile (run-regen-ulps): Add $(objpfx) to test name here.
(regen-ulps): Use $(libm-tests) not $^ in shell loop.
Various fmax and fmin function implementations mishandle sNaN
arguments:
(a) When both arguments are NaNs, the return value should be a qNaN,
but sometimes it is an sNaN if at least one argument is an sNaN.
(b) Under TS 18661-1 semantics, if either argument is an sNaN then the
result should be a qNaN (whereas if one argument is a qNaN and the
other is not a NaN, the result should be the non-NaN argument).
Various implementations treat sNaNs like qNaNs here.
One way to fix that is to detect the sNaN and add a special case. That
said there is no FPU instruction to do that, so it requires transfering
the FP value to an integer register and testing bits. This becomes quite
complicated so it's probably better to just use the generic versions of
these functions which just do that through issignaling.
Changelog:
[BZ #22660]
* sysdeps/alpha/fpu/s_fmax.S: Remove file.
* sysdeps/alpha/fpu/s_fmaxf.S: Likewise.
* sysdeps/alpha/fpu/s_fmin.S: Likewise.
* sysdeps/alpha/fpu/s_fminf.S: Likewise.
This patch updates various files from their upstream sources. This
brings in copyright date updates for some of those files.
Tested for x86_64.
* manual/texinfo.tex: Update to version 2017-12-26.21 with
trailing whitespace removed.
* scripts/config.guess: Update to version 2018-01-01.
* scripts/config.sub: Update to version 2018-01-01.
* scripts/move-if-change: Update from gnulib.
The patch which moved libio.h proper into the bits directory also
changed the name of its guard macro, and I neglected to check whether
anything depended on that name. It turns out that there is a
conditionally-used bits header that looks at it; this broke the libgcc
build on at least sparc64-*-* and sparcv9-*-*.
* libio/bits/libio-ldbl.h: Correct check for improper
inclusion. Add own multiple include guard.
The fillin_rpath function in elf/dl-load.c loops over each RPATH or
RUNPATH tokens and interprets empty tokens as the current directory
("./"). In practice the check for empty token is done *after* the
dynamic string token expansion. The expansion process can return an
empty string for the $ORIGIN token if __libc_enable_secure is set
or if the path of the binary can not be determined (/proc not mounted).
Fix that by moving the check for empty tokens before the dynamic string
token expansion. In addition, check for NULL pointer or empty strings
return by expand_dynamic_string_token.
The above changes highlighted a bug in decompose_rpath, an empty array
is represented by the first element being NULL at the fillin_rpath
level, but by using a -1 pointer in decompose_rpath and other functions.
Changelog:
[BZ #22625]
* elf/dl-load.c (fillin_rpath): Check for empty tokens before dynamic
string token expansion. Check for NULL pointer or empty string possibly
returned by expand_dynamic_string_token.
(decompose_rpath): Check for empty path after dynamic string
token expansion.