GCC 4.8 enables -ftree-loop-distribute-patterns at -O3 by default and
this optimization may transform loops into memset/memmove calls. Without
proper handling this may generate unexpected PLT calls on GLIBC.
This patch fixes by create memset/memmove alias to internal GLIBC
__GI_memset/__GI_memmove symbols.
The most common use case of math functions is with default rounding
mode, i.e. rounding to nearest. Setting and restoring rounding mode
is an unnecessary overhead for this, so I've added support for a
context, which does the set/restore only if the FP status needs a
change. The code is written such that only x86 uses these. Other
architectures should be unaffected by it, but would definitely benefit
if the set/restore has as much overhead relative to the rest of the
code, as the x86 bits do.
Here's a summary of the performance improvement due to these
improvements; I've only mentioned functions that use the set/restore
and have benchmark inputs for x86_64:
Before:
cos(): ITERS:4.69335e+08: TOTAL:28884.6Mcy, MAX:4080.28cy, MIN:57.562cy, 16248.6 calls/Mcy
exp(): ITERS:4.47604e+08: TOTAL:28796.2Mcy, MAX:207.721cy, MIN:62.385cy, 15543.9 calls/Mcy
pow(): ITERS:1.63485e+08: TOTAL:28879.9Mcy, MAX:362.255cy, MIN:172.469cy, 5660.86 calls/Mcy
sin(): ITERS:3.89578e+08: TOTAL:28900Mcy, MAX:704.859cy, MIN:47.583cy, 13480.2 calls/Mcy
tan(): ITERS:7.0971e+07: TOTAL:28902.2Mcy, MAX:1357.79cy, MIN:388.58cy, 2455.55 calls/Mcy
After:
cos(): ITERS:6.0014e+08: TOTAL:28875.9Mcy, MAX:364.283cy, MIN:45.716cy, 20783.4 calls/Mcy
exp(): ITERS:5.48578e+08: TOTAL:28764.9Mcy, MAX:191.617cy, MIN:51.011cy, 19071.1 calls/Mcy
pow(): ITERS:1.70013e+08: TOTAL:28873.6Mcy, MAX:689.522cy, MIN:163.989cy, 5888.18 calls/Mcy
sin(): ITERS:4.64079e+08: TOTAL:28891.5Mcy, MAX:6959.3cy, MIN:36.189cy, 16062.8 calls/Mcy
tan(): ITERS:7.2354e+07: TOTAL:28898.9Mcy, MAX:1295.57cy, MIN:380.698cy, 2503.7 calls/Mcy
So the improvements are:
cos: 27.9089%
exp: 22.6919%
pow: 4.01564%
sin: 19.1585%
tan: 1.96086%
The downside of the change is that it will have an adverse performance
impact on non-default rounding modes, but I think the tradeoff is
justified.
__clock_gettime and other __clock_* functions could result in an extra
PLT reference within libc.so if it actually gets used. None of the
code currently uses them, which is why this probably went unnoticed.
Resolves: #15465
The program name may be unavailable if the user application tampers
with argc and argv[]. Some parts of the dynamic linker caters for
this while others don't, so this patch consolidates the check and
fallback into a single macro and updates all users.
Fixes BZ #15339.
NSS_STATUS_UNAVAIL may mean that a necessary input resource is not
available. This could occur in a number of cases including when the
network is down, system runs out of file descriptors, etc. The
correct differentiator in such a case is the h_errno, which gives the
nature of failure. In case of failures other than a simple 'not
found', we set h_errno as NETDB_INTERNAL and let errno be the
identifier for the exact error.
This implementation speed up memset in several ways. First is avoiding
expensive computed jump. Second is using fact that arguments of memset
are most of time aligned to 8 bytes.
Benchmark results on:
kam.mff.cuni.cz/~ondra/benchmark_string/memset_profile_result27_04_13.tar.bz2
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.
Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
[BZ #15442] This adds support for the inverse interpretation of the
quiet bit of IEEE 754 floating-point NaN data that some processors
use. This includes in particular MIPS architecture processors; the
payload used for the canonical qNaN encoding is updated accordingly
so as not to interfere with the quiet bit.
The EXTRACT_WORDS64 and INSERT_WORDS64 macros use movd for a 64-bit
operation. Somehow gcc manages to turn this into movq, but LLVM won't.
2013-05-15 Peter Collingbourne <pcc@google.com>
* sysdeps/x86_64/fpu/math_private.h (MOVQ): New macro.
(EXTRACT_WORDS64) Use where appropriate.
(INSERT_WORDS64) Likewise.
While these instructions accept memory operands, only one operand
may be a memory operand. Giving two operands xm constraints gives
the compiler the option of using memory for both operands, which
would result in invalid assembly code. Using x for all operands is
more appropriate, as most x86_64 calling conventions will pass the
arguments in registers anyway.
2013-05-15 Peter Collingbourne <pcc@google.com>
* sysdeps/x86_64/fpu/multiarch/s_fma.c (__fma_fma4): Replace xm
constraints with x constraints.
* sysdeps/x86_64/fpu/multiarch/s_fmaf.c (__fmaf_fma4): Likewise.
PowerPC kernel now provides a vDSO implementation for time syscall
(commit fcb41a2030abe0eb716ef0798035ef9562097f42). This patch changes
time syscall wrapper to use the vDSO when available. It also changes
the default non vDSO time on PowerPC to use sysdeps/posix/time.c
(since gettimeofday is a vDSO call).
* sysdeps/gnu/netinet/tcp.h (TCP_TIMESTAMP): New value, from
Linux 3.9.
* sysdeps/unix/sysv/linux/bits/socket.h (PF_VSOCK, AF_VSOCK):
Add.
(PF_MAX): Adjust for VSOCK change.
This patch fix the 3c0265394d commits
by correctly setting minimum architecture for modf PPC optimization
to power5+ instead of power5 (since only on power5+ round/ceil will
be inline to inline assembly).
Kay Sievers reported that coreutils' stat tool has a problem with
s390's statfs[64] definition:
> The definition of struct statfs::f_type needs a fix. s390 is the only
> architecture in the kernel that uses an int and expects magic
> constants lager than INT_MAX to fit into.
>
> A fix is needed to make Fedora boot on s390, it currently fails to do
> so. Userspace does not want to add code to paper-over this issue.
[...]
> Even coreutils cannot handle it:
> #define RAMFS_MAGIC 0x858458f6
> # stat -f -c%t /
> ffffffff858458f6
>
> #define BTRFS_SUPER_MAGIC 0x9123683E
> # stat -f -c%t /mnt
> ffffffff9123683e
The bug is caused by an implicit sign extension within the stat tool:
out_uint_x (pformat, prefix_len, statfsbuf->f_type);
where the format finally will be "%lx".
A similar problem can be found in the 'tail' tool.
s390 is the only architecture which has an int type f_type member in
struct statfs[64]. Other architectures have either unsigned ints or
long values, so that the problem doesn't occur there.
Therefore change the type of the f_type member to unsigned int, so
that we get zero extension instead sign extension when assignment to
a long value happens.
Reported-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
We no longer support configuring for i386, nor do we
elide such a configuration to i686. Configuring with
i386-* is a failure, and we provide an example of
how to fix that.
---
2013-04-17 Carlos O'Donell <carlos@redhat.com>
* configure.in: Remove i386 configure warning. Remove i386 case.
* configure: Regenerate.
* sysdeps/i386/configure.in: Raise error if config_machine is i386.
Add example to error message.
* sysdeps/i386/configure: Regenerate.
The value of PI is never exactly PI in any floating point representation,
and the value of PI/2 is never PI/2. It is wrong to expect cos(M_PI_2l)
to return 0, instead it will return an answer that is non-zero because
M_PI_2l doesn't round to exactly PI/2 in the type used.
That is to say that the correct answer is to do the following:
* Take PI or PI/2.
* Round to the floating point representation.
* Take the rounded value and compute an infinite precision cos or sin.
* Use the rounded result of the infinite precision cos or sin as the
answer to the test.
I used printf to do the type rounding, and Wolfram's Alpha to do the
infinite precision cos calculations.
The following changes bring x86-64 and x86 to 1/2 ulp for two tests.
It shows that the x86 cos implementation is quite good, and that
our test are flawed.
Unfortunately given that the rounding errors are type dependent we
need to fix this for each type. No regressions on x86-64 or x86.
---
2013-04-11 Carlos O'Donell <carlos@redhat.com>
* math/libm-test.inc (cos_test): Fix PI/2 test.
(sincos_test): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerate.
* sysdeps/i386/fpu/libm-test-ulps: Regenerate.
This change does two things:
* Treats a target i386-* as if it were i686.
* Fails configure if the user is generating code
for i386.
We no longer support i386 code-generation because the i386
lacks the atomic operations we need in glibc.
You can still configure for i386-*, but you get i686 code.
You can't build with --march=i386, --mtune=i386 or a compiler
that defaults to i386 code-generation.
I've added two i386 entries in the master todo list to discuss
merging and renaming:
http://sourceware.org/glibc/wiki/Development_Todo/Master#i386
The failure modes are fail-safe here. You compile for i386,
get i686, and try to run on i386 and it fails. The configure
log has a warning saying we elided to i686. There is no situation
that I can see where we run into any serious problems.
The patch makes the current state better in that we get less
confused users and we build successfully in more default
configurations.
The next enhancement would be to add --march=i?86
as suggested in #c20 of BZ#10062 for any i?86-* builds, which
would solve the problem of a 32-bit compiler that defaults to
i386 code-gen and glibc configured for i686-* target. Which
previously failed at build time, and now will fail at configure
time (requires adding --march=i686).
Updated NEWS with BZ #10060 and #10062.
No regressions.
---
2013-04-06 Carlos O'Donell <carlos@redhat.com>
[BZ #10060, #10062]
* aclocal.m4 (LIBC_COMPILER_BUILTIN_INLINED): New macro.
* sysdeps/i386/configure.in: Use LIBC_COMPILER_BUILTIN_INLINED and
fail configure if __sync_val_compare_and_swap is not inlined.
* sysdeps/i386/configure: Regenerate.
* configure.in: Build for i686 when configured for i386.
* configure: Regenerate.
* README: Remove i386 reference.
The s390 and s390x sysdep.h files include the more generic sysdep.h.
The more generic sysdep.h defines PSEUDO. This causes an annoying
CPP warning saying the PSEUDO was redefined. This patch removes the
warning by undefining PSEUDO before the redefinition. This is in line
with what all the other machines do.
---
2013-04-06 Carlos O'Donell <carlos@redhat.com>
* sysdeps/s390/s390-32/sysdep.h: Undefine PSEUDO before redefinition.
* sysdeps/s390/s390-64/sysdep.h: Likewise.
Fix BZ #15305.
On kernel versions earlier than 2.6.29, the Linux kernel exported a
sysctl called restrict_chown for xfs, which could be used to allow
chown to users other than the owner. 2.6.29 removed this support,
causing the open_not_cancel_2 to fail and thus modify errno. The fix
is to save and restore errno so that the caller sees it as unmodified.
Additionally, since the code to check the sysctl is not useful on
newer kernels, we add an ifdef so that in future the code block gets
rmeoved completely.
The branch prediction hints is actually hurts performance in this case.
The assembly implementation make two assumptions: 1. 'fabs (x) < 2^52'
is unlikely and 2. 'x > 0.0' is unlike (if 1. is true). Since it a
general floating point function, expected input is not bounded and then
it is better to let the hardware handle the branches.
The compiler is smart enough to convert those into double for powerpc,
but if we put them as doubles, it adds overhead by performing those
operations in floating point mode.
The mantissa of mp_no is intended to take only integral values. This
is a relatively good choice for powerpc due to its 4 fpus, but not for
other architectures, which suffer due to this choice. This change
makes the default mantissa a long integer and allows powerpc to
override it. Additionally, some operations have been optimized for
integer manipulation, resulting in a significant improvement in
performance.
The patch increase the high value to check if expl overflows. Current
high mark value is not really correct, the algorithm accepts high values.
It also adds a correct wrapper function to check for overflow and underflow.
Due to a typo repeated several times, this bug hasn't been fixed yet,
despite being marked as resolved in glibc 2.12.
* sysdeps/x86_64/strcmp.S: Replace all occurrences of NOT_IN_lib
with NOT_IN_libc.
Fixes BZ #12723
The variable pipe buffer size does nothing to the value of PIPE_BUF,
since the number of bytes that are atomically written is still
PIPE_BUF on Linux.
* sysdeps/unix/sysv/linux/bits/mman-linux.h (MAP_ANONYMOUS):
Allow definition via __MAP_ANONYMOUS.
* sysdeps/unix/sysv/linux/mips/bits/mman.h: Remove all defines
provided by bits/mman-linux.h and include <bits/mman-linux.h>.
(__MAP_ANONYMOUS): Define.
* sysdeps/unix/sysv/linux/s390/bits/mman.h: Include
<bits/mman-linux.h>.
(MCL_CURRENT, MCL_FUTURE): Do not define here, the generic value
is fine.
* sysdeps/unix/sysv/linux/sh/bits/mman.h: Move include of
<bits/mman-linux.h> to end of file.
(MCL_CURRENT, MCL_FUTURE): Do not define here, the generic value
is fine.
* sysdeps/unix/sysv/linux/x86/bits/mman.h: Move include of
<bits/mman-linux.h> to end of file.
(MCL_CURRENT, MCL_FUTURE): Do not define here, the generic value
is fine.
* sysdeps/unix/sysv/linux/sparc/bits/mman.h: Move include of
<bits/mman-linux.h> to end of file.
* sysdeps/unix/sysv/linux/bits/mman-linux.h [!MCL_CURRENT]
(MCL_CURRENT, MCL_FUTURE): Define here.
* sysdeps/unix/sysv/linux/bits/mman-linux.h: New file, with
Linux common definitions.
* sysdeps/unix/sysv/linux/sh/bits/mman.h: Remove all defines
provided by bits/mman-linux.h and include <bits/mman-linux.h>.
* sysdeps/unix/sysv/linux/x86/bits/mman.h: Likewise.
* sysdeps/unix/sysv/linux/s390/bits/mman.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/bits/mman.h: Likewise.
* sysdeps/unix/sysv/linux/sh/bits/mman.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/bits/mman.h: Likewise.
Using long throughout like powerpc does is beneficial since it reduces
the need to switch to 32-bit instructions. It gives a very minor
performance improvement.
These prototypes are duplicated in many places. Add a dedicated
header for holding prototypes for program-specific functions to
avoid that.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This feature is specifically for the C++ compiler to offload calling
thread_local object destructors on thread program exit, to glibc.
This is to overcome the possible complication of destructors of
thread_local objects getting called after the DSO in which they're
defined is unloaded by the dynamic linker. The DSO is marked as
'unloadable' if it has a constructed thread_local object and marked as
'unloadable' again when all the constructed thread_local objects
defined in it are destroyed.
ARM now supports loading unmarked objects from
the dynamic loader cache. Unmarked objects can
be used with the hard-float or soft-float ABI.
We must support loading unmarked objects during
the transition period from a binutils that does
not mark objects to one that does mark them with
the correct ELF flags.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/sparc/sparc32/sparcv9/bits/atomic.h
(__arch_compare_and_exchange_val_32_acq): Use %g0 as second
argument of CAS if possible.
* sysdeps/sparc/sparc64/bits/atomic.h
(__arch_compare_and_exchange_val_32_acq): Likewise.
(__arch_compare_and_exchange_val_64_acq): Likewise.
The bsd implementation of ulimit produces wrong return values, so remove it
in favour of the posix implementation.
The only regression for non-Linux implementations using bsd sysdeps and not
providing an own ulimit is that the __UL_GETMAXBRK case (which is non-standard)
is left unimplemented (giving EINVAL).
In order for the __kernel_get_tbfreq vDSO call to work the
INTERNAL_VSYSCALL_NCS macro needed to be updated to prevent it from
assuming an integer return type (since the timebase frequency is a 64-bit
value) by specifying the type of the return type as a macro parameter. The
macro then specifically declares the return value as a 'register' (or
implied pair) of the denoted type. The compiler is then informed that this
register (or implied pair) is to be used for the return value.
Introduce (only on Linux) and use a HAVE_MREMAP symbol to advertize mremap
availability.
Move the malloc-sysdep.h include from arena.c to malloc.c, since what is
provided by malloc-sysdep.h is needed earlier in malloc.c, before the inclusion
of arena.c.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile: Add vis3
trunc{,f} to libm-sysdep_routes.
* sysdeps/sparc/sparc64/fpu/multiarch/Makefile: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_trunc-vis3.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_trunc.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_truncf-vis3.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_truncf.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_trunc.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_truncf.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc-vis3.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_trunc.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf-vis3.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_truncf.S: New file.
* sysdeps/sparc/sparc64/fpu/s_trunc.S: New file.
* sysdeps/sparc/sparc64/fpu/s_truncf.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile: Add vis3
nearbyint{,f} to libm-sysdep_routes.
* sysdeps/sparc/sparc64/fpu/multiarch/Makefile: Likewise.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyint-vis3.S:
New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyint.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyintf-vis3.S:
New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_nearbyintf.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_nearbyint.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_nearbyintf.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_nearbyint-vis3.S: New
file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_nearbyint.S: New file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_nearbyintf-vis3.S: New
file.
* sysdeps/sparc/sparc64/fpu/multiarch/s_nearbyintf.S: New file.
* sysdeps/sparc/sparc64/fpu/s_nearbyint.S: New file.
* sysdeps/sparc/sparc64/fpu/s_nearbyintf.S: New file.
This header uses size_t but doesn't include stddef.h for it. So when
packages happen to include this before any header that defines size_t,
they get a build failure.
Reviewed-by: Carlos O'Donell <codonell@redhat.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/Makefile: Add vis3
fdim/fdimf to libm-sysdep_routines.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdim-vis3.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdim.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdimf-vis3.S: New
file.
* sysdeps/sparc/sparc32/sparcv9/fpu/multiarch/s_fdimf.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_fdim.S: New file.
* sysdeps/sparc/sparc32/sparcv9/fpu/s_fdimf.S: New file.
* sysdeps/sparc/sparc32/fpu/s_fdim.S: New file.
* sysdeps/sparc/sparc32/fpu/s_fdimf.S: New file.
* sysdeps/sparc/sparc64/fpu/s_fdim.S: New file.
* sysdeps/sparc/sparc64/fpu/s_fdimf.S: New file.
* math/Makefile: Recognize gmp-sysdep_routines.
* sysdeps/sparc/sparc64/multiarch/Makefile: Add VIS3 optimized GMP routines
to sysdeps.
* sysdeps/sparc/sparc64/multiarch/add_n-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/add_n.S: New file.
* sysdeps/sparc/sparc64/multiarch/addmul_1-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/addmul_1.S: New file.
* sysdeps/sparc/sparc64/multiarch/mul_1-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/mul_1.S: New file.
* sysdeps/sparc/sparc64/multiarch/sub_n-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/sub_n.S: New file.
* sysdeps/sparc/sparc64/multiarch/submul_1-vis3.S: New file.
* sysdeps/sparc/sparc64/multiarch/submul_1.S: New file.
* sysdeps/sparc/sparc32/sparcv9/mul_1.S: Properly optimize for 32-bit
sparc V9 rather than using V8 code.
* sysdeps/sparc/sparc32/sparcv9/addmul_1.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/submul_1.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/mul_1.S: Properly optimize for 32-bit
sparc V9 rather than using V8 code.
* sysdeps/sparc/sparc32/sparcv9/addmul_1.S: Likewise.
* sysdeps/sparc/sparc32/sparcv9/submul_1.S: Likewise.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_ceil.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_finite.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_floor.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_frexp.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_isinf.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_isnan.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_llround.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_logb.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_lround.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_modf.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_nearbyint.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_remquo.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_rint.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_round.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_scalbln.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_scalbn.c: New file.
* sysdeps/ieee754/ldbl-opt/wordsize-64/s_trunc.c: New file.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/Implies: Add
ieee754/ldbl-opt/wordsize-64.
* sysdeps/powerpc/powerpc64/Implies: Add
ieee754/dbl-64/wordsize-64.
The mpexp code has an access into m1np:
for (i=n-1; i>0; i--,n--) { if (m1np[i][p]+m2>0) break; }
which could break for p >= 18 or i >= 7. Fortunately this code is
never called due to the way the exp function is implemented since
values having exponent less than -55 return 1.0. Make sure that if it
gets called in future, it is trapped.
Some arches do not have a __NR_fadvise64 but do have __NR_fadvise64_64.
If the former is unavailable, fallback to the latter.
Reviewed-by: Carlos O'Donell <carlos@systemhalted.org>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
The __builtin_bswap* functions were introduced in gcc-4.3, not gcc-4.2.
Fix the __GNUC_PREREQ tests to reflect this.
Otherwise trying to compile code with gcc-4.2 falls down:
In file included from /usr/include/endian.h:60,
from /usr/include/ctype.h:40,
/usr/include/bits/byteswap.h: In function 'unsigned int __bswap_32(unsigned int)':
/usr/include/bits/byteswap.h:46: error: '__builtin_bswap32' was not declared in this scope
/usr/include/bits/byteswap.h: In function 'long long unsigned int __bswap_64(long long unsigned int)':
/usr/include/bits/byteswap.h:110: error: '__builtin_bswap64' was not declared in this scope
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
In commit 26889eacc2 (Remove
__ASSUME_POSIX_CPU_TIMERS), all users of HAS_CPUCLOCK were
dropped. Punt the fallback definition too.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
ptsname_r on failure returns the value that is also set as errno; furthermore,
add more checks to it:
- set errno and return it on __term_get_peername failure
- set errno to ERANGE other than returning it
- change the type of PEERNAME to string_t, and check its length with __strnlen
In ptsname:
- change the type of PEERNAME to string_t
- do not set errno manually, since ptsname_r has set it already
With help from Joseph Myers.
* sysdeps/ieee754/ldbl-128/s_atanl.c (__atanl): Handle tiny and
very large arguments properly.
* math/libm-test.inc (atan_test): New tests.
(atan2_test): New tests.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
* sysdeps/generic/ldconfig.h (FLAG_AARCH64_LIB64): New macro.
* elf/cache.c (print_entry): Print ",AArch64" for
FLAG_AARCH64_LIB64.
Signed-off-by: Steve McIntyre <steve.mcintyre@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@systemhalted.org>
* sysdeps/generic/ldconfig.h (FLAG_ARM_LIBHF): New macro.
* elf/cache.c (print_entry): Print ",hard-float" for
FLAG_ARM_LIBHF.
Signed-off-by: Steve McIntyre <steve.mcintyre@linaro.org>
Reviewed-by: Carlos O'Donell <carlos@systemhalted.org>
With help from Joseph Myers.
* sysdeps/ieee754/flt-32/e_j0f.c (__ieee754_y0f): Adjust tinyness
cutoff to 2**-13.
* sysdeps/ieee754/flt-32/e_j1f.c (__ieee754_y1f): Adjust tinyness
cutoff to 2**-25.
* sysdeps/ieee754/ldbl-128/e_j0l.c (U0): New constant.
( __ieee754_y0l): Avoid arithmetic underflow when 'x' is very
small.
* sysdeps/ieee754/ldbl-128/e_j1l.c (__ieee754_y1l): Likewise.
* math/libm-test.inc (y0_test): New tests.
(y1_test): New tests.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
* sysdeps/sparc/fpu/libm-test-ulps: Update.
* crypt/Makefile: Move test targets after toplevel Rules
inclusion. Grab any necessary sysdep routines when linking.
* crypt/md5.c (md5_process_block): Remove define, we will always
name it __md5_process_block.
(md5_finish_ctx): Update md5_process_block call.
(md5_stream): Likewise.
(md5_process_bytes): Likewise.
(md5_process_block): Rename to __md5_process_block and move to ...
* crypt/md5-block.c: ... here.
* crypt/sha256.c (sha256_process_block): Move to ...
* crypt/sha256-block.c: ... here.
* crypt/sha512.c (sha512_process_block): Move to ...
* crypt/sha512-block.c: ... here.
* locale/Makefile (CFLAGS-md5.c): Define to add crypt/ to include
path.
* sysdeps/sparc/sparc-ifunc.c (sparc_libc_ifunc): Define.
* sysdeps/sparc/sparc64/multiarch/Makefile
(libcrypt-sysdep_routines): Add crypto assembler sysdeps when in
crypt subdir.
(localedef-aux): Add md5 crypto assembler when in locale subdir.
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile: Mirror sparc64
multiarch changes.
* sysdeps/sparc/sparc64/multiarch/md5-block.c: New file.
* sysdeps/sparc/sparc64/multiarch/md5-crop.S: New file.
* sysdeps/sparc/sparc64/multiarch/sha256-block.c: New file.
* sysdeps/sparc/sparc64/multiarch/sha256-crop.S: New file.
* sysdeps/sparc/sparc64/multiarch/sha512-block.c: New file.
* sysdeps/sparc/sparc64/multiarch/sha512-crop.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/md5-block.c: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/md5-crop.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha256-block.c: New
file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha256-crop.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha512-block.c: New
file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/sha512-crop.S: New file.
* sysdeps/unix/sysv/linux/sparc/sparc64/get_clockfreq.c: Include
inttypes.h
(__get_clockfreq_via_proc_openprom): Use __open, __read, and
__close rather than their public counterparts.
* sysdeps/unix/sysv/linux/powerpc/bits/fcntl.h: Remove all
definitions and declarations that are provided by
<bits/fcntl-linux.h> and include <bits/fcntl-linux.h>.
* sysdeps/unix/sysv/linux/sparc/sparc64/__start_context.S: New file.
* sysdeps/unix/sysv/linux/sparc/sparc64/makecontext.c
(__start_context): Declare.
(__makecontext_ret): Delete.
(__makecontext): Hook up __start_context instead of
__makecontext_ret.
* sysdeps/unix/sysv/linux/sparc/sparc64/Makefile
(sysdep_routines): Add __start_context when in stdlib.
[BZ #14809]
* sysdeps/unix/sysv/linux/sys/sysctl.h (_UAPI_LINUX_KERNEL_H)
(_UAPI_LINUX_TYPES_H): Starting with Linux 3.7, the include header
guards are changed. Only define if not yet defined, #undef back
after including linux/sysctl.h if defined here.
Adapts __ppc_get_timebase to the upcoming GCC 4.8 that provides
__builtin_ppc_get_timebase. Building applicationns with previous
versions of GCC will continue to use the internal implementation.
When glibc is built with --enable-static-nss, the warning that
using NSS symbols requires the nss shared objects to be present
is no longer true, as those symbols are built into libc. Suppress
the warning for those symbols by providing a new macro
(nss_interface_function) for the NSS functions that is defined as
static_link_warning in the normal case, and empty for static NSS.
* sysdeps/unix/sysv/linux/x86/bits/fcntl.h (__O_LARGEFILE)
[!__x86_64]: Do not define, take value from <bits/fcntl-linux.h>.
* sysdeps/unix/sysv/linux/s390/bits/fcntl.h (__O_LARGEFILE):
[__WORDSIZE != 64]: Likewise.
* sysdeps/unix/sysv/linux/generic/bits/fcntl.h: (__O_LARGEFILE)
[__WORDSIZE != 64]: Do not define, take value from
<bits/fcntl-linux.h>.
Create a new bits/fcntl-linux.h that contains Linux generic code and a
include it from the architecture specific bits/fcntl.h.
Architectures done: x86, SPARC, s390
(__crypt_r, __crypt): Disable MD5 and DES if FIPS is enabled.
* crypt/md5c-test.c (main): Tolerate disabled MD5.
* sysdeps/unix/sysv/linux/fips-private.h: New file.
* sysdeps/generic/fips-private.h: New file, dummy fallback.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara4.S: On 32-bit, clear
upper 32-bits of the length value in %o2 since we use branch-on-register
tests which consider the entire 64-bit register.
* sysdeps/sparc/sparc64/multiarch/memset-niagara4.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memset-niagara4.S: New
file.
* sysdeps/sparc/sparc64/multiarch/Makefile: Add to
sysdep_routines.
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile: Likewise.
* sysdeps/sparc/sparc64/multiarch/memset.S: Use Niagara-4 memset
and bzero when HWCAP_SPARC_CRYPTO is present.
* sysdeps/sparc/sparc64/multiarch/memcpy-niagara4.S: New file.
* sysdeps/sparc/sparc32/sparcv9/multiarch/memcpy-niagara4.S: New
file.
* sysdeps/sparc/sparc64/multiarch/Makefile: Add to
sysdep_routines.
* sysdeps/sparc/sparc32/sparcv9/multiarch/Makefile: Likewise.
* sysdeps/sparc/sparc64/multiarch/memcpy.S: Use Niagara-4 memcpy
and mempcpy when HWCAP_SPARC_CRYPTO is set.
* sysdeps/posix/getaddrinfo.c (default_scopes): Map RFC 1918
* addresses
to global scope.
* posix/tst-rfc3484.c: Verify 10/8, 172.16/12 and 196.128/16
addresses are in the same scope as 192.0.2/24.
* posix/gai.conf: Document new scope table defaults.
[BZ #14376]
* sysdeps/sparc/sparc64/dl-machine.h (elf_machine_rela): Do not
pass reloc->r_addend in as the 'high' argument to
sparc64_fixup_plt when handling R_SPARC_JMP_IREL relocations.
Using madvise with MADV_DONTNEED to release memory back to the kernel
is not sufficient to change the commit charge accounted against the
process on Linux. It is OK however, when overcommit is enabled or is
heuristic. However, when overcommit is restricted to a percentage of
memory setting the contents of /proc/sys/vm/overcommit_memory as 2, it
makes a difference since memory requests will fail. Hence, we do what
we do with secure exec binaries, which is to call mmap on the region
to be dropped with MAP_FIXED. This internally unmaps the pages in
question and reduces the amount of memory accounted against the
process.
* sysdeps/sparc/sparc32/sparcv9/addmul_1.S: New file.
* sysdeps/sparc/sparc32/sparcv9/submul_1.S: New file.
* sysdeps/sparc/sparc32/sparcv9/mul_1.S: New file.
* sysdeps/i386/i686/fpu/multiarch/Makefile (sysdep_routines):
Add s_sinf-sse2, s_conf-sse2.
* sysdeps/i386/i686/fpu/multiarch/s_sinf-sse2.S: New file.
* sysdeps/i386/i686/fpu/multiarch/s_cosf-sse2.S: New file.
* sysdeps/i386/i686/fpu/multiarch/s_sinf.c: New file.
* sysdeps/i386/i686/fpu/multiarch/s_cosf.c: New file.
* sysdeps/ieee754/flt-32/s_sinf.c (SINF, SINF_FUNC): Add macros
for using routine as __sinf_ia32.
Use macro for function declaration and weak_alias.
* sysdeps/ieee754/flt-32/s_cosf.c (COSF, COSF_FUNC): Add macros
for using routine as __cosf_ia32.
Use macro for function declaration and weak_alias.
* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S: Fix Copyright.
* sysdeps/i386/i686/fpu/multiarch/e_expf.c: Fix Copyright.
* sysdeps/x86_64/fpu/s_sinf.S: New file.
* sysdeps/x86_64/fpu/s_cosf.S: New file.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
* math/libm-test.inc (cos_test): Add more test cases.
(sin_test): Likewise.
(sincos_test): Likewise.
[BZ #14538]
* sysdeps/x86_64/dl-machine.h (elf_machine_dynamic): Use the
first element of the GOT.
(elf_machine_load_address): Return the difference between
the runtime address of _DYNAMIC and elf_machine_dynamic ().
The ttyname and ttyname_r functions on Linux now fall back to
searching for the tty file descriptor in /dev/pts or /dev if /proc is
not available. This allows creation of chroots without the procfs
mounted on /proc.
Fixes BZ #14516.
Initially based on the versions found in wcsmbs/* ; these files have
been changed by hand unrolling, and adding some additional variables
to allow some read-ahead to occur, which then relieves some of the
wait-for-increment/wait-for-load/wait-for-compare-results pressure
that was slowing down every iteration through the while-loop.
For 64-bit Power7, These changes give an approx 20% throughput boost
for the wcschr and wcsrchr functions; and approx 40% boost for the
wcscpy function. 32-bit improvements appear to be slightly better
with ~ %30 and ~ %45 respectively. Results for Power6 closely match
those for power7.
Assorted tweaking, twisting and tuning to squeeze a few additional cycles
out of the memchr code. Changes include bypassing the shift pairs
(sld,srd) when they are not required, and unrolling the small_loop that
handles short and trailing strings.
Per scrollpipe data measuring aligned strings for 64-bit, these changes
save between five and eight cycles (9-13% overall) for short strings (<32),
Longer aligned strings see slight improvement of 1-3% due to bypassing the
shifts and the instruction rearranging.