MIPS has its own version of dl-lookup.c to deal with differences
between undefined symbol semantics in the PIC and non-PIC ABIs. This
is often liable to get out of date with respect to the generic file
(for example, the recent __builtin_expect changes didn't cover ports,
and it's not obvious to anyone changing dl-lookup.c that there would
be architecture-specific versions).
This patch adds a macro that dl-machine.h can define that is used in
the appropriate place in dl-lookup.c, so that MIPS no longer needs its
own version of that file.
Tested for mips64 that the only changes to disassembly of installed
shared libraries appear to be ld.so changes attributable to different
line numbers and paths in assertions.
* elf/dl-lookup.c (ELF_MACHINE_SYM_NO_MATCH): Define if not
already defined.
(do_lookup_x): Use ELF_MACHINE_SYM_NO_MATCH.
* sysdeps/mips/dl-lookup.c: Remove.
* sysdeps/mips/dl-machine.h (ELF_MACHINE_SYM_NO_MATCH): New macro.
This patch moves the AArch64 port to the main sysdeps hierarchy. The
move is essentially:
git mv ports/sysdeps/aarch64 sysdeps/aarch64
git mv ports/sysdeps/unix/sysv/linux/aarch64 sysdeps/unix/sysv/linux/aarch64
The README is updated and I've updated ChangeLog.aarch64 along the
lines of the ARM move. The AArch64 build has been tested to confirm
that there were no changes in objdump -dr output or the shared
objects.
I've moved the MIPS port from ports to the main sysdeps hierarchy.
Beyond the README update, the move of the files was simply
git mv ports/sysdeps/mips sysdeps/mips
git mv ports/sysdeps/unix/mips sysdeps/unix/mips
git mv ports/sysdeps/unix/sysv/linux/mips sysdeps/unix/sysv/linux/mips
and in addition to the ChangeLog entries here, I put a note at the top
of ports/ChangeLog.mips similar to those in other files.
Tested that disassembly of installed shared libraries for mips is the
same before and after this patch (except for ld.so where paths in
assertions are involved, as for arm).
* sysdeps/mips: Move directory from ports/sysdeps/mips.
* sysdeps/unix/mips: Move directory from ports/sysdeps/unix/mips.
* sysdeps/unix/sysv/linux/mips: Move directory from
ports/sysdeps/unix/sysv/linux/mips.
* README: Update listing for mips-*-linux-gnu and
mips64-*-linux-gnu.
* sysdeps/mips: Move directory to ../sysdeps/mips.
* sysdeps/unix/mips: Move directory to ../sysdeps/unix/mips.
* sysdeps/unix/sysv/linux/mips: Move directory to
../sysdeps/unix/sysv/linux/mips.
I've moved the TILE-Gx and TILEPro ports to the main sysdeps hierarchy,
along with the linux-generic ports infrastructure. Beyond the README
update, the move was just
git mv ports/sysdeps/tile sysdeps/tile
git mv ports/sysdeps/unix/sysv/linux/tile \
sysdeps/unix/sysv/linux/tile
git mv ports/sysdeps/unix/sysv/linux/generic \
sysdeps/unix/sysv/linux/generic
I updated the relevant ChangeLogs along the lines of the ARM move
in commit c6bfe5c4d7 and tested the 64-bit tilegx build to confirm that
there were no changes in "objdump -dr" output in the shared objects.
This pulls in the latest defines for {g,s}etsockopt.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Import the current list of defines available in the kernel headers.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
I've moved the ARM port from ports to the main sysdeps hierarchy.
Beyond the README update, the move of the files was simply
git mv ports/sysdeps/arm sysdeps/arm
git mv ports/sysdeps/unix/arm sysdeps/unix/arm
git mv ports/sysdeps/unix/sysv/linux/arm sysdeps/unix/sysv/linux/arm
and in addition to the ChangeLog entries here, I put a note at the top
of ports/ChangeLog.arm similar to that at the top of
ChangeLog.powerpc. There is deliberately no NEWS change, as I think
it makes the most sense to put in a general note above all ports
having moved if we can achieve that for 2.20.
Tested that disassembly of installed shared libraries for arm is the
same before and after this patch, except for data (not instructions)
in ld.so (there are assertions in sysdeps/arm/dl-machine.h, and the
path by which that file is found, and so by which it appears in the
assertion message, changes as a result of the move).
* sysdeps/arm: Move directory from ports/sysdeps/arm.
* sysdeps/unix/arm: Move directory from ports/sysdeps/unix/arm.
* sysdeps/unix/sysv/linux/arm: Move directory from
ports/sysdeps/unix/sysv/linux/arm.
* README: Update listing for arm-*-linux-gnueabi.
ports/ChangeLog.arm:
* sysdeps/arm: Move directory to ../sysdeps/arm.
* sysdeps/unix/arm: Move directory to ../sysdeps.arm.
* sysdeps/unix/sysv/linux/arm: Move directory to
../sysdeps/unix/sysv/linux/arm.
This reverts commit 1f33d36a8a.
Conflicts:
elf/dl-misc.c
Also reverts the follow commits that were bug fixes to new code introduced
in the above commit:
063b2acbceb627fdd585e81c64bba1
Support for /proc/self/task/$tid/comm as added in Linux 2.6.33,
therefore since the test tst-setgetname relies on this functionality
to operate we must skip the test in kernels < 2.6.33. We wrap the
checks with __ASSUME_PROC_PID_TASK_COMM such that in the future when
we move arch_minimum_kernel to 2.6.33 we can remove this code.
This patch creates implicit rules to match the abifiles if
abilist-pattern is defined in the architecture Makefile. This allows
machine specific Makefiles to define different abifiles names
(for instance *-le.abilist for powerpc64le).
When i386 and x86-64 mathinline.h was merged into a single mathinline.h,
"gcc -m32" enables x87 inline functions on x86-64 even when -mfpmath=sse
and SSE2 is enabled. It is a regression on x86-64. We should check
__SSE2_MATH__ instead of __x86_64__ when disabling x87 inline functions.
In BZ #15605 fix with addding memset/memmove alias in symbol-hacks.h,
x32 symbol-hacks.h change was missing. Fixed by including
<sysdeps/generic/symbol-hacks.h> in x32 symbol-hacks.h.
The IFUNC selector for gettimeofday runs before _libc_vdso_platform_setup where
__vdso_gettimeofday is set. The selector then sets __gettimeofday (the internal
version used within GLIBC) to use the system call version instead of the vDSO one.
This patch changes the check if vDSO is available to get its value directly
instead of rely on __vdso_gettimeofday.
This patch changes it by getting the vDSO value directly.
It fixes BZ#16431.
See commit 41b1792698 for testcase.
Note: while this works on s390x, the s390 code hangs when using -e.
But it hangs regardless of this code (the hang seems to occur before
the exit func is even called). I didn't look too closely at it as
it seems to be an issue external to this file, so this code shouldn't
make the situation any worse.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This patches fixes BZ#16430 by setting a different symbol for internal
GLIBC calls that points to ifunc resolvers. For PPC32, if the symbol
is defined as hidden (which is the case for gettimeofday and time) the
compiler will create local branches (symbol@local) and linker will not
create PLT calls (required for IFUNC). This will leads to internal symbol
calling the IFUNC resolver instead of the resolved symbol.
For PPC64 this behavior does not occur because a call to a function in
another translation unit might use a different toc pointer thus requiring
a PLT call.
We needlessly enabled thread cancellation before it was necessary. As
only call that needs to be guarded is waitpid which is cancellation
point we could remove cancellation altogether.
The truncl assembly implementation (sysdeps/powerpc/powerpc64/fpu/s_truncl.S)
returns wrong results for some inputs where first double is a exact integer
and the precision is determined by second long double.
Checking on implementation comments and history, I am very confident the
assembly implementation was based on a version before commit
5c68d40169 that fixes BZ#2423 (Errors in
long double (ldbl-128ibm) rounding functions in glibc-2.4).
By just removing the implementation and make the build select
sysdeps/ieee754/ldbl-128ibm/s_truncl.c instead it fixes tgammal
issues regarding wrong result sign.
This patch fixes bug 16408, ldbl-128ibm expm1l returning NaN for some
large arguments.
The basic problem is that the approach of converting the exponent to
the form n * log(2) + y, where -0.5 <= y <= 0.5, then computing 2^n *
expm1(y) + (2^n - 1) falls over when 2^n overflows (starting slightly
before the point where expm1 overflows, when y is negative and n is
the least integer for which 2^n overflows). The ldbl-128 code, and
the x86/x86_64 code, make expm1l fall back to expl for large positive
arguments to avoid this issue. This patch makes the ldbl-128ibm code
do the same. (The problem appears for the particular argument in the
testsuite because the ldbl-128ibm code also uses an overflow threshold
that's for ldbl-128 and is too big for ldbl-128ibm, but the problem
described applies for large non-overflowing cases as well, although
during the freeze is not a suitable time for making the expm1 tests
cover cases close to overflow more thoroughly.)
This leaves some code for large positive arguments in expm1l that is
now dead. To keep the code for ldbl-128 and ldbl-128ibm similar, and
to avoid unnecessary changes during the freeze, the patch doesn't
remove it; instead I propose to file a bug in Bugzilla as a reminder
that this code (for overflow, including errno setting, and for
arguments of +Inf) is no longer needed and should be removed from both
those expm1l implementations.
Tested powerpc32.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Use __expl
for large positive arguments.
This patch fixes bug 16407, spurious overflows from ldbl-128ibm coshl.
The implementation assumed that a high part (reinterpreted as an
integer) of the absolute value of the argument of 0x408633ce8fb9f87dLL
or more meant overflow, but the actual threshold has high part
0x408633ce8fb9f87eLL (and a negative low part). The patch adjusts the
threshold accordingly.
sinhl probably has the same issue, but I didn't get that far in adding
tests of special cases (such as just below and above overflow) before
the freeze and during the freeze is not a suitable time to add them
(as they'd require ulps to be regenerated again), so I'm not changing
that function for now; when I add more tests of special cases, we'll
discover whether sinhl indeed has this problem.
Tested powerpc32.
* sysdeps/ieee754/ldbl-128ibm/e_coshl.c (__ieee754_coshl):
Increase overflow threshold.
This patch fixes bug 16400, spurious underflow exceptions for ldbl-128
/ ldbl-128ibm lgammal with small positive arguments, by just using
-__logl (x) as the result in the problem cases (similar to the
previous fix for problems with small negative arguments).
Tested powerpc32, and also tested on mips64 that this does not require
ulps regeneration for the ldbl-128 case.
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
Return -__logl (x) for small positive arguments without evaluating
a polynomial.
All the other ptrace structures in this file have a __ prefix except this
new one. This in turn causes build problems for most packages that try to
use ptrace such as strace:
gcc -DHAVE_CONFIG_H -I. -I../.. -I../../linux/x86_64 -I../../linux \
-I./linux -Wall -Wwrite-strings -g -O2 -MT process.o -MD -MP \
-MF .deps/process.Tpo -c -o process.o ../../process.c
In file included from ../../process.c:63:0:
/usr/include/linux/ptrace.h:58:8: error: redefinition of 'struct ptrace_peeksiginfo_args'
struct ptrace_peeksiginfo_args {
^
In file included from ../../defs.h:159:0,
from ../../process.c:37:
/usr/include/sys/ptrace.h:191:8: note: originally defined here
struct ptrace_peeksiginfo_args
^
Since this struct was introduced in glibc-2.18, there shouldn't be any
real regressions with adding the __ prefix.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This patch fixes bug 16390, incorrect signs of zero results from
ldbl-128ibm atan2l, soft-float only. The problem is a longstanding
GCC bug with fabsl not being correct for signed zero for soft float,
and the fix is using -fno-builtin-fabsl as a workaround, as already
done for various other source files. Tested powerpc-nofpu.
* sysdeps/powerpc/nofpu/Makefile [$(subdir) = math]
(CFLAGS-e_atan2l.c): Use -fno-builtin-fabsl.
This patch fixes bug 16386, ldbl-128ibm logl inaccuracy (with
consequent inaccuracy for lgammal) for arguments where the high double
is subnormal, which showed up while attempting to regenerate ulps for
powerpc-nofpu for 2.19. The problem here is logic failing to allow
for subnormals when calculating the exponent of the argument. Tested
for powerpc-nofpu.
* sysdeps/ieee754/ldbl-128ibm/e_logl.c (__ieee754_logl): Adjust
numbers with subnormal high part when calculating exponent.
This patch fixes bug 16385, ldbl-128ibm asinhl inaccuracy, which
showed up while attempting to regenerate ulps for powerpc-nofpu for
2.19. The problem here was use of fabs instead of fabsl meaning large
arguments were reduced to the precision of double. Tested for
powerpc-nofpu.
* sysdeps/ieee754/ldbl-128ibm/s_asinhl.c (__asinhl): Use fabsl not
fabs.
This patch fixes bug 16384, ldbl-128ibm acoshl inaccuracy, which
showed up while attempting to regenerate ulps for powerpc-nofpu for
2.19. There were two separate problems, use of __log1p instead of
__log1pl and an insufficiently accurate constant value for log 2
(which this patch replaces by use of M_LN2l), each of which could
cause substantial inaccuracy in affected cases.
Tested for powerpc-nofpu.
* sysdeps/ieee754/ldbl-128ibm/e_acoshl.c (ln2): Initialize with
M_LN2l.
(__ieee754_acoshl): Use __log1pl not __log1p.
We support older kernels that lack this header, so check for it
before we try to use it.
Reported-by: Adhemerval Zanella <azanella@linux.vnet.ibm.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This patch fixes bug 16337, ldbl-128 lgammal spurious overflows for
small negative arguments (the arguments in question are already in the
testsuite). The implementation uses the reflection formula to compute
lgamma of negative x from lgamma of -x, effectively resulting in a
calculation -log(x^2) + log(-x); cancellation isn't problematic in
this case (bugs for problematic cancellation in lgamma are 2542, 2543,
2558), but the x^2 calculation can underflow (in which case there is
spurious logic to return an overflowing value - lgamma can only ever
correctly overflow for large positive arguments, though tgamma can
overflow for small arguments of either sign as well as large positive
arguments). The fix is simply to calculate the result directly with
logl when the argument is a small enough negative number.
Tested mips64.
* sysdeps/ieee754/ldbl-128/e_lgammal_r.c (__ieee754_lgammal_r):
Calculate results for small negative arguments directly rather
than using reflection formula with special underflow handling.
As discussed in
<https://sourceware.org/ml/libc-alpha/2012-04/msg00840.html> and
<https://sourceware.org/ml/libc-alpha/2012-04/msg00989.html>, it seems
appropriate to flatten sysdeps/unix/bsd/bsd4.4 into sysdeps/unix/bsd.
The bulk of the patch is just moving files. The only other changes
are: update paths in sysdeps/mach/hurd/Implies and
sysdeps/unix/sysv/linux/wait3.c; merge the two syscalls.list files,
with the removal of syscalls that were in
sysdeps/unix/bsd/syscalls.list but overridden in the bsd4.4 directory
by .c files there.
Tested x86_64. The installed shared libraries are identical before
and after the patch except for libc.so where the move of wait3.c
(included by sysdeps/unix/sysv/linux/wait3.c) affects debug info, but
the disassembly is unchanged.
* sysdeps/mach/hurd/Implies: Change unix/bsd/bsd4.4 to unix/bsd.
* sysdeps/unix/bsd/syscalls.list (chflags): Add entry from
sysdeps/unix/bsd/bsd4.4/syscalls.list.
(fchflags): Likewise.
(revoke): Likewise.
(setlogin): Likewise.
(sigaltstack): Likewise.
(wait4): Likewise.
(sigblock): Remove.
(sigsetmask): Likewise.
(wait3): Likewise.
(waitpid): Likewise.
* sysdeps/unix/bsd/bsd4.4/syscalls.list: Remove file.
* sysdeps/unix/sysv/linux/wait3.c: Update directory of included
file.
* sysdeps/unix/bsd/bsd4.4/Makefile: Move to ...
* sysdeps/unix/bsd/Makefile: ... here.
* sysdeps/unix/bsd/bsd4.4/Versions: Move to ...
* sysdeps/unix/bsd/Versions: ... here.
* sysdeps/unix/bsd/bsd4.4/bits/sockaddr.h: Move to ...
* sysdeps/unix/bsd/bits/sockaddr.h: ... here.
* sysdeps/unix/bsd/bsd4.4/cmsg_nxthdr.c: Move to ...
* sysdeps/unix/bsd/cmsg_nxthdr.c: ... here.
* sysdeps/unix/bsd/bsd4.4/sigblock.c: Move to ...
* sysdeps/unix/bsd/sigblock.c: ... here.
* sysdeps/unix/bsd/bsd4.4/sigsetmask.c: Move to ...
* sysdeps/unix/bsd/sigsetmask.c: ... here.
* sysdeps/unix/bsd/bsd4.4/sigvec.c: Move to ...
* sysdeps/unix/bsd/sigvec.c: ... here.
* sysdeps/unix/bsd/bsd4.4/tcdrain.c: Move to ...
* sysdeps/unix/bsd/tcdrain.c: ... here.
* sysdeps/unix/bsd/bsd4.4/tcgetattr.c: Move to ...
* sysdeps/unix/bsd/tcgetattr.c: ... here.
* sysdeps/unix/bsd/bsd4.4/tcsetattr.c: Move to ...
* sysdeps/unix/bsd/tcsetattr.c: ... here.
* sysdeps/unix/bsd/bsd4.4/wait.c: Move to ...
* sysdeps/unix/bsd/wait.c: ... here.
* sysdeps/unix/bsd/bsd4.4/wait3.c: Move to ...
* sysdeps/unix/bsd/wait3.c: ... here.
* sysdeps/unix/bsd/bsd4.4/waitpid.c: Move to ...
* sysdeps/unix/bsd/waitpid.c: ... here.
This patch fixes bug 16356, bad results from x86 / x86_64 expl /
exp10l in directed rounding modes, the most serious of the bugs shown
up by my patch expanding libm test coverage. When I fixed bug 16293,
I thought it was only necessary to set round-to-nearest when using
frndint in expm1 functions, because in other cases the cancellation
error from having the resulting fractional part close to 1 or -1 would
not be significant. However, in expl and exp10l, the way the final
fractional part gets computed (something more complicated than a
simple subtraction, because more precision is needed than you'd get
that way) can result in a value outside the range [-1, 1] when the
argument to frndint was very close to an integer and was rounded the
"wrong" way because of the rounding mode - and the f2xm1 instruction
has undefined results if its argument is outside [-1, 1], so resulting
in the large errors seen. So this patch removes the USE_AS_EXPM1L
conditionals on the round-to-nearest settings, so all of expl, expm1l
and exp10l now get round-to-nearest used for frndint (meaning the
final fractional part can at most be slightly above 0.5 in
magnitude). Associated tests of exp and exp10 are added and testing
of exp10 in directed rounding modes enabled.
Tested x86_64 and x86 and ulps updated accordingly.
* sysdeps/i386/fpu/e_expl.S (IEEE754_EXPL): Also set
round-to-nearest for [!USE_AS_EXPM1L].
* sysdeps/x86_64/fpu/e_expl.S (IEEE754_EXPL): Likewise.
* math/auto-libm-test-in: Do not expect cosh tests to fail. Add
more tests of exp and exp10. Expect some exp10 tests to miss
exceptions or fail in directed rounding modes.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (exp10_tonearest_test_data): New array.
(exp10_test_tonearest): New function.
(exp10_towardzero_test_data): New array.
(exp10_test_towardzero): New function.
(exp10_downward_test_data): New array.
(exp10_test_downward): New function.
(exp10_upward_test_data): New array.
(exp10_test_upward): New function.
(main): Call the new functions.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Various libm functions have inadequate test coverage in libm-test.inc
/ auto-libm-test-in - failing to cover all the usual special cases
(infinities, NaNs, zero, large and small finite values, subnormals) as
well as a reasonable range of ordinary inputs and, where appropriate,
inputs close to the thresholds for underflow and overflow.
This patch improves test coverage for real functions [a-c]* (with the
expectation of adding more coverage for other functions later).
Tested x86_64 and x86 and ulps updated accordingly (and eight glibc
bugs and one C11 DR filed for issues found in the process).
* math/auto-libm-test-in: Add more tests of acos, acosh, asin,
asinh, atan, atan2, atanh, cbrt, cos and cosh.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (acosh_test_data): Add more tests.
(atanh_test_data): Likewise.
(ceil_test_data): Likewise.
(copysign_test_data): Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch moves tests of cpow to auto-libm-test-in, adding the
required support to gen-auto-libm-tests.
Tested x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add tests of cpow.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (cpow_test_data): Use AUTO_TESTS_cc_c.
* * math/gen-auto-libm-tests.c (func_calc_method): Add value
mpc_cc_c.
(func_calc_desc): Add mpc_cc_c union field.
(test_functions): Add cpow.
(special_fill_2pi): New function.
(special_real_inputs): Add 2pi.
(calc_generic_results): Handle mpc_cc_c.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch moves tests of ccos, ccosh, cexp, clog, csqrt, ctan and
ctanh to auto-libm-test-in, adding the required support to
gen-auto-libm-tests. Other TEST_c_c functions aren't moved for now
(although the relevant table entries are put in gen-auto-libm-tests
for it to know how to handle them): clog10 because of a known MPC bug
causing it to hang for at least some pure imaginary inputs (fixed in
SVN, but I'd rather not rely on unreleased versions of MPFR or MPC
even if relying on very recent releases); the inverse trig and
hyperbolic functions because of known slowness in special cases; and
csin / csinh because of observed slowness that I need to investigate
and report to the MPC maintainers. Slowness can be bypassed by moving
to incremental generation (only for new / changed tests) rather than
regenerating the whole of auto-libm-test-out every time, but that
needs implementing. (This patch takes the time for running
gen-auto-libm-tests from about one second to seven, on my system,
which I think is reasonable. The slow functions would make it take
several minutes at least, which seems unreasonable.)
Tested x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add tests of ccos, ccosh, cexp, clog,
csqrt, ctan and ctanh.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (TEST_COND_x86_64): New macro.
(TEST_COND_x86): Likewise.
(ccos_test_data): Use AUTO_TESTS_c_c.
(ccosh_test_data): Likewise.
(cexp_test_data): Likewise.
(clog_test_data): Likewise.
(csqrt_test_data): Likewise.
(ctan_test_data): Likewise.
(ctan_tonearest_test_data): Likewise.
(ctan_towardzero_test_data): Likewise.
(ctan_downward_test_data): Likewise.
(ctan_upward_test_data): Likewise.
(ctanh_test_data): Likewise.
(ctanh_tonearest_test_data): Likewise.
(ctanh_towardzero_test_data): Likewise.
(ctanh_downward_test_data): Likewise.
(ctanh_upward_test_data): Likewise.
* math/gen-auto-libm-tests.c (func_calc_method): Add value
mpc_c_c.
(func_calc_desc): Add mpc_c_c union field.
(FUNC_mpc_c_c): New macro.
(test_functions): Add cacos, cacosh, casin, casinh, catan, catanh,
ccos, ccosh, cexp, clog, clog10, csin, csinh, csqrt, ctan and
ctanh.
(special_fill_min_subnorm_p120): New function.
(special_real_inputs): Add min_subnorm_p120.
(calc_generic_results): Handle mpc_c_c.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch consolidates the multiple copies of code that looks up sin
and cos of a number from the lookup table and computes the final
value, into static functions. This does not have a noticeable
performance impact since the functions are inlined by gcc.
There is further scope for consolidation in the functions but they
cause a more noticable impact on performance (>5%) due to which I have
held back on them.
Removed more redundant computations in the slow paths of the sin and
cos functions. The notable change is the passing of the most
significant bits of X to the slow functions to check if X is positive
so that just the absolute value of x can be passed and the repeated
ABS() operation is avoided.
There are multiple points in the code where the absolute value of a
number is computed multiple times or is computed even though the value
can only be positive. This change removes those redundant
computations. Tested on x86_64 to verify that there were no
regressions in the testsuite.
sysdeps/powerpc/powerpc32/libgcc-compat.S makes certain symbols that
glibc once accidentally reexported from libgcc into compat symbols.
Where the exports were purely accidental, this is the right thing to
do. However, for powerpc-nofpu the soft-fp symbols are deliberately
exported from libc, given public versions in
sysdeps/powerpc/nofpu/Versions and used by libm in preference to the
libgcc versions that do not support the software exceptions and
rounding modes. The libc versions should also be usable by user
programs, though normally libgcc gets linked in first (meaning,
effectively, that the <fenv.h> functions are broken as regards their
expected effects on user arithmetic).
A longstanding todo item is to remove the functions in question from
libgcc (when built with recent enough glibc) - that is, remove them
from static libgcc and make them compat symbols in shared libgcc - so
that this works properly (this is one of the items mentioned at
<http://gcc.gnu.org/wiki/Software_floating_point> - parts of that page
are obviously out of date, but this item still applies). Doing this
requires first that the functions are actually available from libc for
new links, not just as compat symbols.
This patch stops the symbols in question being compat symbols for
powerpc-nofpu. The nofpu Versions entries for them are removed (the
symbols never were exported at GLIBC_2.3.2, only GLIBC_2.0, because
the compat symbols took precedence).
Tested powerpc-nofpu. The symbols are no longer compat symbols and
libm.so now properly gets undefined references to them (resolved to
libc.so) instead of the libgcc copies getting linked into libm as
before.
* sysdeps/powerpc/powerpc32/libgcc-compat.S
[_SOFT_FLOAT || __NO_FPRS__] (__fixdfdi_v_glibc20): Do not define
as a macro and a compat symbol.
[_SOFT_FLOAT || __NO_FPRS__] (__fixsfdi_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__fixunsdfdi_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__fixunssfdi_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__floatdidf_v_glibc20): Likewise.
[_SOFT_FLOAT || __NO_FPRS__] (__floaddisf_v_glibc20): Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixdfdi): Do
not use .hidden.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixsfdi):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixunsdfdi):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__fixunssfdi):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__floaddidf):
Likewise.
[HAVE_DOT_HIDDEN && (_SOFT_FLOAT || __NO_FPRS__)] (__floaddisf):
Likewise.
* sysdeps/powerpc/nofpu/Versions (libc): Remove __fixdfdi,
__fixsfdi, __fixunsdfdi, __fixunssfdi, __floatdidf and __floatdisf
from GLIBC_2.3.2.
This patch moves tests of sincos to auto-libm-test-in, adding the
required support to gen-auto-libm-tests.
Tested x86_64 and x86 and ulps updated accordingly.
(auto-libm-test-out diffs omitted below.)
* math/auto-libm-test-in: Add tests of sincos.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (sincos_test_data): Use AUTO_TESTS_fFF_11.
* math/gen-auto-libm-tests.c (func_calc_method): Add value
mpfr_f_11.
(func_calc_desc): Add mpfr_f_11 union field.
(test_functions): Add sincos.
(calc_generic_results): Handle mpfr_f_11.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
math/gen-libm-test.pl has code to beautify names of various constants,
transforming the source form in libm-test.inc into the version
appearing in test names in libm-test-ulps files.
This has become decreasingly relevant over time for the M_* constants,
first as I changed the test names so only the arguments and not the
expected results appeared in them, then as tests have moved to
auto-libm-test-* so that automatically generated hex float constants
get used instead of M_* in test inputs.
This patch removes the beautification for all M_* constants. Tested
x86_64 and x86 and ulps updated accordingly. Even the one case where
this affected the name in the ulps files will disappear once complex
function tests are moved to auto-libm-test-*.
* math/gen-libm-test.pl (%beautify): Remove M_* constants.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Bug 16293 is inaccuracy of x86/x86_64 versions of expm1, near 0 in
directed rounding modes, that arises from frndint rounding the
exponent to 1 or -1 instead of 0, resulting in large cancellation
error. This inaccuracy in turn affects other functions such as sinh
that use expm1. This patch fixes the problem by setting
round-to-nearest mode temporarily around the affected calls to
frndint. I don't think this is needed for other uses of frndint, such
as in exp itself, as only for expm1 is the cancellation error
significant.
Tested x86_64 and x86 and ulps updated accordingly.
* sysdeps/i386/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]: Set
round-to-nearest mode when using frndint.
* sysdeps/i386/fpu/s_expm1.S (__expm1): Likewise.
* sysdeps/i386/fpu/s_expm1f.S (__expm1f): Likewise.
* sysdeps/x86_64/fpu/e_expl.S (IEEE754_EXPL) [USE_AS_EXPM1L]:
Likewise.
* math/auto-libm-test-in: Add more tests of expm1. Do not expect
sinh test to fail.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (TEST_COND_x86_64): Remove macro.
(TEST_COND_x86): Likewise.
(expm1_tonearest_test_data): New array.
(expm1_test_tonearest): New function.
(expm1_towardzero_test_data): New array.
(expm1_test_towardzero): New function.
(expm1_downward_test_data): New array.
(expm1_test_downward): New function.
(expm1_upward_test_data): New array.
(expm1_test_upward): New function.
(main): Run the new test functions.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
This patch moves tests of jn and yn to auto-libm-test-in, adding the
required support for gen-auto-libm-tests (and adding a missing
assertion there and fixing logic that was broken for functions with
integer arguments).
Tested x86_64 and x86 and ulps updated accordingly.
* math/auto-libm-test-in: Add tests of jn and yn.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (jn_test_data): Use AUTO_TESTS_if_f.
(yn_test_data): Likewise.
* math/gen-auto-libm-tests.c (func_calc_method): Add value
mpfr_if_f.
(func_calc_desc): Add mpfr_if_f union field.
(FUNC_mpfr_if_f): New macro.
(test_functions): Add jn and yn.
(calc_generic_results): Assert type of second input for
mpfr_ff_f. Handle mpfr_if_f.
(output_for_one_input_case): Disable all checking for arguments
fitting floating-point types in case of an integer argument.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
My recent changes that added libm_hidden_proto / libm_hidden_def for
fegetround had the side effect of removing the need for a
localplt.data entry for fegetround for powerpc-nofpu. This patch
removes that entry. Tested powerpc-nofpu.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/nptl/localplt.data:
Don't expect fegetround reference in libm.so.
This patch fixes bug 16338, ldbl-128 logl not handling subnormals
(with consequent inaccuracy for lgammal as well). The fix is simply
to use __frexpl when determining the exponent, as done already in
log2l and log10l. Given the lack of testing of small arguments to any
of the log* functions, appropriate tests are added for all of them.
Tested x86_64 and x86 and ulps updated accordingly, and spot tests
also run for mips64 to confirm the ldbl-128 fix.
Note that while this fixes lgammal inaccuracy for small positive
arguments, I suspect that there will still be problems with spurious
underflows in that case.
* sysdeps/ieee754/ldbl-128/e_logl.c (__ieee754_logl): Use __frexpl
to determine exponent and adjust argument to have exponent of -1.
* math/auto-libm-test-in: Add more tests of log, log10, log1p and
log2.
* math/auto-libm-test-out: Regenerated.
* sysdeps/x86_64/fpu/libm-test-ulps: Update.
- Remove redundant mynumber union definitions
- Clean up a clumsy ternary operator
- Rename TAYLOR_SINCOS to TAYLOR_SIN since we're only expanding the
sin Taylor series in it.
A sse42 version of strstr used pcmpistr instruction which is quite
ineffective. A faster way is look for pairs of characters which is uses
sse2, is faster than pcmpistr and for real strings a pairs we look for
are relatively rare.
For linear time complexity we use buy or rent technique which switches
to two-way algorithm when superlinear behaviour is detected.
For PPC64, all the wrappers at sysdeps are superfluous: they are
basically the same implementation from math/w_sqrt.c with the
'#ifdef _IEEE_LIBM'. And the power4 version just force the 'fsqrt'
instruction utilization with an inline assembly, which is already
handled by math_private.h __ieee754_sqrt implementation.
This patch add optimized __mpn_addmul, __mpn_addsub, __mpn_lshift, and
__mpn_mul_1 implementations for PowerPC64. They are originally from GMP
with adjustments for GLIBC.
This patch add static probes for setjmp/longjmp in the way gdb expects,fixing
the gdb.base/longjmp.exp gdb testcases.
It changes the symbol_name and use macros to to avoid change the probe names
and ending up adding more logic on GDB (since with the expected name
GDB work seamlessly).
To avoid having a ELFv2 binary accidentally picking up an old ABI ld.so,
this patch bumps the soname to ld64.so.2.
In theory (or for testing purposes) this will also allow co-installing
ld.so versions for both ABIs on the same system. Note that the kernel
will already be able to load executables of both ABIs. However, there
is currently no plan to use that theoretical possibility in a any
supported distribution environment ...
Note that in order to check which ABI to use, we need to invoke the
compiler to check the _CALL_ELF macro; this is done in a new configure
check in sysdeps/unix/sysv/linux/powerpc/powerpc64/configure.ac,
replacing the hard-coded value of default-abi in the Makefile.
The ELFv2 ABI changes the calling convention by passing and returning
structures in registers in more cases than the old ABI:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01145.htmlhttp://gcc.gnu.org/ml/gcc-patches/2013-11/msg01147.html
For the most part, this does not affect glibc, since glibc assembler
files do not use structure parameters / return values. However, one
place is affected: the LD_AUDIT interface provides a structure to
the audit routine that contains all registers holding function
argument and return values for the intercepted PLT call.
Since the new ABI now sometimes uses registers to return values
that were never used for this purpose in the old ABI, this structure
has to be extended. To force audit routines to be modified for the
new ABI if necessary, the patch defines v2 variants of the la_ppc64
types and routines.
In addition, the patch contains two unrelated changes to the
PLT trampoline routines: it fixes a bug where FPR return values
were stored in the wrong place, and it removes the unnecessary
save/restore of CR.
This updates glibc for the changes in the ELFv2 relating to the
stack frame layout. These are described in more detail here:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01149.htmlhttp://gcc.gnu.org/ml/gcc-patches/2013-11/msg01146.html
Specifically, the "compiler and linker doublewords" were removed,
which has the effect that the save slot for the TOC register is
now at offset 24 rather than 40 to the stack pointer.
In addition, a function may now no longer necessarily assume that
its caller has set up a 64-byte register save area its use.
To address the first change, the patch goes through all assembler
files and replaces immediate offsets in instructions accessing the
ABI-defined stack slots by symbolic offsets. Those already were
defined in ucontext_i.sym and used in some of the context routines,
but that doesn't really seem like the right place for those defines.
The patch instead defines those symbolic offsets in sysdeps.h,
in two variants for the old and new ABI, and uses them systematically
in all assembler files, not just the context routines.
The second change only affected a few assembler files that used
the save area to temporarily store some registers. In those
cases where this happens within a leaf function, this patch
changes the code to store those registers to the "red zone"
below the stack pointer. Otherwise, the functions already allocate
a stack frame, and the patch changes them to add extra space in
these frames as temporary space for the ELFv2 ABI.
This is a follow-on to the previous patch to support the ELFv2 ABI in the
dynamic loader, split off into its own patch since it is just an optional
optimization.
In the ELFv2 ABI, most functions define both a global and a local entry
point; the local entry requires r2 to be already set up by the caller
to point to the callee's TOC; while the global entry does not require
the caller to know about the callee's TOC, but it needs to set up r12
to the callee's entry point address.
Now, when setting up a PLT slot, the dynamic linker will usually need
to enter the target function's global entry point. However, if the
linker can prove that the target function is in the same DSO as the
PLT slot itself, and the whole DSO only uses a single TOC (which the
linker will let ld.so know via a DT_PPC64_OPT entry), then it is
possible to actually enter the local entry point address into the
PLT slot, for a slight improvement in performance.
Note that this uncovered a problem on the first call via _dl_runtime_resolve,
because that routine neglected to restore the caller's TOC before calling
the target function for the first time, since it assumed that function
would always reload its own TOC anyway ...
This patch adds support for the ELFv2 ABI feature to remove function
descriptors. See this GCC patch for in-depth discussion:
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg01141.html
This mostly involves two types of changes: updating assembler source
files to the new logic, and updating the dynamic loader.
After the refactoring in the previous patch, most of the assembler source
changes can be handled simply by providing ELFv2 versions of the
macros in sysdep.h. One somewhat non-obvious change is in __GI__setjmp:
this used to "fall through" to the immediately following __setjmp ENTRY
point. This is no longer safe in the ELFv2 since ENTRY defines both
a global and a local entry point, and you cannot simply fall through
to a global entry point as it requires r12 to be set up.
Also, makecontext needs to be updated to set up registers according to
the new ABI for calling into the context's start routine.
The dynamic linker changes mostly consist of removing special code
to handle function descriptors. We also need to support the new PLT
and glink format used by the the ELFv2 linker, see:
https://sourceware.org/ml/binutils/2013-10/msg00376.html
In addition, the dynamic linker now verifies that the dynamic libraries
it loads match its own ABI.
The hack in VDSO_IFUNC_RET to "synthesize" a function descriptor
for vDSO routines is also no longer necessary for ELFv2.
This is the first patch to support the new ELFv2 ABI in glibc.
As preparation, this patch simply refactors some of the powerpc64 assembler
code to move all code related to creating function descriptors (.opd section)
or using function descriptors (function pointer call) into a central place
in sysdep.h.
Note that most locations creating .opd entries were already using macros
in sysdep.h, this patch simply extends this to the remaining places.
No relevant change in generated code expected.
This patch updates glibc in accordance with the binutils patch checked in here:
https://sourceware.org/ml/binutils/2013-10/msg00372.html
This changes the various R_PPC64_..._HI and _HA relocations to report
32-bit overflows. The motivation is that existing uses of @h / @ha
are to build up 32-bit offsets (for the "medium model" TOC access
that GCC now defaults to), and we'd really like to see failures at
link / load time rather than silent truncations.
For those rare cases where a modifier is needed to build up a 64-bit
constant, new relocations _HIGH / _HIGHA are supported.
The patch also fixes a bug in overflow checking for the R_PPC64_ADDR30
and R_PPC64_ADDR32 relocations.
The context established by "makecontext" has a link register pointing
back to an error path within the makecontext routine. This is currently
covered by the CFI FDE for makecontext itself, which is simply wrong
for the stack frame *inside* the context. When trying to unwind (e.g.
doing a backtrace) in a routine inside a context created by makecontext,
this can lead to uninitialized stack slots being accessed, causing the
unwinder to crash in the worst case.
Similarly, during parts of the "setcontext" routine, when the stack
pointer has already been switched to point to the new context, the
address range is still covered by the CFI FDE for setcontext. When
trying to unwind in that situation (e.g. backtrace from an async
signal handler for profiling), it is again possible that the unwinder
crashes.
Theses are all problems in existing code, but the changes in stack
frame layout appear to make the "worst case" much more likely in
the ELFv2 ABI context. This causes regressions e.g. in the libgo
testsuite on ELFv2.
This patch fixes this by ending the makecontext/setcontext FDEs
before those problematic parts of the assembler, similar to what
is already done on other platforms. This fixes the libgo
regression on ELFv2.
Only gaih_inet() and gaih_inet_serv() use a special bit flag denoted
by the GAIH_OKIFUNSPEC macro. Only the return value of
gaih_inet_serv() is actively checked for the bit flag which is
redundant because it just copies the nonzero property of the value
otherwise returned. The return value of gaih_inet() is only checked
for being zero and then the bit flag is filtered out. As the bit flag
is set only for otherwise nonzero return values, it doesn't affect the
zero comparison. GAIH_EAI just an alias to ~GAIH_OKIFUNSPEC.
The event code is PTRACE_EVENT_SECCOMP, not PTRAVE_EVENT_SECCOMP.
This patch fixes the V->C typo. There are no ABI issues since the
number remains the same for the code. Code using the old wrong
name will need to be updated.
This patch helps some math functions performance by adding the libc_fexxx
variant of inline functions to handle both FPU round and exception set/restore
and by using them on the libc_fexxx_ctx functions. It is based on already coded
fexxx family functions for PPC with fpu.
Here is the summary of performance improvements due this patch (measured on a
POWER7 machine):
Before:
cos(): ITERS:9.5895e+07: TOTAL:5116.03Mcy, MAX:77.6cy, MIN:49.792cy, 18744 calls/Mcy
exp(): ITERS:2.827e+07: TOTAL:5187.15Mcy, MAX:494.018cy, MIN:38.422cy, 5450.01 calls/Mcy
pow(): ITERS:6.1705e+07: TOTAL:5144.26Mcy, MAX:171.95cy, MIN:29.935cy, 11994.9 calls/Mcy
sin(): ITERS:8.6898e+07: TOTAL:5117.06Mcy, MAX:83.841cy, MIN:46.582cy, 16982 calls/Mcy
tan(): ITERS:2.9473e+07: TOTAL:5115.39Mcy, MAX:191.017cy, MIN:172.352cy, 5761.63 calls/Mcy
After:
cos(): ITERS:2.05265e+08: TOTAL:5111.37Mcy, MAX:78.754cy, MIN:24.196cy, 40158.5 calls/Mcy
exp(): ITERS:3.341e+07: TOTAL:5170.84Mcy, MAX:476.317cy, MIN:15.574cy, 6461.23 calls/Mcy
pow(): ITERS:7.6153e+07: TOTAL:5129.1Mcy, MAX:147.5cy, MIN:30.916cy, 14847.2 calls/Mcy
sin(): ITERS:1.58816e+08: TOTAL:5115.11Mcy, MAX:1490.39cy, MIN:22.341cy, 31048.4 calls/Mcy
tan(): ITERS:3.4964e+07: TOTAL:5114.18Mcy, MAX:177.422cy, MIN:146.115cy, 6836.68 calls/Mcy
On hppa and ia64, the macro DL_AUTO_FUNCTION_ADDRESS() uses the
variable fptr[2] in it's own scope.
The content of fptr[] is thus undefined right after the macro exits.
Newer gcc's (>= 4.7) reuse the stack space of this variable triggering
a segmentation fault in dl-init.c:69.
To fix this we rewrite the macros to make the call directly to init
and fini without needing to pass back a constructed function pointer.
[BZ #16150]
* sysdeps/sparc/sparc64/multiarch/add_n.S: Resolve to the correct generic
symbol in the non-vis3 case in static builds.
* sysdeps/sparc/sparc64/multiarch/addmul_1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/mul_1.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/sub_n.S: Likewise.
* sysdeps/sparc/sparc64/multiarch/submul_1.S: Likewise.
This patch fixes the vDSO symbol used directed in IFUNC resolver where
they do not have an associated ODP entry leading to undefined behavior
in some cases. It adds an artificial OPD static entry to such cases
and set its TOC to non 0 to avoid triggering lazy resolutions.
We cannot use fnegd in this code, as fnegd was added in v9.
Only fnegs exists in v8 and earlier.
[BZ #15985]
* sysdeps/sparc/sparc32/fpu/s_fdim.S (__fdim): Do not use fnegd
on pre-v9 cpus, use a fnegs+fmovs sequence instead.
Autoconf has been deprecating configure.in for quite a long time.
Rename all our configure.in and preconfigure.in files to .ac.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This patch intends to unify both strcpy and stpcpy implementationsi
for PPC64 and PPC64/POWER7. The idead default powerpc64 implementation
is to provide both doubleword and word aligned memory access.
For PPC64/POWER7 is also provide doubleword and word memory access,
remove the branch hints, use the cmpb instruction for compare
doubleword/words, and add an optimization for inputs of same alignment.
Resolves#16072 (CVE-2013-4458).
This patch fixes another stack overflow in getaddrinfo when it is
called with AF_INET6. The AF_UNSPEC case was fixed as CVE-2013-1914,
but the AF_INET6 case went undetected back then.
[BZ #9954]
With the following /etc/hosts:
127.0.0.1 www.my-domain.es
127.0.1.1 www.my-domain.es
192.168.0.1 www.my-domain.es
Using getaddrinfo() on www.my-domain.es, trigger the following assertion:
../sysdeps/posix/getaddrinfo.c:1473: rfc3484_sort: Assertion
`src->results[i].native == -1 || src->results[i].native == a1_native' failed.
This is due to two different bugs:
- In rfc3484_sort() rule 7, src->results[i].native is assigned even if
src->results[i].index is -1, meaning that no interface is associated.
- In getaddrinfo() the source IP address used with the lo interface needs a
special case, as it can be any IP within 127.X.Y.Z.
Add systemtap probes to various slow paths in libm so that application
developers may use systemtap to find out if their applications are
hitting these slow paths. We have added probes for pow, exp, log,
tan, atan and atan2.
* sysdeps/powerpc/powerpc32/dl-machine.c (__process_machine_rela):
Use stdint types in rather than __attribute__((mode())).
* sysdeps/powerpc/powerpc64/dl-machine.h (elf_machine_rela): Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00105.html
Like strnlen, memchr and memrchr had a number of defects fixed by this
patch as well as adding little-endian support. The first one I
noticed was that the entry to the main loop needlessly checked for
"are we done yet?" when we know the size is large enough that we can't
be done. The second defect I noticed was that the main loop count was
wrong, which in turn meant that the small loop needed to handle an
extra word. Thirdly, there is nothing to say that the string can't
wrap around zero, except of course that we'd normally hit a segfault
on trying to read from address zero. Fixing that simplified a number
of places:
- /* Are we done already? */
- addi r9,r8,8
- cmpld r9,r7
- bge L(null)
becomes
+ cmpld r8,r7
+ beqlr
However, the exit gets an extra test because I test for being on the
last word then if so whether the byte offset is less than the end.
Overall, the change is a win.
Lastly, memrchr used the wrong cache hint.
* sysdeps/powerpc/powerpc64/power7/memchr.S: Replace rlwimi with
insrdi. Make better use of reg selection to speed exit slightly.
Schedule entry path a little better. Remove useless "are we done"
checks on entry to main loop. Handle wrapping around zero address.
Correct main loop count. Handle single left-over word from main
loop inline rather than by using loop_small. Remove extra word
case in loop_small caused by wrong loop count. Add little-endian
support.
* sysdeps/powerpc/powerpc32/power7/memchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise. Use proper
cache hint.
* sysdeps/powerpc/powerpc32/power7/memrchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Add little-endian
support. Avoid rlwimi.
* sysdeps/powerpc/powerpc32/power7/rawmemchr.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00104.html
One of the things I noticed when looking at power7 timing is that rlwimi
is cracked and the two resulting insns have a register dependency.
That makes it a little slower than the equivalent rldimi.
* sysdeps/powerpc/powerpc64/memset.S: Replace rlwimi with
insrdi. Formatting.
* sysdeps/powerpc/powerpc64/power4/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memset.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00103.html
LIttle-endian support for memcpy. I spent some time cleaning up the
64-bit power7 memcpy, in order to avoid the extra alignment traps
power7 takes for little-endian. It probably would have been better
to copy the linux kernel version of memcpy.
* sysdeps/powerpc/powerpc32/power4/memcpy.S: Add little endian support.
* sysdeps/powerpc/powerpc32/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/mempcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise. Make better
use of regs. Use power7 mtocrf. Tidy function tails.
http://sourceware.org/ml/libc-alpha/2013-08/msg00102.html
This is a rather large patch due to formatting and renaming. The
formatting changes were to make it possible to compare power7 and
power4 versions of memcmp. Using different register defines came
about while I was wrestling with the code, trying to find spare
registers at one stage. I found it much simpler if we refer to a reg
by the same name throughout a function, so it's better if short-term
multiple use regs like rTMP are referred to using their register
number. I made the cr field usage changes when attempting to reload
rWORDn regs in the exit path to byte swap before comparing when
little-endian. That proved a bad idea due to the pipelining involved
in the main loop; Offsets to reload the regs were different first
time around the loop.. Anyway, I left the cr field usage changes in
place for consistency.
Aside from these more-or-less cosmetic changes, I fixed a number of
places where an early exit path restores regs unnecessarily, removed
some dead code, and optimised one or two exits.
* sysdeps/powerpc/powerpc64/power7/memcmp.S: Add little-endian support.
Formatting. Consistently use rXXX register defines or rN defines.
Use early exit labels that avoid restoring unused non-volatile regs.
Make cr field use more consistent with rWORDn compares. Rename
regs used as shift registers for unaligned loop, using rN defines
for short lifetime/multiple use regs.
* sysdeps/powerpc/powerpc64/power4/memcmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/memcmp.S: Likewise. Exit with
addi 1,1,64 to pop stack frame. Simplify return value code.
* sysdeps/powerpc/powerpc32/power4/memcmp.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00101.html
Adds little-endian support to optimised strchr assembly. I've also
tweaked the big-endian code a little. In power7/strchr.S there's a
check in the tail of the function that we didn't match 0 before
finding a c match, done by comparing leading zero counts. It's just
as valid, and quicker, to compare the raw output from cmpb.
Another little tweak is to use rldimi/insrdi in place of rlwimi for
the power7 strchr functions. Since rlwimi is cracked, it is a few
cycles slower. rldimi can be used on the 32-bit power7 functions
too.
* sysdeps/powerpc/powerpc64/power7/strchr.S (strchr): Add little-endian
support. Correct typos, formatting. Optimize tail. Use insrdi
rather than rlwimi.
* sysdeps/powerpc/powerpc32/power7/strchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strchrnul.S (__strchrnul): Add
little-endian support. Correct typos.
* sysdeps/powerpc/powerpc32/power7/strchrnul.S: Likewise. Use insrdi
rather than rlwimi.
* sysdeps/powerpc/powerpc64/strchr.S (rTMP4, rTMP5): Define. Use
in loop and entry code to keep "and." results.
(strchr): Add little-endian support. Comment. Move cntlzd
earlier in tail.
* sysdeps/powerpc/powerpc32/strchr.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00100.html
The strcpy changes for little-endian are quite straight-forward, just
a matter of rotating the last word differently.
I'll note that the powerpc64 version of stpcpy is just begging to be
converted to use 64-bit loads and stores..
* sysdeps/powerpc/powerpc64/strcpy.S: Add little-endian support:
* sysdeps/powerpc/powerpc32/strcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/stpcpy.S: Likewise.
* sysdeps/powerpc/powerpc32/stpcpy.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00099.html
More little-endian support. I leave the main strcmp loops unchanged,
(well, except for renumbering rTMP to something other than r0 since
it's needed in an addi insn) and modify the tail for little-endian.
I noticed some of the big-endian tail code was a little untidy so have
cleaned that up too.
* sysdeps/powerpc/powerpc64/strcmp.S (rTMP2): Define as r0.
(rTMP): Define as r11.
(strcmp): Add little-endian support. Optimise tail.
* sysdeps/powerpc/powerpc32/strcmp.S: Similarly.
* sysdeps/powerpc/powerpc64/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc32/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power4/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/strncmp.S: Likewise.
* sysdeps/powerpc/powerpc32/power7/strncmp.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00098.html
The existing strnlen code has a number of defects, so this patch is more
than just adding little-endian support. The changes here are similar to
those for memchr.
* sysdeps/powerpc/powerpc64/power7/strnlen.S (strnlen): Add
little-endian support. Remove unnecessary "are we done" tests.
Handle "s" wrapping around zero and extremely large "size".
Correct main loop count. Handle single left-over word from main
loop inline rather than by using small_loop. Correct comments.
Delete "zero" tail, use "end_max" instead.
* sysdeps/powerpc/powerpc32/power7/strnlen.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00097.html
This is the first of nine patches adding little-endian support to the
existing optimised string and memory functions. I did spend some
time with a power7 simulator looking at cycle by cycle behaviour for
memchr, but most of these patches have not been run on cpu simulators
to check that we are going as fast as possible. I'm sure PowerPC can
do better. However, the little-endian support mostly leaves main
loops unchanged, so I'm banking on previous authors having done a
good job on big-endian.. As with most code you stare at long enough,
I found some improvements for big-endian too.
Little-endian support for strlen. Like most of the string functions,
I leave the main word or multiple-word loops substantially unchanged,
just needing to modify the tail.
Removing the branch in the power7 functions is just a tidy. .align
produces a branch anyway. Modifying regs in the non-power7 functions
is to suit the new little-endian tail.
* sysdeps/powerpc/powerpc64/power7/strlen.S (strlen): Add little-endian
support. Don't branch over align.
* sysdeps/powerpc/powerpc32/power7/strlen.S: Likewise.
* sysdeps/powerpc/powerpc64/strlen.S (strlen): Add little-endian support.
Rearrange tmp reg use to suit. Comment.
* sysdeps/powerpc/powerpc32/strlen.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00093.html
This copies the sparc version of sigstack.h, which gives powerpc
#define MINSIGSTKSZ 4096
#define SIGSTKSZ 16384
Before the VSX changes, struct rt_sigframe size was 1920 plus 128 for
__SIGNAL_FRAMESIZE giving ppc64 exactly the default MINSIGSTKSZ of
2048.
After VSX, ucontext increased by 256 bytes. Oops, we're over
MINSIGSTKSZ, so powerpc has been using the wrong value for quite a
while. Add another ucontext for TM and rt_sigframe is now at 3872,
giving actual MINSIGSTKSZ of 4000.
The glibc testcase that I was looking at was tst-cancel21, which
allocates 2*SIGSTKSZ (not because the test is trying to be
conservative, but because the test actually has nested signal stack
frames). We blew the allocation by 48 bytes when using current
mainline gcc to compile glibc (le ppc64).
The required stack depth in _dl_lookup_symbol_x from the top of the
next signal frame was 10944 bytes. I guess you'd want to add 288 to
that, implying an actual SIGSTKSZ of 11232.
* sysdeps/unix/sysv/linux/powerpc/bits/sigstack.h: New file.
http://sourceware.org/ml/libc-alpha/2013-08/msg00092.html
Use conditional form of branch and link to avoid destroying the cpu
link stack used to predict blr return addresses.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/makecontext.S: Use
conditional form of branch and link when obtaining pc.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/makecontext.S: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00091.html
More LE support, correcting word accesses to _dl_hwcap.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/getcontext-common.S: Use
HIWORD/LOWORD.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/setcontext-common.S: Ditto.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/swapcontext-common.S: Ditto.
http://sourceware.org/ml/libc-alpha/2013-08/msg00090.html
This patch fixes symbol versioning in setjmp/longjmp. The existing
code uses raw versions, which results in wrong symbol versioning when
you want to build glibc with a base version of 2.19 for LE.
Note that the merging the 64-bit and 32-bit versions in novmx-lonjmp.c
and pt-longjmp.c doesn't result in GLIBC_2.0 versions for 64-bit, due
to the base in shlib_versions.
* sysdeps/powerpc/longjmp.c: Use proper symbol versioning macros.
* sysdeps/powerpc/novmx-longjmp.c: Likewise.
* sysdeps/powerpc/powerpc32/bsd-_setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/bsd-setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/__longjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/fpu/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc32/mcount.c: Likewise.
* sysdeps/powerpc/powerpc32/setjmp.S: Likewise.
* sysdeps/powerpc/powerpc64/setjmp.S: Likewise.
* nptl/sysdeps/unix/sysv/linux/powerpc/pt-longjmp.c: Likewise.
http://sourceware.org/ml/libc-alpha/2013-08/msg00089.html
Little-endian fixes for setjmp/longjmp. When writing these I noticed
the setjmp code corrupts the non volatile VMX registers when using an
unaligned buffer. Anton fixed this, and also simplified it quite a
bit.
The current code uses boilerplate for the case where we want to store
16 bytes to an unaligned address. For that we have to do a
read/modify/write of two aligned 16 byte quantities. In our case we
are storing a bunch of back to back data (consective VMX registers),
and only the start and end of the region need the read/modify/write.
[BZ #15723]
* sysdeps/powerpc/jmpbuf-offsets.h: Comment fix.
* sysdeps/powerpc/powerpc32/fpu/__longjmp-common.S: Correct
_dl_hwcap access for little-endian.
* sysdeps/powerpc/powerpc32/fpu/setjmp-common.S: Likewise. Don't
destroy vmx regs when saving unaligned.
* sysdeps/powerpc/powerpc64/__longjmp-common.S: Correct CR load.
* sysdeps/powerpc/powerpc64/setjmp-common.S: Likewise CR save. Don't
destroy vmx regs when saving unaligned.
http://sourceware.org/ml/libc-alpha/2013-08/msg00088.html
* sysdeps/powerpc/powerpc32/fpu/s_roundf.S: Increase alignment of
constants to usual value for .cst8 section, and remove redundant
high address load.
* sysdeps/powerpc/powerpc32/power4/fpu/s_llround.S: Use float
constant for 0x1p52. Load little-endian words of double from
correct stack offsets.
http://sourceware.org/ml/libc-alpha/2013-07/msg00201.html
These two functions oddly test x+1>0 when a double x is >= 0.0, and
similarly when x is negative. I don't see the point of that since the
test should always be true. I also don't see any need to convert x+1
to integer rather than simply using xr+1. Note that the standard
allows these functions to return any value when the input is outside
the range of long long, but it's not too hard to prevent xr+1
overflowing so that's what I've done.
(With rounding mode FE_UPWARD, x+1 can be a lot more than what you
might naively expect, but perhaps that situation was covered by the
x - xrf < 1.0 test.)
* sysdeps/powerpc/fpu/s_llround.c (__llround): Rewrite.
* sysdeps/powerpc/fpu/s_llroundf.c (__llroundf): Rewrite.
http://sourceware.org/ml/libc-alpha/2013-07/msg00200.html
This works around the fact that vsx is disabled in current
little-endian gcc. Also, float constants take 4 bytes in memory
vs. 16 bytes for vector constants, and we don't need to write one lot
of masks for double (register format) and another for float (mem
format).
* sysdeps/powerpc/fpu/s_float_bitwise.h (__float_and_test28): Don't
use vector int constants.
(__float_and_test24, __float_and8, __float_get_exp): Likewise.
http://sourceware.org/ml/libc-alpha/2013-07/msg00197.html
A rewrite to make this code correct for little-endian.
* sysdeps/ieee754/ldbl-128ibm/e_sqrtl.c (mynumber): Replace
union 32-bit int array member with 64-bit int array.
(t515, tm256): Double rather than long double.
(__ieee754_sqrtl): Rewrite using 64-bit arithmetic.
http://sourceware.org/ml/libc-alpha/2013-08/msg00085.html
Rid ourselves of ieee854.
* sysdeps/ieee754/ldbl-128ibm/ieee754.h (union ieee854_long_double):
Delete.
(IEEE854_LONG_DOUBLE_BIAS): Delete.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h: Don't include ieee854
version of math_ldbl.h.
http://sourceware.org/ml/libc-alpha/2013-08/msg00084.html
Another batch of ieee854 macros and union replacement. These four
files also have bugs fixed with this patch. The fact that the two
doubles in an IBM long double may have different signs means that
negation and absolute value operations can't just twiddle one sign bit
as you can with ieee864 style extended double. fmodl, remainderl,
erfl and erfcl all had errors of this type. erfl also returned +1 for
large magnitude negative input where it should return -1. The hypotl
error is innocuous since the value adjusted twice is only used as a
flag. The e_hypotl.c tests for large "a" and small "b" are mutually
exclusive because we've already exited when x/y > 2**120. That allows
some further small simplifications.
[BZ #15734], [BZ #15735]
* sysdeps/ieee754/ldbl-128ibm/e_fmodl.c (__ieee754_fmodl): Rewrite
all uses of ieee875 long double macros and unions. Simplify test
for 0.0L. Correct |x|<|y| and |x|=|y| test. Use
ldbl_extract_mantissa value for ix,iy exponents. Properly
normalize after ldbl_extract_mantissa, and don't add hidden bit
already handled. Don't treat low word of ieee854 mantissa like
low word of IBM long double and mask off bit when testing for
zero.
* sysdeps/ieee754/ldbl-128ibm/e_hypotl.c (__ieee754_hypotl): Rewrite
all uses of ieee875 long double macros and unions. Simplify tests
for 0.0L and inf. Correct double adjustment of k. Delete dead code
adjusting ha,hb. Simplify code setting kld. Delete two600 and
two1022, instead use their values. Recognise that tests for large
"a" and small "b" are mutually exclusive. Rename vars. Comment.
* sysdeps/ieee754/ldbl-128ibm/e_remainderl.c (__ieee754_remainderl):
Rewrite all uses of ieee875 long double macros and unions. Simplify
test for 0.0L and nan. Correct negation.
* sysdeps/ieee754/ldbl-128ibm/s_erfl.c (__erfl): Rewrite all uses of
ieee875 long double macros and unions. Correct output for large
magnitude x. Correct absolute value calculation.
(__erfcl): Likewise.
* math/libm-test.inc: Add tests for errors discovered in IBM long
double versions of fmodl, remainderl, erfl and erfcl.
http://sourceware.org/ml/libc-alpha/2013-08/msg00083.html
Further replacement of ieee854 macros and unions. These files also
have some optimisations for comparison against 0.0L, infinity and nan.
Since the ABI specifies that the high double of an IBM long double
pair is the value rounded to double, a high double of 0.0 means the
low double must also be 0.0. The ABI also says that infinity and
nan are encoded in the high double, with the low double unspecified.
This means that tests for 0.0L, +/-Infinity and +/-NaN need only check
the high double.
* sysdeps/ieee754/ldbl-128ibm/e_atan2l.c (__ieee754_atan2l): Rewrite
all uses of ieee854 long double macros and unions. Simplify tests
for long doubles that are fully specified by the high double.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (__ieee754_gammal_r):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_ilogbl.c (__ieee754_ilogbl): Likewise.
Remove dead code too.
* sysdeps/ieee754/ldbl-128ibm/e_jnl.c (__ieee754_jnl): Likewise.
(__ieee754_ynl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_log10l.c (__ieee754_log10l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_logl.c (__ieee754_logl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Likewise.
Remove dead code too.
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c (__kernel_tanl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_expm1l.c (__expm1l): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_frexpl.c (__frexpl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_isinf_nsl.c (__isinf_nsl): Likewise.
Simplify.
* sysdeps/ieee754/ldbl-128ibm/s_isinfl.c (___isinfl): Likewise.
Simplify.
* sysdeps/ieee754/ldbl-128ibm/s_log1pl.c (__log1pl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_modfl.c (__modfl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nextafterl.c (__nextafterl): Likewise.
Comment on variable precision.
* sysdeps/ieee754/ldbl-128ibm/s_nexttoward.c (__nexttoward): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nexttowardf.c (__nexttowardf):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_remquol.c (__remquol): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalblnl.c (__scalblnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_scalbnl.c (__scalbnl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_tanhl.c (__tanhl): Likewise.
* sysdeps/powerpc/fpu/libm-test-ulps: Adjust tan_towardzero ulps.
http://sourceware.org/ml/libc-alpha/2013-08/msg00081.html
This is the first of a series of patches to ban ieee854_long_double
and the ieee854_long_double macros when using IBM long double. union
ieee854_long_double just isn't correct for IBM long double, especially
when little-endian, and pretending it is OK has allowed a number of
bugs to remain undetected in sysdeps/ieee754/ldbl-128ibm/.
This changes the few places in generic code that use it.
* stdio-common/printf_size.c (__printf_size): Don't use
union ieee854_long_double in fpnum union.
* stdio-common/printf_fphex.c (__printf_fphex): Likewise. Use
signbit macro to retrieve sign from long double.
* stdio-common/printf_fp.c (___printf_fp): Use signbit macro to
retrieve sign from long double.
* sysdeps/ieee754/ldbl-128ibm/printf_fphex.c: Adjust for fpnum change.
* sysdeps/ieee754/ldbl-128/printf_fphex.c: Likewise.
* sysdeps/ieee754/ldbl-96/printf_fphex.c: Likewise.
* sysdeps/x86_64/fpu/printf_fphex.c: Likewise.
* math/test-misc.c (main): Don't use union ieee854_long_double.
ports/
* sysdeps/ia64/fpu/printf_fphex.c: Adjust for fpnum change.
http://sourceware.org/ml/libc-alpha/2013-06/msg00919.html
I discovered a number of places where denormals and other corner cases
were being handled wrongly.
- printf_fphex.c: Testing for the low double exponent being zero is
unnecessary. If the difference in exponents is less than 53 then the
high double exponent must be nearing the low end of its range, and the
low double exponent hit rock bottom.
- ldbl2mpn.c: A denormal (ie. exponent of zero) value is treated as
if the exponent was one, so shift mantissa left by one. Code handling
normalisation of the low double mantissa lacked a test for shift count
greater than bits in type being shifted, and lacked anything to handle
the case where the difference in exponents is less than 53 as in
printf_fphex.c.
- math_ldbl.h (ldbl_extract_mantissa): Same as above, but worse, with
code testing for exponent > 1 for some reason, probably a typo for >= 1.
- math_ldbl.h (ldbl_insert_mantissa): Round the high double as per
mpn2ldbl.c (hi is odd or explicit mantissas non-zero) so that the
number we return won't change when applying ldbl_canonicalize().
Add missing overflow checks and normalisation of high mantissa.
Correct misleading comment: "The hidden bit of the lo mantissa is
zero" is not always true as can be seen from the code rounding the hi
mantissa. Also by inspection, lzcount can never be less than zero so
remove that test. Lastly, masking bitfields to their widths can be
left to the compiler.
- mpn2ldbl.c: The overflow checks here on rounding of high double were
just plain wrong. Incrementing the exponent must be accompanied by a
shift right of the mantissa to keep the value unchanged. Above notes
for ldbl_insert_mantissa are also relevant.
[BZ #15680]
* sysdeps/ieee754/ldbl-128ibm/e_rem_pio2l.c: Comment fix.
* sysdeps/ieee754/ldbl-128ibm/printf_fphex.c
(PRINT_FPHEX_LONG_DOUBLE): Tidy code by moving -53 into ediff
calculation. Remove unnecessary test for denormal exponent.
* sysdeps/ieee754/ldbl-128ibm/ldbl2mpn.c (__mpn_extract_long_double):
Correct handling of denormals. Avoid undefined shift behaviour.
Correct normalisation of low mantissa when low double is denormal.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h
(ldbl_extract_mantissa): Likewise. Comment. Use uint64_t* for hi64.
(ldbl_insert_mantissa): Make both hi64 and lo64 parms uint64_t.
Correct normalisation of low mantissa. Test for overflow of high
mantissa and normalise.
(ldbl_nearbyint): Use more readable constant for two52.
* sysdeps/ieee754/ldbl-128ibm/mpn2ldbl.c
(__mpn_construct_long_double): Fix test for overflow of high
mantissa and correct normalisation. Avoid undefined shift.
http://sourceware.org/ml/libc-alpha/2013-07/msg00001.html
This patch starts the process of supporting powerpc64 little-endian
long double in glibc. IBM long double is an array of two ieee
doubles, so making union ibm_extended_long_double reflect this fact is
the correct way to access fields of the doubles.
* sysdeps/ieee754/ldbl-128ibm/ieee754.h
(union ibm_extended_long_double): Define as an array of ieee754_double.
(IBM_EXTENDED_LONG_DOUBLE_BIAS): Delete.
* sysdeps/ieee754/ldbl-128ibm/printf_fphex.c: Update all references
to ibm_extended_long_double and IBM_EXTENDED_LONG_DOUBLE_BIAS.
* sysdeps/ieee754/ldbl-128ibm/e_exp10l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/ldbl2mpn.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/math_ldbl.h: Likewise.
* sysdeps/ieee754/ldbl-128ibm/mpn2ldbl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_nearbyintl.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/strtold_l.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/x2y2m1l.c: Likewise.
sysdeps/unix/make-syscalls.sh and sysdeps/unix/Makefile use GNU Bash's
${parameter/pattern/string} parameter expansion. Non-Bash shells (e.g.
dash or BusyBox ash when built with CONFIG_ASH_BASH_COMPAT disabled)
don't support this expansion syntax. So glibc will fail to build when
$(SHELL) expands to a path that isn't provided by Bash.
An example build failure:
for dir in [...]; do \
test -f $dir/syscalls.list && \
{ sysdirs='[...]' \
asm_CPP='gcc -c -I[...] -D_LIBC_REENTRANT -include include/libc-symbols.h -DASSEMBLER -g -Wa,--noexecstack -E -x assembler-with-cpp' \
/bin/sh sysdeps/unix/make-syscalls.sh $dir || exit 1; }; \
test $dir = sysdeps/unix && break; \
done > [build-dir]/sysd-syscallsT
sysdeps/unix/make-syscalls.sh: line 273: syntax error: bad substitution
This patch simply replaces the three instances of the Bash-only syntax
in these files with an echo and sed command substitution.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This define was removed from the rest of the tree eight years ago.
ChangeLog:
2013-09-24 Will Newton <will.newton@linaro.org>
* sysdeps/mach/hurd/i386/tls.h (TLS_INIT_TP_EXPENSIVE): Remove
macro.
Statically built binaries use __pointer_chk_guard_local,
while dynamically built binaries use __pointer_chk_guard.
Provide the right definition depending on the test case
we are building.
The pointer guard used for pointer mangling was not initialized for
static applications resulting in the security feature being disabled.
The pointer guard is now correctly initialized to a random value for
static applications. Existing static applications need to be
recompiled to take advantage of the fix.
The test tst-ptrguard1-static and tst-ptrguard1 add regression
coverage to ensure the pointer guards are sufficiently random
and initialized to a default value.
It has been a long practice for software using IEEE 754 floating-point
arithmetic run on MIPS processors to use an encoding of Not-a-Number
(NaN) data different to one used by software run on other processors.
And as of IEEE 754-2008 revision [1] this encoding does not follow one
recommended in the standard, as specified in section 6.2.1, where it
is stated that quiet NaNs should have the first bit (d1) of their
significand set to 1 while signalling NaNs should have that bit set to
0, but MIPS software interprets the two bits in the opposite manner.
As from revision 3.50 [2][3] the MIPS Architecture provides for
processors that support the IEEE 754-2008 preferred NaN encoding format.
As the two formats (further referred to as "legacy NaN" and "2008 NaN")
are incompatible to each other, tools have to provide support for the
two formats to help people avoid using incompatible binary modules.
The change is comprised of two functional groups of features, both of
which are required for correct support.
1. Dynamic linker support.
To enforce the NaN encoding requirement in dynamic linking a new ELF
file header flag has been defined. This flag is set for 2008-NaN
shared modules and executables and clear for legacy-NaN ones. The
dynamic linker silently ignores any incompatible modules it
encounters in dependency processing.
To avoid unnecessary processing of incompatible modules in the
presence of a shared module cache, a set of new cache flags has been
defined to mark 2008-NaN modules for the three ABIs supported.
Changes to sysdeps/unix/sysv/linux/mips/readelflib.c have been made
following an earlier code quality suggestion made here:
http://sourceware.org/ml/libc-ports/2009-03/msg00036.html
and are therefore a little bit more extensive than the minimum
required.
Finally a new name has been defined for the dynamic linker so that
2008-NaN and legacy-NaN binaries can coexist on a single system that
supports dual-mode operation and that a legacy dynamic linker that
does not support verifying the 2008-NaN ELF file header flag is not
chosen to interpret a 2008-NaN binary by accident.
2. Floating environment support.
IEEE 754-2008 features are controlled in the Floating-Point Control
and Status (FCSR) register and updates are needed to floating
environment support so that the 2008-NaN flag is set correctly and
the kernel default, inferred from the 2008-NaN ELF file header flag
at the time an executable is loaded, respected.
As the NaN encoding format is a property of GCC code generation that is
both a user-selected GCC configuration default and can be overridden
with GCC options, code that needs to know what NaN encoding standard it
has been configured for checks for the __mips_nan2008 macro that is
defined internally by GCC whenever the 2008-NaN mode has been selected.
This mode is determined at the glibc configuration time and therefore a
few consistency checks have been added to catch cases where compilation
flags have been overridden by the user.
The 2008 NaN set of features relies on kernel support as the in-kernel
floating-point emulator needs to be aware of the NaN encoding used even
on hard-float processors and configure the FPU context according to the
value of the 2008 NaN ELF file header flag of the executable being
started. As at this time work on kernel support is still in progress
and the relevant changes have not made their way yet to linux.org master
repository.
Therefore the minimum version supported has been artificially set to
10.0.0 so that 2008-NaN code is not accidentally run on a Linux kernel
that does not suppport it. It is anticipated that the version is
adjusted later on to the actual initial linux.org kernel version to
support this feature. Legacy NaN encoding support is unaffected, older
kernel versions remain supported.
[1] "IEEE Standard for Floating-Point Arithmetic", IEEE Computer
Society, IEEE Std 754-2008, 29 August 2008
[2] "MIPS Architecture For Programmers, Volume I-A: Introduction to the
MIPS32 Architecture", MIPS Technologies, Inc., Document Number:
MD00082, Revision 3.50, September 20, 2012
[3] "MIPS Architecture For Programmers, Volume I-A: Introduction to the
MIPS64 Architecture", MIPS Technologies, Inc., Document Number:
MD00083, Revision 3.50, September 20, 2012
This change synchronizes the glibc headers with the Linux kernel
headers and arranges to coordinate the definition of structures
already defined the Linux kernel UAPI headers.
It is now safe to include glibc's netinet/in.h or Linux's linux/in6.h
in any order in a userspace application and you will get the same
ABI. The ABI is guaranteed by UAPI and glibc.
Since fanotify_init requires CAP_SYS_ADMIN in order to work (which usually
means running as root), we need to handle that error case too.
Reported-by: Andreas Jaeger <aj@suse.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This patch fixes backtrace for PPC32 and PPC64 to correctly handle
signal trampolines. The 'debug/tst-backtrace6.c' also check for
SA_SIGINFO handling, where is triggers another vDSO symbols for PPC32.
This patch fixes dlfcn/tststatic5 for PowerPC where pagesize
variable was not properly initialized in certain cases. This patch
is based on other architecture code.
The helper binary pt_chown tricked into granting access to another
user's pseudo-terminal.
Pre-conditions for the attack:
* Attacker with local user account
* Kernel with FUSE support
* "user_allow_other" in /etc/fuse.conf
* Victim with allocated slave in /dev/pts
Using the setuid installed pt_chown and a weak check on whether a file
descriptor is a tty, an attacker could fake a pty check using FUSE and
trick pt_chown to grant ownership of a pty descriptor that the current
user does not own. It cannot access /dev/pts/ptmx however.
In most modern distributions pt_chown is not needed because devpts
is enabled by default. The fix for this CVE is to disable building
and using pt_chown by default. We still provide a configure option
to enable hte use of pt_chown but distributions do so at their own
risk.
The generated header is compiled with `-ffreestanding' to avoid any
circular dependencies against the installed implementation headers.
Such a dependency would require the implementation header to be
installed before the generated header could be built (See bug 15711).
In current practice the generated header dependencies do not include
any of the implementation headers removed by the use of `-ffreestanding'.
---
2013-07-15 Carlos O'Donell <carlos@redhat.com>
[BZ #15711]
* sysdeps/unix/sysv/linux/Makefile ($(objpfx)bits/syscall%h):
Avoid system header dependency with -ffreestanding.
($(objpfx)bits/syscall%d): Likewise.
Many Linux arches require fixed mmaps to be aligned higher than pagesize,
so use the SHMLBA define as it represents this quantity exactly.
This fixes spurious errors seen on those arches like:
cannot map archive header: Invalid argument
URL: http://sourceware.org/bugzilla/show_bug.cgi?id=10283
Reported-by: CHIKAMA Masaki <masaki.chikama@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
This patch introduces two new convenience functions to set the default
thread attributes used for creating threads. This allows a programmer
to set the default thread attributes just once in a process and then
run pthread_create without additional attributes.
GCC 4.8 enables -ftree-loop-distribute-patterns at -O3 by default and
this optimization may transform loops into memset/memmove calls. Without
proper handling this may generate unexpected PLT calls on GLIBC.
This patch fixes by create memset/memmove alias to internal GLIBC
__GI_memset/__GI_memmove symbols.
The most common use case of math functions is with default rounding
mode, i.e. rounding to nearest. Setting and restoring rounding mode
is an unnecessary overhead for this, so I've added support for a
context, which does the set/restore only if the FP status needs a
change. The code is written such that only x86 uses these. Other
architectures should be unaffected by it, but would definitely benefit
if the set/restore has as much overhead relative to the rest of the
code, as the x86 bits do.
Here's a summary of the performance improvement due to these
improvements; I've only mentioned functions that use the set/restore
and have benchmark inputs for x86_64:
Before:
cos(): ITERS:4.69335e+08: TOTAL:28884.6Mcy, MAX:4080.28cy, MIN:57.562cy, 16248.6 calls/Mcy
exp(): ITERS:4.47604e+08: TOTAL:28796.2Mcy, MAX:207.721cy, MIN:62.385cy, 15543.9 calls/Mcy
pow(): ITERS:1.63485e+08: TOTAL:28879.9Mcy, MAX:362.255cy, MIN:172.469cy, 5660.86 calls/Mcy
sin(): ITERS:3.89578e+08: TOTAL:28900Mcy, MAX:704.859cy, MIN:47.583cy, 13480.2 calls/Mcy
tan(): ITERS:7.0971e+07: TOTAL:28902.2Mcy, MAX:1357.79cy, MIN:388.58cy, 2455.55 calls/Mcy
After:
cos(): ITERS:6.0014e+08: TOTAL:28875.9Mcy, MAX:364.283cy, MIN:45.716cy, 20783.4 calls/Mcy
exp(): ITERS:5.48578e+08: TOTAL:28764.9Mcy, MAX:191.617cy, MIN:51.011cy, 19071.1 calls/Mcy
pow(): ITERS:1.70013e+08: TOTAL:28873.6Mcy, MAX:689.522cy, MIN:163.989cy, 5888.18 calls/Mcy
sin(): ITERS:4.64079e+08: TOTAL:28891.5Mcy, MAX:6959.3cy, MIN:36.189cy, 16062.8 calls/Mcy
tan(): ITERS:7.2354e+07: TOTAL:28898.9Mcy, MAX:1295.57cy, MIN:380.698cy, 2503.7 calls/Mcy
So the improvements are:
cos: 27.9089%
exp: 22.6919%
pow: 4.01564%
sin: 19.1585%
tan: 1.96086%
The downside of the change is that it will have an adverse performance
impact on non-default rounding modes, but I think the tradeoff is
justified.
__clock_gettime and other __clock_* functions could result in an extra
PLT reference within libc.so if it actually gets used. None of the
code currently uses them, which is why this probably went unnoticed.
Resolves: #15465
The program name may be unavailable if the user application tampers
with argc and argv[]. Some parts of the dynamic linker caters for
this while others don't, so this patch consolidates the check and
fallback into a single macro and updates all users.
Fixes BZ #15339.
NSS_STATUS_UNAVAIL may mean that a necessary input resource is not
available. This could occur in a number of cases including when the
network is down, system runs out of file descriptors, etc. The
correct differentiator in such a case is the h_errno, which gives the
nature of failure. In case of failures other than a simple 'not
found', we set h_errno as NETDB_INTERNAL and let errno be the
identifier for the exact error.
This implementation speed up memset in several ways. First is avoiding
expensive computed jump. Second is using fact that arguments of memset
are most of time aligned to 8 bytes.
Benchmark results on:
kam.mff.cuni.cz/~ondra/benchmark_string/memset_profile_result27_04_13.tar.bz2
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.
Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
[BZ #15442] This adds support for the inverse interpretation of the
quiet bit of IEEE 754 floating-point NaN data that some processors
use. This includes in particular MIPS architecture processors; the
payload used for the canonical qNaN encoding is updated accordingly
so as not to interfere with the quiet bit.
The EXTRACT_WORDS64 and INSERT_WORDS64 macros use movd for a 64-bit
operation. Somehow gcc manages to turn this into movq, but LLVM won't.
2013-05-15 Peter Collingbourne <pcc@google.com>
* sysdeps/x86_64/fpu/math_private.h (MOVQ): New macro.
(EXTRACT_WORDS64) Use where appropriate.
(INSERT_WORDS64) Likewise.
While these instructions accept memory operands, only one operand
may be a memory operand. Giving two operands xm constraints gives
the compiler the option of using memory for both operands, which
would result in invalid assembly code. Using x for all operands is
more appropriate, as most x86_64 calling conventions will pass the
arguments in registers anyway.
2013-05-15 Peter Collingbourne <pcc@google.com>
* sysdeps/x86_64/fpu/multiarch/s_fma.c (__fma_fma4): Replace xm
constraints with x constraints.
* sysdeps/x86_64/fpu/multiarch/s_fmaf.c (__fmaf_fma4): Likewise.
PowerPC kernel now provides a vDSO implementation for time syscall
(commit fcb41a2030abe0eb716ef0798035ef9562097f42). This patch changes
time syscall wrapper to use the vDSO when available. It also changes
the default non vDSO time on PowerPC to use sysdeps/posix/time.c
(since gettimeofday is a vDSO call).
* sysdeps/gnu/netinet/tcp.h (TCP_TIMESTAMP): New value, from
Linux 3.9.
* sysdeps/unix/sysv/linux/bits/socket.h (PF_VSOCK, AF_VSOCK):
Add.
(PF_MAX): Adjust for VSOCK change.
This patch fix the 3c0265394d commits
by correctly setting minimum architecture for modf PPC optimization
to power5+ instead of power5 (since only on power5+ round/ceil will
be inline to inline assembly).
Kay Sievers reported that coreutils' stat tool has a problem with
s390's statfs[64] definition:
> The definition of struct statfs::f_type needs a fix. s390 is the only
> architecture in the kernel that uses an int and expects magic
> constants lager than INT_MAX to fit into.
>
> A fix is needed to make Fedora boot on s390, it currently fails to do
> so. Userspace does not want to add code to paper-over this issue.
[...]
> Even coreutils cannot handle it:
> #define RAMFS_MAGIC 0x858458f6
> # stat -f -c%t /
> ffffffff858458f6
>
> #define BTRFS_SUPER_MAGIC 0x9123683E
> # stat -f -c%t /mnt
> ffffffff9123683e
The bug is caused by an implicit sign extension within the stat tool:
out_uint_x (pformat, prefix_len, statfsbuf->f_type);
where the format finally will be "%lx".
A similar problem can be found in the 'tail' tool.
s390 is the only architecture which has an int type f_type member in
struct statfs[64]. Other architectures have either unsigned ints or
long values, so that the problem doesn't occur there.
Therefore change the type of the f_type member to unsigned int, so
that we get zero extension instead sign extension when assignment to
a long value happens.
Reported-by: Kay Sievers <kay@vrfy.org>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
We no longer support configuring for i386, nor do we
elide such a configuration to i686. Configuring with
i386-* is a failure, and we provide an example of
how to fix that.
---
2013-04-17 Carlos O'Donell <carlos@redhat.com>
* configure.in: Remove i386 configure warning. Remove i386 case.
* configure: Regenerate.
* sysdeps/i386/configure.in: Raise error if config_machine is i386.
Add example to error message.
* sysdeps/i386/configure: Regenerate.
The value of PI is never exactly PI in any floating point representation,
and the value of PI/2 is never PI/2. It is wrong to expect cos(M_PI_2l)
to return 0, instead it will return an answer that is non-zero because
M_PI_2l doesn't round to exactly PI/2 in the type used.
That is to say that the correct answer is to do the following:
* Take PI or PI/2.
* Round to the floating point representation.
* Take the rounded value and compute an infinite precision cos or sin.
* Use the rounded result of the infinite precision cos or sin as the
answer to the test.
I used printf to do the type rounding, and Wolfram's Alpha to do the
infinite precision cos calculations.
The following changes bring x86-64 and x86 to 1/2 ulp for two tests.
It shows that the x86 cos implementation is quite good, and that
our test are flawed.
Unfortunately given that the rounding errors are type dependent we
need to fix this for each type. No regressions on x86-64 or x86.
---
2013-04-11 Carlos O'Donell <carlos@redhat.com>
* math/libm-test.inc (cos_test): Fix PI/2 test.
(sincos_test): Likewise.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerate.
* sysdeps/i386/fpu/libm-test-ulps: Regenerate.
This change does two things:
* Treats a target i386-* as if it were i686.
* Fails configure if the user is generating code
for i386.
We no longer support i386 code-generation because the i386
lacks the atomic operations we need in glibc.
You can still configure for i386-*, but you get i686 code.
You can't build with --march=i386, --mtune=i386 or a compiler
that defaults to i386 code-generation.
I've added two i386 entries in the master todo list to discuss
merging and renaming:
http://sourceware.org/glibc/wiki/Development_Todo/Master#i386
The failure modes are fail-safe here. You compile for i386,
get i686, and try to run on i386 and it fails. The configure
log has a warning saying we elided to i686. There is no situation
that I can see where we run into any serious problems.
The patch makes the current state better in that we get less
confused users and we build successfully in more default
configurations.
The next enhancement would be to add --march=i?86
as suggested in #c20 of BZ#10062 for any i?86-* builds, which
would solve the problem of a 32-bit compiler that defaults to
i386 code-gen and glibc configured for i686-* target. Which
previously failed at build time, and now will fail at configure
time (requires adding --march=i686).
Updated NEWS with BZ #10060 and #10062.
No regressions.
---
2013-04-06 Carlos O'Donell <carlos@redhat.com>
[BZ #10060, #10062]
* aclocal.m4 (LIBC_COMPILER_BUILTIN_INLINED): New macro.
* sysdeps/i386/configure.in: Use LIBC_COMPILER_BUILTIN_INLINED and
fail configure if __sync_val_compare_and_swap is not inlined.
* sysdeps/i386/configure: Regenerate.
* configure.in: Build for i686 when configured for i386.
* configure: Regenerate.
* README: Remove i386 reference.
The s390 and s390x sysdep.h files include the more generic sysdep.h.
The more generic sysdep.h defines PSEUDO. This causes an annoying
CPP warning saying the PSEUDO was redefined. This patch removes the
warning by undefining PSEUDO before the redefinition. This is in line
with what all the other machines do.
---
2013-04-06 Carlos O'Donell <carlos@redhat.com>
* sysdeps/s390/s390-32/sysdep.h: Undefine PSEUDO before redefinition.
* sysdeps/s390/s390-64/sysdep.h: Likewise.
Fix BZ #15305.
On kernel versions earlier than 2.6.29, the Linux kernel exported a
sysctl called restrict_chown for xfs, which could be used to allow
chown to users other than the owner. 2.6.29 removed this support,
causing the open_not_cancel_2 to fail and thus modify errno. The fix
is to save and restore errno so that the caller sees it as unmodified.
Additionally, since the code to check the sysctl is not useful on
newer kernels, we add an ifdef so that in future the code block gets
rmeoved completely.
The branch prediction hints is actually hurts performance in this case.
The assembly implementation make two assumptions: 1. 'fabs (x) < 2^52'
is unlikely and 2. 'x > 0.0' is unlike (if 1. is true). Since it a
general floating point function, expected input is not bounded and then
it is better to let the hardware handle the branches.
The compiler is smart enough to convert those into double for powerpc,
but if we put them as doubles, it adds overhead by performing those
operations in floating point mode.
The mantissa of mp_no is intended to take only integral values. This
is a relatively good choice for powerpc due to its 4 fpus, but not for
other architectures, which suffer due to this choice. This change
makes the default mantissa a long integer and allows powerpc to
override it. Additionally, some operations have been optimized for
integer manipulation, resulting in a significant improvement in
performance.
The patch increase the high value to check if expl overflows. Current
high mark value is not really correct, the algorithm accepts high values.
It also adds a correct wrapper function to check for overflow and underflow.
Due to a typo repeated several times, this bug hasn't been fixed yet,
despite being marked as resolved in glibc 2.12.
* sysdeps/x86_64/strcmp.S: Replace all occurrences of NOT_IN_lib
with NOT_IN_libc.
Fixes BZ #12723
The variable pipe buffer size does nothing to the value of PIPE_BUF,
since the number of bytes that are atomically written is still
PIPE_BUF on Linux.
* sysdeps/unix/sysv/linux/bits/mman-linux.h (MAP_ANONYMOUS):
Allow definition via __MAP_ANONYMOUS.
* sysdeps/unix/sysv/linux/mips/bits/mman.h: Remove all defines
provided by bits/mman-linux.h and include <bits/mman-linux.h>.
(__MAP_ANONYMOUS): Define.
* sysdeps/unix/sysv/linux/s390/bits/mman.h: Include
<bits/mman-linux.h>.
(MCL_CURRENT, MCL_FUTURE): Do not define here, the generic value
is fine.
* sysdeps/unix/sysv/linux/sh/bits/mman.h: Move include of
<bits/mman-linux.h> to end of file.
(MCL_CURRENT, MCL_FUTURE): Do not define here, the generic value
is fine.
* sysdeps/unix/sysv/linux/x86/bits/mman.h: Move include of
<bits/mman-linux.h> to end of file.
(MCL_CURRENT, MCL_FUTURE): Do not define here, the generic value
is fine.
* sysdeps/unix/sysv/linux/sparc/bits/mman.h: Move include of
<bits/mman-linux.h> to end of file.
* sysdeps/unix/sysv/linux/bits/mman-linux.h [!MCL_CURRENT]
(MCL_CURRENT, MCL_FUTURE): Define here.
* sysdeps/unix/sysv/linux/bits/mman-linux.h: New file, with
Linux common definitions.
* sysdeps/unix/sysv/linux/sh/bits/mman.h: Remove all defines
provided by bits/mman-linux.h and include <bits/mman-linux.h>.
* sysdeps/unix/sysv/linux/x86/bits/mman.h: Likewise.
* sysdeps/unix/sysv/linux/s390/bits/mman.h: Likewise.
* sysdeps/unix/sysv/linux/powerpc/bits/mman.h: Likewise.
* sysdeps/unix/sysv/linux/sh/bits/mman.h: Likewise.
* sysdeps/unix/sysv/linux/sparc/bits/mman.h: Likewise.