As Adhemerval noted in
<https://sourceware.org/ml/libc-alpha/2015-01/msg00451.html>, the
powerpc sqrt implementation for when _ARCH_PPCSQ is not defined is
inaccurate in some cases.
The problem is that this code relies on fused multiply-add, and relies
on the compiler contracting a * b + c to get a fused operation. But
sysdeps/ieee754/dbl-64/Makefile disables contraction for e_sqrt.c,
because the implementation in that directory relies on *not* having
contracted operations.
While it would be possible to arrange makefiles so that an earlier
sysdeps directory can disable the setting in
sysdeps/ieee754/dbl-64/Makefile, it seems a lot cleaner to make the
dependence on fused operations explicit in the .c file. GCC 4.6
introduced support for __builtin_fma on powerpc and other
architectures with such instructions, so we can rely on that; this
patch duly makes the code use __builtin_fma for all such fused
operations.
Tested for powerpc32 (hard float).
2015-02-12 Joseph Myers <joseph@codesourcery.com>
[BZ #17964]
* sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrt): Use
__builtin_fma instead of relying on contraction of a * b + c.
The tv_sec is of type time_t in both struct timeval and struct timespec.
This matches the implementation and also the relevant standard (checked
C11 for timespec and opengroup for timeval).
This patch fixes the remaining part of bug 16560, spurious underflows
from exp2 of arguments close to 0 (when the result is close to 1, so
should not underflow), by just using 1+x instead of a more complicated
calculation when the argument is sufficiently small.
Tested for x86_64, x86 and mips64.
[BZ #16560]
* math/e_exp2l.c [LDBL_MANT_DIG == 106] (LDBL_EPSILON): Undefine
and redefine.
(__ieee754_exp2l): Do not multiply small fractional parts by
M_LN2l.
* sysdeps/i386/fpu/e_exp2l.S (__ieee754_exp2l): Just add 1 to
small argument.
* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Likewise.
* sysdeps/ieee754/flt-32/e_exp2f.c (__ieee754_exp2f): Likewise.
* sysdeps/x86_64/fpu/e_exp2l.S (__ieee754_exp2l): Likewise.
* math/auto-libm-test-in: Add more tests of exp2.
* math/auto-libm-test-out: Regenerated.
This patch optimizes strncpy for power7 for unaligned source or
destination address. The source or destination address is aligned
to doubleword and data is shifted based on the alignment and
added with the previous loaded data to be written as a doubleword.
For each load, cmpb instruction is used for faster null check.
The new optimization shows 10 to 70% of performance improvement
for longer string though it does not show big difference on string
size less than 16 due to additional checks.Hence this new algorithm
is restricted to string greater than 16.
pthread_mutexattr_settype adds PTHREAD_MUTEX_NO_ELISION_NP to kind,
which is an internal flag that pthread_mutexattr_gettype shouldn't
expose, since pthread_mutexattr_settype wouldn't accept it.
and revert the corresponding part of ba90e05 which was making the fix
necessary.
* abi-tags: Revert ae20c9a: rename back gnu into gnu-gnu.
* configure.ac, configure: Revert ba90e05: modify gnu-* host_os back
into gnu-gnu, and update comment to refer to abi-tags.
This patch makes sincos set errno to EDOM when passed an infinity,
similarly to sin and cos.
Tested for x86_64, x86, powerpc and mips64. I don't know if the
architecture-specific implementations for ia64 and m68k might need
corresponding fixes.
2015-02-11 Joseph Myers <joseph@codesourcery.com>
[BZ #15467]
* sysdeps/ieee754/dbl-64/s_sincos.c: Include <errno.h>.
(__sincos): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/flt-32/s_sincosf.c: Include <errno.h>.
(SINCOSF_FUNC): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/ldbl-128/s_sincosl.c: Include <errno.h>.
(__sincosl): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/ldbl-128ibm/s_sincosl.c: Include <errno.h>.
(__sincosl): Set errno to EDOM for infinite argument.
* sysdeps/ieee754/ldbl-96/s_sincosl.c: Include <errno.h>.
(__sincosl): Set errno to EDOM for infinite argument.
* math/libm-test.inc (sincos_test_data): Test errno setting.
As noted in
<https://sourceware.org/ml/libc-alpha/2014-10/msg00369.html>, soft-fp
sysdeps subdirectories (and more generally, subdirectories where
sysdeps/foo/Implies contains foo/bar) are unnecessary and should be
eliminated. This patch does so for MIPS.
Tested for MIPS64 (all three ABIs, soft-float) that installed stripped
shared libraries are unchanged by this patch.
* sysdeps/mips/soft-fp/sfp-machine.h: Move to ....
* sysdeps/mips/mips32/sfp-machine.h: ... here.
* sysdeps/mips/mips64/soft-fp/Makefile: Move to ....
* sysdeps/mips/mips64/Makefile: ... here.
* sysdeps/mips/mips64/soft-fp/e_sqrtl.c: Move to ....
* sysdeps/mips/mips64/e_sqrtl.c: ... here.
* sysdeps/mips/mips64/soft-fp/sfp-machine.h: Move to ....
* sysdeps/mips/mips64/sfp-machine.h: ... here.
* sysdeps/mips/mips32/Implies: Remove mips/soft-fp.
* sysdeps/mips/mips64/n32/Implies: Remove mips/mips64/soft-fp.
* sysdeps/mips/mips64/n64/Implies: Likewise.
Current minimum binutils supported (2.22) has ".machine altivec" support
as default, so there is no need to add a configure check for such
functionality. This patches removes the configure checks for it.
This patch cleanup some multiarch code related to memmmove
optimization. Initial IFUNC support added specialized wordcopy
symbols which turned in local IFUNC calls used by memmove default
implementation. The patch removes the internal IFUNC for wordcopy
symbols and uses local branches in the memmmove optimization instead.
This patch cleanup some multiarch code related to memmmove
optimization. Initial IFUNC support added specialized wordcopy
symbols which turned in local IFUNC calls used by memmove default
implementation.
This change by removing then and used the optimized memmove instead
for supported chips.
This patch simplify the default bcopy symbol for powerpc64 by just using
memmove instead of implementing using the default bcopy. Since the
symbol is deprecated, it trades speed by code size.
This removes code which actually never happens, and is already taken
care of in the function.
This is in the second part of select, when the __mach_msg() function
over the portset has returned something else than MACH_MSG_SUCCESS. I
guess in the past the value returned by __mach_msg() was stored in err,
so this code was necessary to set back err to 0, but now it is stored in
msgerr, so err is already still 0 by default. It can thus never contain
MACH_RCV_TIMED_OUT, i.e. the code is dead. The first case mentioned in
the comment is already handled: on time out with no message, err is
already still the default 0. On time out due to poll, err would still be
0, unless some of the io_select RPCs has returned EINTR, in which case
it contains EINTR. If any other io_select RPCs had returned a proper
answer, got!=0, and thus err is set to 0 just below. The code is thus
indeed not useful any more.
It looks like _hurd_thread_sigstate used to return with the sigstate
lock held long ago, but since that's no longer the case, don't unlock
something that isn't locked.
Note that it's unlikely this change fixes anything in practice since
its current implementation (on i386) makes this call a nop.
soft-fp's _FP_FMA fails to set the result's exponent for cases where
the result of the multiplication is 0, yielding incorrect (arbitrary,
depending on uninitialized values) results for those cases. This
affects libm for architectures using soft-fp to implement fma. This
patch adds the exponent setting and tests for this case.
Tested for ARM soft-float (which uses soft-fp fma), x86_64 and x86 (to
verify not introducing new libm test failures there).
(This bug showed up in testing my patch to move the Linux kernel to
current soft-fp. math/Makefile has "override CFLAGS +=
-Wno-uninitialized" which would have stopped compiler warnings from
showing up this problem, although I wouldn't be surprised if removing
that shows spurious warnings from this code, if the compiler fails to
follow that various cases where the exponent is uninitialized don't
need it initialized because the class is set to a value meaning the
uninitialized exponent isn't used.)
[BZ #17932]
* soft-fp/op-common.h (_FP_FMA): Set exponent of result in case
where multiplication results in zero and third argument is finite
and nonzero.
* math/auto-libm-test-in: Add more tests of fma.
* math/auto-libm-test-out: Regenerated.
In <https://sourceware.org/ml/libc-alpha/2014-09/msg00488.html>, I
noted that comparisons in soft-fp did not set FP_EX_DENORM unless
denormal operands were flushed to zero.
This patch fixes soft-fp to check for denormal operands for
comparisons and set that exception whenever FP_EX_DENORM is not zero.
In particular, for the one architecture for which the Linux kernel
defines FP_EX_DENORM (alpha), this corresponds to the existing logic
for comparisons and so allows that logic to be replaced by a simple
call to FP_CMP_D when soft-fp is updated in the kernel.
Tested for powerpc (e500) that installed stripped shared libraries are
unchanged by this patch.
* soft-fp/op-common.h (_FP_CMP_CHECK_DENORM): New macro.
(_FP_CMP_CHECK_FLUSH_ZERO): Likewise.
(_FP_CMP): Use_FP_CMP_CHECK_DENORM and _FP_CMP_CHECK_FLUSH_ZERO.
(_FP_CMP_EQ): Likewise.
(_FP_CMP_UNORD): Use _FP_CMP_CHECK_DENORM.
One special case needed in soft-fp to replace the old version in the
Linux kernel is extending from a narrower floating-point format to a
wider one without quieting signaling NaNs. (This is for
arch/powerpc/math-emu/lfs.c, where previously it used the old FP_CONV
which didn't do anything special for NaNs, then handled packing
specially for NaNs to avoid quieting at packing time, and discarded
the exceptions from unpacking.)
This patch accordingly refactors FP_EXTEND, creating a separate
_FP_EXTEND_CNAN that offers a choice of how NaNs are handled, with
FP_EXTEND reimplemented as a wrapper that provides the common case of
the IEEE operation that does quiet signaling NaNs and raise exceptions
for them.
Tested for powerpc (e500) that installed stripped shared libraries are
unchanged by this patch.
* soft-fp/op-common.h (FP_EXTEND): Rename to _FP_EXTEND_CNAN with
extra argument CHECK_NAN. Redefine as wrapper around
_FP_EXTEND_CNAN.
This reverts part of the previous commit to refactor pthread.h.
The refactoring must be done by having pthread.h include arch
bits headers, not the other way around. Then hppa provides the
arch bits header. For now we synchronzie again with pthread.h
and include the entire contents in the hppa copy.
BZ #16618
Under certain conditions wscanf can allocate too little memory for the
to-be-scanned arguments and overflow the allocated buffer. The
implementation now correctly computes the required buffer size when
using malloc.
A regression test was added to tst-sscanf.
Update all translations.
Update contributions in the manual.
Update installation notes with information about newest working tools.
Reconfigure using exactly autoconf 2.69.
Regenerate INSTALL.
(1) Fix warnings.
This is a bulk update to fix all the warnings that were causing
build failures with -Werror on hppa.
The most egregious problems are in dl-fptr.c which needs to be
entirely rewritten, thus I've used -Wno-error for that.
(2) Fix conformance errors.
The sysdep.c file had __syscall_error and syscall in one file
which caused conformance issues by including syscall when
__syscall_error was linked to. The fix is obviously to split
the file and use syscall.c to implement syscall.
* sysdeps/sparc/sparc32/bits/atomic.h
(__sparc32_atomic_do_unlock24): Put the memory barrier before the
unlock not after it.
(__v9_compare_and_exchange_val_32_acq): Use unions to avoid getting
volatile register usage warnings from the compiler.
memcpy with unaligned 256-bit AVX register loads/stores are slow on older
processorsl like Sandy Bridge. This patch adds bit_AVX_Fast_Unaligned_Load
and sets it only when AVX2 is available.
[BZ #17801]
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
Set the bit_AVX_Fast_Unaligned_Load bit for AVX2.
* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load):
New.
(index_AVX_Fast_Unaligned_Load): Likewise.
(HAS_AVX_FAST_UNALIGNED_LOAD): Likewise.
* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the
bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit.
* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise.
* sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace
HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD.
* sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.
The padding bytes in the statsdata struct are not initialized, due to
which valgrind throws a warning:
==11384== Memcheck, a memory error detector
==11384== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==11384== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==11384== Command: nscd -d
==11384==
Fri 25 Apr 2014 10:34:53 AM CEST - 11384: handle_request: request received (Version = 2) from PID 11396
Fri 25 Apr 2014 10:34:53 AM CEST - 11384: GETSTAT
==11384== Thread 6:
==11384== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==11384== at 0x4E4ACDC: send (in /lib64/libpthread-2.12.so)
==11384== by 0x11AF6B: send_stats (in /usr/sbin/nscd)
==11384== by 0x112F75: nscd_run_worker (in /usr/sbin/nscd)
==11384== by 0x4E439D0: start_thread (in /lib64/libpthread-2.12.so)
==11384== by 0x599AB6C: clone (in /lib64/libc-2.12.so)
==11384== Address 0x15708395 is on thread 6's stack
Fix the warning by initializing the structure.
This is because of alignment issues in the sem_t support.
tilegx32 does in fact support 64-bit atomics and we will need
to revisit this after the 2.21 freeze.
This patch disables use of 64-bit atomics for MIPS n32 to fix the
problems with unaligned semaphores.
Before 64-bit atomics are used for anything for which such alignment
issues do not arise, and before the addition of any new ILP32 ports
with 64-bit semaphores for which the ABI can be set to have the
greater alignment (AARCH64?), a better approach will need to be
established that allows architectures to declare their 64-bit atomics
availability accurately, without doing so causing inappropriate use of
such atomics on unaligned semaphores.
Tested for MIPS n32 that this fixes the nptl/tst-sem3 failure.
* sysdeps/mips/bits/atomic.h [_MIPS_SIM == _ABIN32]
(__HAVE_64B_ATOMICS): Define to 0.
This patch fixes a bug introduced by 18f2945ae9, where it optimizes
the FPSCR set by just issuing a mtfs instruction if new flag is different
from older one. The issue is a typo, where the new flag should the the
new value, instead of the old one.
It fixes BZ#17885.
Some powerpc64 processors (e5500 core for instance) does not provide the
fsqrt instruction, however current check to use in math_private.h is
__WORDSIZE and _ARCH_PWR4 (ISA 2.02). This is patch change it to use
the compiler flag _ARCH_PPCSQ (which is the same condition GCC uses to
decide whether to generate fsqrt instruction).
It fixes BZ#16576.
GLIBC memset optimization for POWER8 uses the '.machine power8'
directive, which is only supported officially on binutils 2.24+. This
causes a build failure on older binutils.
Since the requirement of .machine power8 is to correctly assembly the
'mtvsrd' instruction and it is already handled by the MTVSRD_V1_R4
macro, there is no really needed of using it.
The patch replaces the power8 with power7 for .machine directive.
It fixes BZ#17869.
This patch fix the elf/ifuncmain6pie failure when building with GCC
4.9+. For some reason, the compiler removes the branch taken code at
resolve_ifunc (sysdeps/powerpc/powerpc64/dl-machine.h) as dead-code
and thus the testcase fails because the ifunc resolves branches to an
invalid memory location. It fixes by explicit adding a dependency of
value based on odp variable to avoid compiler optimization.
It fixes BZ#17868.
This patch replaces unsigned long int and 1UL with uint64_t and
(uint64_t) 1 to support ILP32 targets like x32.
[BZ #17870]
* nptl/sem_post.c (__new_sem_post): Replace unsigned long int
with uint64_t.
* nptl/sem_waitcommon.c (__sem_wait_cleanup): Replace 1UL with
(uint64_t) 1.
(__new_sem_wait_slow): Replace unsigned long int with uint64_t.
Replace 1UL with (uint64_t) 1.
* sysdeps/nptl/internaltypes.h (new_sem): Replace unsigned long
int with uint64_t.
soft-fp has various macros containing labels and goto statements.
Because label names are function-scoped, this is problematic for using
the same macro more than once within a function, which some
architectures do in the Linux kernel (the soft-fp version there
predates the addition of any of these labels and gotos). This patch
fixes this by using __label__ to make the labels local to the block
with the __label__ declaration.
Tested for powerpc-nofpu that installed stripped shared libraries are
unchanged by this patch.
* soft-fp/op-common.h (_FP_ADD_INTERNAL): Declare labels with
__label__.
(_FP_FMA): Likewise.
(_FP_TO_INT_ROUND): Likewise.
(_FP_FROM_INT): Likewise.
This patch fix powerpc __get_clockfreq racy and cancel-safe issues by
dropping internal static cache and by using nocancel file operations.
The vDSO failure check is also removed, since kernel code does not
return an error (it cleans cr0.so bit on function return) and the static
code (to read value /proc) now uses non-cancellable calls.
The test is rewritten to look for the testable conditions and
exit once they are all detected. This prevents the test from
iterating over 2000 UIDs and looking up each one. It speeds up
the test and prevents it from failing if the system under test
has an NSS-based passwd that is slower than the test timeout.
See:
https://sourceware.org/ml/libc-alpha/2015-01/msg00394.html