Commit Graph

28422 Commits

Author SHA1 Message Date
H.J. Lu
5f3d0b78e0 Use AVX unaligned memcpy only if AVX2 is available
memcpy with unaligned 256-bit AVX register loads/stores are slow on older
processorsl like Sandy Bridge.  This patch adds bit_AVX_Fast_Unaligned_Load
and sets it only when AVX2 is available.

	[BZ #17801]
	* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
	Set the bit_AVX_Fast_Unaligned_Load bit for AVX2.
	* sysdeps/x86_64/multiarch/init-arch.h (bit_AVX_Fast_Unaligned_Load):
	New.
	(index_AVX_Fast_Unaligned_Load): Likewise.
	(HAS_AVX_FAST_UNALIGNED_LOAD): Likewise.
	* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check the
	bit_AVX_Fast_Unaligned_Load bit instead of the bit_AVX_Usable bit.
	* sysdeps/x86_64/multiarch/memcpy_chk.S (__memcpy_chk): Likewise.
	* sysdeps/x86_64/multiarch/mempcpy.S (__mempcpy): Likewise.
	* sysdeps/x86_64/multiarch/mempcpy_chk.S (__mempcpy_chk): Likewise.
	* sysdeps/x86_64/multiarch/memmove.c (__libc_memmove): Replace
	HAS_AVX with HAS_AVX_FAST_UNALIGNED_LOAD.
	* sysdeps/x86_64/multiarch/memmove_chk.c (__memmove_chk): Likewise.
2015-01-30 15:37:58 -08:00
Andreas Schwab
b658fdd82b Include <signal.h> in sysdeps/nptl/allocrtsig.c
Architectures which don't use hp-timing-common.h don't include <signal.h>
via <sys/param.h>.
2015-01-29 10:00:25 +01:00
Siddhesh Poyarekar
00b8b9baf4 Fix up ChangeLog formatting 2015-01-29 10:31:10 +05:30
Siddhesh Poyarekar
3cb26316b4 Initialize nscd stats data [BZ #17892]
The padding bytes in the statsdata struct are not initialized, due to
which valgrind throws a warning:

==11384== Memcheck, a memory error detector
==11384== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al.
==11384== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info
==11384== Command: nscd -d
==11384==
Fri 25 Apr 2014 10:34:53 AM CEST - 11384: handle_request: request received (Version = 2) from PID 11396
Fri 25 Apr 2014 10:34:53 AM CEST - 11384:       GETSTAT
==11384== Thread 6:
==11384== Syscall param socketcall.sendto(msg) points to uninitialised byte(s)
==11384==    at 0x4E4ACDC: send (in /lib64/libpthread-2.12.so)
==11384==    by 0x11AF6B: send_stats (in /usr/sbin/nscd)
==11384==    by 0x112F75: nscd_run_worker (in /usr/sbin/nscd)
==11384==    by 0x4E439D0: start_thread (in /lib64/libpthread-2.12.so)
==11384==    by 0x599AB6C: clone (in /lib64/libc-2.12.so)
==11384==  Address 0x15708395 is on thread 6's stack

Fix the warning by initializing the structure.
2015-01-29 10:30:09 +05:30
Martin Sebor
527de9e4e3 Clarify math/README.libm-test. Add "How to read the test output." 2015-01-28 21:07:01 -07:00
Chris Metcalf
06991eb816 tilegx32: set __HAVE_64B_ATOMICS to 0
This is because of alignment issues in the sem_t support.
tilegx32 does in fact support 64-bit atomics and we will need
to revisit this after the 2.21 freeze.
2015-01-28 14:51:21 -05:00
Joseph Myers
df34134284 Disable 64-bit atomics for MIPS n32.
This patch disables use of 64-bit atomics for MIPS n32 to fix the
problems with unaligned semaphores.

Before 64-bit atomics are used for anything for which such alignment
issues do not arise, and before the addition of any new ILP32 ports
with 64-bit semaphores for which the ABI can be set to have the
greater alignment (AARCH64?), a better approach will need to be
established that allows architectures to declare their 64-bit atomics
availability accurately, without doing so causing inappropriate use of
such atomics on unaligned semaphores.

Tested for MIPS n32 that this fixes the nptl/tst-sem3 failure.

	* sysdeps/mips/bits/atomic.h [_MIPS_SIM == _ABIN32]
	(__HAVE_64B_ATOMICS): Define to 0.
2015-01-28 18:40:35 +00:00
Adhemerval Zanella
d4d0ecb244 powerpc: Fix fesetexceptflag [BZ#17885]
This patch fixes a bug introduced by 18f2945ae9, where it optimizes
the FPSCR set by just issuing a mtfs instruction if new flag is different
from older one.  The issue is a typo, where the new flag should the the
new value, instead of the old one.

It fixes BZ#17885.
2015-01-28 05:59:21 -05:00
Adhemerval Zanella
08cee2a464 powerpc: Fix fsqrt build in libm [BZ#16576]
Some powerpc64 processors (e5500 core for instance) does not provide the
fsqrt instruction, however current check to use in math_private.h is
__WORDSIZE and _ARCH_PWR4 (ISA 2.02).  This is patch change it to use
the compiler flag _ARCH_PPCSQ (which is the same condition GCC uses to
decide whether to generate fsqrt instruction).

It fixes BZ#16576.
2015-01-28 05:59:16 -05:00
Andreas Krebbel
5fe8e35975 iconv: Suppress array out of bounds warning. 2015-01-27 09:37:04 +01:00
Andreas Schwab
fa20da31c8 ia64: avoid set-but-not-used warning 2015-01-25 23:38:04 +01:00
Andreas Schwab
45819cbca1 m68k/coldfire: avoid warning about volatile register variables 2015-01-25 23:36:02 +01:00
Andreas Schwab
403cc231e6 m68k: fix missing definition of __feraiseexcept 2015-01-25 23:36:02 +01:00
Andreas Schwab
24bb7432a7 m68k: force inlining bswap functions 2015-01-25 23:35:51 +01:00
Bram
9317ea653a Fix segmentation fault when LD_LIBRARY_PATH contains only non-existings paths 2015-01-25 15:12:10 +10:00
Adhemerval Zanella
bea5801360 powerpc: Fix powerpc64 build failure with binutils 2.22
GLIBC memset optimization for POWER8 uses the '.machine power8'
directive, which is only supported officially on binutils 2.24+.  This
causes a build failure on older binutils.

Since the requirement of .machine power8 is to correctly assembly the
'mtvsrd' instruction and it is already handled by the MTVSRD_V1_R4
macro, there is no really needed of using it.

The patch replaces the power8 with power7 for .machine directive.

It fixes BZ#17869.
2015-01-24 08:40:04 -05:00
Adhemerval Zanella
0e87343e20 powerpc: Fix ifuncmain6pie failure with GCC 4.9
This patch fix the elf/ifuncmain6pie failure when building with GCC
4.9+.  For some reason, the compiler removes the branch taken code at
resolve_ifunc (sysdeps/powerpc/powerpc64/dl-machine.h) as dead-code
and thus the testcase fails because the ifunc resolves branches to an
invalid memory location.  It fixes by explicit adding a dependency of
value based on odp variable to avoid compiler optimization.

It fixes BZ#17868.
2015-01-24 08:38:39 -05:00
H.J. Lu
972af9e8dd Also treat model numbers 0x5a/0x5d as Silvermont 2015-01-23 18:52:45 -08:00
H.J. Lu
ede0236c86 Treat model numbers 0x4a/0x4d as Silvermont
* sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features):
	Treat model numbers 0x4a/0x4d as Intel Silvermont architecture.
2015-01-23 18:08:10 -08:00
H.J. Lu
e0da28a1b2 Also use uint64_t in __new_sem_wait_fast 2015-01-23 16:21:07 -08:00
H.J. Lu
22971c35e2 Use uint64_t and (uint64_t) 1 for 64-bit int
This patch replaces unsigned long int and 1UL with uint64_t and
(uint64_t) 1 to support ILP32 targets like x32.

	[BZ #17870]
	* nptl/sem_post.c (__new_sem_post): Replace unsigned long int
	with uint64_t.
	* nptl/sem_waitcommon.c (__sem_wait_cleanup): Replace 1UL with
	(uint64_t) 1.
	(__new_sem_wait_slow): Replace unsigned long int with uint64_t.
	Replace 1UL with (uint64_t) 1.
	* sysdeps/nptl/internaltypes.h (new_sem): Replace unsigned long
	int with uint64_t.
2015-01-23 14:48:40 -08:00
Roland McGrath
2ec2d7032f Add missing libc_hidden_weak to stub if_nameindex, if_freenameindex. 2015-01-23 11:29:02 -08:00
Roland McGrath
da5bf2459c Add missing libc_hidden_def to stub getrlimit64. 2015-01-23 10:41:37 -08:00
Joseph Myers
d7423856b5 soft-fp: Use __label__ for all labels within macros.
soft-fp has various macros containing labels and goto statements.
Because label names are function-scoped, this is problematic for using
the same macro more than once within a function, which some
architectures do in the Linux kernel (the soft-fp version there
predates the addition of any of these labels and gotos).  This patch
fixes this by using __label__ to make the labels local to the block
with the __label__ declaration.

Tested for powerpc-nofpu that installed stripped shared libraries are
unchanged by this patch.

	* soft-fp/op-common.h (_FP_ADD_INTERNAL): Declare labels with
	__label__.
	(_FP_FMA): Likewise.
	(_FP_TO_INT_ROUND): Likewise.
	(_FP_FROM_INT): Likewise.
2015-01-22 22:39:26 +00:00
Adhemerval Zanella
6b2ba95b6b BZ #16418: Fix powerpc get_clockfreq raciness
This patch fix powerpc __get_clockfreq racy and cancel-safe issues by
dropping internal static cache and by using nocancel file operations.
The vDSO failure check is also removed, since kernel code does not
return an error (it cleans cr0.so bit on function return) and the static
code (to read value /proc) now uses non-cancellable calls.
2015-01-21 10:46:49 -05:00
Carlos O'Donell
191220b306 Update copyright year to 2015 for new files. 2015-01-21 10:35:31 -05:00
Carlos O'Donell
0897c551c0 tst-getpw: Rewrite.
The test is rewritten to look for the testable conditions and
exit once they are all detected. This prevents the test from
iterating over 2000 UIDs and looking up each one. It speeds up
the test and prevents it from failing if the system under test
has an NSS-based passwd that is slower than the test timeout.

See:
https://sourceware.org/ml/libc-alpha/2015-01/msg00394.html
2015-01-21 10:08:18 -05:00
Marek Polacek
86bba162b5 Fix tst_wcscpy.c test. 2015-01-21 12:30:42 +01:00
Carlos O'Donell
ccdb048df4 Fix recursive dlopen.
The ability to recursively call dlopen is useful for malloc
implementations that wish to load other dynamic modules that
implement reentrant/AS-safe functions to use in their own
implementation.

Given that a user malloc implementation may be called by an
ongoing dlopen to allocate memory the user malloc
implementation interrupts dlopen and if it calls dlopen again
that's a reentrant call.

This patch fixes the issues with the ld.so.cache mapping
and the _r_debug assertion which prevent this from working
as expected.

See:
https://sourceware.org/ml/libc-alpha/2014-12/msg00446.html
2015-01-21 01:51:10 -05:00
Carlos O'Donell
042e1521c7 Fix semaphore destruction (bug 12674).
This commit fixes semaphore destruction by either using 64b atomic
operations (where available), or by using two separate fields when only
32b atomic operations are available.  In the latter case, we keep a
conservative estimate of whether there are any waiting threads in one
bit of the field that counts the number of available tokens, thus
allowing sem_post to atomically both add a token and determine whether
it needs to call futex_wake.

See:
https://sourceware.org/ml/libc-alpha/2014-12/msg00155.html
2015-01-21 00:46:16 -05:00
Carlos O'Donell
a8db092ec0 Regenerate INSTALL. 2015-01-20 22:25:38 -05:00
Carlos O'Donell
fe0e85afcb Update libc.pot:
In preparation for providing a tarball to the translation project.

        * po/libc.pot: Regenerated.
2015-01-20 22:18:11 -05:00
Chung-Lin Tang
522e6ee3b4 Commit nios2 port to master. 2015-01-17 22:29:12 -08:00
Stefan Liebler
026eb207ed S390: Get rid of linknamespace failures for utmp functions. 2015-01-16 09:18:58 +01:00
Stefan Liebler
1d53248326 S390: Get rid of linknamespace failures for string functions. 2015-01-16 09:17:32 +01:00
Joseph Myers
53fbd16918 Fix powerpc-nofpu fesetenv namespace (bug 17748).
When fixing namespace issues for <fenv.h> functions I missed one call
to fesetenv for powerpc-nofpu.  This patch changes this to a call to
__fesetenv.

Tested for powerpc-nofpu; it fixes the previously observed math.h
linknamespace test failures.

	[BZ #17748]
	* sysdeps/powerpc/nofpu/feholdexcpt.c (__feholdexcept): Call
	__fesetenv instead of fesetenv.
2015-01-14 21:35:40 +00:00
Siddhesh Poyarekar
d639a36345 [s390] Define a __tls_get_addr macro to avoid declaring it again
commit 050f7298e1 added an extern
declaration for __tls_get_addr that conflicts with the one in s390
dl-tls.h, based on whether __tls_get_addr is defined as a macro.  The
rationale seems to be based on the assumption that __tls_get_addr is
exported for every architecture and hence an internal non-plt alias is
needed.  This is not true for s390 though, since it exports
__tls_get_offset and not __tls_get_addr.  This results in tst-audit9
being stuck in an infinite loop.

This patch fixes this by defining a __tls_get_addr macro to itself so
as to not use the conflicting declaration.
2015-01-14 21:26:50 +05:30
Adhemerval Zanella
ce6615c9c6 powerpc: Fix POWER7/PPC64 performance regression on LE
This patch fixes a performance regression on the POWER7/PPC64 memcmp
porting for Little Endian.  The LE code uses 'ldbrx' instruction to read
the memory on byte reversed form, however ISA 2.06 just provide the indexed
form which uses a register value as additional index, instead of a fixed value
enconded in the instruction.

And the port strategy for LE uses r0 index value and update the address
value on each compare loop interation.  For large compare size values,
it adds 8 more instructions plus some more depending of trailing
size.  This patch fixes it by adding pre-calculate indexes to remove the
address update on loops and tailing sizes.

For large sizes it shows a considerable gain, with double performance
pairing with BE.
2015-01-13 14:35:40 -05:00
Adhemerval Zanella
d3b00f468b powerpc: Optimized strncmp for POWER8/PPC64
This patch adds an optimized POWER8 strncmp.  The implementation focus
on speeding up unaligned cases follwing the ideas of power8 strcmp.

The algorithm first check the initial 16 bytes, then align the first
function source and uses unaligned loads on second argument only.
Aditional checks for page boundaries are done for unaligned cases
(where sources alignment are different).
2015-01-13 14:35:40 -05:00
Rajalakshmi Srinivasaraghavan
72607db038 powerpc: Optimize POWER7 strcmp trailing checks
This patch optimized the POWER7 trailing check by avoiding using byte
read operations and instead use the doubleword already readed with
bitwise operations.
2015-01-13 14:35:40 -05:00
David S. Miller
54dc546139 Fix scanf15.c testsuite build on sparc.
* include/signal.h (__sigreturn): Guard with __USE_MISC.
2015-01-13 11:28:17 -08:00
Roland McGrath
1c6e6f2315 Remove some references to bcopy/bcmp/bzero. 2015-01-13 11:12:55 -08:00
Adhemerval Zanella
8bedcb5f03 powerpc: Optimized strcmp for POWER8/PPC64
This patch adds an optimized POWER8 strcmp using unaligned accesses.
The algorithm first check the initial 16 bytes, then align the first
function source and uses unaligned loads on second argument only.
Aditional checks for page boundaries are done for unaligned cases
2015-01-13 11:28:58 -05:00
Adhemerval Zanella
f06a4faf8a powerpc: Optimized st{r,p}ncpy for POWER8/PPC64
This patch adds an optimized POWER8 st{r,p}ncpy using unaligned accesses.
It shows 10%-80% improvement over the optimized POWER7 one that uses
only aligned accesses, specially on unaligned inputs.

The algorithm first read and check 16 bytes (if inputs do not cross a 4K
page size).  The it realign source to 16-bytes and issue a 16 bytes read
and compare loop to speedup null byte checks for large strings.  Also,
different from POWER7 optimization, the null pad is done inline in the
implementation using possible unaligned accesses, instead of realying on
a memset call.  Special case is added for page cross reads.
2015-01-13 11:28:44 -05:00
Adhemerval Zanella
9f2f36e5a9 powerpc: Optimized strncat for POWER7/PPC64
With 3eb38795db (Simplify strncat) the generic algorithms uses
strlen, strnlen, and memcpy.  This is faster than POWER7 current
implementation, especially for unaligned strings (where POWER7 code
uses byte-byte operations).

This patch removes the assembly implementation and uses a multiarch
specialization based on default algorithm calling optimized POWER7
symbols.
2015-01-13 11:28:40 -05:00
Adhemerval Zanella
94c9680945 powerpc: Optimized strcat for POWER8/PPC64
With new optimized strcpy for POWER8, this patch adds an optimized
strcat which uses it along with default implementation at strings/.
2015-01-13 11:28:36 -05:00
Adhemerval Zanella
96d6fd6c40 powerpc: Optimized st{r,p}cpy for POWER8/PPC64
This patch adds an optimized POWER8 strcpy using unaligned accesses.
For strings up to 16 bytes the implementation first calculate the
string size, like strlen, and issues a memcpy.  For larger strings,
source is first aligned to 16 bytes and then tested over a loop that
reads 16 bytes am combine the cmpb results for speedup.  Special case is
added for page cross reads.

It shows 30%-60% improvement over the optimized POWER7 one that uses
only aligned accesses.
2015-01-13 11:28:30 -05:00
Leonhard Holz
0f9e585480 Fix memory handling in strxfrm_l [BZ #16009]
[Modified from the original email by Siddhesh Poyarekar]

This patch solves bug #16009 by implementing an additional path in
strxfrm that does not depend on caching the weight and rule indices.

In detail the following changed:

* The old main loop was factored out of strxfrm_l into the function
do_xfrm_cached to be able to alternativly use the non-caching version
do_xfrm.

* strxfrm_l allocates a a fixed size array on the stack. If this is not
sufficiant to store the weight and rule indices, the non-caching path is
taken. As the cache size is not dependent on the input there can be no
problems with integer overflows or stack allocations greater than
__MAX_ALLOCA_CUTOFF. Note that malloc-ing is not possible because the
definition of strxfrm does not allow an oom errorhandling.

* The uncached path determines the weight and rule index for every char
and for every pass again.

* Passing all the locale data array by array resulted in very long
parameter lists, so I introduced a structure that holds them.

* Checking for zero src string has been moved a bit upwards, it is
before the locale data initialization now.

* To verify that the non-caching path works correct I added a test run
to localedata/sort-test.sh & localedata/xfrm-test.c where all strings
are patched up with spaces so that they are too large for the caching path.
2015-01-13 11:33:56 +05:30
Torvald Riegel
c60ec0e016 Fix wake-up in sysdeps/nptl/fork.c. 2015-01-13 01:09:29 +01:00
Joseph Myers
5a9e4c09a2 Fix ldbl-96 scalblnl underflowing results (bug 17803).
The ldbl-96 implementation of scalblnl (used for x86_64 and ia64) uses
a condition k <= -63 to determine when a standard underflowing result
tiny*__copysignl(tiny,x) should be returned.  However, that condition
corresponds to values with exponent -16446 or less, and in the case of
-16446, the correct result for round-to-nearest depends on whether the
value is exactly 0x1p-16446 (half the least subnormal) or more than
that.  This patch fixes the bug by changing the condition to k <= -64
and accordingly adjusting the exponent by 64 not 63 when converting to
a normal value.

Tested for x86_64.

	[BZ #17803]
	* sysdeps/ieee754/ldbl-96/s_scalblnl.c (twom63): Rename to
	twom64.  Adjust value to 0x1p-64L.
	(__scalblnl): Only return standard underflowing result for K <=
	-64 not K <= -63; adjust exponent for underflowing result by 64
	not 63.
	* math/libm-test.inc (scalbn_test_data): Add more tests.
	(scalbln_test_data): Likewise.
2015-01-12 23:02:14 +00:00