For strings >16B and <32B existing algorithm takes more time than default
implementation when strings are placed closed to end of page. This is due
to byte by byte access for handling page cross. This is improved by
following >32B code path where the address is adjusted to aligned memory
before doing load doubleword operation instead of loading bytes.
Tested on powerpc64 and powerpc64le.
Based on comments on previous attempt to address BZ#16640 [1],
the idea is not support invalid use of strtok (the original
bug report proposal). This leader to a new strtok optimized
strtok implementation [2].
The idea of this patch is to fix BZ#16640 to align all the
implementations to a same contract. However, with newer strtok
code it is better to get remove the old assembly ones instead of
fix them.
For x86 is a gain in all cases since the new implementation can
potentially use sse2/sse42 implementation for strspn and strcspn.
This shows a better performance on both i686 and x86_64 using
the string benchtests.
On powerpc64 the gains are mixed, where only for larger inputs
or keys some gains are showns (based on benchtest it seems that
it shows some gains for keys larger than 10 and inputs larger
than 32). I would prefer to remove the optimized implementation
based on first code simplicity and second because some more gain
could be optimized using a better optimized strcspn/strspn
code (as for x86). However if powerpc arch maintainers prefer I
can send a v2 with the assembly code adjusted instead.
Checked on x86_64-linux-gnu, i686-linux-gnu, and powerpc64le-linux-gnu.
[BZ #16640]
* sysdeps/i386/i686/strtok.S: Remove file.
* sysdeps/i386/i686/strtok_r.S: Likewise.
* sysdeps/i386/strtok.S: Likewise.
* sysdeps/i386/strtok_r.S: Likewise.
* sysdeps/powerpc/powerpc64/strtok.S: Likewise.
* sysdeps/powerpc/powerpc64/strtok_r.S: Likewise.
* sysdeps/x86_64/strtok.S: Likewise.
* sysdeps/x86_64/strtok_r.S: Likewise.
[1] https://sourceware.org/ml/libc-alpha/2016-10/msg00411.html
[2] https://sourceware.org/ml/libc-alpha/2016-12/msg00461.html
Since commit 6e46de42fe default strcat implementation is essentially
the same for specialized ia64 and powerpc ones. This patch removes the
redundant implementation and adjust powerpc64 ifunc code to use the
default one.
Checked on powerpc32-linux-gnu (default and power4) and ia64-linux build
and on powerpc64le-linux-gnu.
* sysdeps/ia64/strcat.c: Remove file.
* sysdeps/powerpc/strcat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcat-power7.c: Use default
C implementation.
* sysdeps/powerpc/powerpc64/multiarch/strcat-power8.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcat-ppc64.c: Likewise.
The P7 code is used for <=32B strings and for > 32B vectorized loops are used.
This shows as an average 25% improvement depending on the position of search
character. The performance is same for shorter strings.
Tested on ppc64 and ppc64le.
Current optimized powercp64/power7 memchr uses a strategy to check for
p versus align(p+n) (where 'p' is the input char pointer and n the
maximum size to check for the byte) without taking care for possible
overflow on the pointer addition in case of large 'n'.
It was triggered by 3038145ca2 where default rawmemchr (used to
created ppc64 rawmemchr in ifunc selection) now uses memchr (p, c, (size_t)-1)
on its implementation.
This patch fixes it by implement a satured addition where overflows
sets the maximum pointer size to UINTPTR_MAX.
Checked on powerpc64le-linux-gnu.
[BZ# 20971]
* sysdeps/powerpc/powerpc64/power7/memchr.S (__memchr): Avoid
overflow in pointer addition.
* string/test-memchr.c (do_test): Add an argument to pass as
the size on memchr.
(test_main): Add check for SIZE_MAX.
Commit c7debbdfac redirected the internal strrch to default powerpc64
implementation by redefining the weak_alias at
sysdeps/powerpc/powerpc64/multiarch/strchr-ppc64.c:
#undef weak_alias
#define weak_alias(name, aliasname) \
extern __typeof (__strrchr_ppc) aliasname \
__attribute__ ((weak, alias ("__strrchr_ppc")));
This creates a __GI_strchr alias that clashes with the IFUNC symbol in
stprchr.os. There is not need to define the default version for internal
version, since ifunc should work internally for powerpc64. This patch
removes the weak_alias indirection.
Checked on powerpc64le.
* sysdeps/powerpc/powerpc64/multiarch/strrchr-ppc64.c (weak_alias):
Remove redirection to __strrchr_ppc.
Commit 142e0a9953 redirected the internal stpcpy to default powerpc64
implementation by redefining the weak_alias at
sysdeps/powerpc/powerpc64/multiarch/stpcpy-ppc64.c:
#undef weak_alias
#define weak_alias(name, aliasname) \
extern __typeof (__stpcpy_ppc) aliasname \
__attribute__ ((weak, alias ("__stpcpy_ppc")));
This creates a __GI_stpcpy alias that clashes with the IFUNC symbol in
stpcpy.os. There is not need to define the default version for internal
version, since ifunc should work internally for powerpc64. This patch
removes the weak_alias indirection.
Checked on powerpc64le.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy-ppc64.c (weak_alias):
Remove redirection to __stpcpy_ppc.
Building glibc for powerpc64 with recent (2.27.51.20161012) binutils,
with multi-arch enabled, I get the error:
../sysdeps/powerpc/powerpc64/power6/memset.S: Assembler messages:
../sysdeps/powerpc/powerpc64/power6/memset.S:254: Error: operand out of range (5 is not between 0 and 1)
../sysdeps/powerpc/powerpc64/power6/memset.S:254: Error: operand out of range (128 is not between 0 and 31)
../sysdeps/powerpc/powerpc64/power6/memset.S:254: Error: missing operand
Indeed, cmpli is documented as a four-operand instruction, and looking
at nearby code it seems likely cmpldi was intended. This patch fixes
this powerpc64 code accordingly, and makes a corresponding change to
the powerpc32 code.
Tested for powerpc, powerpc64 and powerpc64le by Tulio Magno Quites
Machado Filho
* sysdeps/powerpc/powerpc32/power6/memset.S (memset): Use cmplwi
instead of cmpli.
* sysdeps/powerpc/powerpc64/power6/memset.S (memset): Use cmpldi
instead of cmpli.
The powerpc (hard-float) implementations of copysignl, both 32-bit and
64-bit, raise spurious "invalid" exceptions when the first argument is
a signaling NaN. copysign functions should never raise exceptions
even for signaling NaNs.
The problem is the use of an fcmpu instruction to test the sign of the
high part of the long double argument. This patch fixes the functions
to use fsel instead (as used for fabsl following my fixes for a
similar bug there), or to examine the integer representation for older
32-bit processors without fsel.
Tested for powerpc64 and powerpc32 (configurations with and without
fsel used).
[BZ #20718]
* sysdeps/powerpc/powerpc32/fpu/s_copysignl.S (__copysignl): Do
not use floating-point comparisons to test sign.
* sysdeps/powerpc/powerpc64/fpu/s_copysignl.S (__copysignl):
Likewise.
The current s390 ifunc resolver for vector optimized functions and the common
libc_ifunc macro in include/libc-symbols.h uses something like that to generate ifunc'ed functions:
extern void *__resolve___strlen(unsigned long int dl_hwcap) asm (strlen);
asm (".type strlen, %gnu_indirect_function");
This leads to false debug information:
objdump --dwarf=info libc.so:
...
<1><1e6424>: Abbrev Number: 43 (DW_TAG_subprogram)
<1e6425> DW_AT_external : 1
<1e6425> DW_AT_name : (indirect string, offset: 0x1146e): __resolve___strlen
<1e6429> DW_AT_decl_file : 1
<1e642a> DW_AT_decl_line : 23
<1e642b> DW_AT_linkage_name: (indirect string, offset: 0x1147a): strlen
<1e642f> DW_AT_prototyped : 1
<1e642f> DW_AT_type : <0x1e4ccd>
<1e6433> DW_AT_low_pc : 0x998e0
<1e643b> DW_AT_high_pc : 0x16
<1e6443> DW_AT_frame_base : 1 byte block: 9c (DW_OP_call_frame_cfa)
<1e6445> DW_AT_GNU_all_call_sites: 1
<1e6445> DW_AT_sibling : <0x1e6459>
<2><1e6449>: Abbrev Number: 44 (DW_TAG_formal_parameter)
<1e644a> DW_AT_name : (indirect string, offset: 0x1845): dl_hwcap
<1e644e> DW_AT_decl_file : 1
<1e644f> DW_AT_decl_line : 23
<1e6450> DW_AT_type : <0x1e4c8d>
<1e6454> DW_AT_location : 0x122115 (location list)
...
The debuginfo for the ifunc-resolver function contains the DW_AT_linkage_name
field, which names the real function name "strlen". If you perform an inferior
function call to strlen in lldb, then it fails due to something like that:
"error: no matching function for call to 'strlen'
candidate function not viable: no known conversion from 'const char [6]'
to 'unsigned long' for 1st argument"
The unsigned long is the dl_hwcap argument of the resolver function.
The strlen function itself has no debufinfo.
The s390 ifunc resolver for memset & co uses something like that:
asm (".globl FUNC"
".type FUNC, @gnu_indirect_function"
".set FUNC, __resolve_FUNC");
This way the debuginfo for the ifunc-resolver function does not conain the
DW_AT_linkage_name field and the real function has no debuginfo, too.
Using this strategy for the vector optimized functions leads to some troubles
for functions like strnlen. Here we have __strnlen and a weak alias strnlen.
The __strnlen function is the ifunc function, which is realized with the asm-
statement above. The weak_alias-macro can't be used here due to undefined symbol:
gcc ../sysdeps/s390/multiarch/strnlen.c -c ...
In file included from <command-line>:0:0:
../sysdeps/s390/multiarch/strnlen.c:28:24: error: ‘strnlen’ aliased to undefined symbol ‘__strnlen’
weak_alias (__strnlen, strnlen)
^
./../include/libc-symbols.h:111:26: note: in definition of macro ‘_weak_alias’
extern __typeof (name) aliasname __attribute__ ((weak, alias (#name)));
^
../sysdeps/s390/multiarch/strnlen.c:28:1: note: in expansion of macro ‘weak_alias’
weak_alias (__strnlen, strnlen)
^
make[2]: *** [build/string/strnlen.o] Error 1
As the __strnlen function is defined with asm-statements the function name
__strnlen isn't known by gcc. But the weak alias can also be done with an
asm statement to resolve this issue:
__asm__ (".weak strnlen\n\t"
".set strnlen,__strnlen\n");
In order to use the weak_alias macro, gcc needs to know the ifunc function. The
minimum gcc to build glibc is currently 4.7, which supports attribute((ifunc)).
See https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/Function-Attributes.html.
It is only supported if gcc is configured with --enable-gnu-indirect-function
or gcc supports it by default for at least intel and s390x architecture.
This patch uses the old behaviour if gcc support is not available.
Usage of attribute ifunc is something like that:
__typeof (FUNC) FUNC __attribute__ ((ifunc ("__resolve_FUNC")));
Then gcc produces the same .globl, .type, .set assembler instructions like above.
And the debuginfo does not contain the DW_AT_linkage_name field and there is no
debuginfo for the real function, too.
But in order to get it work, there is also some extra work to do.
Currently, the glibc internal symbol on s390x e.g. __GI___strnlen is not the
ifunc symbol, but the fallback __strnlen_c symbol. Thus I have to omit the
libc_hidden_def macro in strnlen.c (here is the ifunc function __strnlen)
because it is already handled in strnlen-c.c (here is __strnlen_c).
Due to libc_hidden_proto (__strnlen) in string.h, compiling fails:
gcc ../sysdeps/s390/multiarch/strnlen.c -c ...
In file included from <command-line>:0:0:
../sysdeps/s390/multiarch/strnlen.c:53:24: error: ‘strnlen’ aliased to undefined symbol ‘__strnlen’
weak_alias (__strnlen, strnlen)
^
./../include/libc-symbols.h:111:26: note: in definition of macro ‘_weak_alias’
extern __typeof (name) aliasname __attribute__ ((weak, alias (#name)));
^
../sysdeps/s390/multiarch/strnlen.c:53:1: note: in expansion of macro ‘weak_alias’
weak_alias (__strnlen, strnlen)
^
make[2]: *** [build/string/strnlen.os] Error 1
I have to redirect the prototypes for __strnlen in string.h and create a copy
of the prototype for using as ifunc function:
__typeof (__redirect___strnlen) __strnlen __attribute__ ((ifunc ("__resolve_strnlen")));
weak_alias (__strnlen, strnlen)
This way there is no trouble with the internal __GI_* symbols.
Glibc builds fine with this construct and the debuginfo is "correct".
For functions without a __GI_* symbol like memccpy this redirection is not needed.
This patch adjusts the common libc_ifunc and libm_ifunc macro to use gcc
attribute ifunc. Due to this change, the macro users where the __GI_* symbol
does not target the ifunc symbol have to be prepared with the redirection
construct.
Furthermore a configure check to test gcc support is added. If it is not supported,
the old behaviour is used.
This patch also prepares the libc_ifunc macro to be useable in s390-ifunc-macro.
The s390 ifunc-resolver-functions do have an hwcaps parameter and not all
resolvers need the same initialization code. The next patch in this series
changes the s390 ifunc macros to use this common one.
ChangeLog:
* include/libc-symbols.h (__ifunc_resolver):
New macro is used by __ifunc* macros.
(__ifunc): New macro uses gcc attribute ifunc or inline assembly
depending on HAVE_GCC_IFUNC.
(libc_ifunc, libm_ifunc): Use __ifunc as base macro.
(libc_ifunc_redirected, libc_ifunc_hidden, libm_ifunc_init): New macro.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finite.c:
Redirect ifunced function in header for using as type for ifunc function.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memmove.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/memset.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strlen.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finite.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_finitef.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isinff.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnan.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/rawmemchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcat.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strnlen.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strstr.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/wcschr.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_isnanf.c:
Add libc_hidden_def() and use libc_ifunc_hidden() macro
instead of libc_ifunc() macro.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_isnanf.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c: Likewise.
Commit a6a4395d fixed modf implementation by compiling s_modf.c and
s_modff.c with -fsignaling-nans. However these files are also included
from the pre-POWER5+ implementation, and thus these files should also
be compiled with -fsignaling-nans.
Changelog:
[BZ #20240]
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/Makefile
(CFLAGS-s_modf-ppc32.c): New variable.
(CFLAGS-s_modff-ppc32.c): Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
(CFLAGS-s_modf-ppc64.c): Likewise.
(CFLAGS-s_modff-ppc64.c): Likewise.
If the input values are unaligned and if there are null characters in the
memory before the starting address of the input values, strcasecmp
gives incorrect return code. Fixed it by adding mask the bits that
are not part of the string.
This implementation is based on the one already used at
sysdeps/x86_64/fpu/e_expf.S.
This implementation improves the performance by ~14% on average in synthetic
benchmarks at the cost of decreasing accuracy to 1 ULP.
atomic_compare_and_exchange_bool_rel and
catomic_compare_and_exchange_bool_rel are removed and replaced with the
new C11-like atomic_compare_exchange_weak_release. The concurrent code
in nscd/cache.c has not been reviewed yet, so this patch does not add
detailed comments.
* nscd/cache.c (cache_add): Use new C11-like atomic operation instead
of atomic_compare_and_exchange_bool_rel.
* nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_full): Likewise.
* include/atomic.h (atomic_compare_and_exchange_bool_rel,
catomic_compare_and_exchange_bool_rel): Remove.
* sysdeps/aarch64/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
* sysdeps/alpha/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
* sysdeps/arm/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
* sysdeps/mips/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
* sysdeps/tile/atomic-machine.h
(atomic_compare_and_exchange_bool_rel): Likewise.
Some architectures have their own versions of fdim functions, which
are missing errno setting (bug 6796) and may also return sNaN instead
of qNaN for sNaN input, in the case of the x86 / x86_64 long double
versions (bug 20256).
These versions are not actually doing anything that a compiler
couldn't generate, just straightforward comparisons / arithmetic (and,
in the x86 / x86_64 case, testing for NaNs with fxam, which isn't
actually needed once you use an unordered comparison and let the NaNs
pass through the same subtraction as non-NaN inputs). This patch
removes the x86 / x86_64 / powerpc versions, so that those
architectures use the generic C versions, which correctly handle
setting errno and deal properly with sNaN inputs. This seems better
than dealing with setting errno in lots of .S versions.
The i386 versions also return results with excess range and precision,
which is not appropriate for a function exactly defined by reference
to IEEE operations. For errno setting to work correctly on overflow,
it's necessary to remove excess range with math_narrow_eval, which
this patch duly does in the float and double versions so that the
tests can reliably pass on x86. For float, this avoids any double
rounding issues as the long double precision is more than twice that
of float. For double, double rounding issues will need to be
addressed separately, so this patch does not fully fix bug 20255.
Tested for x86_64, x86 and powerpc.
[BZ #6796]
[BZ #20255]
[BZ #20256]
* math/s_fdim.c: Include <math_private.h>.
(__fdim): Use math_narrow_eval on result.
* math/s_fdimf.c: Include <math_private.h>.
(__fdimf): Use math_narrow_eval on result.
* sysdeps/i386/fpu/s_fdim.S: Remove file.
* sysdeps/i386/fpu/s_fdimf.S: Likewise.
* sysdeps/i386/fpu/s_fdiml.S: Likewise.
* sysdeps/i386/i686/fpu/s_fdim.S: Likewise.
* sysdeps/i386/i686/fpu/s_fdimf.S: Likewise.
* sysdeps/i386/i686/fpu/s_fdiml.S: Likewise.
* sysdeps/powerpc/fpu/s_fdim.c: Likewise.
* sysdeps/powerpc/fpu/s_fdimf.c: Likewise.
* sysdeps/powerpc/powerpc32/fpu/s_fdim.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_fdim.c: Likewise.
* sysdeps/x86_64/fpu/s_fdiml.S: Likewise.
* math/libm-test.inc (fdim_test_data): Expect errno setting on
overflow. Add sNaN tests.
This implementation utilizes vectors to improve performance
compared to current byte by byte implementation for POWER7.
The performance improvement is upto 4x. This patch is tested
on powerpc64 and powerpc64le.
The powerpc64 versions of ceil, floor, round, trunc, rint, nearbyint
and their float versions return sNaN for sNaN input when they should
return qNaN. This patch fixes them to add a NaN argument to itself to
quiet sNaNs before returning.
Tested for powerpc64.
[BZ #20160]
* sysdeps/powerpc/powerpc64/fpu/s_ceil.S (__ceil): Add NaN
argument to itself before returning the result.
* sysdeps/powerpc/powerpc64/fpu/s_ceilf.S (__ceilf): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floor.S (__floor): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floorf.S (__floorf): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S (__nearbyint):
Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S (__nearbyintf):
Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rint.S (__rint): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_rintf.S (__rintf): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_round.S (__round): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_roundf.S (__roundf): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_trunc.S (__trunc): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_truncf.S (__truncf): Likewise.
The powerpc implementations of fabsl for ldbl-128ibm (both powerpc32
and powerpc64) wrongly raise the "invalid" exception for sNaN
arguments. fabs functions should be quiet for all inputs including
signaling NaNs. The problem is the use of a comparison instruction
fcmpu to determine if the high part of the argument is negative and so
the low part needs to be negated; such instructions raise "invalid"
for sNaNs.
There is a pure integer implementation of fabsl in
sysdeps/ieee754/ldbl-128ibm/s_fabsl.c. However, it's not necessary to
use it to avoid such exceptions. The fsel instruction does not raise
exceptions for sNaNs, and can be used in place of the original
comparison. (Note that if the high part is zero or a NaN, it does not
matter whether the low part is negated; the choice of whether the low
part of a zero is +0 or -0 does not affect the value, and the low part
of a NaN does not affect the value / payload either.)
The condition in GCC for fsel to be available is TARGET_PPC_GFXOPT,
corresponding to the _ARCH_PPCGR predefined macro. fsel is available
on all 64-bit processors supported by GCC. A few 32-bit processors
supported by GCC do not have TARGET_PPC_GFXOPT despite having hard
float support. To support those processors, integer code (similar to
that in copysignl) is included for the !_ARCH_PPCGR case for
powerpc32.
Tested for powerpc32 (configurations with and without _ARCH_PPCGR) and
powerpc64.
[BZ #20157]
* sysdeps/powerpc/powerpc32/fpu/s_fabsl.S (__fabsl): Use fsel to
determine whether to negate low half if [_ARCH_PPCGR], and integer
comparison otherwise.
* sysdeps/powerpc/powerpc64/fpu/s_fabsl.S (__fabsl): Use fsel to
determine whether to negate low half.
Continuing fixes for ceil, floor and trunc functions not to raise the
"inexact" exception, this patch fixes the versions used on older
powerpc64 processors. As was done with the round implementations some
time ago, the save of floating-point state is moved after the first
floating-point operation on the input to ensure that any "invalid"
exception from signaling NaN input is included in the saved state, and
then the whole state gets restored rather than just the rounding mode.
This has no effect on configurations using the power5+ code, since
such processors can do these operations with a single instruction (and
those instructions do not set "inexact", so are correct for TS 18661-1
semantics).
Tested for powerpc64.
[BZ #15479]
* sysdeps/powerpc/powerpc64/fpu/s_ceil.S (__ceil): Move save of
floating-point state after first floating-point operation on
input. Restore full floating-point state instead of just rounding
mode.
* sysdeps/powerpc/powerpc64/fpu/s_ceilf.S (__ceilf): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floor.S (__floor): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_floorf.S (__floorf): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_trunc.S (__trunc): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_truncf.S (__truncf): Likewise.
The file sysdeps/powerpc/sysdeps.h defines aliases for condition register
operands. E.g.: 'cr7' means condition register 7. On the one hand, this
increases readability, as it makes it easier for readers to know whether the
operand is a condition register, a general purpose register or an immediate.
On the other hand, this permits that condition registers be written as if they
were general purpose, and vice-versa, thus reducing the readability of the
code.
This commit removes some of these unintentional misuses.
The changes have no effect on the final code. Checked with objdump.
Call __memset_power8 to pad, with zeros, the remaining bytes in the
dest string on __strncpy_power8 and __stpncpy_power8. This improves
performance when n is larger than the input string, giving ~30% gain for
larger strings without impacting much shorter strings.
This patch optimizes strcasestr function for power >= 8 systems. The average
improvement of this optimization is ~40% and compares 16 bytes at a time
using vector instructions. This patch is tested on powerpc64 and powerpc64le.
This utilizes vectors and bitmasks. For small needle, large
haystack, the performance improvement is upto 8x. For short
strings (0-4B), the cost of computing the bitmask dominates,
and is a tad slower.
This patch removes the powerpc64 optimized strspn, strcspn, and
strpbrk assembly implementation now that the default C one
implements the same strategy. On internal glibc benchtests
current implementations shows similar performance with -O2.
Tested on powerpc64le (POWER8).
* sysdeps/powerpc/powerpc64/strcspn.S: Remove file.
* sysdeps/powerpc/powerpc64/strpbrk.S: Remove file.
* sysdeps/powerpc/powerpc64/strspn.S: Remove file.
This patch adds a new feature for powerpc. In order to get faster access to
the HWCAP/HWCAP2 bits and platform number (i.e. for implementing
__builtin_cpu_is () / __builtin_cpu_supports () in GCC) without the overhead of
reading from the auxiliary vector, we now reserve space for them in the TCB.
This is an ABI change for GLIBC 2.23.
A new versioned symbol '__parse_hwcap_and_convert_at_platform' is available to
get the data from the auxiliary vector and parse it, and store it for later use
in the TLS initialization code. This function is called very early
(in _dl_sysdep_start () via DL_PLATFORM_INFO for the dynamic linking case, and
in __libc_start_main () for the static linking case) to make sure the data is
available at the time of TLS initialization.
* sysdeps/powerpc/Makefile (sysdep-dl-routines): Add hwcapinfo.
(sysdep_routines): Likewise.
(sysdep-rtld-routines): Likewise.
[$(subdir) = nptl](tests): Add test-get_hwcap and test-get_hwcap-static
[$(subdir) = nptl](tests-static): test-get_hwcap-static
* sysdeps/powerpc/Versions: Added new
__parse_hwcap_and_convert_at_platform symbol to GLIBC-2.23.
* sysdeps/powerpc/hwcapinfo.c: New file.
(__tcb_parse_hwcap_and_convert_at_platform): New function to initialize
and parse hwcap, hwcap2 and platform number information.
* sysdeps/powerpc/hwcapinfo.h: New file. Creates global variables
to store HWCAP+HWCAP2 and platform number.
* sysdeps/powerpc/nptl/tcb-offsets.sym: Added new offsets
for HWCAP+HWCAP2 and platform number in the TCB.
* sysdeps/powerpc/nptl/tls.h: New functionality. Stores
the HWCAP, HWCAP2 and platform number in the TCB.
(dtv): Added new fields for HWCAP+HWCAP2 and platform number.
(TLS_INIT_TP): Included calls to add the hwcap and
at_platform values in the TCB in TP initialization.
(TLS_DEFINE_INIT_TP): Likewise.
(THREAD_GET_HWCAP): New macro.
(THREAD_SET_HWCAP): Likewise.
(THREAD_GET_AT_PLATFORM): Likewise.
(THREAD_SET_AT_PLATFORM): Likewise.
* sysdeps/powerpc/powerpc32/dl-machine.h:
(dl_platform_init): New function that calls
__parse_hwcap_and_convert_at_platform for the dymanic linking case for
powerpc32.
* sysdeps/powerpc/powerpc64/dl-machine.h: Likewise, for powerpc64.
* sysdeps/powerpc/test-get_hwcap-static.c: New file. Testcase for
this functionality, static linking case.
* sysdeps/powerpc/test-get_hwcap.c: New file. Likewise, dynamic
linking case.
* sysdeps/unix/sysv/linux/powerpc/libc-start.c: Added call to
__parse_hwcap_and_convert_at_platform for the static linking case.
* sysdeps/unix/sysv/linux/powerpc/powerpc32/ld.abilist:
Included the new __parse_hwcap_and_convert_at_platform symbol in the
ABI list for GLIBC 2.23.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/ld-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/ld.abilist:
Likewise.
The powerpc hard-float round and roundf functions, both 32-bit and
64-bit, raise spurious "inexact" exceptions for integer arguments from
adding 0.5 and rounding to integer toward zero.
Since these functions already save and restore the rounding mode, it's
natural to make them restore the full floating-point state instead to
fix this bug, which this patch does. The save of the state is moved
after the first floating-point operation on the input so that any
"invalid" exceptions from signaling NaN inputs are properly
preserved. As a consequence of this approach to the fix, "inexact"
for noninteger arguments (disallowed by TS 18661-1 but not by C99/C11,
see bug 15479) is also avoided for these implementations; this is
*not* a general fix for bug 15479 since plenty of other
implementations of various functions still raise spurious "inexact"
for noninteger arguments.
This issue and fix do not apply to builds using power5+ versions of
round and roundf, which use the frin instruction and avoid "inexact"
exceptions that way.
This patch should get hard-float powerpc32 and powerpc64 (default
function implementations) back to a state where test-float and
test-double will pass after ulps regeneration.
Tested for powerpc32 and powerpc64.
[BZ #15479]
[BZ #19238]
* sysdeps/powerpc/powerpc32/fpu/s_round.S (__round): Save
floating-point state after first operation on input. Restore full
state rather than just rounding mode.
* sysdeps/powerpc/powerpc32/fpu/s_roundf.S (__roundf): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_round.S (__round): Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_roundf.S (__roundf): Likewise.
Similar to bug 19134 for powerpc32, the powerpc64 implementations of
lround, lroundf, llround, llroundf can raise spurious "inexact"
exceptions for integer arguments from adding 0.5 then converting to
integer (this does not apply to the power5+ version for double, which
uses the frin instruction which is defined never to raise "inexact"; I
don't know why power5+ doesn't use that version for float as well).
This patch fixes the bug in a similar way to the powerpc32 bug, by
testing for integers (adding and subtracting 2^52 and comparing with
the value before that addition and subtraction) and not adding 0.5 in
that case.
The powerpc maintainers may wish to look at making power5+ / power6x /
power8 use frin for float lround / llround as well as for double,
unless there's some reason I've missed that this isn't beneficial.
Tested for powerpc64.
[BZ #19235]
* sysdeps/powerpc/powerpc64/fpu/s_llround.S (__llround): Do not
add 0.5 to integer arguments.
* sysdeps/powerpc/powerpc64/fpu/s_llroundf.S (__llroundf):
Likewise.
(.LC2): New object.
Similar to bug 15491 recently fixed for x86_64 / x86, the powerpc
(both powerpc32 and powerpc64) hard-float implementations of
nearbyintf and nearbyint wrongly clear an "inexact" exception that was
raised before the function was called; this shows up as failure of the
test math/test-nearbyint-except added when that bug was fixed. They
also wrongly leave traps on "inexact" disabled if they were enabled
before the function was called.
This patch fixes the bugs similar to how the x86 bug was fixed: saving
and restoring the whole floating-point state, both to restore the
original "inexact" flag state and to restore the original state of
whether traps on "inexact" were enabled. Because there's a convenient
point in the powerpc implementations to save state after any sNaN
arguments will have raised "invalid" but before "inexact" traps need
to be disabled, no special handling for "invalid" is needed as in the
x86 version.
Tested for powerpc64 and powerpc32, where it fixes the
math/test-nearbyint-except failure as well as fixing the new test
math/test-nearbyint-except-2 added by this patch. Also tested for
x86_64 and x86 that the new test passes.
If powerpc experts see a more efficient way of doing this
(e.g. instruction positioning that's better for pipelines on typical
processors) then of course followups optimizing the fix are welcome.
[BZ #19228]
* sysdeps/powerpc/powerpc32/fpu/s_nearbyint.S (__nearbyint): Save
and restore full floating-point state.
* sysdeps/powerpc/powerpc32/fpu/s_nearbyintf.S (__nearbyintf):
Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyint.S (__nearbyint):
Likewise.
* sysdeps/powerpc/powerpc64/fpu/s_nearbyintf.S (__nearbyintf):
Likewise.
* math/test-nearbyint-except-2.c: New file.
* math/Makefile (tests): Add test-nearbyint-except-2.
The file sysdeps/powerpc/sysdeps.h defines aliases for register operands,
which add the letter 'r' as a prefix to a register name. E.g.: register 20
can be written as 'r20', instead of '20'. On the one hand, this increases
readability, as it makes it easier for readers to know whether the operand is a
register or an immediate. On the other hand, this permits that immediate
operands be written as if they were registers, and vice-versa, thus reducing
the readability of the code.
This commit removes some of these unintentional misuses.
This commit also increases readability of the code by adding the prefix 'cr' to
some uses of the control register.
Both changes have no effect on the final code. Checked with objdump.
* sysdeps/powerpc/powerpc64/power8/strncpy.S: Remove or add register
prefix from operands.
It was noted in
<https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the
bits/*.h naming scheme should only be used for installed headers.
This patch renames bits/atomic.h to atomic-machine.h to follow that
convention.
This is the only change in this series that needs to change the
filename rather than simply removing a directory level (because both
atomic.h and bits/atomic.h exist at present).
Tested for x86_64 (testsuite, and that installed stripped shared
libraries are unchanged by the patch).
[BZ #14912]
* sysdeps/aarch64/bits/atomic.h: Move to ...
* sysdeps/aarch64/atomic-machine.h: ...here.
(_AARCH64_BITS_ATOMIC_H): Rename macro to
_AARCH64_ATOMIC_MACHINE_H.
* sysdeps/alpha/bits/atomic.h: Move to ...
* sysdeps/alpha/atomic-machine.h: ...here.
* sysdeps/arm/bits/atomic.h: Move to ...
* sysdeps/arm/atomic-machine.h: ...here. Update comments.
* bits/atomic.h: Move to ...
* sysdeps/generic/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/i386/bits/atomic.h: Move to ...
* sysdeps/i386/atomic-machine.h: ...here.
* sysdeps/ia64/bits/atomic.h: Move to ...
* sysdeps/ia64/atomic-machine.h: ...here.
* sysdeps/m68k/coldfire/bits/atomic.h: Move to ...
* sysdeps/m68k/coldfire/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/m68k/m680x0/m68020/bits/atomic.h: Move to ...
* sysdeps/m68k/m680x0/m68020/atomic-machine.h: ...here.
* sysdeps/microblaze/bits/atomic.h: Move to ...
* sysdeps/microblaze/atomic-machine.h: ...here.
* sysdeps/mips/bits/atomic.h: Move to ...
* sysdeps/mips/atomic-machine.h: ...here.
(_MIPS_BITS_ATOMIC_H): Rename macro to _MIPS_ATOMIC_MACHINE_H.
* sysdeps/powerpc/bits/atomic.h: Move to ...
* sysdeps/powerpc/atomic-machine.h: ...here. Update comments.
* sysdeps/powerpc/powerpc32/bits/atomic.h: Move to ...
* sysdeps/powerpc/powerpc32/atomic-machine.h: ...here. Update
comments. Include <atomic-machine.h> instead of <bits/atomic.h>.
* sysdeps/powerpc/powerpc64/bits/atomic.h: Move to ...
* sysdeps/powerpc/powerpc64/atomic-machine.h: ...here. Include
<atomic-machine.h> instead of <bits/atomic.h>.
* sysdeps/s390/bits/atomic.h: Move to ...
* sysdeps/s390/atomic-machine.h: ...here.
* sysdeps/sparc/sparc32/bits/atomic.h: Move to ...
* sysdeps/sparc/sparc32/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/sparc/sparc32/sparcv9/bits/atomic.h: Move to ...
* sysdeps/sparc/sparc32/sparcv9/atomic-machine.h: ...here.
* sysdeps/sparc/sparc64/bits/atomic.h: Move to ...
* sysdeps/sparc/sparc64/atomic-machine.h: ...here.
* sysdeps/tile/bits/atomic.h: Move to ...
* sysdeps/tile/atomic-machine.h: ...here.
* sysdeps/tile/tilegx/bits/atomic.h: Move to ...
* sysdeps/tile/tilegx/atomic-machine.h: ...here. Include
<sysdeps/tile/atomic-machine.h> instead of
<sysdeps/tile/bits/atomic.h>.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/tile/tilepro/bits/atomic.h: Move to ...
* sysdeps/tile/tilepro/atomic-machine.h: ...here. Include
<sysdeps/tile/atomic-machine.h> instead of
<sysdeps/tile/bits/atomic.h>.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/arm/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/arm/atomic-machine.h: ...here. Include
<sysdeps/arm/atomic-machine.h> instead of
<sysdeps/arm/bits/atomic.h>.
* sysdeps/unix/sysv/linux/hppa/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/hppa/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/m68k/coldfire/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/m68k/coldfire/atomic-machine.h: ...here.
(_BITS_ATOMIC_H): Rename macro to _ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/nios2/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/nios2/atomic-machine.h: ...here.
(_NIOS2_BITS_ATOMIC_H): Rename macro to _NIOS2_ATOMIC_MACHINE_H.
* sysdeps/unix/sysv/linux/sh/bits/atomic.h: Move to ...
* sysdeps/unix/sysv/linux/sh/atomic-machine.h: ...here.
* sysdeps/x86_64/bits/atomic.h: Move to ...
* sysdeps/x86_64/atomic-machine.h: ...here.
* include/atomic.h: Include <atomic-machine.h> instead of
<bits/atomic.h>.
Fix usage of tabort in generated syscalls. r0 has special meaning
when used with this instruction, thus it will not generate
persistent errors, nor return an error code. This mitigates poor
CPU usage when performing elided critical sections.
Additionally, transactions should be aborted when entering a user
invoked syscall. Otherwise the results of the transaction may be
undefined.
2015-08-25 Paul E. Murphy <murphyp@linux.vnet.ibm.com>
* sysdeps/powerpc/powerpc32/sysdep.h (ABORT_TRANSACTION): Use
register other than r0 for tabort, it has special meaning.
* sysdeps/powerpc/powerpc64/sysdep.h (ABORT_TRANSACTION): Likewise
* sysdeps/unix.sysv/linux/powerpc/syscall.S (syscall): Abort
transaction before starting syscall.
Instead of checking needle length, constant 'n' number of comparisons
is checked to fall back to default implementation. This patch is tested
on powerpc64 and powerpc64le.
2015-08-25 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
* sysdeps/powerpc/powerpc64/power7/strstr.S: Handle worst case.
In powerpc64, memchr was always pointing to the internal __GI_memchr
implementation. This patch fixes that and makes it use the
optimized POWER7 version when adequate.
* sysdeps/powerpc/powerpc64/multiarch/memchr-ppc64.c: Make
memchr not point to the internal __GI_memchr implementation.