5 years ago, commit 8f1b841e45
unintentionally added an ifunc to the loader.
That modification has not caused any harm so far, but it doesn't add any
value either, because the hwcap information is available later during
libc initialization.
Suggested-by: Anton Blanchard <anton@ozlabs.org>
This patch was based on the __memcmp_power8 and the recent
__strlen_power10.
Improvements from __memcmp_power8:
1. Don't need alignment code.
On POWER10 lxvp and lxvl do not generate alignment interrupts, so
they are safe for use on caching-inhibited memory. Notice that the
comparison on the main loop will wait for both VSR to be ready.
Therefore aligning one of the input address does not improve
performance. In order to align both registers a vperm is necessary
which add too much overhead.
2. Uses new POWER10 instructions
This code uses lxvp to decrease contention on load by loading 32 bytes
per instruction.
The vextractbm is used to have a smaller tail code for calculating the
return value.
3. Performance improvement
This version has around 35% better performance on average. I saw no
performance regressions for any length or alignment.
Thanks Matheus for helping me out with some details.
Co-authored-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Reuse code for optimized strlen to implement a faster version of rawmemchr.
This takes advantage of the same benefits provided by the strlen implementation,
but needs some extra steps. __strlen_power10 code should be unchanged after this
change.
rawmemchr returns a pointer to the char found, while strlen returns only the
length, so we have to take that into account when preparing the return value.
To quickly check 64B, the loop on __strlen_power10 merges the whole block into
16B by using unsigned minimum vector operations (vminub) and checks if there are
any \0 on the resulting vector. The same code is used by rawmemchr if the char c
is 0. However, this approach does not work when c != 0. We first need to
subtract each byte by c, so that the value we are looking for is converted to a
0, then taking the minimum and checking for nulls works again.
The new code branches after it has compared ~256 bytes and chooses which of the
two strategies above will be used in the main loop, based on the char c. This
extra branch adds some overhead (~5%) for length ~256, but is quickly amortized
by the faster loop for larger sizes.
Compared to __rawmemchr_power9, this version is ~20% faster for length < 256.
Because of the optimized main loop, the improvement becomes ~35% for c != 0
and ~50% for c = 0 for strings longer than 256.
Reviewed-by: Lucas A. M. Magalhaes <lamm@linux.ibm.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
The hwcap2 check for the aforementioned functions should check for
both PPC_FEATURE2_ARCH_3_1 and PPC_FEATURE2_HAS_ISEL but was
mistakenly checking for any one of them, enabling isa 3.1 version of
the functions in incompatible processors, like POWER8.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This implementation is based on __memset_power8 and integrates a lot
of suggestions from Anton Blanchard.
The biggest difference is that it makes extensive use of stxvl to
alignment and tail code to avoid branches and small stores. It has
three main execution paths:
a) "Short lengths" for lengths up to 64 bytes, avoiding as many
branches as possible.
b) "General case" for larger lengths, it has an alignment section
using stxvl to avoid branches, a 128 bytes loop and then a tail
code, again using stxvl with few branches.
c) "Zeroing cache blocks" for lengths from 256 bytes upwards and set
value being zero. It is mostly the __memset_power8 code but the
alignment phase was simplified because, at this point, address is
already 16-bytes aligned and also changed to use vector stores.
The tail code was also simplified to reuse the general case tail.
All unaligned stores use stxvl instructions that do not generate
alignment interrupts on POWER10, making it safe to use on
caching-inhibited memory.
On average, this implementation provides something around 30%
improvement when compared to __memset_power8.
Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This implementation is based on __memcpy_power8_cached and integrates
suggestions from Anton Blanchard.
It benefits from loads and stores with length for short lengths and for
tail code, simplifying the code.
All unaligned memory accesses use instructions that do not generate
alignment interrupts on POWER10, making it safe to use on
caching-inhibited memory.
The main loop has also been modified in order to increase instruction
throughput by reducing the dependency on updates from previous iterations.
On average, this implementation provides around 30% improvement when
compared to __memcpy_power7 and 10% improvement in comparison to
__memcpy_power8_cached.
This patch was initially based on the __memmove_power7 with some ideas
from strncpy implementation for Power 9.
Improvements from __memmove_power7:
1. Use lxvl/stxvl for alignment code.
The code for Power 7 uses branches when the input is not naturally
aligned to the width of a vector. The new implementation uses
lxvl/stxvl instead which reduces pressure on GPRs. It also allows
the removal of branch instructions, implicitly removing branch stalls
and mispredictions.
2. Use of lxv/stxv and lxvl/stxvl pair is safe to use on Cache Inhibited
memory.
On Power 10 vector load and stores are safe to use on CI memory for
addresses unaligned to 16B. This code takes advantage of this to
do unaligned loads.
The unaligned loads don't have a significant performance impact by
themselves. However doing so decreases register pressure on GPRs
and interdependence stalls on load/store pairs. This also improved
readability as there are now less code paths for different alignments.
Finally this reduces the overall code size.
3. Improved performance.
This version runs on average about 30% better than memmove_power7
for lengths larger than 8KB. For input lengths shorter than 8KB
the improvement is smaller, it has on average about 17% better
performance.
This version has a degradation of about 50% for input lengths
in the 0 to 31 bytes range when dest is unaligned.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Improvements compared to POWER9 version:
1. Take into account first 16B comparison for aligned strings
The previous version compares the first 16B and increments r4 by the number
of bytes until the address is 16B-aligned, then starts doing aligned loads at
that address. For aligned strings, this causes the first 16B to be compared
twice, because the increment is 0. Here we calculate the next 16B-aligned
address differently, which avoids that issue.
2. Use simple comparisons for the first ~192 bytes
The main loop is good for big strings, but comparing 16B each time is better
for smaller strings. So after aligning the address to 16 Bytes, we check
more 176B in 16B chunks. There may be some overlaps with the main loop for
unaligned strings, but we avoid using the more aggressive strategy too soon,
and also allow the loop to start at a 64B-aligned address. This greatly
benefits smaller strings and avoids overlapping checks if the string is
already aligned at a 64B boundary.
3. Reduce dependencies between load blocks caused by address calculation on loop
Doing a precise time tracing on the code showed many loads in the loop were
stalled waiting for updates to r4 from previous code blocks. This
implementation avoids that as much as possible by using 2 registers (r4 and
r5) to hold addresses to be used by different parts of the code.
Also, the previous code aligned the address to 16B, then to 64B by doing a
few 48B loops (if needed) until the address was aligned. The main loop could
not start until that 48B loop had finished and r4 was updated with the
current address. Here we calculate the address used by the loop very early,
so it can start sooner.
The main loop now uses 2 pointers 128B apart to make pointer updates less
frequent, and also unrolls 1 iteration to guarantee there is enough time
between iterations to update the pointers, reducing stalled cycles.
4. Use new P10 instructions
lxvp is used to load 32B with a single instruction, reducing contention in
the load queue.
vextractbm allows simplifying the tail code for the loop, replacing
vbpermq and avoiding having to generate a permute control vector.
Reviewed-by: Paul E Murphy <murphyp@linux.ibm.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Reviewed-by: Lucas A. M. Magalhaes <lamm@linux.ibm.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
Add stpncpy support into the POWER9 strncpy.
Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Similar to the strcpy P9 optimization, this version uses VSX to improve
performance.
Reviewed-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This started as a trivial change to Anton's rawmemchr. I got
carried away. This is a hybrid between P8's asympotically
faster 64B checks with extremely efficient small string checks
e.g <64B (and sometimes a little bit more depending on alignment).
The second trick is to align to 64B by running a 48B checking loop
16B at a time until we naturally align to 64B (i.e checking 48/96/144
bytes/iteration based on the alignment after the first 5 comparisons).
This allieviates the need to check page boundaries.
Finally, explicly use the P7 strlen with the runtime loader when building
P9. We need to be cautious about vector/vsx extensions here on P9 only
builds.
This version uses vector instructions and is up to 60% faster on medium
matches and up to 90% faster on long matches, compared to the POWER7
version. A few examples:
__rawmemchr_power9 __rawmemchr_power7
Length 32, alignment 0: 2.27566 3.77765
Length 64, alignment 2: 2.46231 3.51064
Length 1024, alignment 0: 17.3059 32.6678
Add stpcpy support to the POWER9 strcpy. This is up to 40% faster on
small strings and up to 90% faster on long relatively unaligned strings,
compared to the POWER8 version. A few examples:
__stpcpy_power9 __stpcpy_power8
Length 20, alignments in bytes 4/ 4: 2.58246 4.8788
Length 1024, alignments in bytes 1/ 6: 24.8186 47.8528
This version uses VSX store vector with length instructions and is
significantly faster on small strings and relatively unaligned large
strings, compared to the POWER8 version. A few examples:
__strcpy_power9 __strcpy_power8
Length 16, alignments in bytes 0/ 0: 2.52454 4.62695
Length 412, alignments in bytes 4/ 0: 11.6 22.9185
This patch rewrites wcscat using wcslen and wcscpy. This is similar to
the optimization done on strcat by 6e46de42fe.
The strcpy changes are mainly to add the internal alias to avoid PLT
calls.
Checked on x86_64-linux-gnu and a build against the affected
architectures.
* include/wchar.h (__wcscpy): New prototype.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy-ppc32.c
(__wcscpy): Route internal symbol to generic implementation.
* sysdeps/powerpc/powerpc32/power4/multiarch/wcscpy.c (wcscpy):
Add internal __wcscpy alias.
* sysdeps/powerpc/powerpc64/multiarch/wcscpy.c (wcscpy): Likewise.
* sysdeps/s390/wcscpy.c (wcscpy): Likewise.
* sysdeps/x86_64/multiarch/wcscpy.c (wcscpy): Likewise.
* wcsmbs/wcscpy.c (wcscpy): Add
* sysdeps/x86_64/multiarch/wcscpy-c.c (WCSCPY): Adjust macro to
use generic implementation.
* wcsmbs/wcscat.c (wcscat): Rewrite using wcslen and wcscpy.
A single underscore was omitted in
sysdeps/powerpc/powerpc64/multiarch/strncmp.c, resulting in use of
power8 version of strncmp instead of power9 version, with significant
performance degradation.
* sysdeps/powerpc/powerpc64/multiarch/strncmp.c: Fix #ifdef.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
This patch moves little endian specific POWER9 optimization files to
sysdeps/powerpc/powerpc64/le and creates POWER9 ifunc functions
only for little endian.
On POWER8, unaligned memory accesses to cached memory has little impact
on performance as opposed to its ancestors.
It is disabled by default and will only be available when the tunable
glibc.tune.cached_memopt is set to 1.
__memcpy_power8_cached __memcpy_power7
============================================================
max-size=4096: 33325.70 ( 12.65%) 38153.00
max-size=8192: 32878.20 ( 11.17%) 37012.30
max-size=16384: 33782.20 ( 11.61%) 38219.20
max-size=32768: 33296.20 ( 11.30%) 37538.30
max-size=65536: 33765.60 ( 10.53%) 37738.40
* manual/tunables.texi (Hardware Capability Tunables): Document
glibc.tune.cached_memopt.
* sysdeps/powerpc/cpu-features.c: New file.
* sysdeps/powerpc/cpu-features.h: New file.
* sysdeps/powerpc/dl-procinfo.c [!IS_IN(ldconfig)]: Add
_dl_powerpc_cpu_features.
* sysdeps/powerpc/dl-tunables.list: New file.
* sysdeps/powerpc/ldsodefs.h: Include cpu-features.h.
* sysdeps/powerpc/powerpc32/power4/multiarch/init-arch.h
(INIT_ARCH): Initialize use_aligned_memopt.
* sysdeps/powerpc/powerpc64/dl-machine.h [defined(SHARED &&
IS_IN(rtld))]: Restrict dl_platform_init availability and
initialize CPU features used by tunables.
* sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines):
Add memcpy-power8-cached.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Add
__memcpy_power8_cached.
* sysdeps/powerpc/powerpc64/multiarch/memcpy.c: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power8-cached.S:
New file.
Reviewed-by: Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
Update strcasestr-power8 to use power8 version of strnlen for
calculating length.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.vnet.ibm.com>
This is another one where we'll be wanting the base symbols for
powerpc64le rather than just a power7 variant.
* sysdeps/powerpc/powerpc64/multiarch/strncase_l-power7.c: Include
string/strncase_l.c, not string/strncase.c.
(USE_IN_EXTENDED_LOCALE_MODEL): Don't define.
(libc_hidden_def): Redefine.
The routine being assembled here is strcasecmp_l, so ask for that via
__STRCMP and STRCMP defines. That change means tweaking the power7
override. Needed for later powerpc64le changes where we want the base
symbols, not just a power7 variant.
* sysdeps/powerpc/powerpc64/multiarch/strcasecmp_l-power7.S:
(__STRCMP, STRCMP, __strcasecmp_l): Define.
(__strcasecmp): Don't define.
These functions aren't used in ld.so at the moment since we don't have
strcmp or strncmp ifuncs for them there. Remove the ld.so bloat.
* sysdeps/powerpc/powerpc64/multiarch/strcmp-power8.S: Wrap in
IS_IN (libc).
* sysdeps/powerpc/powerpc64/multiarch/strcmp-power9.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strncmp-power9.S: Likewise.
USE_AS_STPNCPY is defined by sysdeps/powerpc/powerpc64/power8/stpncpy.S,
included by this file.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Don't define
USE_AS_STPNCPY.
Recent commit 59ba2d2b54 missed to add __memrchr_power8 in
ifunc list. Also handled discarding unwanted bytes for
unaligned inputs in power8 optimization.
2017-10-05 Rajalakshmi Srinivasaraghavan <raji@linux.vnet.ibm.com>
* sysdeps/powerpc/powerpc64/multiarch/memrchr-ppc64.c: Revert
back to powerpc32 file.
* sysdeps/powerpc/powerpc64/multiarch/memrchr.c
(memrchr): Add __memrchr_power8 to ifunc list.
* sysdeps/powerpc/powerpc64/power8/memrchr.S: Mask
extra bytes for unaligned inputs.
Vectorized loops are used for sizes greater than 32B to improve
performance over power7 optimization. This shows as an average
of 25% improvement depending on the position of search
character. The performance is same for shorter strings.
These machine-dependent inline string functions have never been on by
default, and even if they were a good idea at the time they were
introduced, they haven't really been touched in ten to fifteen years
and probably aren't a good idea on current-gen processors. Current
thinking is that this class of optimization is best left to the
compiler.
* bits/string.h, string/bits/string.h
* sysdeps/aarch64/bits/string.h
* sysdeps/m68k/m680x0/m68020/bits/string.h
* sysdeps/s390/bits/string.h, sysdeps/sparc/bits/string.h
* sysdeps/x86/bits/string.h: Delete file.
* string/string.h: Don't include bits/string.h.
* string/bits/string3.h: Rename to bits/string_fortified.h.
No need to undef various symbols that the removed headers
might have defined as macros.
* string/Makefile (headers): Remove bits/string.h, change
bits/string3.h to bits/string_fortified.h.
* string/string-inlines.c: Update commentary. Remove definitions
of various macros that nothing looks at anymore. Don't directly
include bits/string.h. Set _STRING_INLINE_unaligned here, based on
compiler-predefined macros.
* string/strncat.c: If STRNCAT is not defined, or STRNCAT_PRIMARY
_is_ defined, provide internal hidden alias __strncat.
* include/string.h: Declare internal hidden alias __strncat.
Only forward __stpcpy to __builtin_stpcpy if __NO_STRING_INLINES is
not defined.
* include/bits/string3.h: Rename to bits/string_fortified.h,
update to match above.
* sysdeps/i386/string-inlines.c: Define compat symbols for
everything formerly defined by sysdeps/x86/bits/string.h.
Make existing definitions into compat symbols as well.
Remove some no-longer-necessary messing around with macros.
* sysdeps/powerpc/powerpc32/power4/multiarch/mempcpy.c
* sysdeps/powerpc/powerpc64/multiarch/mempcpy.c
* sysdeps/powerpc/powerpc64/multiarch/stpcpy.c
* sysdeps/s390/multiarch/mempcpy.c
No need to define _HAVE_STRING_ARCH_mempcpy.
Do define __NO_STRING_INLINES and NO_MEMPCPY_STPCPY_REDIRECT.
* sysdeps/i386/i686/multiarch/strncat-c.c
* sysdeps/s390/multiarch/strncat-c.c
* sysdeps/x86_64/multiarch/strncat-c.c
Define STRNCAT_PRIMARY. Don't change definition of libc_hidden_def.
Makes __stpncpy_power8 call __memset_power8 directly rather than via an
IFUNC. Fixes a missing _mcount, and removes some redundant NOPS. The
*_is_local defines are also used in a followup patch.
* sysdeps/powerpc/powerpc64/multiarch/strncpy-power7.S: Define
MEMSET_is_local.
* sysdeps/powerpc/powerpc64/multiarch/strncpy-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/stpncpy-power8.S: Likewise.
Define MEMSET.
* sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: Define
STRLEN_is_local, STRNLEN_is_local, and STRCHR_is_local.
* sysdeps/powerpc/powerpc64/power7/strstr.S: Likewise. Don't add
nop after local calls.
* sysdeps/powerpc/powerpc64/power7/strncpy.S: Define MEMSET_is_local.
Don't add nop after local call.
* sysdeps/powerpc/powerpc64/power8/strncpy.S: Likewise. Add missing
CALL_MCOUNT.
.align on some targets takes a byte alignment, on others like powerpc,
log2 of the byte alignment. It's a good idea to avoid .align,
particularly since x86 and powerpc are different. This patch fixes
the occurrences of .align in powerpc64/sysdep.h, renames DOT_LABEL
since the macro doesn't have anything to do with adding dots, removes
extraneous semicolons, and fixes some formatting.
* sysdeps/powerpc/powerpc64/sysdep.h: Formatting.
(FUNC_LABEL): Rename from DOT_LABEL.
(ENTRY_1): Use FUNC_LABEL and remove leading space from label.
Use .p2align rather than .align.
(TRACEBACK, TRACEBACK_MASK): Use .p2align rather than .align.
(ABORT_TRANSACTION): Likewise.
(ENTRY_1, ENTRY_2, END_2, LOCALENTRY): Remove unnecessary semicolons,
particularly at end. Add semicolon at invocation as necessary.
(TRACEBACK, TRACEBACK_MASK, PSEUDO, PSEUDO_NOERRNO): Likewise.
(PSEUDO_ERRVAL, PPC64_LOAD_FUNCPTR, OPD_ENT): Likewise.
* sysdeps/powerpc/powerpc64/multiarch/strrchr-power8.S (ENTRY,
END): Adjust to suit.
P7 code is used for <=32B strings and for > 32B vectorized loops are used.
This shows as an average 25% improvement depending on the position of search
character. The performance is same for shorter strings.
Tested on ppc64 and ppc64le.
With new optimized strnlen for POWER8 [1], this patch adds
strncat for power8 to make use of optimized strlen and strnlen.
This is faster than POWER7 current implementation for larger strings.
Tested on powerpc64 and powerpc64le.
[1] https://sourceware.org/ml/libc-alpha/2017-03/msg00491.html
* sysdeps/powerpc/powerpc64/multiarch/Makefile (sysdep_routines): Add
strncat-power8.
* sysdeps/powerpc/powerpc64/multiarch/strncat.c (strncat): Add
__strncat_power8 to ifunc list.
* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c
(strncat): Add __strncat_power8 to list of strncat functions.
* sysdeps/powerpc/powerpc64/multiarch/strncat-power8.c: New file.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/memcmp-power4.S: Define the
implementation-specific function name and remove unneeded
macros definition.
* sysdeps/powerpc/powerpc64/multiarch/memcmp-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memmove-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/memcmp.S: Set a default function
name if not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/power7/memcmp.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memmove.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-a2.S: Define the
implementation-specific function name and remove unneeded
macros definition.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-cell.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power4.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power6.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memcpy-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/mempcpy-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/a2/memcpy.S: Set a default function
name if not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/cell/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power4/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memcpy.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/mempcpy.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/memchr-power7.S: Define the
implementation-specific function name and remove unneeded macros
definition.
* sysdeps/powerpc/powerpc64/multiarch/memrchr-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/rawmemchr-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memchr.S: Set a default
function name if not defined and pass as parameter to macros
accordingly.
* sysdeps/powerpc/powerpc64/power7/memrchr.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/rawmemchr.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/memset-power4.S: Define the
implementation-specific function name and remove unneeded macros
definition.
* sysdeps/powerpc/powerpc64/multiarch/memset-power6.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset-power7.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset-power8.S: Likewise.
* sysdeps/powerpc/powerpc64/multiarch/memset-ppc64.S: Likewise.
* sysdeps/powerpc/powerpc64/memset.S: Set a default function name if
not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/power4/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power6/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power7/memset.S: Likewise.
* sysdeps/powerpc/powerpc64/power8/memset.S: Likewise.
Clean up the IFUNC implementations for powerpc in order to remove
unneeded macro definitions.
Tested on ppc64le with and without --disable-multi-arch flag.
* sysdeps/powerpc/powerpc64/multiarch/strcasestr-power8.S: Define the
strcasestr implementation name and remove unneeded macros definition.
* sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: Define
strstr implementation name and remove unneeded macros definition.
* sysdeps/powerpc/powerpc64/power7/strstr.S: Set a default function
name if not defined and pass as parameter to macros accordingly.
* sysdeps/powerpc/powerpc64/power8/strcasestr.S: Likewise.