Commit Graph

43 Commits

Author SHA1 Message Date
Rajalakshmi Srinivasaraghavan
b42f8cad52 powerpc: strstr optimization
This patch optimizes strstr function for power >= 7 systems.  Performance
gain is obtained using aligned memory access and usage of cmpb
instruction for quicker comparison.  The average improvement of this
optimization is ~40%.  Tested on ppc64 and ppc64le.

2015-07-16  Rajalakshmi Srinivasaraghavan  <raji@linux.vnet.ibm.com>

	* sysdeps/powerpc/powerpc64/multiarch/Makefile: Add strstr().
	* sysdeps/powerpc/powerpc64/multiarch/ifunc-impl-list.c: Likewise.
	* sysdeps/powerpc/powerpc64/power7/strstr.S: New File.
	* sysdeps/powerpc/powerpc64/multiarch/strstr-power7.S: New File.
	* sysdeps/powerpc/powerpc64/multiarch/strstr-ppc64.c: New File.
	* sysdeps/powerpc/powerpc64/multiarch/strstr.c: New File.
2015-07-16 13:43:51 -03:00
Adhemerval Zanella
b269211467 powerpc: wordcopy/memmove cleanup for ppc64
This patch cleanup some multiarch code related to memmmove
optimization. Initial IFUNC support added specialized wordcopy
symbols which turned in local IFUNC calls used by memmove default
implementation.

This change by removing then and used the optimized memmove instead
for supported chips.
2015-02-09 06:42:28 -05:00
Adhemerval Zanella
18e270aada powerpc: Remove POWER7 wordcopy ifunc
This patch remove the POWER7 ifunc wordcopy function
(_wordcopy_*_power7), since now GLIBC provides a optimized memmove/bcopy
for POWER7.
2015-02-09 06:42:28 -05:00
Adhemerval Zanella
3001e54c57 powerpc: multiarch Makefile cleanup for powerpc64
This patch cleanups the multiarch Makefile by putting the wide chars
implementation to correct wcsmbs rule.
2015-02-09 06:42:27 -05:00
Adhemerval Zanella
d3b00f468b powerpc: Optimized strncmp for POWER8/PPC64
This patch adds an optimized POWER8 strncmp.  The implementation focus
on speeding up unaligned cases follwing the ideas of power8 strcmp.

The algorithm first check the initial 16 bytes, then align the first
function source and uses unaligned loads on second argument only.
Aditional checks for page boundaries are done for unaligned cases
(where sources alignment are different).
2015-01-13 14:35:40 -05:00
Adhemerval Zanella
8bedcb5f03 powerpc: Optimized strcmp for POWER8/PPC64
This patch adds an optimized POWER8 strcmp using unaligned accesses.
The algorithm first check the initial 16 bytes, then align the first
function source and uses unaligned loads on second argument only.
Aditional checks for page boundaries are done for unaligned cases
2015-01-13 11:28:58 -05:00
Adhemerval Zanella
f06a4faf8a powerpc: Optimized st{r,p}ncpy for POWER8/PPC64
This patch adds an optimized POWER8 st{r,p}ncpy using unaligned accesses.
It shows 10%-80% improvement over the optimized POWER7 one that uses
only aligned accesses, specially on unaligned inputs.

The algorithm first read and check 16 bytes (if inputs do not cross a 4K
page size).  The it realign source to 16-bytes and issue a 16 bytes read
and compare loop to speedup null byte checks for large strings.  Also,
different from POWER7 optimization, the null pad is done inline in the
implementation using possible unaligned accesses, instead of realying on
a memset call.  Special case is added for page cross reads.
2015-01-13 11:28:44 -05:00
Adhemerval Zanella
94c9680945 powerpc: Optimized strcat for POWER8/PPC64
With new optimized strcpy for POWER8, this patch adds an optimized
strcat which uses it along with default implementation at strings/.
2015-01-13 11:28:36 -05:00
Adhemerval Zanella
96d6fd6c40 powerpc: Optimized st{r,p}cpy for POWER8/PPC64
This patch adds an optimized POWER8 strcpy using unaligned accesses.
For strings up to 16 bytes the implementation first calculate the
string size, like strlen, and issues a memcpy.  For larger strings,
source is first aligned to 16 bytes and then tested over a loop that
reads 16 bytes am combine the cmpb results for speedup.  Special case is
added for page cross reads.

It shows 30%-60% improvement over the optimized POWER7 one that uses
only aligned accesses.
2015-01-13 11:28:30 -05:00
Adhemerval Zanella
0f0a1c82f5 powerpc: Add powerpc64 strpbrk optimization
This patch makes the POWER7 optimized strpbrk generic by using
default doubleword stores to zero the hash, instead of VSX
instructions.  Performance on POWER7/POWER8 does not change.
2014-12-02 13:34:02 -05:00
Adhemerval Zanella
bb2542e0ae powerpc: Add powerpc64 strcspn optimization
This patch makes the POWER7 optimized strcspn generic by using
default doubleword stores to zero the hash, instead of VSX
instructions.  Performance on POWER7/POWER8 does not change.
2014-12-02 07:16:24 -05:00
Adhemerval Zanella
2e8a2de2da powerpc: Add powerpc64 strspn optimization
This patch makes the POWER7 optimized strspn generic by using
default doubleword stores to zero the hash, instead of VSX
instructions. Performance on POWER7/POWER8 machines does not changed.
2014-12-02 07:15:58 -05:00
Adhemerval Zanella
71ae86478e PowerPC: memset optimization for POWER8/PPC64
This patch adds an optimized memset implementation for POWER8.  For
sizes from 0 to 255 bytes, a word/doubleword algorithm similar to
POWER7 optimized one is used.

For size higher than 255 two strategies are used:

1. If the constant is different than 0, the memory is written with
   altivec vector instruction;

2. If constant is 0, dbcz instructions are used.  The loop is unrolled
   to clear 512 byte at time.

Using vector instructions increases throughput considerable, with a
double performance for sizes larger than 1024.  The dcbz loops unrolls
also shows performance improvement, by doubling throughput for sizes
larger than 8192 bytes.
2014-09-10 07:39:46 -04:00
Adhemerval Zanella
3b473fecdf PowerPC: multiarch bzero cleanup for PPC64
This patch cleanups the multiarch bzero for powerpc64 by remove
the multiarch objects and use instead the the memset embedded
implementation presented in each multiarch optimization.  The
code generate is essentially the same, but the TB_TOCLESS (which
is not essential).
2014-09-10 07:39:46 -04:00
Adhemerval Zanella
17762f6625 PowerPC: optimized memmove for POWER7/PPC64
This patch adds an optimized memmove optimization for POWER7/powerpc64.
Basically the idea is to use the memcpy for POWER7 on non-overlapped
memory regions and a optimized backward memcpy for memory regions
that overlap (similar to the idea of string/memmove.c).

The backward memcpy algorithm used is similar the one use for memcpy for
POWER7, with adjustments done for alignment.  The difference is memory
is always aligned to 16 bytes before using VSX/altivec instructions.
2014-07-07 15:41:21 -05:00
Vidya Ranganathan
bc8ea38590 PowerPC: strcat optimization for PPC64/POWER7
This patch adds an ifunc power7 strcat symbol that uses the logic on
sysdeps/powerpc/strcat.c but call power7 strlen/strcpy symbols instead
of default ones.
2014-07-02 14:04:21 -05:00
Vidya Ranganathan
e23d3d2690 PowerPC: Optimized strcmp for PPC64/POWER7
Optimization is achieved on 8 byte aligned strings with double word
comparison using cmpb instruction. On unaligned strings loop unrolling
is applied for Power7 gain.
2014-06-11 08:39:31 -05:00
Vidya Ranganathan
f360f94a05 PowerPC: strncpy/stpncpy optimization for PPC64/POWER7
The optimization is achieved by following techniques:
  > data alignment [gain from aligned memory access on read/write]
  > POWER7 gains performance with loop unrolling/unwinding
    [gain by reduction of branch penalty].
  > zero padding done by calling optimized memset
2014-05-06 09:54:25 -05:00
Adhemerval Zanella
6f23d0939e PowerPC: optimized strpbrk for POWER7
This patch add an optimized strpbrk for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance on
the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and memory
clear using VSX instructions.
2014-03-20 19:46:13 -05:00
Adhemerval Zanella
6eaf95cbfa PowerPC: optimized strcspn for PPC64/POWER7
This patch add a optimized strcspn for POWER7 by using a different
algorithm than default implementation: it constructs a table based on
the 'accept' argument and use this table to check for any occurance
on the input string. The idea is similar as x86_64 uses.
For PowerPC some tunings were added, such as unroll loops and align
stack memory to table to 16 bytes (so VSX clean can ran without
alignment issues).
2014-03-20 11:24:52 -05:00
Vidya Ranganathan
e65caf1f1d PowerPC: strspn optimization for PPC64/POWER7
The optimization is achieved by following techniques:
  > hashing of needle.
  > hashing avoids scanning of duplicate entries in needle across the string.
  > initializing the hash table with Vector instructions (VSX) by quadword access.
  > unrolling when scanning for character in string across hash table.
2014-03-11 08:54:33 -05:00
Adhemerval Zanella
ba9cc0714e PowerPC: strncat optimization for PPC64
The optimization is achieved by following techniques:
1. Doubleword aligned memory access and compares using
   cmpb instruction.
2. Loop unrolling for byte load/store.
3. CPU pre-fetch to avoid cache miss.
2014-03-10 07:25:09 -05:00
Rajalakshmi Srinivasaraghavan
c7debbdfac PowerPC: strrchr optimization for POWER7/PPC64
This patch optimizes strrchr() for ppc64. It uses aligned memory
access along with cmpb instruction and CPU prefetch to avoid
cache misses for speed improvement.
2014-03-03 08:06:41 -06:00
Adhemerval Zanella
a52374e82b PowerPC: multiarch stpcpy for PowerPC64 2013-12-13 14:55:22 -05:00
Adhemerval Zanella
7f5ec11336 PowerPC: multiarch strcpy for PowerPC64 2013-12-13 14:54:41 -05:00
Adhemerval Zanella
e28bcd427b PowerPC: multiarch wordcopy for PowerPC64 2013-12-13 14:54:08 -05:00
Adhemerval Zanella
92cacfce7d PowerPC: multiarch wcscpy for PowerPC64. 2013-12-13 14:53:25 -05:00
Adhemerval Zanella
7b714620a7 PowerPC: multiarch wcsrchr for PowerPC64 2013-12-13 14:52:48 -05:00
Adhemerval Zanella
16fd2ae37c PowerPC: multiarch wcschr for PowerPC64 2013-12-13 14:51:36 -05:00
Adhemerval Zanella
9ee2969b05 PowerPC: multiarch strchrnul for PowerPC64 2013-12-13 14:50:26 -05:00
Adhemerval Zanella
372dc060e0 PowerPC: multiarch strchr for PowerPC64 2013-12-13 14:49:54 -05:00
Adhemerval Zanella
24c2c3b996 PowerPC: multiarch strncmp for PowerPC64 2013-12-13 14:48:48 -05:00
Adhemerval Zanella
1c92d9a0e0 PowerPC: multiarch strncasecmp for PowerPC64 2013-12-13 14:40:28 -05:00
Adhemerval Zanella
17de3ee3c1 PowerPC: multiarch strcasecmp for PowerPC64 2013-12-13 14:39:51 -05:00
Adhemerval Zanella
62982bf978 PowerPC: multiarch strnlen for PowerPC64 2013-12-13 14:38:50 -05:00
Adhemerval Zanella
a65f4904ab PowerPC: multiarch strlen for PowerPC64 2013-12-13 14:38:17 -05:00
Adhemerval Zanella
1fd005ad2f PowerPC: multiarch rawmemchr for PowerPC64 2013-12-13 14:37:26 -05:00
Adhemerval Zanella
cd05ba9135 PowerPC: multiarch memrchr for PowerPC64 2013-12-13 14:36:50 -05:00
Adhemerval Zanella
870f867648 PowerPC: multiarch memchr for PowerPC64 2013-12-13 14:35:28 -05:00
Adhemerval Zanella
f00be62b08 PowerPC: multiarch mempcpy for PowerPC64 2013-12-13 14:34:06 -05:00
Adhemerval Zanella
8a29a3d00b PowerPC: multiarch memset/bzero for PowerPC64 2013-12-13 14:33:16 -05:00
Adhemerval Zanella
07253fcf7b PowerPC: multirach memcmp for PowerPC64 2013-12-13 14:32:31 -05:00
Adhemerval Zanella
b5beafbcee PowerPC: multiarch memcpy for PowerPC64 2013-12-13 14:31:41 -05:00