glibc/sysdeps/powerpc/powerpc64
Matheus Castanho 10624a97e8 powerpc: Add optimized strlen for POWER10
Improvements compared to POWER9 version:

1. Take into account first 16B comparison for aligned strings

   The previous version compares the first 16B and increments r4 by the number
   of bytes until the address is 16B-aligned, then starts doing aligned loads at
   that address. For aligned strings, this causes the first 16B to be compared
   twice, because the increment is 0. Here we calculate the next 16B-aligned
   address differently, which avoids that issue.

2. Use simple comparisons for the first ~192 bytes

   The main loop is good for big strings, but comparing 16B each time is better
   for smaller strings.  So after aligning the address to 16 Bytes, we check
   more 176B in 16B chunks.  There may be some overlaps with the main loop for
   unaligned strings, but we avoid using the more aggressive strategy too soon,
   and also allow the loop to start at a 64B-aligned address.  This greatly
   benefits smaller strings and avoids overlapping checks if the string is
   already aligned at a 64B boundary.

3. Reduce dependencies between load blocks caused by address calculation on loop

   Doing a precise time tracing on the code showed many loads in the loop were
   stalled waiting for updates to r4 from previous code blocks.  This
   implementation avoids that as much as possible by using 2 registers (r4 and
   r5) to hold addresses to be used by different parts of the code.

   Also, the previous code aligned the address to 16B, then to 64B by doing a
   few 48B loops (if needed) until the address was aligned. The main loop could
   not start until that 48B loop had finished and r4 was updated with the
   current address. Here we calculate the address used by the loop very early,
   so it can start sooner.

   The main loop now uses 2 pointers 128B apart to make pointer updates less
   frequent, and also unrolls 1 iteration to guarantee there is enough time
   between iterations to update the pointers, reducing stalled cycles.

4. Use new P10 instructions

   lxvp is used to load 32B with a single instruction, reducing contention in
   the load queue.

   vextractbm allows simplifying the tail code for the loop, replacing
   vbpermq and avoiding having to generate a permute control vector.

Reviewed-by: Paul E Murphy <murphyp@linux.ibm.com>
Reviewed-by: Raphael M Zinsly <rzinsly@linux.ibm.com>
Reviewed-by: Lucas A. M. Magalhaes <lamm@linux.ibm.com>
2021-04-22 16:18:06 -03:00
..
a2 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
be Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
bits Define wordsize.h macros everywhere 2016-11-04 09:37:44 -07:00
cell Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
fpu Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
le powerpc: Add optimized strlen for POWER10 2021-04-22 16:18:06 -03:00
multiarch powerpc: Add optimized strlen for POWER10 2021-04-22 16:18:06 -03:00
power4 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
power6 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
power7 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
power8 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
__longjmp-common.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
__longjmp.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
addmul_1.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
atomic-machine.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
backtrace.c powerpc64: Workaround sigtramp vdso return call 2021-01-28 13:57:50 -03:00
bsd-_setjmp.S
bsd-setjmp.S
bzero.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
configure powerpc64: Fix calls when r2 is not used [BZ #26173] 2020-07-10 19:41:06 -03:00
configure.ac powerpc64: Fix calls when r2 is not used [BZ #26173] 2020-07-10 19:41:06 -03:00
crti.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
crtn.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-dtprocnum.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-irel.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-machine.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-machine.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-trampoline.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
entry.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
ffsll.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
hp-timing.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
Implies
lshift.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
Makefile powerpc64: apply -mabi=ibmlongdouble to special files 2020-03-25 14:34:23 -05:00
memcpy.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
memset.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
mul_1.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
ppc-mcount.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
register-dump.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
rtld-memset.c
setjmp-bug21895.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
setjmp-common.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
setjmp.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
stackguard-macros.h
start.S Reduce the statically linked startup code [BZ #23323] 2021-02-25 12:13:02 +01:00
strchr.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
strcmp.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
strlen.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
strncmp.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
submul_1.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
sysdep.h powerpc64: Select POWER9 machine for the scv instruction 2021-01-22 10:45:27 +01:00
tls-macros.h tst-tlsopt-powerpc as a shared lib 2017-08-03 15:39:21 +09:30
tst-audit.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-setjmp-bug21895-static.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-ucontext-ppc64-vscr.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00