AArch64: Improve backwards memmove performance

On some microarchitectures performance of the backwards memmove improves if
the stores use STR with decreasing addresses.  So change the memmove loop
in memcpy_advsimd.S to use 2x STR rather than STP.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This commit is contained in:
Wilco Dijkstra 2020-08-28 17:51:40 +01:00
parent 567b170501
commit bd394d131c

View File

@ -223,12 +223,13 @@ L(copy_long_backwards):
b.ls L(copy64_from_start)
L(loop64_backwards):
stp A_q, B_q, [dstend, -32]
str B_q, [dstend, -16]
str A_q, [dstend, -32]
ldp A_q, B_q, [srcend, -96]
stp C_q, D_q, [dstend, -64]
str D_q, [dstend, -48]
str C_q, [dstend, -64]!
ldp C_q, D_q, [srcend, -128]
sub srcend, srcend, 64
sub dstend, dstend, 64
subs count, count, 64
b.hi L(loop64_backwards)