mirror of
https://sourceware.org/git/glibc.git
synced 2024-11-24 14:00:30 +00:00
AArch64: Update A64FX memset not to degrade at 16KB
This patch updates unroll8 code so as not to degrade at the peak performance 16KB for both FX1000 and FX700. Inserted 2 instructions at the beginning of the unroll8 loop, cmp and branch, are a workaround that is found heuristically. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
This commit is contained in:
parent
69623c0db0
commit
23777232c2
@ -96,7 +96,14 @@ L(vl_agnostic): // VL Agnostic
|
||||
L(unroll8):
|
||||
sub count, count, tmp1
|
||||
.p2align 4
|
||||
1: st1b_unroll 0, 7
|
||||
// The 2 instructions at the beginning of the following loop,
|
||||
// cmp and branch, are a workaround so as not to degrade at
|
||||
// the peak performance 16KB.
|
||||
// It is found heuristically and the branch condition, b.ne,
|
||||
// is chosen intentionally never to jump.
|
||||
1: cmp xzr, xzr
|
||||
b.ne 1b
|
||||
st1b_unroll 0, 7
|
||||
add dst, dst, tmp1
|
||||
subs count, count, tmp1
|
||||
b.hi 1b
|
||||
|
Loading…
Reference in New Issue
Block a user