mirror of
https://sourceware.org/git/glibc.git
synced 2025-01-14 21:10:19 +00:00
642933158e
Optimizations are: 1. Use more overlapping stores to avoid branches. 2. Reduce how unrolled the aligning copies are (this is more of a code-size save, its a negative for some sizes in terms of perf). 3. For st{r|p}n{cat|cpy} re-order the branches to minimize the number that are taken. Performance Changes: Times are from N = 10 runs of the benchmark suite and are reported as geometric mean of all ratios of New Implementation / Old Implementation. strcat-avx2 -> 0.998 strcpy-avx2 -> 0.937 stpcpy-avx2 -> 0.971 strncpy-avx2 -> 0.793 stpncpy-avx2 -> 0.775 strncat-avx2 -> 0.962 Code Size Changes: function -> Bytes New / Bytes Old -> Ratio strcat-avx2 -> 685 / 1639 -> 0.418 strcpy-avx2 -> 560 / 903 -> 0.620 stpcpy-avx2 -> 592 / 939 -> 0.630 strncpy-avx2 -> 1176 / 2390 -> 0.492 stpncpy-avx2 -> 1268 / 2438 -> 0.520 strncat-avx2 -> 1042 / 2563 -> 0.407 Notes: 1. Because of the significant difference between the implementations they are split into three files. strcpy-avx2.S -> strcpy, stpcpy, strcat strncpy-avx2.S -> strncpy strncat-avx2.S > strncat I couldn't find a way to merge them without making the ifdefs incredibly difficult to follow. Full check passes on x86-64 and build succeeds for all ISA levels w/ and w/o multiarch. |
||
---|---|---|
.. | ||
aarch64 | ||
alpha | ||
arc | ||
arm | ||
csky | ||
generic | ||
gnu | ||
hppa | ||
htl | ||
hurd | ||
i386 | ||
ia64 | ||
ieee754 | ||
loongarch | ||
m68k | ||
mach | ||
microblaze | ||
mips | ||
nios2 | ||
nptl | ||
or1k | ||
posix | ||
powerpc | ||
pthread | ||
riscv | ||
s390 | ||
sh | ||
sparc | ||
unix | ||
wordsize-32 | ||
wordsize-64 | ||
x86 | ||
x86_64 |