This website requires JavaScript.
ReeceSX
Explore
Aurora
Register
Sign In
AuroraMiddleware
/
glibc
Watch
1
Star
0
Fork
0
You've already forked glibc
mirror of
https://sourceware.org/git/glibc.git
synced
2025-01-07 18:10:07 +00:00
Code
Issues
Packages
Projects
Releases
Wiki
Activity
642933158e
glibc
/
sysdeps
/
x86_64
/
multiarch
/
strcat-avx2-rtm.S
4 lines
88 B
ArmAsm
Raw
Normal View
History
Unescape
Escape
x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions Optimizations are: 1. Use more overlapping stores to avoid branches. 2. Reduce how unrolled the aligning copies are (this is more of a code-size save, its a negative for some sizes in terms of perf). 3. For st{r|p}n{cat|cpy} re-order the branches to minimize the number that are taken. Performance Changes: Times are from N = 10 runs of the benchmark suite and are reported as geometric mean of all ratios of New Implementation / Old Implementation. strcat-avx2 -> 0.998 strcpy-avx2 -> 0.937 stpcpy-avx2 -> 0.971 strncpy-avx2 -> 0.793 stpncpy-avx2 -> 0.775 strncat-avx2 -> 0.962 Code Size Changes: function -> Bytes New / Bytes Old -> Ratio strcat-avx2 -> 685 / 1639 -> 0.418 strcpy-avx2 -> 560 / 903 -> 0.620 stpcpy-avx2 -> 592 / 939 -> 0.630 strncpy-avx2 -> 1176 / 2390 -> 0.492 stpncpy-avx2 -> 1268 / 2438 -> 0.520 strncat-avx2 -> 1042 / 2563 -> 0.407 Notes: 1. Because of the significant difference between the implementations they are split into three files. strcpy-avx2.S -> strcpy, stpcpy, strcat strncpy-avx2.S -> strncpy strncat-avx2.S > strncat I couldn't find a way to merge them without making the ifdefs incredibly difficult to follow. Full check passes on x86-64 and build succeeds for all ISA levels w/ and w/o multiarch.
2022-11-09 01:38:39 +00:00
#
define
S
T
R
C
A
T
_
_
s
t
r
c
a
t
_
a
v
x2
_
r
t
m
#
include
"
x86
-
a
v
x
-
r
t
m
-
v
e
c
s
.
h
"
x86-64: Add AVX optimized string/memory functions for RTM Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX optimized string/memory functions with xtest jz 1f vzeroall ret 1: vzeroupper ret at function exit on processors with usable RTM, but without 256-bit EVEX instructions to avoid VZEROUPPER inside a transactionally executing RTM region.
2021-03-05 15:26:42 +00:00
#
include
"
s
t
r
c
a
t
-
a
v
x2
.
S
"
Reference in New Issue
Copy Permalink