mirror of
https://sourceware.org/git/glibc.git
synced 2024-11-25 06:20:06 +00:00
330881763e
Optimizations are: 1. Use the fact that tzcnt(0) -> VEC_SIZE for memchr to save a branch in short string case. 2. Restructure code so that small strings are given the hot path. - This is a net-zero on the benchmark suite but in general makes sense as smaller sizes are far more common. 3. Use more code-size efficient instructions. - tzcnt ... -> bsf ... - vpcmpb $0 ... -> vpcmpeq ... 4. Align labels less aggressively, especially if it doesn't save fetch blocks / causes the basic-block to span extra cache-lines. The optimizations (especially for point 2) make the memchr and rawmemchr code essentially incompatible so split rawmemchr-evex to a new file. Code Size Changes: memchr-evex.S : -107 bytes rawmemchr-evex.S : -53 bytes Net perf changes: Reported as geometric mean of all improvements / regressions from N=10 runs of the benchtests. Value as New Time / Old Time so < 1.0 is improvement and 1.0 is regression. memchr-evex.S : 0.928 rawmemchr-evex.S : 0.986 (Less targets cross cache lines) Full results attached in email. Full check passes on x86-64.
7 lines
122 B
ArmAsm
7 lines
122 B
ArmAsm
#define RAWMEMCHR __rawmemchr_evex_rtm
|
|
|
|
#define USE_IN_RTM 1
|
|
#define SECTION(p) p##.evex.rtm
|
|
|
|
#include "rawmemchr-evex.S"
|