mirror of
https://sourceware.org/git/glibc.git
synced 2025-01-07 18:10:07 +00:00
af5306a735
The new code: 1. prioritizes smaller user-arg lengths more. 2. optimizes target placement more carefully 3. reuses logic more 4. fixes up various inefficiencies in the logic. The biggest case here is the `lzcnt` logic for checking returns which saves either a branch or multiple instructions. The total code size saving is: 306 bytes Geometric Mean of all benchmarks New / Old: 0.760 Regressions: There are some regressions. Particularly where the length (user arg length) is large but the position of the match char is near the beginning of the string (in first VEC). This case has roughly a 10-20% regression. This is because the new logic gives the hot path for immediate matches to shorter lengths (the more common input). This case has roughly a 15-45% speedup. Full xcheck passes on x86_64. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
14 lines
300 B
ArmAsm
14 lines
300 B
ArmAsm
#ifndef MEMRCHR
|
|
# define MEMRCHR __memrchr_avx2_rtm
|
|
#endif
|
|
|
|
#define COND_VZEROUPPER COND_VZEROUPPER_XTEST
|
|
#define ZERO_UPPER_VEC_REGISTERS_RETURN \
|
|
ZERO_UPPER_VEC_REGISTERS_RETURN_XTEST
|
|
|
|
#define VZEROUPPER_RETURN jmp L(return_vzeroupper)
|
|
|
|
#define SECTION(p) p##.avx.rtm
|
|
|
|
#include "memrchr-avx2.S"
|