The AVX2 str(n)casecmp implementations use the 'bzhi' instruction, which
belongs to the BMI2 CPU feature.
NB: It also uses the 'tzcnt' BMI1 instruction, but it is executed as BSF
as BSF if the CPU doesn't support TZCNT, and produces the same result
for non-zero input.
Partially fixes: b77b06e0e2 ("x86: Optimize strcmp-avx2.S")
Partially resolves: BZ #29611
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
1. Add default ISA level selection in non-multiarch/rtld
implementations.
2. Add ISA level build guards to different implementations.
- I.e strcmp-avx2.S which is ISA level 3 will only build if
compiled ISA level <= 3. Otherwise there is no reason to
include it as we will always use one of the ISA level 4
implementations (strcmp-evex.S).
3. Refactor the ifunc selector and ifunc implementation list to use
the ISA level aware wrapper macros that allow functions below the
compiled ISA level (with a guranteed replacement) to be skipped.
Tested with and without multiarch on x86_64 for ISA levels:
{generic, x86-64-v2, x86-64-v3, x86-64-v4}
And m32 with and without multiarch.
With SSE2, SSE4.1, AVX2, and EVEX versions very few targets prefer
SSSE3. As a result it is no longer worth it to keep the SSSE3
versions given the code size cost.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
The rational is:
1. SSE42 has nearly identical logic so any benefit is minimal (3.4%
regression on Tigerlake using SSE42 versus AVX across the
benchtest suite).
2. AVX2 version covers the majority of targets that previously
prefered it.
3. The targets where AVX would still be best (SnB and IVB) are
becoming outdated.
All in all the saving the code size is worth it.
All string/memory tests pass.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.
I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah. I don't
know why I run into these diagnostics whereas others evidently do not.
remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
Support usable check for all CPU features with the following changes:
1. Change struct cpu_features to
struct cpuid_features
{
struct cpuid_registers cpuid;
struct cpuid_registers usable;
};
struct cpu_features
{
struct cpu_features_basic basic;
struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};
so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.
The results are
1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
Implement strcmp family IFUNC selectors in C.
All internal calls within libc.so can use IFUNC on x86-64 since unlike
x86, x86-64 supports PC-relative addressing to access the GOT entry so
that it can call via PLT without using an extra register. For libc.a,
we can't use IFUNC for functions which are called before IFUNC has been
initialized. Use IFUNC internally reduces the icache footprint since
libc.so and other codes in the process use the same implementations.
This patch uses IFUNC for strcmp family functions within libc.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
strcmp-sse2, strcmp-sse4_2, strncmp-sse2, strncmp-sse4_2,
strcasecmp_l-sse2, strcasecmp_l-sse4_2, strcasecmp_l-avx,
strncase_l-sse2, strncase_l-sse4_2 and strncase_l-avx.
* sysdeps/x86_64/multiarch/ifunc-strcasecmp.h: New file.
* sysdeps/x86_64/multiarch/strcasecmp.c: Likewise.
* sysdeps/x86_64/multiarch/strcasecmp_l-avx.S: Likewise.
* sysdeps/x86_64/multiarch/strcasecmp_l-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/strcasecmp_l-sse4_2.S: Likewise.
* sysdeps/x86_64/multiarch/strcasecmp_l.c: Likewise.
* sysdeps/x86_64/multiarch/strcmp-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp-sse4_2.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp.c: Likewise.
* sysdeps/x86_64/multiarch/strncase.c: Likewise.
* sysdeps/x86_64/multiarch/strncase_l-avx.S : Likewise.
* sysdeps/x86_64/multiarch/strncase_l-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/strncase_l-sse4_2.S: Likewise.
* sysdeps/x86_64/multiarch/strncase_l.c: Likewise.
* sysdeps/x86_64/multiarch/strncmp-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/strncmp-sse4_2.S: Likewise.
* sysdeps/x86_64/multiarch/strncmp.c: Likewise.
* sysdeps/x86_64/multiarch/strcasecmp_l.S: Removed.
* sysdeps/x86_64/multiarch/strcmp.S: Likewise.
* sysdeps/x86_64/multiarch/strncase_l.S: Likewise.
* sysdeps/x86_64/multiarch/strncmp.S: Likewise.
* sysdeps/x86_64/multiarch/strcmp-sse42.S: Include <sysdep.h>.
(STRCMP_SSE42): New. Defined to __strcmp_sse42 if not defined.
[USE_AS_STRCASECMP_L || USE_AS_STRNCASECMP_L]: Include
"locale-defines.h".
(UPDATE_STRNCMP_COUNTER): New.
(SECTION): Likewise.
(GLABEL): Likewise.
(LABEL): Likewise.
* sysdeps/x86_64/multiarch/strncmp-ssse3.S: Rewrite and enable
for libc.a.