Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with
xtest
jz 1f
vzeroall
ret
1:
vzeroupper
ret
at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.
Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to
select the function optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW
and BMI2 since VZEROUPPER isn't needed at function exit.
For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP
is set.
No bug. Just seemed the performance could be improved a bit. Observed
and expected behavior are unchanged. Optimized body of main
loop. Updated page cross logic and optimized accordingly. Made a few
minor instruction selection modifications. No regressions in test
suite. Both test-strchrnul and test-strchr passed.
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
Support usable check for all CPU features with the following changes:
1. Change struct cpu_features to
struct cpuid_features
{
struct cpuid_registers cpuid;
struct cpuid_registers usable;
};
struct cpu_features
{
struct cpu_features_basic basic;
struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};
so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.
The results are
1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
GCC 9 has gained an enhancement to help detect attribute mismatches
between alias declarations and their targets. It consists of a new
warning, -Wattribute-alias, an enhancement to an existing warning,
-Wmissing-attributes, and a new attribute called copy.
The purpose of the warnings is to help identify either possible bugs
(an alias declared with more restrictive attributes than its target
promises) or optimization or diagnostic opportunities (an alias target
missing some attributes that it could be declared with that might
benefit analysis and code generation). The purpose of the new
attribute is to easily apply (almost) the same set of attributes
to one declaration as those already present on another.
As expected (and intended) the enhancement triggers warnings for
many alias declarations in Glibc code. This change, tested on
x86_64-linux, avoids all instances of the new warnings by making
use of the attribute where appropriate. To fully benefit from
the enhancement Glibc will need to be compiled with
-Wattribute-alias=2 and remaining warnings reviewed and dealt with
(there are a couple of thousand but most should be straightforward
to deal with).
ChangeLog:
* include/libc-symbols.h (__attribute_copy__): Define macro unless
it's already defined.
(_strong_alias): Use __attribute_copy__.
(_weak_alias, __hidden_ver1, __hidden_nolink2): Same.
* misc/sys/cdefs.h (__attribute_copy__): New macro.
* sysdeps/x86_64/multiarch/memchr.c (memchr): Use __attribute_copy__.
* sysdeps/x86_64/multiarch/memcmp.c (memcmp): Same.
* sysdeps/x86_64/multiarch/mempcpy.c (mempcpy): Same.
* sysdeps/x86_64/multiarch/memset.c (memset): Same.
* sysdeps/x86_64/multiarch/stpcpy.c (stpcpy): Same.
* sysdeps/x86_64/multiarch/strcat.c (strcat): Same.
* sysdeps/x86_64/multiarch/strchr.c (strchr): Same.
* sysdeps/x86_64/multiarch/strcmp.c (strcmp): Same.
* sysdeps/x86_64/multiarch/strcpy.c (strcpy): Same.
* sysdeps/x86_64/multiarch/strcspn.c (strcspn): Same.
* sysdeps/x86_64/multiarch/strlen.c (strlen): Same.
* sysdeps/x86_64/multiarch/strncmp.c (strncmp): Same.
* sysdeps/x86_64/multiarch/strncpy.c (strncpy): Same.
* sysdeps/x86_64/multiarch/strnlen.c (strnlen): Same.
* sysdeps/x86_64/multiarch/strpbrk.c (strpbrk): Same.
* sysdeps/x86_64/multiarch/strrchr.c (strrchr): Same.
* sysdeps/x86_64/multiarch/strspn.c (strspn): Same.
Optimize strchr/strchrnul/wcschr with AVX2 to search 32 bytes with vector
instructions. It is as fast as SSE2 versions for size <= 16 bytes and up
to 1X faster for or size > 16 bytes on Haswell. Select AVX2 version on
AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast.
NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input. TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
strchr-sse2, strchrnul-sse2, strchr-avx2, strchrnul-avx2,
wcschr-sse2 and wcschr-avx2.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add tests for __strchr_avx2,
__strchrnul_avx2, __strchrnul_sse2, __wcschr_avx2 and
__wcschr_sse2.
* sysdeps/x86_64/multiarch/strchr-avx2.S: New file.
* sysdeps/x86_64/multiarch/strchr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/strchr.c: Likewise.
* sysdeps/x86_64/multiarch/strchrnul-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/strchrnul-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/strchrnul.c: Likewise.
* sysdeps/x86_64/multiarch/wcschr-avx2.S: Likewise.
* sysdeps/x86_64/multiarch/wcschr-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/wcschr.c: Likewise.
* sysdeps/x86_64/multiarch/strchr.S: Removed.