glibc/sysdeps/x86_64/multiarch
H.J. Lu 935971ba6b x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE
Optimize x86-64 memcmp/wmemcmp with AVX2.  It uses vector compare as
much as possible.  It is as fast as SSE4 memcmp for size <= 16 bytes
and up to 2X faster for size > 16 bytes on Haswell and Skylake.  Select
AVX2 memcmp/wmemcmp on AVX2 machines where vzeroupper is preferred and
AVX unaligned load is fast.

NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input.  TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.

Key features:

1. For size from 2 to 7 bytes, load as big endian with movbe and bswap
   to avoid branches.
2. Use overlapping compare to avoid branch.
3. Use vector compare when size >= 4 bytes for memcmp or size >= 8
   bytes for wmemcmp.
4. If size is 8 * VEC_SIZE or less, unroll the loop.
5. Compare 4 * VEC_SIZE at a time with the aligned first memory area.
6. Use 2 vector compares when size is 2 * VEC_SIZE or less.
7. Use 4 vector compares when size is 4 * VEC_SIZE or less.
8. Use 8 vector compares when size is 8 * VEC_SIZE or less.

	* sysdeps/x86/cpu-features.h (index_cpu_MOVBE): New.
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memcmp-avx2 and wmemcmp-avx2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test __memcmp_avx2 and __wmemcmp_avx2.
	* sysdeps/x86_64/multiarch/memcmp-avx2.S: New file.
	* sysdeps/x86_64/multiarch/wmemcmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/memcmp.S: Use __memcmp_avx2 on AVX
	2 machines if AVX unaligned load is fast and vzeroupper is
	preferred.
	* sysdeps/x86_64/multiarch/wmemcmp.S: Use __wmemcmp_avx2 on AVX
	2 machines if AVX unaligned load is fast and vzeroupper is
	preferred.
2017-06-05 12:52:55 -07:00
..
bcopy.S Use IFUNC memmove/memset in x86-64 bcopy/bzero 2012-10-11 13:58:16 -07:00
ifunc-impl-list.c x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
Makefile x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
memcmp-avx2-movbe.S x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
memcmp-sse4.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcmp-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcmp.S x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
memcpy_chk.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memcpy-ssse3-back.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcpy-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcpy.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memmove_chk.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memmove-avx512-no-vzeroupper.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memmove-avx512-unaligned-erms.S Require binutils 2.24 to build x86-64 glibc [BZ #20139] 2016-07-01 06:03:05 -07:00
memmove-avx-unaligned-erms.S X86-64: Use non-temporal store in memcpy on large data 2016-04-12 08:10:47 -07:00
memmove-ssse3-back.S
memmove-ssse3.S
memmove-vec-unaligned-erms.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memmove.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
mempcpy_chk.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
mempcpy.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memset_chk.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memset-avx2-unaligned-erms.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
memset-avx512-no-vzeroupper.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memset-avx512-unaligned-erms.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
memset-vec-unaligned-erms.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
memset.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
sched_cpucount.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
stpcpy-sse2-unaligned.S
stpcpy-ssse3.S
stpcpy.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
stpncpy-c.c
stpncpy-sse2-unaligned.S
stpncpy-ssse3.S
stpncpy.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strcasecmp_l-ssse3.S
strcasecmp_l.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strcat-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcat-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcat.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strchr-sse2-no-bsf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strchr.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcmp-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcmp-sse42.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcmp-ssse3.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcmp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcpy-sse2-unaligned.S Fix x86 strncat optimized implementation for large sizes 2017-01-03 14:24:53 -02:00
strcpy-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcpy.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcspn-c.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcspn.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strncase_l-ssse3.S
strncase_l.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strncat-c.c
strncat-sse2-unaligned.S
strncat-ssse3.S
strncat.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strncmp-ssse3.S Don't define x86-64 __strncmp_ssse3 in libc.a 2012-09-27 07:43:03 -07:00
strncmp.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strncpy-c.c
strncpy-sse2-unaligned.S
strncpy-ssse3.S
strncpy.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strpbrk-c.c
strpbrk.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strspn-c.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strspn.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strstr-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strstr.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
test-multiarch.c Suppress internal declarations for most of the testsuite. 2017-05-11 19:27:59 -04:00
varshift.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
varshift.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wcscpy-c.c Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcscpy-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wcscpy.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wmemcmp-avx2-movbe.S x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
wmemcmp-c.c Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wmemcmp-sse4.S
wmemcmp-ssse3.S
wmemcmp.S x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
wmemset_chk-nonshared.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
wmemset_chk.c x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
wmemset.c x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
wmemset.h x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00