glibc/sysdeps/x86_64/multiarch
H.J. Lu 8fe57365bf x86-64: Optimize strchr/strchrnul/wcschr with AVX2
Optimize strchr/strchrnul/wcschr with AVX2 to search 32 bytes with vector
instructions.  It is as fast as SSE2 versions for size <= 16 bytes and up
to 1X faster for or size > 16 bytes on Haswell.  Select AVX2 version on
AVX2 machines where vzeroupper is preferred and AVX unaligned load is fast.

NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input.  TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strchr-sse2, strchrnul-sse2, strchr-avx2, strchrnul-avx2,
	wcschr-sse2 and wcschr-avx2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add tests for __strchr_avx2,
	__strchrnul_avx2, __strchrnul_sse2, __wcschr_avx2 and
	__wcschr_sse2.
	* sysdeps/x86_64/multiarch/strchr-avx2.S: New file.
	* sysdeps/x86_64/multiarch/strchr-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strchr.c: Likewise.
	* sysdeps/x86_64/multiarch/strchrnul-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/strchrnul-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/strchrnul.c: Likewise.
	* sysdeps/x86_64/multiarch/wcschr-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcschr-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcschr.c: Likewise.
	* sysdeps/x86_64/multiarch/strchr.S: Removed.
2017-06-09 05:42:29 -07:00
..
bcopy.S Use IFUNC memmove/memset in x86-64 bcopy/bzero 2012-10-11 13:58:16 -07:00
ifunc-avx2.h x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
ifunc-impl-list.c x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
ifunc-wmemset.h x86-64: Rename wmemset.h to ifunc-wmemset.h 2017-06-07 14:48:34 -07:00
Makefile x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
memchr-avx2.S x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
memchr-sse2.S x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
memchr.c x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
memcmp-avx2-movbe.S x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
memcmp-sse4.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcmp-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcmp.S x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
memcpy_chk.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memcpy-ssse3-back.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcpy-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memcpy.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memmove_chk.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memmove-avx512-no-vzeroupper.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memmove-avx512-unaligned-erms.S Require binutils 2.24 to build x86-64 glibc [BZ #20139] 2016-07-01 06:03:05 -07:00
memmove-avx-unaligned-erms.S X86-64: Use non-temporal store in memcpy on large data 2016-04-12 08:10:47 -07:00
memmove-ssse3-back.S Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7 2010-06-30 08:26:11 -07:00
memmove-ssse3.S Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7 2010-06-30 08:26:11 -07:00
memmove-vec-unaligned-erms.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memmove.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
mempcpy_chk.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
mempcpy.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memset_chk.S x86: Use AVX2 memcpy/memset on Skylake server [BZ #21396] 2017-04-18 14:01:45 -07:00
memset-avx2-unaligned-erms.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
memset-avx512-no-vzeroupper.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
memset-avx512-unaligned-erms.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
memset-vec-unaligned-erms.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
memset.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
rawmemchr-avx2.S x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
rawmemchr-sse2.S x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
rawmemchr.c x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
sched_cpucount.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
stpcpy-sse2-unaligned.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
stpcpy-ssse3.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
stpcpy.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
stpncpy-c.c SSSE3 strcpy/stpcpy for x86-64 2009-07-02 03:39:03 -07:00
stpncpy-sse2-unaligned.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
stpncpy-ssse3.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
stpncpy.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strcasecmp_l-ssse3.S Fix x86-64 build without multiarch. 2010-08-14 14:56:32 -07:00
strcasecmp_l.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strcat-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcat-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcat.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strchr-avx2.S x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
strchr-sse2-no-bsf.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strchr-sse2.S x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
strchr.c x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
strchrnul-avx2.S x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
strchrnul-sse2.S x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
strchrnul.c x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
strcmp-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcmp-sse42.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcmp-ssse3.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcmp.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcpy-sse2-unaligned.S Fix x86 strncat optimized implementation for large sizes 2017-01-03 14:24:53 -02:00
strcpy-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcpy.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcspn-c.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strcspn.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strlen-avx2.S x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
strlen-sse2.S x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
strlen.c x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
strncase_l-ssse3.S Add optimized strncasecmp versions for x86-64. 2010-08-14 22:04:01 -07:00
strncase_l.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strncat-c.c Improve 64 bit strcat functions with SSE2/SSSE3 2011-07-19 17:11:54 -04:00
strncat-sse2-unaligned.S Improve 64 bit strcat functions with SSE2/SSSE3 2011-07-19 17:11:54 -04:00
strncat-ssse3.S Improve 64 bit strcat functions with SSE2/SSSE3 2011-07-19 17:11:54 -04:00
strncat.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strncmp-ssse3.S Don't define x86-64 __strncmp_ssse3 in libc.a 2012-09-27 07:43:03 -07:00
strncmp.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strncpy-c.c SSSE3 strcpy/stpcpy for x86-64 2009-07-02 03:39:03 -07:00
strncpy-sse2-unaligned.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
strncpy-ssse3.S Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
strncpy.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strnlen-avx2.S x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
strnlen-sse2.S x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
strnlen.c x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
strpbrk-c.c Don't define __strpbrk_sse42 in static library 2010-03-24 12:16:24 -07:00
strpbrk.S Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
strspn-c.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strspn.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strstr-sse2-unaligned.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
strstr.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
test-multiarch.c Suppress internal declarations for most of the testsuite. 2017-05-11 19:27:59 -04:00
varshift.c Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
varshift.h Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wcschr-avx2.S x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
wcschr-sse2.S x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
wcschr.c x86-64: Optimize strchr/strchrnul/wcschr with AVX2 2017-06-09 05:42:29 -07:00
wcscpy-c.c Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcscpy-ssse3.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wcscpy.S Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
wcslen-avx2.S x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
wcslen-sse2.S x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
wcslen.c x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
wcsnlen-avx2.S x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
wcsnlen-c.c x86-64: Move wcsnlen.S to multiarch/wcsnlen-sse4_1.S 2017-06-06 06:12:32 -07:00
wcsnlen-sse4_1.S x86-64: Move wcsnlen.S to multiarch/wcsnlen-sse4_1.S 2017-06-06 06:12:32 -07:00
wcsnlen.c x86-64: Optimize strlen/strnlen/wcslen/wcsnlen with AVX2 2017-06-09 05:18:18 -07:00
wmemchr-avx2.S x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
wmemchr-sse2.S x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
wmemchr.c x86-64: Optimize memchr/rawmemchr/wmemchr with SSE2/AVX2 2017-06-09 05:13:31 -07:00
wmemcmp-avx2-movbe.S x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
wmemcmp-c.c Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wmemcmp-sse4.S Optimized memcmp and wmemcmp for x86-64 and x86-32 2011-10-15 11:10:08 -04:00
wmemcmp-ssse3.S Optimized memcmp and wmemcmp for x86-64 and x86-32 2011-10-15 11:10:08 -04:00
wmemcmp.S x86-64: Optimize memcmp/wmemcmp with AVX2 and MOVBE 2017-06-05 12:52:55 -07:00
wmemset_chk-nonshared.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
wmemset_chk.c x86-64: Rename wmemset.h to ifunc-wmemset.h 2017-06-07 14:48:34 -07:00
wmemset.c x86-64: Rename wmemset.h to ifunc-wmemset.h 2017-06-07 14:48:34 -07:00