glibc/sysdeps/x86_64
Leonardo Sandoval 1457016337 x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2
Optimize x86-64 strcmp/wcscmp and strncmp/wcsncmp with AVX2. It uses vector
comparison as much as possible. Peak performance observed on a SkyLake
machine: 9x, 3x, 2.5x and 5.5x for strcmp, strncmp, wcscmp and wcsncmp,
respectively. The larger the comparison length, the more benefit using
avx2 functions, except on the strcmp, where peak is observed at length
== 32 bytes. Select AVX2 strcmp/wcscmp on AVX2 machines where vzeroupper
is preferred and AVX unaligned load is fast.

NB: It uses TZCNT instead of BSF since TZCNT produces the same result
as BSF for non-zero input.  TZCNT is faster than BSF and is executed
as BSF if machine doesn't support TZCNT.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	strcmp-avx2, strncmp-avx2, wcscmp-avx2, wcscmp-sse2, wcsncmp-avx2 and
	wcsncmp-sse2.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add tests for __strcmp_avx2,
	__strncmp_avx2,	__wcscmp_avx2, __wcsncmp_avx2, __wcscmp_sse2
	and __wcsncmp_sse2.
	* sysdeps/x86_64/multiarch/strcmp.c (OPTIMIZE (avx2)):
	(IFUNC_SELECTOR): Return OPTIMIZE (avx2) on AVX 2 machines if
	AVX unaligned load is fast and vzeroupper is preferred.
	* sysdeps/x86_64/multiarch/strncmp.c: Likewise.
	* sysdeps/x86_64/multiarch/strcmp-avx2.S: New file.
	* sysdeps/x86_64/multiarch/strncmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp-sse2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcscmp.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp-avx2.S: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp-sse2.c: Likewise.
	* sysdeps/x86_64/multiarch/wcsncmp.c: Likewise.
	* sysdeps/x86_64/wcscmp.S (__wcscmp): Add alias only if __wcscmp
	is undefined.
2018-06-01 16:32:43 -05:00
..
64
fpu Update ulps with "make regen-ulps" on AMD Ryzen 7 1800X. 2018-05-30 09:17:47 -07:00
multiarch x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 2018-06-01 16:32:43 -05:00
nptl nptl: Remove __ASSUME_PRIVATE_FUTEX 2018-05-17 04:25:10 -07:00
x32 Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
____longjmp_chk.S
__longjmp.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
_mcount.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
abort-instr.h
add_n.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
addmul_1.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
atomic-machine.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
bsd-_setjmp.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
bsd-setjmp.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
bzero.S
configure Add --enable-static-pie configure option to build static PIE [BZ #19574] 2017-12-15 17:12:14 -08:00
configure.ac Add --enable-static-pie configure option to build static PIE [BZ #19574] 2017-12-15 17:12:14 -08:00
crti.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
crtn.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
dl-irel.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
dl-lookupcfg.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
dl-machine.h elf: Unify symbol address run-time calculation [BZ #19818] 2018-04-04 23:09:37 +01:00
dl-procinfo.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
dl-runtime.c
dl-tls.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
dl-tls.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
dl-tlsdesc.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
dl-tlsdesc.S hurd: Fix build without NO_HIDDEN 2018-01-06 18:20:18 +01:00
dl-trampoline.h x86-64: Properly align La_x86_64_retval to VEC_SIZE [BZ #22715] 2018-01-17 04:32:04 -08:00
dl-trampoline.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
ffs.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
ffsll.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
hp-timing.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
htonl.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
ifuncmain8.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
ifuncmod8.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
Implies Add float128 support for x86_64, x86. 2017-06-26 22:02:24 +00:00
jmpbuf-offsets.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
jmpbuf-unwind.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
l10nflist.c
ldbl2mpn.c
link-defines.sym
locale-defines.sym
localplt.data ld.so: Introduce struct dl_exception 2017-08-10 16:54:57 +02:00
lshift.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
machine-gmon.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
Makefile x86-64: Use fxsave/xsave/xsavec in _dl_runtime_resolve [BZ #21265] 2017-10-20 11:00:34 -07:00
memchr.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
memcmp.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
memcopy.h
memcpy_chk.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
memcpy.S
memmove_chk.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
memmove.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
mempcpy_chk.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
mempcpy.S
memrchr.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
memset_chk.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
memset.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
memusage.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
mp_clz_tab.c
mul_1.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
preconfigure
preconfigure.ac
rawmemchr.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
rshift.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
rtld-offsets.sym x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
sched_cpucount.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
setjmp.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
stack-aliasing.h
stackguard-macros.h
stackinfo.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
start.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
stpcpy.S
strcasecmp_l-nonascii.c Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
strcasecmp_l.S
strcasecmp.S
strcat.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
strchr.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
strchrnul.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
strcmp.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
strcpy.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
strcspn.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
strlen.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
strncase_l-nonascii.c Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
strncase_l.S
strncase.S
strncmp.S
strnlen.S
strpbrk.S x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C 2017-06-15 08:59:05 -07:00
strrchr.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
strspn.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
sub_n.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
submul_1.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
sysdep.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tls_get_addr.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tls-macros.h
tlsdesc.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tlsdesc.sym x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
tst-audit3.c
tst-audit4-aux.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-audit4.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-audit5.c
tst-audit6.c
tst-audit7.c
tst-audit10-aux.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-audit10.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-audit.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-auditmod3a.c
tst-auditmod3b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod4a.c
tst-auditmod4b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod5a.c
tst-auditmod5b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod6a.c
tst-auditmod6b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod6c.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod7a.c
tst-auditmod7b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod10a.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-auditmod10b.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-avx512-aux.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-avx512.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-avx512mod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-avx-aux.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-avx.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-avxmod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-mallocalign1.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-platform-1.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-platformmod-1.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-platformmod-2.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-quad1.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-quad1pie.c
tst-quad2.c
tst-quad2pie.c
tst-quadmod1.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-quadmod1pie.S
tst-quadmod2.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-quadmod2pie.S
tst-split-dynreloc.c
tst-split-dynreloc.lds
tst-sse.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-ssemod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-stack-align.h Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-x86_64-1.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
tst-x86_64mod-1.c Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
Versions
wcschr.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
wcscmp.S x86-64: Optimize strcmp/wcscmp and strncmp/wcsncmp with AVX2 2018-06-01 16:32:43 -05:00
wcslen.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
wcsrchr.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
wmemset_chk.S Update copyright dates with scripts/update-copyrights. 2018-01-01 00:32:25 +00:00
wmemset.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
wordcopy.c