glibc/sysdeps/i386/i686/multiarch
Andrew Senkevich 8b4416d83c i386: memcpy functions with SSE2 unaligned load/store
These new memcpy functions are the 32-bit version of x86_64 SSE2 unaligned
memcpy.  Memcpy average performace benefit is 18% on Silvermont, other
platforms also improved about 35%, benchmarked on Silvermont, Haswell, Ivy
Bridge, Sandy Bridge and Westmere, performance results attached in

https://sourceware.org/ml/libc-alpha/2014-07/msg00157.html

	* sysdeps/i386/i686/multiarch/bcopy-sse2-unaligned.S: New file.
	* sysdeps/i386/i686/multiarch/memcpy-sse2-unaligned.S: Likewise.
	* sysdeps/i386/i686/multiarch/memmove-sse2-unaligned.S: Likewise.
	* sysdeps/i386/i686/multiarch/mempcpy-sse2-unaligned.S: Likewise.
	* sysdeps/i386/i686/multiarch/bcopy.S: Select the sse2_unaligned
	version if bit_Fast_Unaligned_Load is set.
	* sysdeps/i386/i686/multiarch/memcpy.S: Likewise.
	* sysdeps/i386/i686/multiarch/memcpy_chk.S: Likewise.
	* sysdeps/i386/i686/multiarch/memmove.S: Likewise.
	* sysdeps/i386/i686/multiarch/memmove_chk.S: Likewise.
	* sysdeps/i386/i686/multiarch/mempcpy.S: Likewise.
	* sysdeps/i386/i686/multiarch/mempcpy_chk.S: Likewise.
	* sysdeps/i386/i686/multiarch/Makefile (sysdep_routines): Add
	bcopy-sse2-unaligned, memcpy-sse2-unaligned,
	memmove-sse2-unaligned and mempcpy-sse2-unaligned.
	* sysdeps/i386/i686/multiarch/ifunc-impl-list.c (MAX_IFUNC): Set
	to 4.
	(__libc_ifunc_impl_list): Test __bcopy_sse2_unaligned,
	__memmove_chk_sse2_unaligned, __memmove_sse2_unaligned,
	__memcpy_chk_sse2_unaligned, __memcpy_sse2_unaligned,
	__mempcpy_chk_sse2_unaligned, and __mempcpy_sse2_unaligned.
2014-12-30 07:19:38 -08:00
..
bcopy-sse2-unaligned.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
bcopy-ssse3-rep.S Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
bcopy-ssse3.S Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
bcopy.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
bzero-sse2-rep.S Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
bzero-sse2.S Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
bzero.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
ifunc-defines.sym Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
ifunc-impl-list.c i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
init-arch.c Remove ENABLE_SSSE3_ON_ATOM. 2009-08-28 14:54:46 -07:00
init-arch.h
locale-defines.sym SSSE3 optimized strcasecmp and strncasecmp for x86-32 2011-11-13 09:50:13 -05:00
Makefile i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
memchr-sse2-bsf.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memchr-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memchr.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memcmp-sse4.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memcmp-ssse3.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memcmp.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memcpy_chk.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
memcpy-sse2-unaligned.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
memcpy-ssse3-rep.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memcpy-ssse3.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memcpy.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
memmove_chk.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
memmove-sse2-unaligned.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
memmove-ssse3-rep.S Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
memmove-ssse3.S Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
memmove.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
mempcpy_chk.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
mempcpy-sse2-unaligned.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
mempcpy-ssse3-rep.S Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
mempcpy-ssse3.S Optimize 32bit memset/memcpy with SSE2/SSSE3. 2010-01-12 11:22:03 -08:00
mempcpy.S i386: memcpy functions with SSE2 unaligned load/store 2014-12-30 07:19:38 -08:00
memrchr-c.c Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memrchr-sse2-bsf.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memrchr-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memrchr.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memset_chk.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memset-sse2-rep.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memset-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
memset.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
rawmemchr-sse2-bsf.S Optimized memchr, memrchr, rawmemchr for x86-32 2011-10-12 11:42:04 -04:00
rawmemchr-sse2.S Optimized memchr, memrchr, rawmemchr for x86-32 2011-10-12 11:42:04 -04:00
rawmemchr.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
rtld-strnlen.c Fix strnlen change 2011-10-23 16:30:40 -04:00
s_fma-fma.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
s_fma.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
s_fmaf-fma.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
s_fmaf.c Update copyright notices with scripts/update-copyrights 2014-01-01 22:00:23 +10:00
sched_cpucount.c
stpcpy-sse2.S Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
stpcpy-ssse3.S Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
stpcpy.S Add i686 __libc_ifunc_impl_list 2012-10-11 16:40:02 -07:00
stpncpy-sse2.S Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
stpncpy-ssse3.S Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
stpncpy.S Add i686 __libc_ifunc_impl_list 2012-10-11 16:40:02 -07:00
strcasecmp_l-c.c Fix x86 strcasecmp_l (bug 13786). 2012-02-29 22:37:38 +00:00
strcasecmp_l-sse4.S Add SSE4.2 support for strcasecmp and strncasecmp on x86-32 2011-11-14 18:24:35 -05:00
strcasecmp_l-ssse3.S SSSE3 optimized strcasecmp and strncasecmp for x86-32 2011-11-13 09:50:13 -05:00
strcasecmp_l.S Add i686 __libc_ifunc_impl_list 2012-10-11 16:40:02 -07:00
strcasecmp-c.c SSSE3 optimized strcasecmp and strncasecmp for x86-32 2011-11-13 09:50:13 -05:00
strcasecmp.S Fix misdetected Slow_SSE4_2 cpu feature bit (bug 17501) 2014-10-27 10:44:28 +01:00
strcasestr-c.c Add i686 __libc_ifunc_impl_list 2012-10-11 16:40:02 -07:00
strcat-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcat-ssse3.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcat.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strchr-sse2-bsf.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strchr-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strchr.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcmp-sse4.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcmp-ssse3.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcmp.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcpy-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcpy-ssse3.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcpy.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strcspn-c.c Add x86 32-bit SSE4.2 string functions. 2009-08-04 12:13:43 -07:00
strcspn.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strlen-sse2-bsf.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strlen-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strlen.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strncase_l-c.c Fix x86 strcasecmp_l (bug 13786). 2012-02-29 22:37:38 +00:00
strncase_l-sse4.S Add SSE4.2 support for strcasecmp and strncasecmp on x86-32 2011-11-14 18:24:35 -05:00
strncase_l-ssse3.S SSSE3 optimized strcasecmp and strncasecmp for x86-32 2011-11-13 09:50:13 -05:00
strncase_l.S Add i686 __libc_ifunc_impl_list 2012-10-11 16:40:02 -07:00
strncase-c.c SSSE3 optimized strcasecmp and strncasecmp for x86-32 2011-11-13 09:50:13 -05:00
strncase.S Fix misdetected Slow_SSE4_2 cpu feature bit (bug 17501) 2014-10-27 10:44:28 +01:00
strncat-c.c Improve x86-32 strcat functions with SSE2/SSSE3 2011-08-04 15:33:38 -04:00
strncat-sse2.S Improve x86-32 strcat functions with SSE2/SSSE3 2011-08-04 15:33:38 -04:00
strncat-ssse3.S Improve x86-32 strcat functions with SSE2/SSSE3 2011-08-04 15:33:38 -04:00
strncat.S Add i686 __libc_ifunc_impl_list 2012-10-11 16:40:02 -07:00
strncmp-c.c 32bit memcmp/strcmp/strncmp optimized for SSSE3/SSS4.2 2010-02-15 11:17:50 -08:00
strncmp-sse4.S 32bit memcmp/strcmp/strncmp optimized for SSSE3/SSS4.2 2010-02-15 11:17:50 -08:00
strncmp-ssse3.S 32bit memcmp/strcmp/strncmp optimized for SSSE3/SSS4.2 2010-02-15 11:17:50 -08:00
strncmp.S Add i686 __libc_ifunc_impl_list 2012-10-11 16:40:02 -07:00
strncpy-c.c Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
strncpy-sse2.S Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
strncpy-ssse3.S Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
strncpy.S Add i686 __libc_ifunc_impl_list 2012-10-11 16:40:02 -07:00
strnlen-c.c Fix some warning nits 2011-10-28 12:02:08 +02:00
strnlen-sse2.S Add optimized wcslen and strnlen for x86-32 2011-10-23 15:17:23 -04:00
strnlen.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strpbrk-c.c Add x86 32-bit SSE4.2 string functions. 2009-08-04 12:13:43 -07:00
strpbrk.S Add i686 __libc_ifunc_impl_list 2012-10-11 16:40:02 -07:00
strrchr-sse2-bsf.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strrchr-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strrchr.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
strspn-c.c Add x86 32-bit SSE4.2 string functions. 2009-08-04 12:13:43 -07:00
strspn.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
test-multiarch.c BZ#14059: Fix AVX and FMA4 detection. 2012-05-17 06:59:28 -07:00
varshift.c Fixup x86 after x86-64 varshift change. 2010-08-27 12:10:11 -07:00
varshift.h Clean up SSE variable shifts 2010-08-24 11:35:01 -07:00
Versions Add x86-32 FMA support 2010-04-14 22:27:59 -07:00
wcschr-c.c Fix strftime wcschr namespace (bug 17634). 2014-12-10 16:59:02 +00:00
wcschr-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcschr.S Fix strftime wcschr namespace (bug 17634). 2014-12-10 16:59:02 +00:00
wcscmp-c.c Fix warnings in fallback C code of x86-32 wide memory functions 2011-11-12 00:50:26 -05:00
wcscmp-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcscmp.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcscpy-c.c Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcscpy-ssse3.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcscpy.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcslen-c.c Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcslen-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcslen.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcsrchr-c.c Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcsrchr-sse2.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wcsrchr.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wmemcmp-c.c Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30
wmemcmp-sse4.S Optimized memcmp and wmemcmp for x86-64 and x86-32 2011-10-15 11:10:08 -04:00
wmemcmp-ssse3.S Optimized memcmp and wmemcmp for x86-64 and x86-32 2011-10-15 11:10:08 -04:00
wmemcmp.S Remove NOT_IN_libc 2014-11-24 15:03:45 +05:30