glibc/sysdeps
H.J. Lu ef9c4cb6c7 x86-64: Optimize wmemset with SSE2/AVX2/AVX512
The difference between memset and wmemset is byte vs int.  Add stubs
to SSE2/AVX2/AVX512 memset for wmemset with updated constant and size:

SSE2 wmemset:
	shl    $0x2,%rdx
	movd   %esi,%xmm0
	mov    %rdi,%rax
	pshufd $0x0,%xmm0,%xmm0
	jmp	entry_from_wmemset

SSE2 memset:
	movd   %esi,%xmm0
	mov    %rdi,%rax
	punpcklbw %xmm0,%xmm0
	punpcklwd %xmm0,%xmm0
	pshufd $0x0,%xmm0,%xmm0
entry_from_wmemset:

Since the ERMS versions of wmemset requires "rep stosl" instead of
"rep stosb", only the vector store stubs of SSE2/AVX2/AVX512 wmemset
are added.  The SSE2 wmemset is about 3X faster and the AVX2 wmemset
is about 6X faster on Haswell.

	* include/wchar.h (__wmemset_chk): New.
	* sysdeps/x86_64/memset.S (VDUP_TO_VEC0_AND_SET_RETURN): Renamed
	to MEMSET_VDUP_TO_VEC0_AND_SET_RETURN.
	(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
	(WMEMSET_CHK_SYMBOL): Likewise.
	(WMEMSET_SYMBOL): Likewise.
	(__wmemset): Add hidden definition.
	(wmemset): Add weak hidden definition.
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	wmemset_chk-nonshared.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add __wmemset_sse2_unaligned,
	__wmemset_avx2_unaligned, __wmemset_avx512_unaligned,
	__wmemset_chk_sse2_unaligned, __wmemset_chk_avx2_unaligned
	and __wmemset_chk_avx512_unaligned.
	* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S
	(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
	(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
	(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
	(WMEMSET_SYMBOL): Likewise.
	* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S
	(VDUP_TO_VEC0_AND_SET_RETURN): Renamed to ...
	(MEMSET_VDUP_TO_VEC0_AND_SET_RETURN): This.
	(WMEMSET_VDUP_TO_VEC0_AND_SET_RETURN): New.
	(WMEMSET_SYMBOL): Likewise.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Updated.
	(WMEMSET_CHK_SYMBOL): New.
	(WMEMSET_CHK_SYMBOL (__wmemset_chk, unaligned)): Likewise.
	(WMEMSET_SYMBOL (__wmemset, unaligned)): Likewise.
	* sysdeps/x86_64/multiarch/memset.S (WMEMSET_SYMBOL): New.
	(libc_hidden_builtin_def): Also define __GI_wmemset and
	__GI___wmemset.
	(weak_alias): New.
	* sysdeps/x86_64/multiarch/wmemset.c: New file.
	* sysdeps/x86_64/multiarch/wmemset.h: Likewise.
	* sysdeps/x86_64/multiarch/wmemset_chk-nonshared.S: Likewise.
	* sysdeps/x86_64/multiarch/wmemset_chk.c: Likewise.
	* sysdeps/x86_64/wmemset.c: Likewise.
	* sysdeps/x86_64/wmemset_chk.c: Likewise.
2017-06-05 11:09:59 -07:00
..
aarch64 aarch64: Thunderx specific memcpy and memmove 2017-05-24 16:46:48 -07:00
alpha Move shared pthread definitions to common headers 2017-05-09 17:49:17 -03:00
arm Fix more namespace issues in sys/ucontext.h (bug 21457). 2017-06-01 14:07:40 +00:00
generic Fix sys/ucontext.h namespace from signal.h etc. inclusion (bug 21457). 2017-05-23 11:49:48 +00:00
gnu Regenerate sysdeps/gnu/errlist.c. 2017-06-04 15:27:14 -04:00
hppa Remove wrong definitions from pthread header refactor 2017-05-11 10:46:03 -03:00
i386 Fix more namespace issues in sys/ucontext.h (bug 21457). 2017-06-01 14:07:40 +00:00
ia64 Suppress internal declarations for most of the testsuite. 2017-05-11 19:27:59 -04:00
ieee754 float128: Add wrappers to override ldbl-128 as float128. 2017-05-25 09:01:37 -03:00
init_array Update copyright dates with scripts/update-copyrights. 2017-01-01 00:14:16 +00:00
m68k Fix more namespace issues in sys/ucontext.h (bug 21457). 2017-06-01 14:07:40 +00:00
mach Fix struct sigaltstack namespace (bug 21517). 2017-06-05 10:17:46 +00:00
microblaze Move shared pthread definitions to common headers 2017-05-09 17:49:17 -03:00
mips Fix more namespace issues in sys/ucontext.h (bug 21457). 2017-06-01 14:07:40 +00:00
nios2 Move shared pthread definitions to common headers 2017-05-09 17:49:17 -03:00
nptl fork: Remove bogus parent PID assertions [BZ #21386] 2017-05-12 16:04:16 +02:00
posix getaddrinfo: Eliminate another strdup call 2017-06-03 08:37:31 +02:00
powerpc Remove __need macros from signal.h. 2017-05-20 19:04:43 -04:00
pthread Remove __need macros from signal.h. 2017-05-20 19:04:43 -04:00
s390 Move shared pthread definitions to common headers 2017-05-09 17:49:17 -03:00
sh Move shared pthread definitions to common headers 2017-05-09 17:49:17 -03:00
sparc Remove useless comment from sysdeps/sparc/sparc32/dl-machine.h 2017-05-23 01:10:29 +05:30
tile Move shared pthread definitions to common headers 2017-05-09 17:49:17 -03:00
unix x86-64: Update LO_HI_LONG for p{readv,writev}{64}v2 2017-06-05 07:21:57 -07:00
wordsize-32 Build divdi3 only for architecture that required it 2017-04-06 15:14:34 -03:00
wordsize-64 Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
x86 x86: Add macros to implement ifunce selection in C 2017-06-05 08:28:13 -07:00
x86_64 x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00