glibc/sysdeps
H.J. Lu 830566307f Add x86-64 memset with unaligned store and rep stosb
Implement x86-64 memset with unaligned store and rep movsb.  Support
16-byte, 32-byte and 64-byte vector register sizes.  A single file
provides 2 implementations of memset, one with rep stosb and the other
without rep stosb.  They share the same codes when size is between 2
times of vector register size and REP_STOSB_THRESHOLD which defaults
to 2KB.

Key features:

1. Use overlapping store to avoid branch.
2. For size <= 4 times of vector register size, fully unroll the loop.
3. For size > 4 times of vector register size, store 4 times of vector
register size at a time.

	[BZ #19881]
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and
	memset-avx512-unaligned-erms.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned,
	__memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned,
	__memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned,
	__memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned,
	__memset_sse2_unaligned_erms, __memset_erms,
	__memset_avx2_unaligned, __memset_avx2_unaligned_erms,
	__memset_avx512_unaligned_erms and __memset_avx512_unaligned.
	* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New
	file.
	* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:
	Likewise.
2016-03-31 10:06:07 -07:00
..
aarch64 Add _STRING_INLINE_unaligned and string_private.h 2016-02-18 14:55:29 -02:00
alpha Update Alpha libm-test-ulps 2016-01-25 10:43:41 -08:00
arm Fix building glibc master with NDEBUG and --with-cpu. 2016-03-15 23:23:24 -04:00
generic hurd: Do not hide rtld symbols which need to be preempted 2016-03-20 19:51:42 +01:00
gnu Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
hppa hppa: fix dladdr [BZ #19415] 2016-01-08 02:19:26 -05:00
i386 Fix x86_64 / x86 powl inaccuracy for integer exponents (bug 19848). 2016-03-24 01:32:52 +00:00
ia64 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
ieee754 Fix ldbl-128ibm nearbyintl in non-default rounding modes (bug 19790). 2016-03-09 00:30:59 +00:00
init_array Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
m68k Add _STRING_INLINE_unaligned and string_private.h 2016-02-18 14:55:29 -02:00
mach hurd: Add c++-types expected result 2016-03-20 22:16:34 +01:00
microblaze Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
mips Fix MIPS64 memcpy regression. 2016-01-28 01:52:05 +00:00
nacl NaCl: Fix unused variable errors in lowlevellock-futex.h macros. 2016-01-20 13:57:14 -08:00
nios2 Maintainence patch for nios2: update ULPS file and localplt.data changes. 2016-01-21 22:58:03 -08:00
nptl New pthread_barrier algorithm to fulfill barrier destruction requirements. 2016-01-15 21:20:34 +01:00
posix Fix flag test in waitid compatibility layer 2016-03-13 21:44:09 +01:00
powerpc powerpc: Rearrange cfi_offset calls 2016-03-11 11:31:58 -03:00
pthread Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
s390 S390: Extend structs La_s390_regs / La_s390_retval with vector-registers. 2016-03-31 17:37:16 +02:00
sh Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
sparc Add _STRING_INLINE_unaligned and string_private.h 2016-02-18 14:55:29 -02:00
tile Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
unix [microblaze] Remove __ASSUME_FUTIMESAT. 2016-03-29 22:13:36 +00:00
wordsize-32 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
wordsize-64 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
x86 Initial Enhanced REP MOVSB/STOSB (ERMS) support 2016-03-28 19:23:31 -07:00
x86_64 Add x86-64 memset with unaligned store and rep stosb 2016-03-31 10:06:07 -07:00