glibc/sysdeps
H.J. Lu 88b57b8ed4 Add x86-64 memmove with unaligned load/store and rep movsb
Implement x86-64 memmove with unaligned load/store and rep movsb.
Support 16-byte, 32-byte and 64-byte vector register sizes.  When
size <= 8 times of vector register size, there is no check for
address overlap bewteen source and destination.  Since overhead for
overlap check is small when size > 8 times of vector register size,
memcpy is an alias of memmove.

A single file provides 2 implementations of memmove, one with rep movsb
and the other without rep movsb.  They share the same codes when size is
between 2 times of vector register size and REP_MOVSB_THRESHOLD which
is 2KB for 16-byte vector register size and scaled up by large vector
register size.

Key features:

1. Use overlapping load and store to avoid branch.
2. For size <= 8 times of vector register size, load  all sources into
registers and store them together.
3. If there is no address overlap bewteen source and destination, copy
from both ends with 4 times of vector register size at a time.
4. If address of destination > address of source, backward copy 8 times
of vector register size at a time.
5. Otherwise, forward copy 8 times of vector register size at a time.
6. Use rep movsb only for forward copy.  Avoid slow backward rep movsb
by fallbacking to backward copy 8 times of vector register size at a
time.
7. Skip when address of destination == address of source.

	[BZ #19776]
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
	memmove-avx512-unaligned-erms.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test
	__memmove_chk_avx512_unaligned_2,
	__memmove_chk_avx512_unaligned_erms,
	__memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
	__memmove_chk_sse2_unaligned_2,
	__memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
	__memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
	__memmove_avx512_unaligned_erms, __memmove_erms,
	__memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
	__memcpy_chk_avx512_unaligned_2,
	__memcpy_chk_avx512_unaligned_erms,
	__memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
	__memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
	__memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
	__memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
	__memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
	__memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
	__mempcpy_chk_avx512_unaligned_erms,
	__mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
	__mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
	__mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
	__mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
	__mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
	__mempcpy_erms.
	* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
	file.
	* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
	Likwise.
	* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
	Likwise.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
	Likwise.
2016-03-31 10:04:40 -07:00
..
aarch64 Add _STRING_INLINE_unaligned and string_private.h 2016-02-18 14:55:29 -02:00
alpha Update Alpha libm-test-ulps 2016-01-25 10:43:41 -08:00
arm Fix building glibc master with NDEBUG and --with-cpu. 2016-03-15 23:23:24 -04:00
generic hurd: Do not hide rtld symbols which need to be preempted 2016-03-20 19:51:42 +01:00
gnu Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
hppa hppa: fix dladdr [BZ #19415] 2016-01-08 02:19:26 -05:00
i386 Fix x86_64 / x86 powl inaccuracy for integer exponents (bug 19848). 2016-03-24 01:32:52 +00:00
ia64 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
ieee754 Fix ldbl-128ibm nearbyintl in non-default rounding modes (bug 19790). 2016-03-09 00:30:59 +00:00
init_array Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
m68k Add _STRING_INLINE_unaligned and string_private.h 2016-02-18 14:55:29 -02:00
mach hurd: Add c++-types expected result 2016-03-20 22:16:34 +01:00
microblaze Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
mips Fix MIPS64 memcpy regression. 2016-01-28 01:52:05 +00:00
nacl NaCl: Fix unused variable errors in lowlevellock-futex.h macros. 2016-01-20 13:57:14 -08:00
nios2 Maintainence patch for nios2: update ULPS file and localplt.data changes. 2016-01-21 22:58:03 -08:00
nptl New pthread_barrier algorithm to fulfill barrier destruction requirements. 2016-01-15 21:20:34 +01:00
posix Fix flag test in waitid compatibility layer 2016-03-13 21:44:09 +01:00
powerpc powerpc: Rearrange cfi_offset calls 2016-03-11 11:31:58 -03:00
pthread Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
s390 S390: Extend structs La_s390_regs / La_s390_retval with vector-registers. 2016-03-31 17:37:16 +02:00
sh Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
sparc Add _STRING_INLINE_unaligned and string_private.h 2016-02-18 14:55:29 -02:00
tile Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
unix [microblaze] Remove __ASSUME_FUTIMESAT. 2016-03-29 22:13:36 +00:00
wordsize-32 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
wordsize-64 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
x86 Initial Enhanced REP MOVSB/STOSB (ERMS) support 2016-03-28 19:23:31 -07:00
x86_64 Add x86-64 memmove with unaligned load/store and rep movsb 2016-03-31 10:04:40 -07:00