2022-07-05 19:41:07 +00:00
|
|
|
#include <isa-level.h>
|
|
|
|
|
|
|
|
#if ISA_SHOULD_BUILD (3)
|
|
|
|
|
2016-04-03 21:32:20 +00:00
|
|
|
# define VEC_SIZE 32
|
|
|
|
# define VEC(i) ymm##i
|
2016-04-12 15:10:31 +00:00
|
|
|
# define VMOVNT vmovntdq
|
2016-04-03 21:32:20 +00:00
|
|
|
# define VMOVU vmovdqu
|
|
|
|
# define VMOVA vmovdqa
|
2021-11-01 05:49:51 +00:00
|
|
|
# define MOV_SIZE 4
|
2022-07-05 19:41:07 +00:00
|
|
|
|
2016-04-03 21:32:20 +00:00
|
|
|
# define SECTION(p) p##.avx
|
2022-07-05 19:41:07 +00:00
|
|
|
|
|
|
|
# ifndef MEMMOVE_SYMBOL
|
|
|
|
# define MEMMOVE_SYMBOL(p,s) p##_avx_##s
|
|
|
|
# endif
|
Add x86-64 memmove with unaligned load/store and rep movsb
Implement x86-64 memmove with unaligned load/store and rep movsb.
Support 16-byte, 32-byte and 64-byte vector register sizes. When
size <= 8 times of vector register size, there is no check for
address overlap bewteen source and destination. Since overhead for
overlap check is small when size > 8 times of vector register size,
memcpy is an alias of memmove.
A single file provides 2 implementations of memmove, one with rep movsb
and the other without rep movsb. They share the same codes when size is
between 2 times of vector register size and REP_MOVSB_THRESHOLD which
is 2KB for 16-byte vector register size and scaled up by large vector
register size.
Key features:
1. Use overlapping load and store to avoid branch.
2. For size <= 8 times of vector register size, load all sources into
registers and store them together.
3. If there is no address overlap bewteen source and destination, copy
from both ends with 4 times of vector register size at a time.
4. If address of destination > address of source, backward copy 8 times
of vector register size at a time.
5. Otherwise, forward copy 8 times of vector register size at a time.
6. Use rep movsb only for forward copy. Avoid slow backward rep movsb
by fallbacking to backward copy 8 times of vector register size at a
time.
7. Skip when address of destination == address of source.
[BZ #19776]
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
memmove-avx512-unaligned-erms.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Test
__memmove_chk_avx512_unaligned_2,
__memmove_chk_avx512_unaligned_erms,
__memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
__memmove_chk_sse2_unaligned_2,
__memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
__memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
__memmove_avx512_unaligned_erms, __memmove_erms,
__memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
__memcpy_chk_avx512_unaligned_2,
__memcpy_chk_avx512_unaligned_erms,
__memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
__memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
__memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
__memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
__memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
__memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
__mempcpy_chk_avx512_unaligned_erms,
__mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
__mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
__mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
__mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
__mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
__mempcpy_erms.
* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
file.
* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
Likwise.
* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
Likwise.
* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
Likwise.
2016-03-31 17:04:26 +00:00
|
|
|
|
2016-04-03 21:32:20 +00:00
|
|
|
# include "memmove-vec-unaligned-erms.S"
|
2022-07-05 19:41:07 +00:00
|
|
|
|
|
|
|
# if MINIMUM_X86_ISA_LEVEL == 3
|
|
|
|
# include "memmove-shlib-compat.h"
|
|
|
|
# endif
|
2016-04-03 21:32:20 +00:00
|
|
|
#endif
|