glibc/sysdeps/x86_64
Noah Goldstein e59ced2384 x86: Optimize memset-vec-unaligned-erms.S
No bug.

Optimization are

1. change control flow for L(more_2x_vec) to fall through to loop and
   jump for L(less_4x_vec) and L(less_8x_vec). This uses less code
   size and saves jumps for length > 4x VEC_SIZE.

2. For EVEX/AVX512 move L(less_vec) closer to entry.

3. Avoid complex address mode for length > 2x VEC_SIZE

4. Slightly better aligning code for the loop from the perspective of
   code size and uops.

5. Align targets so they make full use of their fetch block and if
   possible cache line.

6. Try and reduce total number of icache lines that will need to be
   pulled in for a given length.

7. Include "local" version of stosb target. For AVX2/EVEX/AVX512
   jumping to the stosb target in the sse2 code section will almost
   certainly be to a new page. The new version does increase code size
   marginally by duplicating the target but should get better iTLB
   behavior as a result.

test-memset, test-wmemset, and test-bzero are all passing.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-10-12 13:38:02 -05:00
..
64 Move architecture shlib-versions files to Linux-specific directories. 2014-07-17 14:31:12 +00:00
fpu Add narrowing fma functions 2021-09-22 21:25:31 +00:00
multiarch x86: Optimize memset-vec-unaligned-erms.S 2021-10-12 13:38:02 -05:00
nptl elf: Remove THREAD_GSCOPE_IN_TCB 2021-09-16 01:04:20 +02:00
x32 mcheck: Align struct hdr to MALLOC_ALIGNMENT bytes [BZ #28068] 2021-07-12 18:13:32 -07:00
____longjmp_chk.S
__longjmp.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
_mcount.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
abort-instr.h
add_n.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
addmul_1.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
bsd-_setjmp.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
bsd-setjmp.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
bzero.S
configure elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT) 2021-10-11 11:14:02 -07:00
configure.ac elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT) 2021-10-11 11:14:02 -07:00
crti.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
crtn.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-hwcaps-subdirs.c <sys/platform/x86.h>: Remove the C preprocessor magic 2021-01-21 05:58:17 -08:00
dl-irel.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-machine.h elf: Fix elf_get_dynamic_info definition 2021-10-12 13:25:43 -03:00
dl-procinfo.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-runtime.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-tls.c elf: Use relaxed atomics for racy accesses [BZ #19329] 2021-05-11 17:16:37 +01:00
dl-tls.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
dl-tlsdesc.h x86_64: Remove lazy tlsdesc relocation related code 2021-04-15 09:47:47 +01:00
dl-tlsdesc.S x86_64: Remove lazy tlsdesc relocation related code 2021-04-15 09:47:47 +01:00
dl-trampoline.h elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT) 2021-10-11 11:14:02 -07:00
dl-trampoline.S elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT) 2021-10-11 11:14:02 -07:00
ffs.c Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
ffsll.c Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
htonl.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
ifuncmain8.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
ifuncmod8.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
Implies Remove dbl-64/wordsize-64 (part 2) 2021-01-07 15:26:26 +00:00
isa.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
jmpbuf-offsets.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
jmpbuf-unwind.h Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
l10nflist.c
link-defines.sym elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT) 2021-10-11 11:14:02 -07:00
locale-defines.sym
localplt.data mtrace: Wean away from malloc hooks 2021-07-22 18:38:06 +05:30
lshift.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
machine-gmon.h Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
Makefile x86-64: Remove compiler -mavx512f check 2021-08-24 07:05:35 -07:00
memchr.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
memcmp.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
memcpy_chk.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
memcpy.S X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove 2016-06-08 13:58:08 -07:00
memmove_chk.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
memmove.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
mempcpy_chk.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
mempcpy.S X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove 2016-06-08 13:58:08 -07:00
memrchr.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
memset_chk.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
memset.S x86: Optimize memset-vec-unaligned-erms.S 2021-10-12 13:38:02 -05:00
memusage.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
mp_clz_tab.c
mul_1.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
preconfigure rename configure.in to configure.ac 2013-10-30 17:32:08 +10:00
preconfigure.ac rename configure.in to configure.ac 2013-10-30 17:32:08 +10:00
rawmemchr.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
rshift.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
rtld-offsets.sym x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
setjmp.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
stackguard-macros.h BZ #15754: CVE-2013-4788 2013-09-23 00:52:09 -04:00
stackinfo.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
start.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
stpcpy.S
strcasecmp_l-nonascii.c Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
strcasecmp_l.S
strcasecmp.S
strcat.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
strchr.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
strchrnul.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
strcmp.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
strcpy.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
strcspn.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
strlen.S x86-64: Move strlen.S to multiarch/strlen-vec.S 2021-06-23 10:24:35 -07:00
strncase_l-nonascii.c Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
strncase_l.S
strncase.S
strncmp.S
strnlen.S
strpbrk.S x86-64: Implement strcspn/strpbrk/strspn IFUNC selectors in C 2017-06-15 08:59:05 -07:00
strrchr.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
strspn.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
sub_n.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
submul_1.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
sysdep.h x86-64: Add AVX optimized string/memory functions for RTM 2021-03-29 07:40:17 -07:00
tls_get_addr.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tlsdesc.c elf: Remove lazy tlsdesc relocation related code 2021-04-21 14:35:53 +01:00
tlsdesc.sym x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
tst-audit3.c Modify several tests to use test-skeleton.c 2014-11-05 15:24:08 +05:30
tst-audit4-aux.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-audit4.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-audit5.c Modify several tests to use test-skeleton.c 2014-11-05 15:24:08 +05:30
tst-audit6.c Modify several tests to use test-skeleton.c 2015-07-15 15:10:23 +05:30
tst-audit7.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-audit10-aux.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-audit10.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-audit.h Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-auditmod3a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod3b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod4a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod4b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod5a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod5b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod6a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod6b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod6c.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod7a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod7b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod10a.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-auditmod10b.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-avx512-aux.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-avx512.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-avx512mod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-avx-aux.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-avx.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-avxmod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-glibc-hwcaps.c <sys/platform/x86.h>: Remove the C preprocessor magic 2021-01-21 05:58:17 -08:00
tst-platform-1.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-platformmod-1.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-platformmod-2.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-quad1.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-quad1pie.c
tst-quad2.c
tst-quad2pie.c
tst-quadmod1.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-quadmod1pie.S
tst-quadmod2.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-quadmod2pie.S
tst-rsi-strlen.c x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064] 2021-07-08 18:55:40 -04:00
tst-rsi-wcslen.c x86-64: Test strlen and wcslen with 0 in the RSI register [BZ #28064] 2021-07-08 18:55:40 -04:00
tst-split-dynreloc.c Fix dynamic linker issue with bind-now 2015-08-19 05:37:01 -07:00
tst-split-dynreloc.lds Fix dynamic linker issue with bind-now 2015-08-19 05:37:01 -07:00
tst-sse.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-ssemod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-x86_64-1.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-x86_64mod-1.c Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
tst-x86-64-tls-1.c x86_64: Correct THREAD_SETMEM/THREAD_SETMEM_NC for movq [BZ #27591] 2021-04-01 07:00:22 -07:00
Versions Move __fentry__ version definition to sysdeps/{i386,x86_64} 2018-08-10 09:07:44 +02:00
wcschr.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
wcscmp.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
wcslen.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
wcsrchr.S Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
wmemset_chk.S Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
wmemset.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
wordcopy.c X86-64: Add dummy memcopy.h and wordcopy.c 2016-06-09 04:38:34 -07:00