glibc/sysdeps/x86_64
Noah Goldstein 2d2493a644 x86: Use VMM API in memcmpeq-evex.S and minor changes
Changes to generated code are:
    1. In a few places use `vpcmpeqb` instead of `vpcmpneq` to save a
       byte of code size.
    2. Add a branch for length <= (VEC_SIZE * 6) as opposed to doing
       the entire block of [VEC_SIZE * 4 + 1, VEC_SIZE * 8] in a
       single basic-block (the space to add the extra branch without
       changing code size is bought with the above change).

Change (2) has roughly a 20-25% speedup for sizes in [VEC_SIZE * 4 +
1, VEC_SIZE * 6] and negligible to no-cost for [VEC_SIZE * 6 + 1,
VEC_SIZE * 8]

From N=10 runs on Tigerlake:

align1,align2 ,length ,result               ,New Time ,Cur Time ,New Time / Old Time
0     ,0      ,129    ,0                    ,5.404    ,6.887    ,0.785
0     ,0      ,129    ,1                    ,5.308    ,6.826    ,0.778
0     ,0      ,129    ,18446744073709551615 ,5.359    ,6.823    ,0.785
0     ,0      ,161    ,0                    ,5.284    ,6.827    ,0.774
0     ,0      ,161    ,1                    ,5.317    ,6.745    ,0.788
0     ,0      ,161    ,18446744073709551615 ,5.406    ,6.778    ,0.798

0     ,0      ,193    ,0                    ,6.804    ,6.802    ,1.000
0     ,0      ,193    ,1                    ,6.950    ,6.754    ,1.029
0     ,0      ,193    ,18446744073709551615 ,6.792    ,6.719    ,1.011
0     ,0      ,225    ,0                    ,6.625    ,6.699    ,0.989
0     ,0      ,225    ,1                    ,6.776    ,6.735    ,1.003
0     ,0      ,225    ,18446744073709551615 ,6.758    ,6.738    ,0.992
0     ,0      ,256    ,0                    ,5.402    ,5.462    ,0.989
0     ,0      ,256    ,1                    ,5.364    ,5.483    ,0.978
0     ,0      ,256    ,18446744073709551615 ,5.341    ,5.539    ,0.964

Rewriting with VMM API allows for memcmpeq-evex to be used with
evex512 by including "x86-evex512-vecs.h" at the top.

Complete check passes on x86-64.
2022-11-08 19:22:08 -08:00
..
64 Move architecture shlib-versions files to Linux-specific directories. 2014-07-17 14:31:12 +00:00
fpu x86: Remove .tfloat usage 2022-10-03 14:03:21 -03:00
multiarch x86: Use VMM API in memcmpeq-evex.S and minor changes 2022-11-08 19:22:08 -08:00
nptl elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
x32 Fix build with GCC 13 _FloatN, _FloatNx built-in functions 2022-10-31 23:20:08 +00:00
____longjmp_chk.S ____longjmp_chk is now OS-specific. 2009-07-30 21:42:27 -07:00
__longjmp.S Introduce <pointer_guard.h>, extracted from <sysdep.h> 2022-10-18 17:03:55 +02:00
_mcount.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
abort-instr.h
add_n.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
addmul_1.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
bsd-_setjmp.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
bsd-setjmp.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
configure x86/configure.ac: Define PI_STATIC_AND_HIDDEN/SUPPORT_STATIC_PIE 2022-02-14 07:34:54 -08:00
configure.ac x86/configure.ac: Define PI_STATIC_AND_HIDDEN/SUPPORT_STATIC_PIE 2022-02-14 07:34:54 -08:00
crti.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
crtn.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
dl-hwcaps-subdirs.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
dl-irel.h Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
dl-machine.h x86-64: Only define used SSE/AVX/AVX512 run-time resolvers 2022-06-27 14:17:52 -07:00
dl-procinfo.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
dl-runtime.h Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
dl-tls.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
dl-tls.h Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
dl-tlsdesc.h Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
dl-tlsdesc.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
dl-trampoline.h x86-64: Small improvements to dl-trampoline.S 2022-06-29 19:47:52 -07:00
dl-trampoline.S x86-64: Small improvements to dl-trampoline.S 2022-06-29 19:47:52 -07:00
ffs.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
ffsll.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
ifuncmain8.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
ifuncmod8.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
Implies Remove dbl-64/wordsize-64 (part 2) 2021-01-07 15:26:26 +00:00
isa-default-impl.h x86: Remove faulty sanity tests for RTLD build with no multiarch 2022-06-23 11:14:08 -07:00
isa.h Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
jmpbuf-offsets.h Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
jmpbuf-unwind.h Use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources 2022-10-18 17:04:10 +02:00
l10nflist.c Minor optimization of popcount in l10nflist 2011-08-11 14:07:04 -04:00
link-defines.sym elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT) 2021-10-11 11:14:02 -07:00
locale-defines.sym Implement optimized strcaecmp for x86-64. 2010-07-30 00:14:04 -07:00
localplt.data elf: Rework exception handling in the dynamic loader [BZ #25486] 2022-11-03 09:39:31 +01:00
lshift.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
machine-gmon.h Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
Makefile x86_64: Remove platform directory library loading test 2022-10-06 07:59:48 -03:00
memchr.S x86: Add support for compiling {raw|w}memchr with high ISA level 2022-06-22 19:41:35 -07:00
memcmp-isa-default-impl.h x86: Add support for building {w}memcmp{eq} with explicit ISA level 2022-07-05 16:42:42 -07:00
memcmp.S x86: Add support for building {w}memcmp{eq} with explicit ISA level 2022-07-05 16:42:42 -07:00
memcmpeq.S x86: Add support for building {w}memcmp{eq} with explicit ISA level 2022-07-05 16:42:42 -07:00
memcpy_chk.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
memcpy.S X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove 2016-06-08 13:58:08 -07:00
memmove_chk.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
memmove.S x86: Add support for building {w}memmove{_chk} with explicit ISA level 2022-07-05 16:42:42 -07:00
mempcpy_chk.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
mempcpy.S X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove 2016-06-08 13:58:08 -07:00
memrchr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
memset_chk.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
memset.S x86: Add support for building {w}memset{_chk} with explicit ISA level 2022-07-05 16:42:42 -07:00
mp_clz_tab.c * sysdeps/x86_64/mp_clz_tab.c: New file. 2009-04-15 04:30:41 +00:00
mul_1.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
preconfigure rename configure.in to configure.ac 2013-10-30 17:32:08 +10:00
preconfigure.ac rename configure.in to configure.ac 2013-10-30 17:32:08 +10:00
rawmemchr.S x86: Add support for compiling {raw|w}memchr with high ISA level 2022-06-22 19:41:35 -07:00
rshift.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
rtld-offsets.sym x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
setjmp.S Introduce <pointer_guard.h>, extracted from <sysdep.h> 2022-10-18 17:03:55 +02:00
stackguard-macros.h BZ #15754: CVE-2013-4788 2013-09-23 00:52:09 -04:00
stackinfo.h nptl: x86_64: Use same code for CURRENT_STACK_FRAME and stackinfo_get_sp 2022-08-31 09:04:27 -03:00
start.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
stpcpy.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
stpncpy.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strcasecmp_l-nonascii.c Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
strcasecmp_l.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strcasecmp.S Implement optimized strcaecmp for x86-64. 2010-07-30 00:14:04 -07:00
strcat.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strchr-isa-default-impl.h x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strchr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strchrnul.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strcmp.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strcpy.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strcspn-generic.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strcspn.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strlen.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strncase_l-nonascii.c Use locale_t, not __locale_t, throughout glibc 2017-06-20 20:30:06 -04:00
strncase_l.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strncase.S Add optimized strncasecmp versions for x86-64. 2010-08-14 22:04:01 -07:00
strncat.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strncmp.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strncpy.S x86: Add support to build st{p|r}{n}{cpy|cat} with explicit ISA level 2022-07-16 03:07:59 -07:00
strnlen.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strpbrk-generic.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strpbrk.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strrchr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
strspn-generic.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
strspn.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
sub_n.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
submul_1.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
sysdep.h x86-64: Move LP_SIZE definition to its own header 2022-10-18 17:02:08 +02:00
tls_get_addr.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tlsdesc.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tlsdesc.sym x86-64: Align the stack in __tls_get_addr [BZ #21609] 2017-07-06 04:43:20 -07:00
tst-audit3.c Modify several tests to use test-skeleton.c 2014-11-05 15:24:08 +05:30
tst-audit4-aux.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-audit4.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-audit5.c Modify several tests to use test-skeleton.c 2014-11-05 15:24:08 +05:30
tst-audit6.c Modify several tests to use test-skeleton.c 2015-07-15 15:10:23 +05:30
tst-audit7.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-audit10-aux.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-audit10.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-audit.h Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-auditmod3a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod3b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod4a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod4b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod5a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod5b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod6a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod6b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod6c.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod7a.c Move x86_64-specific audit tests to sysdeps/x86_64/. 2013-04-25 19:23:11 +00:00
tst-auditmod7b.c Add missing header files throughout the testsuite. 2017-02-16 17:33:18 -05:00
tst-auditmod10a.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-auditmod10b.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-avx512-aux.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-avx512.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-avx512mod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-avx-aux.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-avx.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-avxmod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-glibc-hwcaps.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-platform-1.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-platformmod-1.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-platformmod-2.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-quad1.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-quad1pie.c Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-quad2.c Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-quad2pie.c Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-quadmod1.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-quadmod1pie.S Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-quadmod2.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-quadmod2pie.S Handle R_X86_64_RELATIVE64 and R_X86_64_64 for x32 2012-05-10 17:05:06 -07:00
tst-rsi-strlen.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-rsi-wcslen.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-split-dynreloc.c Fix dynamic linker issue with bind-now 2015-08-19 05:37:01 -07:00
tst-split-dynreloc.lds Fix dynamic linker issue with bind-now 2015-08-19 05:37:01 -07:00
tst-sse.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
tst-ssemod.c x86-64: Verify that _dl_runtime_resolve preserves vector registers 2017-02-09 12:19:58 -08:00
tst-x86-64-tls-1.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
varshift.c x86: Add support for building str{c|p}{brk|spn} with explicit ISA level 2022-07-05 16:42:42 -07:00
Versions Move __fentry__ version definition to sysdeps/{i386,x86_64} 2018-08-10 09:07:44 +02:00
wcschr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcscmp.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcscpy-generic.c x86: Add support to build wcscpy with explicit ISA level 2022-07-16 03:07:59 -07:00
wcscpy.S x86: Add support to build wcscpy with explicit ISA level 2022-07-16 03:07:59 -07:00
wcslen.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsncmp-generic.c x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsncmp.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsnlen-generic.c x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsnlen.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wcsrchr.S x86: Add support to build strcmp/strlen/strchr with explicit ISA level 2022-07-16 03:07:59 -07:00
wmemchr.S x86: Add support for compiling {raw|w}memchr with high ISA level 2022-06-22 19:41:35 -07:00
wmemcmp.S x86: Add support for building {w}memcmp{eq} with explicit ISA level 2022-07-05 16:42:42 -07:00
wmemset_chk.S Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
wmemset.S x86-64: Optimize wmemset with SSE2/AVX2/AVX512 2017-06-05 11:09:59 -07:00
wordcopy.c X86-64: Add dummy memcopy.h and wordcopy.c 2016-06-09 04:38:34 -07:00
x86-lp_size.h x86-64: Move LP_SIZE definition to its own header 2022-10-18 17:02:08 +02:00