glibc/sysdeps
Noah Goldstein 2d2493a644 x86: Use VMM API in memcmpeq-evex.S and minor changes
Changes to generated code are:
    1. In a few places use `vpcmpeqb` instead of `vpcmpneq` to save a
       byte of code size.
    2. Add a branch for length <= (VEC_SIZE * 6) as opposed to doing
       the entire block of [VEC_SIZE * 4 + 1, VEC_SIZE * 8] in a
       single basic-block (the space to add the extra branch without
       changing code size is bought with the above change).

Change (2) has roughly a 20-25% speedup for sizes in [VEC_SIZE * 4 +
1, VEC_SIZE * 6] and negligible to no-cost for [VEC_SIZE * 6 + 1,
VEC_SIZE * 8]

From N=10 runs on Tigerlake:

align1,align2 ,length ,result               ,New Time ,Cur Time ,New Time / Old Time
0     ,0      ,129    ,0                    ,5.404    ,6.887    ,0.785
0     ,0      ,129    ,1                    ,5.308    ,6.826    ,0.778
0     ,0      ,129    ,18446744073709551615 ,5.359    ,6.823    ,0.785
0     ,0      ,161    ,0                    ,5.284    ,6.827    ,0.774
0     ,0      ,161    ,1                    ,5.317    ,6.745    ,0.788
0     ,0      ,161    ,18446744073709551615 ,5.406    ,6.778    ,0.798

0     ,0      ,193    ,0                    ,6.804    ,6.802    ,1.000
0     ,0      ,193    ,1                    ,6.950    ,6.754    ,1.029
0     ,0      ,193    ,18446744073709551615 ,6.792    ,6.719    ,1.011
0     ,0      ,225    ,0                    ,6.625    ,6.699    ,0.989
0     ,0      ,225    ,1                    ,6.776    ,6.735    ,1.003
0     ,0      ,225    ,18446744073709551615 ,6.758    ,6.738    ,0.992
0     ,0      ,256    ,0                    ,5.402    ,5.462    ,0.989
0     ,0      ,256    ,1                    ,5.364    ,5.483    ,0.978
0     ,0      ,256    ,18446744073709551615 ,5.341    ,5.539    ,0.964

Rewriting with VMM API allows for memcmpeq-evex to be used with
evex512 by including "x86-evex512-vecs.h" at the top.

Complete check passes on x86-64.
2022-11-08 19:22:08 -08:00
..
aarch64 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
alpha elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
arc elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
arm configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
csky elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
generic elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
gnu errlist: add missing entry for EDEADLOCK (bug 29545) 2022-09-08 11:40:24 +02:00
hppa elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
htl htl: Make pthread*_cond_timedwait register wref before releasing mutex 2022-08-22 22:27:24 +02:00
hurd hurd: Fix pthread_kill on exiting/ted thread 2022-01-15 15:11:54 +01:00
i386 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
ia64 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
ieee754 Fix build with GCC 13 _FloatN, _FloatNx built-in functions 2022-10-31 23:20:08 +00:00
loongarch elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
m68k elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
mach hurd: Add sigtimedwait and sigwaitinfo support 2022-11-07 21:16:26 +01:00
microblaze elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
mips elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
nios2 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
nptl Use atomic_exchange_release/acquire 2022-09-26 16:58:08 +01:00
or1k elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
posix get_nscd_addresses: Fix subscript typos [BZ #29605] 2022-09-28 12:47:10 -04:00
powerpc elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
pthread Do not define static_assert or thread_local in headers for C2x 2022-09-07 18:39:28 +00:00
riscv elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
s390 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
sh elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
sparc elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
unix Linux: Add ppoll fortify symbol for 64 bit time_t (BZ# 29746) 2022-11-08 13:37:06 -03:00
wordsize-32 Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
wordsize-64 configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
x86 elf: Remove _dl_string_hwcap 2022-10-06 07:59:48 -03:00
x86_64 x86: Use VMM API in memcmpeq-evex.S and minor changes 2022-11-08 19:22:08 -08:00