glibc/sysdeps
H.J. Lu f35ad30da4 x86-64: Improve EVEX strcmp with masked load
In strcmp-evex.S, to compare 2 32-byte strings, replace

        VMOVU   (%rdi, %rdx), %YMM0
        VMOVU   (%rsi, %rdx), %YMM1
        /* Each bit in K0 represents a mismatch in YMM0 and YMM1.  */
        VPCMP   $4, %YMM0, %YMM1, %k0
        VPCMP   $0, %YMMZERO, %YMM0, %k1
        VPCMP   $0, %YMMZERO, %YMM1, %k2
        /* Each bit in K1 represents a NULL in YMM0 or YMM1.  */
        kord    %k1, %k2, %k1
        /* Each bit in K1 represents a NULL or a mismatch.  */
        kord    %k0, %k1, %k1
        kmovd   %k1, %ecx
        testl   %ecx, %ecx
        jne     L(last_vector)

with

        VMOVU   (%rdi, %rdx), %YMM0
        VPTESTM %YMM0, %YMM0, %k2
        /* Each bit cleared in K1 represents a mismatch or a null CHAR
           in YMM0 and 32 bytes at (%rsi, %rdx).  */
        VPCMP   $0, (%rsi, %rdx), %YMM0, %k1{%k2}
        kmovd   %k1, %ecx
        incl    %ecx
        jne     L(last_vector)

It makes EVEX strcmp faster than AVX2 strcmp by up to 40% on Tiger Lake
and Ice Lake.

Co-Authored-By: Noah Goldstein <goldstein.w.n@gmail.com>
(cherry picked from commit c46e9afb2d)
2022-04-26 18:18:16 -07:00
..
aarch64 elf: Fix runtime linker auditing on aarch64 (BZ #26643) 2022-04-12 13:33:10 -04:00
alpha elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
arc elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
arm elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
csky elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
generic elf: Fix runtime linker auditing on aarch64 (BZ #26643) 2022-04-12 13:33:10 -04:00
gnu hurd: Fix glob lstat compatibility 2021-07-22 20:31:52 +02:00
hppa hppa: Fix bind-now audit (BZ #28857) 2022-04-12 13:33:17 -04:00
htl htl: Do not expose pthread hidden proto outside libpthread 2021-07-18 20:25:33 +00:00
hurd Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
i386 elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
ia64 elf: Issue la_symbind for bind-now (BZ #23734) 2022-04-12 13:32:59 -04:00
ieee754 Update math: redirect roundeven function 2021-06-27 07:56:57 -07:00
m68k elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
mach hurd if_index: Explicitly use AF_INET for if index discovery 2022-02-03 16:22:04 +01:00
microblaze elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
mips elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
nios2 elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
nptl nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) 2022-04-15 09:52:54 -03:00
posix getcwd: Set errno to ERANGE for size == 1 (CVE-2021-3999) 2022-01-24 11:37:06 +05:30
powerpc elf: Issue la_symbind for bind-now (BZ #23734) 2022-04-12 13:32:59 -04:00
pthread nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) 2022-04-15 09:52:54 -03:00
riscv elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
s390 S390: Add new s390 platform z16. 2022-04-14 14:21:57 +02:00
sh elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
sparc elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
unix Default to --with-default-link=no (bug 25812) 2022-04-22 11:31:14 +02:00
wordsize-32 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
wordsize-64 Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
x86 x86: Modify ENTRY in sysdep.h so that p2align can be specified 2022-04-26 18:18:16 -07:00
x86_64 x86-64: Improve EVEX strcmp with masked load 2022-04-26 18:18:16 -07:00