glibc/sysdeps
Noah Goldstein ffe75982cc x86: Remove memcmp-sse4.S
Code didn't actually use any sse4 instructions since `ptest` was
removed in:

commit 2f9062d717
Author: Noah Goldstein <goldstein.w.n@gmail.com>
Date:   Wed Nov 10 16:18:56 2021 -0600

    x86: Shrink memcmp-sse4.S code size

The new memcmp-sse2 implementation is also faster.

geometric_mean(N=20) of page cross cases SSE2 / SSE4: 0.905

Note there are two regressions preferring SSE2 for Size = 1 and Size =
65.

Size = 1:
size, align0, align1, ret, New Time/Old Time
   1,      1,      1,   0,               1.2
   1,      1,      1,   1,             1.197
   1,      1,      1,  -1,               1.2

This is intentional. Size == 1 is significantly less hot based on
profiles of GCC11 and Python3 than sizes [4, 8] (which is made
hotter).

Python3 Size = 1        -> 13.64%
Python3 Size = [4, 8]   -> 60.92%

GCC11   Size = 1        ->  1.29%
GCC11   Size = [4, 8]   -> 33.86%

size, align0, align1, ret, New Time/Old Time
   4,      4,      4,   0,             0.622
   4,      4,      4,   1,             0.797
   4,      4,      4,  -1,             0.805
   5,      5,      5,   0,             0.623
   5,      5,      5,   1,             0.777
   5,      5,      5,  -1,             0.802
   6,      6,      6,   0,             0.625
   6,      6,      6,   1,             0.813
   6,      6,      6,  -1,             0.788
   7,      7,      7,   0,             0.625
   7,      7,      7,   1,             0.799
   7,      7,      7,  -1,             0.795
   8,      8,      8,   0,             0.625
   8,      8,      8,   1,             0.848
   8,      8,      8,  -1,             0.914
   9,      9,      9,   0,             0.625

Size = 65:
size, align0, align1, ret, New Time/Old Time
  65,      0,      0,   0,             1.103
  65,      0,      0,   1,             1.216
  65,      0,      0,  -1,             1.227
  65,     65,      0,   0,             1.091
  65,      0,     65,   1,              1.19
  65,     65,     65,  -1,             1.215

This is because A) the checks in range [65, 96] are now unrolled 2x
and B) because smaller values <= 16 are now given a hotter path. By
contrast the SSE4 version has a branch for Size = 80. The unrolled
version has get better performance for returns which need both
comparisons.

size, align0, align1, ret, New Time/Old Time
 128,      4,      8,   0,             0.858
 128,      4,      8,   1,             0.879
 128,      4,      8,  -1,             0.888

As well, out of microbenchmark environments that are not full
predictable the branch will have a real-cost.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>

(cherry picked from commit 7cbc03d030)
2022-05-16 18:55:16 -07:00
..
aarch64 elf: Fix runtime linker auditing on aarch64 (BZ #26643) 2022-04-12 13:33:10 -04:00
alpha elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
arc elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
arm elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
csky elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
generic elf: Fix runtime linker auditing on aarch64 (BZ #26643) 2022-04-12 13:33:10 -04:00
gnu hurd: Fix glob lstat compatibility 2021-07-22 20:31:52 +02:00
hppa hppa: Fix bind-now audit (BZ #28857) 2022-04-12 13:33:17 -04:00
htl htl: Do not expose pthread hidden proto outside libpthread 2021-07-18 20:25:33 +00:00
hurd
i386 i386: Regenerate ulps 2022-04-27 21:20:43 -04:00
ia64 elf: Issue la_symbind for bind-now (BZ #23734) 2022-04-12 13:32:59 -04:00
ieee754 Update math: redirect roundeven function 2021-06-27 07:56:57 -07:00
m68k elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
mach hurd if_index: Explicitly use AF_INET for if index discovery 2022-02-03 16:22:04 +01:00
microblaze elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
mips elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
nios2 elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
nptl nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) 2022-04-15 09:52:54 -03:00
posix getcwd: Set errno to ERANGE for size == 1 (CVE-2021-3999) 2022-01-24 11:37:06 +05:30
powerpc elf: Issue la_symbind for bind-now (BZ #23734) 2022-04-12 13:32:59 -04:00
pthread nptl: Handle spurious EINTR when thread cancellation is disabled (BZ#29029) 2022-04-15 09:52:54 -03:00
riscv elf: Fix dynamic-link.h usage on rtld.c 2022-04-08 14:18:11 -04:00
s390 S390: Add new s390 platform z16. 2022-04-14 14:21:57 +02:00
sh elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
sparc elf: Add _dl_audit_pltexit 2022-04-08 14:18:12 -04:00
unix Add HWCAP2_AFP, HWCAP2_RPRES from Linux 5.17 to AArch64 bits/hwcap.h 2022-05-03 11:08:52 +02:00
wordsize-32
wordsize-64
x86 x86: Improve L to support L(XXX_SYMBOL (YYY, ZZZ)) 2022-05-16 18:52:19 -07:00
x86_64 x86: Remove memcmp-sse4.S 2022-05-16 18:55:16 -07:00