glibc/sysdeps
Noah Goldstein 2f9062d717 x86: Shrink memcmp-sse4.S code size
No bug.

This implementation refactors memcmp-sse4.S primarily with minimizing
code size in mind. It does this by removing the lookup table logic and
removing the unrolled check from (256, 512] bytes.

memcmp-sse4 code size reduction : -3487 bytes
wmemcmp-sse4 code size reduction: -1472 bytes

The current memcmp-sse4.S implementation has a large code size
cost. This has serious adverse affects on the ICache / ITLB. While
in micro-benchmarks the implementations appears fast, traces of
real-world code have shown that the speed in micro benchmarks does not
translate when the ICache/ITLB are not primed, and that the cost
of the code size has measurable negative affects on overall
application performance.

See https://research.google/pubs/pub48320/ for more details.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-10 20:12:10 -06:00
..
aarch64 String: Add hidden defs for __memcmpeq() to enable internal usage 2021-10-26 16:51:29 -05:00
alpha elf: Fix dynamic-link.h usage on rtld.c 2021-10-14 14:52:07 -03:00
arc elf: Fix dynamic-link.h usage on rtld.c 2021-10-14 14:52:07 -03:00
arm arm: Use have-mtls-dialect-gnu2 to check for ARM TLS descriptors support 2021-11-01 16:23:15 -03:00
csky String: Add hidden defs for __memcmpeq() to enable internal usage 2021-10-26 16:51:29 -05:00
generic elf: Use the minimal malloc on tunables_strdup 2021-11-09 14:11:25 -03:00
gnu Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
hppa elf: Fix dynamic-link.h usage on rtld.c 2021-10-14 14:52:07 -03:00
htl htl: Reimplement GSCOPE 2021-09-16 01:04:17 +02:00
hurd Update copyright dates with scripts/update-copyrights 2021-01-02 12:17:34 -08:00
i386 String: Add hidden defs for __memcmpeq() to enable internal usage 2021-10-26 16:51:29 -05:00
ia64 String: Add hidden defs for __memcmpeq() to enable internal usage 2021-10-26 16:51:29 -05:00
ieee754 Fixed inaccuracy of j0f (BZ #28185) 2021-10-05 13:45:37 +02:00
m68k elf: Fix dynamic-link.h usage on rtld.c 2021-10-14 14:52:07 -03:00
mach Fix build a chec failures after b05fae4d8e 2021-11-09 23:21:22 -03:00
microblaze elf: Fix dynamic-link.h usage on rtld.c 2021-10-14 14:52:07 -03:00
mips ld.so: Initialize bootstrap_map.l_ld_readonly [BZ #28340] 2021-10-19 06:40:38 -07:00
nios2 elf: Fix dynamic-link.h usage on rtld.c 2021-10-14 14:52:07 -03:00
nptl nptl: Use FUTEX_LOCK_PI2 when available 2021-10-01 08:09:13 -03:00
posix posix: Remove spawni.c 2021-09-27 12:44:25 -03:00
powerpc [powerpc] Tighten contraints for asm constant parameters 2021-11-03 09:17:28 -05:00
pthread elf: Avoid deadlock between pthread_create and ctors [BZ #28357] 2021-10-04 15:07:05 +01:00
riscv riscv: Build with -mno-relax if linker does not support R_RISCV_ALIGN 2021-11-03 09:25:06 -03:00
s390 s390: Use long branches across object boundaries (jgh instead of jh) 2021-11-10 15:21:37 +01:00
sh elf: Fix dynamic-link.h usage on rtld.c 2021-10-14 14:52:07 -03:00
sparc String: Add hidden defs for __memcmpeq() to enable internal usage 2021-10-26 16:51:29 -05:00
unix Update syscall lists for Linux 5.15 2021-11-10 15:21:19 +00:00
wordsize-32 Disable symbol hack in libc_nonshared.a 2021-09-27 07:46:25 -07:00
wordsize-64 Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
x86 x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h 2021-11-06 16:18:08 -05:00
x86_64 x86: Shrink memcmp-sse4.S code size 2021-11-10 20:12:10 -06:00