glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-21 20:40:05 +00:00

History

Noah Goldstein 483443d321 x86/string: Fixup alignment of main loop in str{n}cmp-evex [BZ #32212 ] The loop should be aligned to 32-bytes so that it can ideally run out the DSB. This is particularly important on Skylake-Server where deficiencies in it's DSB implementation make it prone to not being able to run loops out of the DSB. For example running strcmp-evex on 200Mb string: 32-byte aligned loop: - 43,399,578,766 idq.dsb_uops not 32-byte aligned loop: - 6,060,139,704 idq.dsb_uops This results in a 25% performance degradation for the non-aligned version. The fix is to just ensure the code layout is such that the loop is aligned. (Which was previously the case but was accidentally dropped in `84e7c46df`). NB: The fix was actually 64-byte alignment. This is because 64-byte alignment generally produces more stable performance than 32-byte aligned code (cache line crosses can affect perf), so if we are going past 16-byte alignmnent, might as well go to 64. 64-byte alignment also matches most other functions we over-align, so it creates a common point of optimization. Times are reported as ratio of Time_With_Patch / Time_Without_Patch. Lower is better. The values being reported is the geometric mean of the ratio across all tests in bench-strcmp and bench-strncmp. Note this patch is only attempting to improve the Skylake-Server strcmp for long strings. The rest of the numbers are only to test for regressions. Tigerlake Results Strings <= 512: strcmp : 1.026 strncmp: 0.949 Tigerlake Results Strings > 512: strcmp : 0.994 strncmp: 0.998 Skylake-Server Results Strings <= 512: strcmp : 0.945 strncmp: 0.943 Skylake-Server Results Strings > 512: strcmp : 0.778 strncmp: 1.000 The 2.6% regression on TGL-strcmp is due to slowdowns caused by changes in alignment of code handling small sizes (most on the page-cross logic). These should be safe to ignore because 1) We previously only 16-byte aligned the function so this behavior is not new and was essentially up to chance before this patch and 2) this type of alignment related regression on small sizes really only comes up in tight micro-benchmark loops and is unlikely to have any affect on realworld performance. Reviewed-by: H.J. Lu <hjl.tools@gmail.com>		2024-09-30 07:40:40 -07:00
..
aarch64	AArch64: Simplify rounding-multiply pattern in several AdvSIMD routines	2024-09-23 15:44:08 +01:00
alpha	math: Update alpha ulps	2024-07-14 12:44:15 +02:00
arc	arc: Cleanup arcbe	2024-09-25 15:54:07 +01:00
arm	arm: Regenerate ULPs	2024-08-07 11:02:03 -03:00
csky	elf: Remove HWCAP_IMPORTANT	2024-06-18 10:45:36 +02:00
generic	nptl: Fix Race conditions in pthread cancellation [BZ#12683]	2024-08-23 14:27:43 -03:00
gnu	sysdeps: Re-flow and sort multiline gnu/Makefile definitions	2024-08-07 11:02:03 -03:00
hppa	hppa: Update libm-test-ulps	2024-09-09 09:57:42 -04:00
htl	hurd: Fix missing pthread_ compat symbol in libc	2024-08-01 23:58:51 +02:00
hurd	hurd: Move internal functions to internal header	2024-03-23 22:43:07 +01:00
i386	i386: Update ulps	2024-09-05 22:25:55 +02:00
ieee754	Convert to autoconf 2.72 (vanilla release, no distribution patches)	2024-06-17 21:15:28 +02:00
loongarch	LoongArch: Fix macro redefined warning in tls-desc.S	2024-09-06 15:46:13 +08:00
m68k	math: Update m68k ULPs	2024-07-08 21:51:03 +02:00
mach	hurd: Avoid file_check_access () RPC for access (F_OK)	2024-09-19 14:18:39 +02:00
microblaze	Implement C23 logp1	2024-06-17 13:47:09 +00:00
mips	MIPS: Regenerate ULPs	2024-08-08 14:53:53 +02:00
nios2	Convert to autoconf 2.72 (vanilla release, no distribution patches)	2024-06-17 21:15:28 +02:00
nptl	Linux: Block signals around _Fork (bug 32215)	2024-09-28 09:44:25 +02:00
or1k	Implement C23 logp1	2024-06-17 13:47:09 +00:00
posix	Fix missing randomness in __gen_tempname (bug 32214)	2024-09-26 11:45:44 +02:00
powerpc	powerpc64le: Build new strtod tests with long double ABI flags (bug 32145)	2024-09-05 22:02:23 +02:00
pthread	nptl: Fix Race conditions in pthread cancellation [BZ#12683]	2024-08-23 14:27:43 -03:00
riscv	RISC-V: Regenerate ULPs	2024-08-08 14:53:55 +02:00
s390	s390x: Update ulps	2024-08-08 13:01:02 +02:00
sh	nptl: Fix Race conditions in pthread cancellation [BZ#12683]	2024-08-23 14:27:43 -03:00
sparc	sparc: Regenerate ULPs	2024-08-07 11:02:03 -03:00
unix	arc: Cleanup arcbe	2024-09-25 15:54:07 +01:00
wordsize-32	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
wordsize-64	Update copyright dates with scripts/update-copyrights	2024-01-01 10:53:40 -08:00
x86	x86: Enable non-temporal memset for Hygon processors	2024-08-26 10:01:58 -07:00
x86_64	x86/string: Fixup alignment of main loop in str{n}cmp-evex [BZ #32212 ]	2024-09-30 07:40:40 -07:00