Go to file
Noah Goldstein 483443d321 x86/string: Fixup alignment of main loop in str{n}cmp-evex [BZ #32212]
The loop should be aligned to 32-bytes so that it can ideally run out
the DSB. This is particularly important on Skylake-Server where
deficiencies in it's DSB implementation make it prone to not being
able to run loops out of the DSB.

For example running strcmp-evex on 200Mb string:

32-byte aligned loop:
    - 43,399,578,766      idq.dsb_uops
not 32-byte aligned loop:
    - 6,060,139,704       idq.dsb_uops

This results in a 25% performance degradation for the non-aligned
version.

The fix is to just ensure the code layout is such that the loop is
aligned. (Which was previously the case but was accidentally dropped
in 84e7c46df).

NB: The fix was actually 64-byte alignment. This is because 64-byte
alignment generally produces more stable performance than 32-byte
aligned code (cache line crosses can affect perf), so if we are going
past 16-byte alignmnent, might as well go to 64. 64-byte alignment
also matches most other functions we over-align, so it creates a
common point of optimization.

Times are reported as ratio of Time_With_Patch /
Time_Without_Patch. Lower is better.

The values being reported is the geometric mean of the ratio across
all tests in bench-strcmp and bench-strncmp.

Note this patch is only attempting to improve the Skylake-Server
strcmp for long strings. The rest of the numbers are only to test for
regressions.

Tigerlake Results Strings <= 512:
    strcmp : 1.026
    strncmp: 0.949

Tigerlake Results Strings > 512:
    strcmp : 0.994
    strncmp: 0.998

Skylake-Server Results Strings <= 512:
    strcmp : 0.945
    strncmp: 0.943

Skylake-Server Results Strings > 512:
    strcmp : 0.778
    strncmp: 1.000

The 2.6% regression on TGL-strcmp is due to slowdowns caused by
changes in alignment of code handling small sizes (most on the
page-cross logic). These should be safe to ignore because 1) We
previously only 16-byte aligned the function so this behavior is not
new and was essentially up to chance before this patch and 2) this
type of alignment related regression on small sizes really only comes
up in tight micro-benchmark loops and is unlikely to have any affect
on realworld performance.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2024-09-30 07:40:40 -07:00
advisories Document CVE-2024-33599, CVE-2024-33600, CVE-2024-33601, CVE-2024-33602 2024-05-06 15:12:31 -04:00
argp Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
assert assert: Mark __assert_fail as cold 2024-07-26 20:41:00 +08:00
benchtests benchtests: Add random memset benchmark 2024-08-07 14:58:46 +01:00
bits AArch64: Add vector logp1 alias for log1p 2024-09-19 17:53:34 +01:00
catgets Fix conditionals on mtrace-based tests (bug 31892) 2024-07-01 17:20:30 +02:00
ChangeLog.old Add ChangeLog file 2024-07-21 18:33:37 +02:00
conform conform: Reformat Makefile. 2024-02-25 13:38:16 -05:00
csu Add crt1-2.0.o for glibc 2.0 compatibility tests 2024-05-06 07:49:40 -07:00
ctype ctype: Reformat Makefile. 2024-02-25 13:38:16 -05:00
debug debug: Fix read error handling in pcprofiledump 2024-09-10 12:40:27 +02:00
dirent Linux: readdir64_r should not skip d_ino == 0 entries (bug 32126) 2024-09-21 19:32:34 +02:00
dlfcn dlfcn: Reformat Makefile. 2024-02-25 13:38:16 -05:00
elf elf: Move __rtld_malloc_init_stubs call into _dl_start_final 2024-09-24 13:23:10 +02:00
gmon Define write_profiling functions only in profile library [BZ #31756] 2024-05-22 06:12:55 -07:00
gnulib Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
hesiod hesiod: Reformat Makefile. 2024-02-25 13:38:16 -05:00
htl hurd: Fix missing pthread_ compat symbol in libc 2024-08-01 23:58:51 +02:00
hurd x86_64 hurd: ensure we have a large enough buffer to receive exception_raise requests. 2024-07-30 16:59:12 +02:00
iconv iconv: Use $(run-program-prefix) for running iconv (bug 32197) 2024-09-24 12:35:40 +02:00
iconvdata iconv: Preserve iconv -c error exit on invalid inputs (bug 32046) 2024-09-20 13:51:09 +02:00
include stdlib: Do not use GLIBC_PRIVATE ABI for errno in libc_nonshared.a 2024-09-06 14:07:00 +02:00
inet inet: Avoid label at end of compound statement in tst-if_nameindex 2024-08-26 16:45:31 +02:00
intl Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
io io: Add FUSE-based test for fchmod 2024-09-09 09:51:50 +02:00
libio Add another test for fclose on an unopened file 2024-09-20 10:32:35 -04:00
locale support: Use macros for *stat wrappers 2024-08-16 16:05:20 +02:00
localedata Update to Unicode 16.0.0 [BZ #32168] 2024-09-27 14:43:38 +02:00
login login: Re-flow and sort multiline Makefile definitions 2024-08-07 11:02:03 -03:00
mach mach: Drop some unnecessary vm_param.h includes 2024-01-03 21:59:54 +01:00
malloc malloc: Link threading tests with $(shared-thread-library) 2024-08-20 16:16:25 +02:00
manual manual: Document that feof and ferror are mutually exclusive 2024-09-27 11:41:14 +02:00
math AArch64: Add vector logp1 alias for log1p 2024-09-19 17:53:34 +01:00
mathvec Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
misc misc: Link tst-mkstemp-fuse-parallel with $(shared-thread-library) 2024-09-24 13:05:52 +02:00
nis Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
nptl nptl: Prefer setresuid32 in tst-setuid2 2024-09-24 13:48:11 +02:00
nptl_db Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
nscd nscd: Use time_t for return type of addgetnetgrentX 2024-05-02 18:59:29 +02:00
nss nss: Fix incorrect switch fall-through in tst-nss-gai-actions 2024-08-07 15:00:25 +02:00
po po/*: regenerate (only line number changes) 2024-07-21 17:50:35 +02:00
posix support: Use macros for *stat wrappers 2024-08-16 16:05:20 +02:00
resolv resolv: Fix tst-resolv-short-response for older GCC (bug 32042) 2024-08-01 21:07:48 +02:00
resource Always define __USE_TIME_BITS64 when 64 bit time_t is used 2024-04-02 15:28:36 -03:00
rt debug: Fix clang mq_open fortify wrapper (BZ 31917) 2024-06-27 13:32:48 -03:00
scripts scripts: Remove arceb-linux-gnu from build-many-glibcs.py 2024-09-25 11:25:22 +02:00
setjmp Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
signal signal/Makefile: Split and sort tests 2024-07-01 13:47:27 +02:00
socket Fix name space violation in fortify wrappers (bug 32052) 2024-08-05 16:49:58 +02:00
soft-fp soft-fp: Add brain format support 2024-02-01 19:06:54 +01:00
stdio-common stdio-common: Fix memory leak in tst-freopen4* tests on UNSUPPORTED 2024-09-28 21:06:11 +02:00
stdlib Make tst-strtod-underflow type-generic 2024-09-20 23:25:32 +00:00
string string: strerror, strsignal cannot use buffer after dlmopen (bug 32026) 2024-08-19 15:48:03 +02:00
sunrpc Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
support support: Add valgrind instructions to <support/fuse.h> 2024-09-21 19:29:13 +02:00
sysdeps x86/string: Fixup alignment of main loop in str{n}cmp-evex [BZ #32212] 2024-09-30 07:40:40 -07:00
sysvipc Always define __USE_TIME_BITS64 when 64 bit time_t is used 2024-04-02 15:28:36 -03:00
termios Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
time time/Makefile: Split and sort tests 2024-07-12 17:33:28 +02:00
timezone timezone: sync to TZDB 2024b 2024-09-05 20:57:17 +00:00
wcsmbs Fix name space violation in fortify wrappers (bug 32052) 2024-08-05 16:49:58 +02:00
wctype Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
.clang-format Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
.gitattributes Assume __NR_openat is always defined 2016-03-23 23:35:08 +01:00
.gitignore Add *.pyc to .gitignore 2015-05-18 15:26:26 +05:30
abi-tags Remove the bulk of the NaCl port. 2017-05-20 08:09:10 -04:00
aclocal.m4 Convert to autoconf 2.72 (vanilla release, no distribution patches) 2024-06-17 21:15:28 +02:00
config.h.in arc: Remove HAVE_ARC_BE macro and disable big-endian port 2024-09-25 11:25:22 +02:00
config.make.in manual: add syscalls 2024-07-09 11:54:29 +02:00
configure Turn on -Wimplicit-fallthrough by default if available 2024-08-09 15:34:53 +02:00
configure.ac Turn on -Wimplicit-fallthrough by default if available 2024-08-09 15:34:53 +02:00
CONTRIBUTED-BY crypt: Remove libcrypt support 2023-10-30 13:03:59 -03:00
COPYING
COPYING.LIB
extra-lib.mk Rename cppflags-iterator.mk to libof-iterator.mk, remove extra-modules.mk. 2017-05-09 07:06:29 -04:00
gen-locales.mk locale: Handle loading a missing locale twice (Bug 14247) 2024-04-22 16:03:00 -04:00
INSTALL install.texi: bump "latest verified" versions 2024-07-21 00:27:35 +02:00
libc-abis riscv: support GNU indirect function 2021-01-10 21:25:13 -05:00
libof-iterator.mk Rename cppflags-iterator.mk to libof-iterator.mk, remove extra-modules.mk. 2017-05-09 07:06:29 -04:00
LICENSES Relicense IBM portions of resolv/base64.c resolv/res_debug.c. 2024-01-26 13:33:36 -05:00
MAINTAINERS Add MAINTAINERS 2017-05-11 13:38:30 -04:00
Makeconfig Turn on -Wimplicit-fallthrough by default if available 2024-08-09 15:34:53 +02:00
Makefile Pass -nostdlib -nostartfiles together with -r [BZ #31753] 2024-05-19 16:29:02 -07:00
Makefile.help math: Add support for auto static math tests 2024-05-21 16:53:27 -03:00
Makefile.in New make target to only build benchmark binaries 2016-04-20 10:23:28 +05:30
Makerules Support compiling .S files with additional options 2024-02-25 09:22:40 -08:00
NEWS arc: Remove HAVE_ARC_BE macro and disable big-endian port 2024-09-25 11:25:22 +02:00
o-iterator.mk
README Remove ia64-linux-gnu 2024-01-08 17:09:36 -03:00
Rules Implement run-built-tests=no for make xcheck, always build xtests 2024-09-21 00:29:55 +02:00
SECURITY.md Adapt the security policy for the security page 2023-12-05 09:15:10 -05:00
SHARED-FILES Update to Unicode 16.0.0 [BZ #32168] 2024-09-27 14:43:38 +02:00
shlib-versions crypt: Remove libcrypt support 2023-10-30 13:03:59 -03:00
test-skeleton.c Update copyright dates with scripts/update-copyrights 2024-01-01 10:53:40 -08:00
version.h Increase version number to 2.40.9000 2024-07-21 18:49:35 +02:00

This directory contains the sources of the GNU C Library.
See the file "version.h" for what release version you have.

The GNU C Library is the standard system C library for all GNU systems,
and is an important part of what makes up a GNU system.  It provides the
system API for all programs written in C and C-compatible languages such
as C++ and Objective C; the runtime facilities of other programming
languages use the C library to access the underlying operating system.

In GNU/Linux systems, the C library works with the Linux kernel to
implement the operating system behavior seen by user applications.
In GNU/Hurd systems, it works with a microkernel and Hurd servers.

The GNU C Library implements much of the POSIX.1 functionality in the
GNU/Hurd system, using configurations i[4567]86-*-gnu and x86_64-gnu.

When working with Linux kernels, this version of the GNU C Library
requires Linux kernel version 3.2 or later.

Also note that the shared version of the libgcc_s library must be
installed for the pthread library to work correctly.

The GNU C Library supports these configurations for using Linux kernels:

	aarch64*-*-linux-gnu
	alpha*-*-linux-gnu
	arc*-*-linux-gnu
	arm-*-linux-gnueabi
	csky-*-linux-gnuabiv2
	hppa-*-linux-gnu
	i[4567]86-*-linux-gnu
	x86_64-*-linux-gnu	Can build either x86_64 or x32
	loongarch64-*-linux-gnu Hardware floating point, LE only.
	m68k-*-linux-gnu
	microblaze*-*-linux-gnu
	mips-*-linux-gnu
	mips64-*-linux-gnu
	or1k-*-linux-gnu
	powerpc-*-linux-gnu	Hardware or software floating point, BE only.
	powerpc64*-*-linux-gnu	Big-endian and little-endian.
	s390-*-linux-gnu
	s390x-*-linux-gnu
	riscv32-*-linux-gnu
	riscv64-*-linux-gnu
	sh[34]-*-linux-gnu
	sparc*-*-linux-gnu
	sparc64*-*-linux-gnu

If you are interested in doing a port, please contact the glibc
maintainers; see https://www.gnu.org/software/libc/ for more
information.

See the file INSTALL to find out how to configure, build, and install
the GNU C Library.  You might also consider reading the WWW pages for
the C library at https://www.gnu.org/software/libc/.

The GNU C Library is (almost) completely documented by the Texinfo manual
found in the `manual/' subdirectory.  The manual is still being updated
and contains some known errors and omissions; we regret that we do not
have the resources to work on the manual as much as we would like.  For
corrections to the manual, please file a bug in the `manual' component,
following the bug-reporting instructions below.  Please be sure to check
the manual in the current development sources to see if your problem has
already been corrected.

Please see https://www.gnu.org/software/libc/bugs.html for bug reporting
information.  We are now using the Bugzilla system to track all bug reports.
This web page gives detailed information on how to report bugs properly.

The GNU C Library is free software.  See the file COPYING.LIB for copying
conditions, and LICENSES for notices about a few contributions that require
these additional notices to be distributed.  License copyright years may be
listed using range notation, e.g., 1996-2015, indicating that every year in
the range, inclusive, is a copyrightable year that would otherwise be listed
individually.