Go to file
Noah Goldstein f049f52dfe x86: Optimize and shrink st{r|p}{n}{cat|cpy}-evex functions
Optimizations are:
    1. Use more overlapping stores to avoid branches.
    2. Reduce how unrolled the aligning copies are (this is more of a
       code-size save, its a negative for some sizes in terms of
       perf).
    3. Improve the loop a bit (similiar to what we do in strlen with
       2x vpminu + kortest instead of 3x vpminu + kmov + test).
    4. For st{r|p}n{cat|cpy} re-order the branches to minimize the
       number that are taken.

Performance Changes:

    Times are from N = 10 runs of the benchmark suite and are
    reported as geometric mean of all ratios of
    New Implementation / Old Implementation.

    stpcpy-evex      -> 0.922
    strcat-evex      -> 0.985
    strcpy-evex      -> 0.880

    strncpy-evex     -> 0.831
    stpncpy-evex     -> 0.780

    strncat-evex     -> 0.958

Code Size Changes:
    function         -> Bytes New / Bytes Old -> Ratio

    strcat-evex      ->  819 / 1874 -> 0.437
    strcpy-evex      ->  700 / 1074 -> 0.652
    stpcpy-evex      ->  735 / 1094 -> 0.672

    strncpy-evex     -> 1397 / 2611 -> 0.535
    stpncpy-evex     -> 1489 / 2691 -> 0.553

    strncat-evex     -> 1184 / 2832 -> 0.418

Notes:
    1. Because of the significant difference between the
       implementations they are split into three files.

           strcpy-evex.S    -> strcpy, stpcpy, strcat
           strncpy-evex.S   -> strncpy
           strncat-evex.S    > strncat

       I couldn't find a way to merge them without making the
       ifdefs incredibly difficult to follow.

    2. All implementations can be made evex512 by including
       "x86-evex512-vecs.h" at the top.

    3. All implementations have an optional define:
        `USE_EVEX_MASKED_STORE`
       Setting to one uses evex-masked stores for handling short
       strings.  This saves code size and branches.  It's disabled
       for all implementations are the moment as there are some
       serious drawbacks to masked stores in certain cases, but
       that may be fixed on future architectures.

Full check passes on x86-64 and build succeeds for all ISA levels w/
and w/o multiarch.
2022-11-08 19:22:33 -08:00
argp configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
assert Use atomic_exchange_release/acquire 2022-09-26 16:58:08 +01:00
benchtests benchtests: Make str{n}{cat|cpy} benchmarks output json 2022-11-08 19:22:33 -08:00
bits Expose all MAP_ constants in <sys/mman.h> unconditionally (bug 29375) 2022-10-10 09:30:24 +02:00
catgets Use '%z' instead of '%Z' on printf functions 2022-09-22 08:48:04 -03:00
ChangeLog.old Create ChangeLog.old/ChangeLog.25. 2022-07-29 18:03:09 -04:00
conform hurd: drop SA_SIGINFO availability xfail 2022-01-15 17:43:07 +01:00
crypt crypt: Remove unused variable on cert test 2022-03-31 09:00:54 -03:00
csu elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
ctype Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
debug Linux: Add ppoll fortify symbol for 64 bit time_t (BZ# 29746) 2022-11-08 13:37:06 -03:00
dirent configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
dlfcn dlfcn: Pass caller pointer to static dlopen implementation (bug 29446) 2022-08-04 17:54:48 +02:00
elf elf/tlsdeschtab.h: Add the Malloc return value check in _dl_make_tlsdesc_dynamic() 2022-11-07 13:13:07 +00:00
gmon Use '%z' instead of '%Z' on printf functions 2022-09-22 08:48:04 -03:00
gnulib Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
grp Add access function attributes to grp and shadow headers 2022-03-14 20:02:30 +05:30
gshadow Add access function attributes to grp and shadow headers 2022-03-14 20:02:30 +05:30
hesiod Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
htl Use C11 atomics instead of atomic_decrement_and_test 2022-09-23 15:59:56 +01:00
hurd Use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources 2022-10-18 17:04:10 +02:00
iconv Use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources 2022-10-18 17:04:10 +02:00
iconvdata gconv: Correct Big5-HKSCS conversion to preserve all state bits. [BZ #25744] 2022-07-06 09:27:13 -03:00
include Linux: Add ppoll fortify symbol for 64 bit time_t (BZ# 29746) 2022-11-08 13:37:06 -03:00
inet configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
intl intl: Fix clang -Wunused-but-set-variable on plural.c 2022-11-01 09:45:34 -03:00
io Linux: Add ppoll fortify symbol for 64 bit time_t (BZ# 29746) 2022-11-08 13:37:06 -03:00
libio configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
locale locale: prevent maybe-uninitialized errors with -Os [BZ #19444] 2022-10-05 18:04:13 -03:00
localedata Update to Unicode 15.0.0 [BZ #29604] 2022-10-06 08:58:33 +02:00
login configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
mach Use atomic_exchange_release/acquire 2022-09-26 16:58:08 +01:00
malloc malloc: Use uintptr_t for pointer alignment 2022-11-01 09:48:22 +00:00
manual manual: Add missing % in int conversion list 2022-10-25 09:12:30 +02:00
math Disable use of -fsignaling-nans if compiler does not support it 2022-11-01 09:46:08 -03:00
mathvec Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
misc configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
nis nis: Fix nis_print_directory 2022-10-20 10:54:27 -03:00
nptl elf: Rework exception handling in the dynamic loader [BZ #25486] 2022-11-03 09:39:31 +01:00
nptl_db nptl_db: disable DT_RELR on libthread_db.so 2022-06-08 11:17:47 -05:00
nscd nscd: Drop local address tuple variable [BZ #29607] 2022-10-04 18:40:25 -04:00
nss Use PTR_MANGLE and PTR_DEMANGLE unconditionally in C sources 2022-10-18 17:04:10 +02:00
po Update libc.pot for 2.36 release. 2022-07-29 16:41:57 -04:00
posix posix: Make posix_spawn extensions available by default 2022-11-04 13:29:52 +01:00
pwd Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
resolv configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
resource configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
rt rt: Initialize mq_send input on tst-mqueue{5,6} 2022-10-05 18:04:13 -03:00
scripts scripts/glibcelf.py: Properly report <elf.h> parsing failures 2022-11-03 12:24:17 +01:00
setjmp Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
shadow Add access function attributes to grp and shadow headers 2022-03-14 20:02:30 +05:30
signal Refactor internal-signals.h 2022-06-30 14:56:21 -03:00
socket configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
soft-fp soft-fp: Add fixhf[uns][di|si] and float[uns][di|si]hf 2022-08-08 11:28:40 -03:00
stdio-common configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
stdlib Apply asm redirection in gmp.h before first use 2022-11-07 10:40:21 -03:00
string string: Add len=0 to {w}memcmp{eq} tests and benchtests 2022-11-08 19:19:35 -08:00
sunrpc sunrpc: Suppress GCC -Os warning on user2netname 2022-10-05 18:04:13 -03:00
support support: Add xpthread_cond_signal wrapper 2022-10-03 11:19:36 -03:00
sysdeps x86: Optimize and shrink st{r|p}{n}{cat|cpy}-evex functions 2022-11-08 19:22:33 -08:00
sysvipc Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
termios configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
time configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
timezone timezone: Fix tst-bz28707 Makefile rule 2022-01-12 10:30:10 -03:00
wcsmbs configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
wctype configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
.clang-format Add .clang-format style file 2022-04-11 10:51:03 -05:00
.gitattributes
.gitignore
abi-tags
aclocal.m4 Correctly determine libc.so 'OUTPUT_FORMAT' when cross-compiling. 2022-10-28 17:19:02 -04:00
config.h.in LoongArch: Add LoongArch entries to config.h.in 2022-07-26 12:35:12 -03:00
config.make.in Revert "Detect ld.so and libc.so version inconsistency during startup" 2022-08-25 18:46:43 +02:00
configure Rewrite find_cxx_header config configure.ac 2022-11-07 10:40:17 -03:00
configure.ac Rewrite find_cxx_header config configure.ac 2022-11-07 10:40:17 -03:00
CONTRIBUTED-BY Remove "Contributed by" lines 2021-09-03 22:06:44 +05:30
COPYING
COPYING.LIB
extra-lib.mk
gen-locales.mk Improve gen-locales.mk and gen-locale.sh to make test files with @ options work 2018-02-27 17:01:57 +01:00
INSTALL Revert "Detect ld.so and libc.so version inconsistency during startup" 2022-08-25 18:46:43 +02:00
libc-abis riscv: support GNU indirect function 2021-01-10 21:25:13 -05:00
libof-iterator.mk
LICENSES arc4random: simplify design for better safety 2022-07-27 08:58:27 -03:00
MAINTAINERS
Makeconfig Remove lingering libSegfault Makefile entries 2022-10-26 15:55:43 -03:00
Makefile grep: egrep -> grep -E, fgrep -> grep -F 2022-06-05 12:09:02 -07:00
Makefile.help Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
Makefile.in
Makerules Makerules: fix MAKEFLAGS assignment for upcoming make-4.4 [BZ# 29564] 2022-09-13 13:45:32 -04:00
NEWS NEWS: Fix grammar 2022-10-06 13:19:33 +02:00
o-iterator.mk
README LoongArch: Update NEWS and README for the LoongArch port. 2022-07-26 12:35:12 -03:00
Rules Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
SHARED-FILES Mention today's regex merge in SHARED-FILES 2021-09-21 18:00:10 -07:00
shlib-versions nss: Do not mention NSS test modules in <gnu/lib-names.h> 2022-03-11 08:24:04 +01:00
test-skeleton.c Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
version.h Open master branch for glibc 2.37 development 2022-07-30 15:34:51 -04:00

This directory contains the sources of the GNU C Library.
See the file "version.h" for what release version you have.

The GNU C Library is the standard system C library for all GNU systems,
and is an important part of what makes up a GNU system.  It provides the
system API for all programs written in C and C-compatible languages such
as C++ and Objective C; the runtime facilities of other programming
languages use the C library to access the underlying operating system.

In GNU/Linux systems, the C library works with the Linux kernel to
implement the operating system behavior seen by user applications.
In GNU/Hurd systems, it works with a microkernel and Hurd servers.

The GNU C Library implements much of the POSIX.1 functionality in the
GNU/Hurd system, using configurations i[4567]86-*-gnu.

When working with Linux kernels, this version of the GNU C Library
requires Linux kernel version 3.2 or later.

Also note that the shared version of the libgcc_s library must be
installed for the pthread library to work correctly.

The GNU C Library supports these configurations for using Linux kernels:

	aarch64*-*-linux-gnu
	alpha*-*-linux-gnu
	arc*-*-linux-gnu
	arm-*-linux-gnueabi
	csky-*-linux-gnuabiv2
	hppa-*-linux-gnu
	i[4567]86-*-linux-gnu
	x86_64-*-linux-gnu	Can build either x86_64 or x32
	ia64-*-linux-gnu
	loongarch64-*-linux-gnu Hardware floating point, LE only.
	m68k-*-linux-gnu
	microblaze*-*-linux-gnu
	mips-*-linux-gnu
	mips64-*-linux-gnu
	or1k-*-linux-gnu
	powerpc-*-linux-gnu	Hardware or software floating point, BE only.
	powerpc64*-*-linux-gnu	Big-endian and little-endian.
	s390-*-linux-gnu
	s390x-*-linux-gnu
	riscv32-*-linux-gnu
	riscv64-*-linux-gnu
	sh[34]-*-linux-gnu
	sparc*-*-linux-gnu
	sparc64*-*-linux-gnu

If you are interested in doing a port, please contact the glibc
maintainers; see https://www.gnu.org/software/libc/ for more
information.

See the file INSTALL to find out how to configure, build, and install
the GNU C Library.  You might also consider reading the WWW pages for
the C library at https://www.gnu.org/software/libc/.

The GNU C Library is (almost) completely documented by the Texinfo manual
found in the `manual/' subdirectory.  The manual is still being updated
and contains some known errors and omissions; we regret that we do not
have the resources to work on the manual as much as we would like.  For
corrections to the manual, please file a bug in the `manual' component,
following the bug-reporting instructions below.  Please be sure to check
the manual in the current development sources to see if your problem has
already been corrected.

Please see https://www.gnu.org/software/libc/bugs.html for bug reporting
information.  We are now using the Bugzilla system to track all bug reports.
This web page gives detailed information on how to report bugs properly.

The GNU C Library is free software.  See the file COPYING.LIB for copying
conditions, and LICENSES for notices about a few contributions that require
these additional notices to be distributed.  License copyright years may be
listed using range notation, e.g., 1996-2015, indicating that every year in
the range, inclusive, is a copyrightable year that would otherwise be listed
individually.