glibc/sysdeps
Noah Goldstein 642933158e x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions
Optimizations are:
    1. Use more overlapping stores to avoid branches.
    2. Reduce how unrolled the aligning copies are (this is more of a
       code-size save, its a negative for some sizes in terms of
       perf).
    3. For st{r|p}n{cat|cpy} re-order the branches to minimize the
       number that are taken.

Performance Changes:

    Times are from N = 10 runs of the benchmark suite and are
    reported as geometric mean of all ratios of
    New Implementation / Old Implementation.

    strcat-avx2      -> 0.998
    strcpy-avx2      -> 0.937
    stpcpy-avx2      -> 0.971

    strncpy-avx2     -> 0.793
    stpncpy-avx2     -> 0.775

    strncat-avx2     -> 0.962

Code Size Changes:
    function         -> Bytes New / Bytes Old -> Ratio

    strcat-avx2      ->  685 / 1639 -> 0.418
    strcpy-avx2      ->  560 /  903 -> 0.620
    stpcpy-avx2      ->  592 /  939 -> 0.630

    strncpy-avx2     -> 1176 / 2390 -> 0.492
    stpncpy-avx2     -> 1268 / 2438 -> 0.520

    strncat-avx2     -> 1042 / 2563 -> 0.407

Notes:
    1. Because of the significant difference between the
       implementations they are split into three files.

           strcpy-avx2.S    -> strcpy, stpcpy, strcat
           strncpy-avx2.S   -> strncpy
           strncat-avx2.S    > strncat

       I couldn't find a way to merge them without making the
       ifdefs incredibly difficult to follow.

Full check passes on x86-64 and build succeeds for all ISA levels w/
and w/o multiarch.
2022-11-08 19:22:33 -08:00
..
aarch64 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
alpha elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
arc elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
arm configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
csky elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
generic elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
gnu errlist: add missing entry for EDEADLOCK (bug 29545) 2022-09-08 11:40:24 +02:00
hppa elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
htl htl: Make pthread*_cond_timedwait register wref before releasing mutex 2022-08-22 22:27:24 +02:00
hurd hurd: Fix pthread_kill on exiting/ted thread 2022-01-15 15:11:54 +01:00
i386 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
ia64 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
ieee754 Fix build with GCC 13 _FloatN, _FloatNx built-in functions 2022-10-31 23:20:08 +00:00
loongarch elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
m68k elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
mach hurd: Add sigtimedwait and sigwaitinfo support 2022-11-07 21:16:26 +01:00
microblaze elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
mips elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
nios2 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
nptl Use atomic_exchange_release/acquire 2022-09-26 16:58:08 +01:00
or1k elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
posix get_nscd_addresses: Fix subscript typos [BZ #29605] 2022-09-28 12:47:10 -04:00
powerpc elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
pthread Do not define static_assert or thread_local in headers for C2x 2022-09-07 18:39:28 +00:00
riscv elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
s390 elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
sh elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
sparc elf: Introduce <dl-call_tls_init_tp.h> and call_tls_init_tp (bug 29249) 2022-11-03 17:28:03 +01:00
unix Linux: Add ppoll fortify symbol for 64 bit time_t (BZ# 29746) 2022-11-08 13:37:06 -03:00
wordsize-32 Update copyright dates with scripts/update-copyrights 2022-01-01 11:40:24 -08:00
wordsize-64 configure: Use -Wno-ignored-attributes if compiler warns about multiple aliases 2022-11-01 09:51:06 -03:00
x86 elf: Remove _dl_string_hwcap 2022-10-06 07:59:48 -03:00
x86_64 x86: Optimize and shrink st{r|p}{n}{cat|cpy}-avx2 functions 2022-11-08 19:22:33 -08:00