Commit Graph

23 Commits

Author SHA1 Message Date
Wilco Dijkstra
9f298bfe1f AArch64: Add SVE memcpy
Add an initial SVE memcpy implementation.  Copies up to 32 bytes use SVE
vectors which improves the random memcpy benchmark significantly.
Cleanup the memcpy and memmove ifunc selectors.
2022-06-07 16:58:03 +01:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Naohiro Tamura
4f26956d5b aarch64: Added optimized memset for A64FX
This patch optimizes the performance of memset for A64FX [1] which
implements ARMv8-A SVE and has L1 64KB cache per core and L2 8MB cache
per NUMA node.

The performance optimization makes use of Scalable Vector Register
with several techniques such as loop unrolling, memory access
alignment, cache zero fill and prefetch.

SVE assembler code for memset is implemented as Vector Length Agnostic
code so theoretically it can be run on any SOC which supports ARMv8-A
SVE standard.

We confirmed that all testcases have been passed by running 'make
check' and 'make xcheck' not only on A64FX but also on ThunderX2.

And also we confirmed that the SVE 512 bit vector register performance
is roughly 4 times better than Advanced SIMD 128 bit register and 8
times better than scalar 64 bit register by running 'make bench'.

[1] https://github.com/fujitsu/A64FX

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
2021-05-27 09:47:53 +01:00
Naohiro Tamura
fa527f345c aarch64: Added optimized memcpy and memmove for A64FX
This patch optimizes the performance of memcpy/memmove for A64FX [1]
which implements ARMv8-A SVE and has L1 64KB cache per core and L2 8MB
cache per NUMA node.

The performance optimization makes use of Scalable Vector Register
with several techniques such as loop unrolling, memory access
alignment, cache zero fill, and software pipelining.

SVE assembler code for memcpy/memmove is implemented as Vector Length
Agnostic code so theoretically it can be run on any SOC which supports
ARMv8-A SVE standard.

We confirmed that all testcases have been passed by running 'make
check' and 'make xcheck' not only on A64FX but also on ThunderX2.

And also we confirmed that the SVE 512 bit vector register performance
is roughly 4 times better than Advanced SIMD 128 bit register and 8
times better than scalar 64 bit register by running 'make bench'.

[1] https://github.com/fujitsu/A64FX

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com>
2021-05-27 09:47:53 +01:00
Szabolcs Nagy
04c6a8073d aarch64: Fix the list of tested IFUNC variants [BZ #26818]
Some IFUNC variants are not compatible with BTI and MTE so don't
set them as usable for testing and benchmarking on a BTI or MTE
enabled system.

As far as IFUNC selectors are concerned a system is BTI enabled if
the cpu supports it and glibc was built with BTI branch protection.

Most IFUNC variants are BTI compatible, but thunderx2 memcpy and
memmove use a jump table with indirect jump, without a BTI j.

Fixes bug 26818.
2021-01-25 16:15:54 +00:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Wilco Dijkstra
f46ef33ad1 AArch64: Improve strlen_asimd performance (bug 25824)
Optimize strlen using a mix of scalar and SIMD code.  On modern micro
architectures large strings are 2.6 times faster than existing
strlen_asimd and 35% faster than the new MTE version of strlen.

On a random strlen benchmark using small sizes the speedup is 7% vs
strlen_asimd and 40% vs the MTE strlen.  This fixes the main strlen
regressions on Cortex-A53 and other cores with a simple Neon unit.

Rename __strlen_generic to __strlen_mte, and select strlen_asimd when
MTE is not enabled (this is waiting on support for a HWCAP_MTE bit).

This fixes big-endian bug 25824. Passes GLIBC regression tests.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2020-07-17 15:07:23 +01:00
Wilco Dijkstra
4a733bf375 AArch64: Add optimized Q-register memcpy
Add a new memcpy using 128-bit Q registers - this is faster on modern
cores and reduces codesize.  Similar to the generic memcpy, small cases
include copies up to 32 bytes.  64-128 byte copies are split into two
cases to improve performance of 64-96 byte copies.  Large copies align
the source rather than the destination.

bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1,
so make this memcpy the default on N1 (on Centriq it is 15% faster than
memcpy_falkor).

Passes GLIBC regression tests.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2020-07-15 16:55:07 +01:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Xuelei Zhang
525de033a9 aarch64: Optimized memset for Kunpeng processor.
Due to the branch prediction issue of Kunpeng processor, we found
memset_generic has poor performance on middle sizes setting, and so
we reconstructed the logic, expanded the loop by 4 times in set_long
to solve the problem, even when setting below 1K sizes have benefit.

Another change is that DZ_ZVA seems no work when setting zero, so we
discarded it and used set_long to set zero instead. Fewer branches and
predictions also make the zero case have slightly improvement.

Checked on aarch64-linux-gnu.

Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
2019-12-19 16:31:04 -03:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Anton Youdkevitch
32e902a94e aarch64: thunderx2 memmove performance improvements
The performance improvement is about 20%-30% for
larger cases and about 1%-5% for smaller cases.

Used SIMD load/store instead of GPR for large
overlapping forward moves.

Reused existing memcpy implementation for smaller
or overlapping backward moves.

Fixed the existing memcpy implementation to allow it
to deal with the overlapping case.

Simplified loop tails in the memcpy implementation -
use branchless overlapping sequence of fixed length
load/stores instead of branching depending on the
size.

A cleanup/optimization converting str's to stp's.

Added __memmove_thunderx2 to the list of the
available implementations.
2019-05-03 11:01:34 -07:00
Feng Xue
83d1cc42d8 aarch64: Optimized memchr specific to AmpereComputing emag
This version uses general register based memory instruction to load
data, because vector register based is slightly slower in emag.

Character-matching is performed on 16-byte (both size and alignment)
memory block in parallel each iteration.

    * sysdeps/aarch64/memchr.S (__memchr): Rename to MEMCHR.
    [!MEMCHR](MEMCHR): Set to __memchr.
    * sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
    Add memchr_generic and memchr_nosimd.
    * sysdeps/aarch64/multiarch/ifunc-impl-list.c
    (__libc_ifunc_impl_list): Add memchr ifuncs.
    * sysdeps/aarch64/multiarch/memchr.c: New file.
    * sysdeps/aarch64/multiarch/memchr_generic.S: Likewise.
    * sysdeps/aarch64/multiarch/memchr_nosimd.S: Likewise.
2019-02-01 08:14:21 -05:00
Feng Xue
c7d3890ff5 aarch64: Optimized memset specific to AmpereComputing emag
This version uses general register based memory store instead of
vector register based, for the former is faster than the latter
in emag.

The fact that DC ZVA size in emag is 64-byte, is used by IFUNC
dispatch to select this memset, so that cost of runtime-check on
DC ZVA size can be saved.

    * sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
    Add memset_emag.
    * sysdeps/aarch64/multiarch/ifunc-impl-list.c
    (__libc_ifunc_impl_list): Add __memset_emag to memset ifunc.
    * sysdeps/aarch64/multiarch/memset.c (libc_ifunc):
    Add IS_EMAG check for ifunc dispatch.
    * sysdeps/aarch64/multiarch/memset_base64.S: New file.
    * sysdeps/aarch64/multiarch/memset_emag.S: New file.
2019-02-01 07:59:18 -05:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
Siddhesh Poyarekar
436e4d5b96 [aarch64] Add an ASIMD variant of strlen for falkor
This variant of strlen uses vector loads and operations to reduce the
size of the code and also eliminate the non-ascii fallback.  This
works very well for falkor because of its two vector units and
efficient vector ops.  In the best case it reduces latency of cases in
bench-strlen by 48%, with gains throughout the benchmark.
strlen-walk also sees uniform gains in the 5%-15% range.

Overall the routine appears to work better than the stock one for falkor
regardless of the benchmark, length of string or cache state.

The same cannot be said of a53 and a72 though.  a53 performance was
greatly reduced and for a72 it was a bit of a mixed bag, slightly on the
negative side but I reckon it might be fast in some situations.

	* sysdeps/aarch64/strlen.S (__strlen): Rename to STRLEN.
	[!STRLEN](STRLEN): Set to __strlen.
	* sysdeps/aarch64/multiarch/strlen.c: New file.
	* sysdeps/aarch64/multiarch/strlen_generic.S: Likewise.
	* sysdeps/aarch64/multiarch/strlen_asimd.S: Likewise.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add strlen.
	* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add
	strlen_generic and strlen_asimd.

Reviewed-By: szabolcs.nagy@arm.com
CC: pinskia@gmail.com
2018-08-15 23:01:33 +05:30
Steve Ellcey
e9537dddc7 IFUNC for Cavium ThunderX2
* sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
	Add memcpy_thunderx2.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c (MAX_IFUNC):
	Increment to 4.
	(__libc_ifunc_impl_list): Add __memcpy_thunderx2.
	* sysdeps/aarch64/multiarch/memcpy.c (libc_ifunc): Add IS_THUNDERX2
	and IS_THUNDERX2PA checks.
	* sysdeps/aarch64/multiarch/memcpy_thunderx.S (USE_THUNDERX2):
	Use macro to set name appropriately.
	(memcpy): Use USE_THUNDERX2 macro to modify prefetches.
	* sysdeps/aarch64/multiarch/memcpy_thunderx2.S: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_THUNDERX2PA):
	New macro.
	(IS_THUNDERX2): New macro.
2018-02-22 08:38:47 -08:00
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
Siddhesh Poyarekar
5a67c4fa01 aarch64: Optimized memset for falkor
The generic memset reads dczid_el0 on every memset.  This has a
significant impact on falkor for a range of sizes because reading
dczid_el0 is slow.

The DZP bit in the dczid_el0 register does not change dynamically, so
it is safe to read once during program startup.  With this patch
dczid_el0 is read once during startup and zva_size is cached.  This is
used to invoke the falkor-specific memset; the generic memset routine
remains unchanged.

The gains due to this are significant for falkor, with run time
reductions as high as 48%.  Here's a sample from the falkor tests:

Function: memset
Variant: walk
                      simple_memset	__memset_falkor	__memset_generic
=====================================================================
length=256, char=0:   139.96 (-698.28%)	   9.07 ( 48.26%)  17.53
length=257, char=0:   140.50 (-699.03%)	   9.53 ( 45.80%)  17.58
length=258, char=0:   140.96 (-703.95%)	   9.58 ( 45.36%)  17.53
length=259, char=0:   141.56 (-705.16%)	   9.53 ( 45.79%)  17.58
length=260, char=0:   142.15 (-710.76%)	   9.57 ( 45.39%)  17.53
length=261, char=0:   142.50 (-710.39%)	   9.53 ( 45.78%)  17.58
length=262, char=0:   142.97 (-715.09%)	   9.57 ( 45.42%)  17.54
length=263, char=0:   143.51 (-716.18%)	   9.53 ( 45.80%)  17.58
length=264, char=0:   143.93 (-720.55%)	   9.58 ( 45.39%)  17.54
length=265, char=0:   144.56 (-722.07%)	   9.53 ( 45.80%)  17.59
length=266, char=0:   144.98 (-726.42%)	   9.58 ( 45.42%)  17.54
length=267, char=0:   145.53 (-727.53%)	   9.53 ( 45.80%)  17.59
length=268, char=0:   146.25 (-731.81%)	   9.53 ( 45.79%)  17.58
length=269, char=0:   146.52 (-735.39%)	   9.53 ( 45.66%)  17.54
length=270, char=0:   146.97 (-735.81%)	   9.53 ( 45.80%)  17.58
length=271, char=0:   147.54 (-741.08%)	   9.58 ( 45.38%)  17.54
length=512, char=0:   268.26 (-1307.85%)  12.06 ( 36.71%)  19.05
length=513, char=0:   268.73 (-1273.89%)  13.56 ( 30.68%)  19.56
length=514, char=0:   269.31 (-1276.89%)  13.56 ( 30.68%)  19.56
length=515, char=0:   269.73 (-1279.05%)  13.56 ( 30.68%)  19.56
length=516, char=0:   270.34 (-1282.24%)  13.56 ( 30.67%)  19.56
length=517, char=0:   270.83 (-1284.71%)  13.56 ( 30.66%)  19.56
length=518, char=0:   271.20 (-1286.54%)  13.56 ( 30.67%)  19.56
length=519, char=0:   271.67 (-1288.67%)  13.65 ( 30.24%)  19.56
length=520, char=0:   272.14 (-1291.04%)  13.65 ( 30.22%)  19.56
length=521, char=0:   272.66 (-1293.69%)  13.65 ( 30.23%)  19.56
length=522, char=0:   273.14 (-1296.13%)  13.65 ( 30.20%)  19.56
length=523, char=0:   273.64 (-1298.75%)  13.65 ( 30.23%)  19.56
length=524, char=0:   274.34 (-1302.16%)  13.66 ( 30.20%)  19.57
length=525, char=0:   274.64 (-1297.78%)  13.56 ( 30.99%)  19.65
length=526, char=0:   275.20 (-1300.04%)  13.56 ( 31.01%)  19.66
length=527, char=0:   275.66 (-1302.86%)  13.56 ( 30.99%)  19.65
length=1024, char=0:  524.46 (-2169.75%)  20.12 ( 12.92%)  23.11
length=1025, char=0:  525.14 (-2124.63%)  21.62 (  8.40%)  23.61
length=1026, char=0:  525.59 (-2125.36%)  21.88 (  7.37%)  23.62
length=1027, char=0:  525.98 (-2127.14%)  21.62 (  8.46%)  23.62
length=1028, char=0:  526.68 (-2131.10%)  21.62 (  8.42%)  23.61
length=1029, char=0:  527.10 (-2131.70%)  21.79 (  7.73%)  23.62
length=1030, char=0:  527.54 (-2118.51%)  21.62 (  9.10%)  23.78
length=1031, char=0:  527.98 (-2136.37%)  21.62 (  8.43%)  23.61
length=1032, char=0:  528.70 (-2139.38%)  21.62 (  8.43%)  23.61
length=1033, char=0:  529.25 (-2124.37%)  21.62 (  9.11%)  23.79
length=1034, char=0:  529.48 (-2142.95%)  21.62 (  8.43%)  23.61
length=1035, char=0:  530.11 (-2145.13%)  21.62 (  8.44%)  23.61
length=1036, char=0:  530.76 (-2147.10%)  21.79 (  7.73%)  23.62
length=1037, char=0:  531.03 (-2149.45%)  21.62 (  8.42%)  23.61
length=1038, char=0:  531.64 (-2151.87%)  21.62 (  8.42%)  23.61
length=1039, char=0:  531.99 (-2151.63%)  21.80 (  7.75%)  23.63

	* sysdeps/aarch64/memset-reg.h: New file.
	* sysdeps/aarch64/memset.S: Use it.
	(__memset): Rename to MEMSET macro.
	[ZVA_MACRO]: Use zva_macro.
	* sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
	Add memset_generic and memset_falkor.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Add memset ifuncs.
	* sysdeps/aarch64/multiarch/init-arch.h (INIT_ARCH): New
	local variable zva_size.
	* sysdeps/aarch64/multiarch/memset.c: New file.
	* sysdeps/aarch64/multiarch/memset_generic.S: New file.
	* sysdeps/aarch64/multiarch/memset_falkor.S: New file.
	* sysdeps/aarch64/multiarch/rtld-memset.S: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
	(DCZID_DZP_MASK): New macro.
	(DCZID_BS_MASK): Likewise.
	(init_cpu_features): Read and set zva_size.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h
	(struct cpu_features): New member zva_size.
2017-11-20 18:25:04 +05:30
Siddhesh Poyarekar
dd5bc7f1b3 aarch64: Optimized implementation of memmove for Qualcomm Falkor
This is an optimized memmove implementation for the Qualcomm Falkor
processor core.  Due to the way the falkor memcpy needs to be written,
code cannot be easily shared between memmove and memcpy like in case
of other aarch64 memcpy implementations due to which this routine is
separate.  The underlying principle is the same as that of memcpy
where it tries to use registers with the same lower 4 bits for
fetching the same stream, thus optimizing hardware prefetcher
performance.

The memcpy copy loop copies 64 bytes at a time using the same register
pair since that's the way to train the hardware prefetcher on the
falkor core.  memmove cannot quite do that since it needs to avoid
overlaps, so it does the next best thing, i.e. has a 32 byte loop with
a 32 byte end (prefetch a loop ahead to account for overlapping
locations) with register pairs that alias so that they hit the same
prefetcher.  Due to this difference in loop size, they have to
currently be separate implementations but efforts are on to try and
get memmove to fall back into memcpy whenever it can without simply
duplicating all of the code.

Performance:

The routine fares around 20-25% better than the generic memmove for
most medium to large sizes (i.e. > 128 bytes) for the new walking
memmove benchmark (memmove-walk) with an unexplained regression
between 1K and 2K.  The minor regression is something worth looking
into for us, but the remaining gains are significant enough that we
would like this included upstream as we looking into the cause for the
regression.  Here is a snippet of the numbers as generated from the
microbenchmark by the compare_strings script.  Comparisons are against
__memmove_generic:

Function: memmove
Variant: walk
                                    __memmove_thunderx	__memmove_falkor	__memmove_generic
========================================================================================================================
<snip>
                        length=16384:  12508800.00 (  6.09%)	 11486800.00 ( 13.76%)	 13319600.00
                        length=16400:  13614200.00 ( -0.67%)	 11585000.00 ( 14.33%)	 13523600.00
                        length=16385:  13448400.00 (  0.10%)	 11732700.00 ( 12.84%)	 13461200.00
                        length=16399:  13594100.00 ( -0.22%)	 11859600.00 ( 12.57%)	 13564400.00
                        length=16386:  13211600.00 (  1.13%)	 11503800.00 ( 13.91%)	 13362400.00
                        length=16398:  13218600.00 (  2.12%)	 11573200.00 ( 14.30%)	 13504700.00
                        length=16387:  13510900.00 ( -0.37%)	 11744200.00 ( 12.76%)	 13461300.00
                        length=16397:  13603700.00 ( -0.15%)	 11878200.00 ( 12.55%)	 13583200.00
                        length=16388:  13461700.00 ( -0.13%)	 11558000.00 ( 14.03%)	 13444100.00
                        length=16396:  13517500.00 ( -0.03%)	 11561300.00 ( 14.45%)	 13513900.00
                        length=16389:  13534100.00 (  0.17%)	 11756800.00 ( 13.28%)	 13556900.00
                        length=16395:  13585600.00 (  0.11%)	 11791800.00 ( 13.30%)	 13601200.00
                        length=16390:  13480100.00 ( -0.13%)	 11685500.00 ( 13.20%)	 13462100.00
                        length=16394:  13529900.00 ( -0.23%)	 11549800.00 ( 14.43%)	 13498200.00
                        length=16391:  13595400.00 ( -0.26%)	 11768200.00 ( 13.22%)	 13560600.00
                        length=16393:  13567000.00 (  0.20%)	 11779700.00 ( 13.35%)	 13594700.00
                        length=32768:  71308800.00 ( -6.53%)	 50220800.00 ( 24.98%)	 66939200.00
                        length=32784:  72100800.00 (-11.55%)	 50114100.00 ( 22.47%)	 64636300.00
                        length=32769:  71767000.00 ( -7.10%)	 51238400.00 ( 23.54%)	 67010000.00
                        length=32783:  70113700.00 (-40.95%)	 51129000.00 ( -2.78%)	 49744400.00
                        length=32770:  71367600.00 ( -6.52%)	 50244700.00 ( 25.01%)	 67000900.00
                        length=32782:  64366700.00 (  4.71%)	 50101400.00 ( 25.83%)	 67545600.00
                        length=32771:  71440100.00 ( -6.51%)	 51263900.00 ( 23.57%)	 67074900.00
                        length=32781:  66993000.00 (  0.34%)	 51108300.00 ( 23.97%)	 67220300.00
                        length=32772:  71443900.00 (-60.50%)	 50062100.00 (-12.47%)	 44512600.00
                        length=32780:  71759100.00 ( -6.58%)	 50263200.00 ( 25.35%)	 67328600.00
                        length=32773:  71714900.00 (-33.21%)	 51076600.00 (  5.12%)	 53835400.00
                        length=32779:  71756900.00 ( -6.56%)	 51290800.00 ( 23.83%)	 67337800.00
                        length=32774:  59689300.00 (-34.55%)	 50068400.00 (-12.86%)	 44363300.00
                        length=32778:  71847500.00 (-18.20%)	 50084100.00 ( 17.61%)	 60786500.00
                        length=32775:  71599300.00 ( -6.54%)	 51278200.00 ( 23.70%)	 67204800.00
                        length=32777:  71862900.00 (-60.85%)	 51094000.00 (-14.36%)	 44677900.00
                        length=65536: 282848000.00 ( -6.60%)	199187000.00 ( 24.93%)	265325000.00
                        length=65552: 243285000.00 (-41.61%)	198512000.00 (-15.54%)	171805000.00
                        length=65537: 255415000.00 (-23.47%)	202499000.00 (  2.11%)	206858000.00
                        length=65551: 280122000.00 (-62.95%)	203349000.00 (-18.29%)	171911000.00
                        length=65538: 283676000.00 (-14.46%)	198368000.00 ( 19.96%)	247848000.00
                        length=65550: 275566000.00 (-51.76%)	198494000.00 ( -9.31%)	181581000.00
                        length=65539: 283699000.00 ( -6.58%)	203453000.00 ( 23.57%)	266195000.00
                        length=65549: 286572000.00 ( -6.65%)	202607000.00 ( 24.60%)	268712000.00
                        length=65540: 283710000.00 ( -6.59%)	199161000.00 ( 25.17%)	266160000.00
                        length=65548: 237573000.00 ( 11.48%)	198462000.00 ( 26.06%)	268395000.00
                        length=65541: 284150000.00 ( -6.58%)	203273000.00 ( 23.75%)	266600000.00
                        length=65547: 286250000.00 ( -6.70%)	202594000.00 ( 24.48%)	268263000.00
                        length=65542: 284167000.00 ( -6.60%)	199122000.00 ( 25.31%)	266584000.00
                        length=65546: 285656000.00 ( -6.59%)	198443000.00 ( 25.95%)	268002000.00
                        length=65543: 284600000.00 ( -6.58%)	203247000.00 ( 23.89%)	267030000.00
                        length=65545: 285665000.00 ( -6.40%)	202575000.00 ( 24.55%)	268472000.00
<snip>

	* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add
	memmove_falkor.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Likewise.
	* sysdeps/aarch64/multiarch/memmove.c: Likewise.
	* sysdeps/aarch64/multiarch/memmove_falkor.S: New file.
2017-10-05 22:20:23 +05:30
Siddhesh Poyarekar
36ada5f681 aarch64: Optimized memcpy for Qualcomm Falkor processor
This is an optimized implementation of the memcpy routine that gives a
significant gain in performance for all sizes of copies on the
Qualcomm Falkor processor.  A detailed rationale of the implementation
is written in a comment in the patch.

This implementation improves time for copies up to 128 bytes by up to
15% and for larger copies by up to 35% in the glibc
microbenchmark. The memcpy-random benchmark sees improvements in all
sizes in the range of 13%-18%.

Here are the full numbers extracted from the glibc microbenchmark
using the commands:

../benchtests/scripts/compare_strings.py benchtests/bench-memcpy.out \
		../benchtests/scripts/benchout_strings.schema.json \
		-base=__memcpy_generic length align1 align2

../benchtests/scripts/compare_strings.py benchtests/bench-memcpy-large.out \
		../benchtests/scripts/benchout_strings.schema.json \
		-base=__memcpy_generic length align1 align2

../benchtests/scripts/compare_strings.py benchtests/bench-memcpy-random.out \
		../benchtests/scripts/benchout_strings.schema.json \
		-base=__memcpy_generic max-size

Function: memcpy
__memcpy_thunderx       __memcpy_falkor __memcpy_generic
Variant: default
================================================================================
length=1,align1=0,align2=0:     33.59 (-115.00%)        15.62 (0.00%)   15.62
length=1,align1=0,align2=0:     16.41 (-10.53%) 14.06 (5.26%)   14.84
length=1,align1=0,align2=0:     14.84 (0.00%)   14.84 (0.00%)   14.84
length=1,align1=0,align2=0:     15.62 (-5.26%)  14.06 (5.26%)   14.84
length=2,align1=0,align2=0:     15.62 (-5.26%)  14.06 (5.26%)   14.84
length=2,align1=1,align2=0:     15.62 (-5.26%)  14.06 (5.26%)   14.84
length=2,align1=0,align2=1:     14.84 (0.00%)   14.06 (5.26%)   14.84
length=2,align1=1,align2=1:     14.84 (-5.56%)  14.06 (0.00%)   14.06
length=4,align1=0,align2=0:     14.06 (0.00%)   14.06 (0.00%)   14.06
length=4,align1=2,align2=0:     14.06 (-5.88%)  14.06 (-5.88%)  13.28
length=4,align1=0,align2=2:     14.06 (0.00%)   14.06 (0.00%)   14.06
length=4,align1=2,align2=2:     14.06 (-5.88%)  14.06 (-5.88%)  13.28
length=8,align1=0,align2=0:     14.84 (-5.56%)  13.28 (5.56%)   14.06
length=8,align1=3,align2=0:     14.06 (0.00%)   13.28 (5.56%)   14.06
length=8,align1=0,align2=3:     13.28 (0.00%)   13.28 (0.00%)   13.28
length=8,align1=3,align2=3:     13.28 (-6.25%)  13.28 (-6.25%)  12.50
length=16,align1=0,align2=0:    13.28 (0.00%)   13.28 (0.00%)   13.28
length=16,align1=4,align2=0:    13.28 (0.00%)   12.50 (5.88%)   13.28
length=16,align1=0,align2=4:    13.28 (0.00%)   13.28 (0.00%)   13.28
length=16,align1=4,align2=4:    13.28 (-6.25%)  12.50 (0.00%)   12.50
length=32,align1=0,align2=0:    14.06 (0.00%)   12.50 (11.11%)  14.06
length=32,align1=5,align2=0:    13.28 (0.00%)   12.50 (5.88%)   13.28
length=32,align1=0,align2=5:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=32,align1=5,align2=5:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=64,align1=0,align2=0:    14.06 (-5.88%)  13.28 (0.00%)   13.28
length=64,align1=6,align2=0:    13.28 (0.00%)   13.28 (0.00%)   13.28
length=64,align1=0,align2=6:    14.06 (5.26%)   14.06 (5.26%)   14.84
length=64,align1=6,align2=6:    14.84 (-11.77%) 14.06 (-5.88%)  13.28
length=128,align1=0,align2=0:   17.19 (-4.76%)  14.84 (9.52%)   16.41
length=128,align1=7,align2=0:   16.41 (4.55%)   15.62 (9.09%)   17.19
length=128,align1=0,align2=7:   16.41 (0.00%)   14.06 (14.29%)  16.41
length=128,align1=7,align2=7:   16.41 (4.55%)   15.62 (9.09%)   17.19
length=256,align1=0,align2=0:   21.88 (-3.70%)  21.09 (0.00%)   21.09
length=256,align1=8,align2=0:   21.09 (-3.85%)  21.09 (-3.85%)  20.31
length=256,align1=0,align2=8:   20.31 (-4.00%)  20.31 (-4.00%)  19.53
length=256,align1=8,align2=8:   21.88 (-7.69%)  20.31 (0.00%)   20.31
length=512,align1=0,align2=0:   28.91 (-2.78%)  28.91 (-2.78%)  28.12
length=512,align1=9,align2=0:   30.47 (-2.63%)  30.47 (-2.63%)  29.69
length=512,align1=0,align2=9:   29.69 (0.00%)   29.69 (0.00%)   29.69
length=512,align1=9,align2=9:   28.12 (-2.86%)  28.12 (-2.86%)  27.34
length=1024,align1=0,align2=0:  44.53 (0.00%)   44.53 (0.00%)   44.53
length=1024,align1=10,align2=0:         50.00 (0.00%)   50.00 (0.00%)   50.00
length=1024,align1=0,align2=10:         49.22 (1.56%)   50.78 (-1.56%)  50.00
length=1024,align1=10,align2=10:        44.53 (-1.79%)  43.75 (0.00%)   43.75
length=2048,align1=0,align2=0:  77.34 (-1.02%)  76.56 (0.00%)   76.56
length=2048,align1=11,align2=0:         89.84 (0.00%)   89.84 (0.00%)   89.84
length=2048,align1=0,align2=11:         89.84 (0.00%)   89.84 (0.00%)   89.84
length=2048,align1=11,align2=11:        75.78 (0.00%)   75.78 (0.00%)   75.78
length=4096,align1=0,align2=0:  141.41 (-0.56%) 140.62 (0.00%)  140.62
length=4096,align1=12,align2=0:         171.09 (-0.46%) 170.31 (0.00%)  170.31
length=4096,align1=0,align2=12:         170.31 (0.00%)  170.31 (0.00%)  170.31
length=4096,align1=12,align2=12:        140.62 (0.00%)  140.62 (0.00%)  140.62
length=8192,align1=0,align2=0:  278.91 (-0.28%) 275.78 (0.84%)  278.12
length=8192,align1=13,align2=0:         338.28 (0.23%)  335.94 (0.92%)  339.06
length=8192,align1=0,align2=13:         338.28 (0.00%)  455.47 (-34.64%)        338.28
length=8192,align1=13,align2=13:        278.12 (-0.28%) 275.78 (0.56%)  277.34
length=16384,align1=0,align2=0:         535.94 (-0.15%) 531.25 (0.73%)  535.16
length=16384,align1=14,align2=0:        659.38 (0.12%)  659.38 (0.12%)  660.16
length=16384,align1=0,align2=14:        659.38 (0.00%)  657.03 (0.36%)  659.38
length=16384,align1=14,align2=14:       535.16 (0.44%)  532.81 (0.87%)  537.50
length=32768,align1=0,align2=0:         1260.94 (10.68%)        1121.88 (20.53%)        1411.72
length=32768,align1=15,align2=0:        1368.75 (10.02%)        1376.56 (9.50%) 1521.09
length=32768,align1=0,align2=15:        1333.59 (10.91%)        1373.44 (8.25%) 1496.88
length=32768,align1=15,align2=15:       1256.25 (13.96%)        1125.78 (22.90%)        1460.16
length=65536,align1=0,align2=0:         2853.91 (30.11%)        2589.06 (36.60%)        4083.59
length=65536,align1=16,align2=0:        2850.00 (30.14%)        2589.84 (36.52%)        4079.69
length=65536,align1=0,align2=16:        2853.12 (30.60%)        2589.84 (37.00%)        4110.94
length=65536,align1=16,align2=16:       2850.78 (30.07%)        2589.06 (36.49%)        4076.56
length=0,align1=0,align2=0:     15.62 (-5.26%)  16.41 (-10.53%) 14.84
length=0,align1=0,align2=0:     14.84 (-5.56%)  14.84 (-5.56%)  14.06
length=0,align1=0,align2=0:     14.84 (0.00%)   14.84 (0.00%)   14.84
length=0,align1=0,align2=0:     16.41 (-16.67%) 14.84 (-5.56%)  14.06
length=1,align1=0,align2=0:     15.62 (4.76%)   15.62 (4.76%)   16.41
length=1,align1=1,align2=0:     15.62 (0.00%)   14.84 (5.00%)   15.62
length=1,align1=0,align2=1:     14.84 (0.00%)   14.84 (0.00%)   14.84
length=1,align1=1,align2=1:     14.84 (0.00%)   14.06 (5.26%)   14.84
length=2,align1=0,align2=0:     14.84 (0.00%)   14.06 (5.26%)   14.84
length=2,align1=2,align2=0:     14.84 (0.00%)   14.06 (5.26%)   14.84
length=2,align1=0,align2=2:     14.84 (-5.56%)  14.06 (0.00%)   14.06
length=2,align1=2,align2=2:     14.84 (0.00%)   14.06 (5.26%)   14.84
length=3,align1=0,align2=0:     14.84 (0.00%)   14.84 (0.00%)   14.84
length=3,align1=3,align2=0:     14.84 (-5.56%)  14.06 (0.00%)   14.06
length=3,align1=0,align2=3:     15.62 (-11.11%) 14.06 (0.00%)   14.06
length=3,align1=3,align2=3:     14.84 (0.00%)   14.06 (5.26%)   14.84
length=4,align1=0,align2=0:     17.97 (-27.78%) 14.06 (0.00%)   14.06
length=4,align1=4,align2=0:     13.28 (5.56%)   14.06 (0.00%)   14.06
length=4,align1=0,align2=4:     14.06 (0.00%)   13.28 (5.56%)   14.06
length=4,align1=4,align2=4:     13.28 (5.56%)   13.28 (5.56%)   14.06
length=5,align1=0,align2=0:     13.28 (5.56%)   13.28 (5.56%)   14.06
length=5,align1=5,align2=0:     14.06 (0.00%)   14.06 (0.00%)   14.06
length=5,align1=0,align2=5:     14.06 (0.00%)   13.28 (5.56%)   14.06
length=5,align1=5,align2=5:     14.06 (-5.88%)  14.06 (-5.88%)  13.28
length=6,align1=0,align2=0:     14.06 (-5.88%)  14.06 (-5.88%)  13.28
length=6,align1=6,align2=0:     14.06 (0.00%)   14.06 (0.00%)   14.06
length=6,align1=0,align2=6:     14.06 (0.00%)   13.28 (5.56%)   14.06
length=6,align1=6,align2=6:     14.06 (0.00%)   13.28 (5.56%)   14.06
length=7,align1=0,align2=0:     14.84 (-11.77%) 14.06 (-5.88%)  13.28
length=7,align1=7,align2=0:     13.28 (0.00%)   14.06 (-5.88%)  13.28
length=7,align1=0,align2=7:     14.06 (0.00%)   14.06 (0.00%)   14.06
length=7,align1=7,align2=7:     14.06 (0.00%)   14.06 (0.00%)   14.06
length=8,align1=0,align2=0:     14.06 (-5.88%)  13.28 (0.00%)   13.28
length=8,align1=8,align2=0:     14.06 (0.00%)   13.28 (5.56%)   14.06
length=8,align1=0,align2=8:     13.28 (0.00%)   13.28 (0.00%)   13.28
length=8,align1=8,align2=8:     14.06 (-5.88%)  13.28 (0.00%)   13.28
length=9,align1=0,align2=0:     13.28 (0.00%)   13.28 (0.00%)   13.28
length=9,align1=9,align2=0:     13.28 (0.00%)   13.28 (0.00%)   13.28
length=9,align1=0,align2=9:     13.28 (0.00%)   14.06 (-5.88%)  13.28
length=9,align1=9,align2=9:     14.06 (-5.88%)  13.28 (0.00%)   13.28
length=10,align1=0,align2=0:    14.06 (0.00%)   13.28 (5.56%)   14.06
length=10,align1=10,align2=0:   14.06 (-5.88%)  14.06 (-5.88%)  13.28
length=10,align1=0,align2=10:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=10,align1=10,align2=10:  14.06 (0.00%)   13.28 (5.56%)   14.06
length=11,align1=0,align2=0:    14.06 (-5.88%)  13.28 (0.00%)   13.28
length=11,align1=11,align2=0:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=11,align1=0,align2=11:   13.28 (0.00%)   13.28 (0.00%)   13.28
length=11,align1=11,align2=11:  13.28 (0.00%)   13.28 (0.00%)   13.28
length=12,align1=0,align2=0:    14.06 (-5.88%)  13.28 (0.00%)   13.28
length=12,align1=12,align2=0:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=12,align1=0,align2=12:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=12,align1=12,align2=12:  14.06 (0.00%)   13.28 (5.56%)   14.06
length=13,align1=0,align2=0:    14.06 (-5.88%)  13.28 (0.00%)   13.28
length=13,align1=13,align2=0:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=13,align1=0,align2=13:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=13,align1=13,align2=13:  13.28 (0.00%)   13.28 (0.00%)   13.28
length=14,align1=0,align2=0:    13.28 (0.00%)   13.28 (0.00%)   13.28
length=14,align1=14,align2=0:   13.28 (5.56%)   13.28 (5.56%)   14.06
length=14,align1=0,align2=14:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=14,align1=14,align2=14:  14.06 (-5.88%)  13.28 (0.00%)   13.28
length=15,align1=0,align2=0:    14.06 (-5.88%)  13.28 (0.00%)   13.28
length=15,align1=15,align2=0:   14.06 (-5.88%)  14.06 (-5.88%)  13.28
length=15,align1=0,align2=15:   13.28 (0.00%)   13.28 (0.00%)   13.28
length=15,align1=15,align2=15:  13.28 (0.00%)   14.06 (-5.88%)  13.28
length=16,align1=0,align2=0:    14.06 (-5.88%)  13.28 (0.00%)   13.28
length=16,align1=16,align2=0:   13.28 (5.56%)   14.06 (0.00%)   14.06
length=16,align1=0,align2=16:   14.84 (-11.77%) 13.28 (0.00%)   13.28
length=16,align1=16,align2=16:  13.28 (-6.25%)  12.50 (0.00%)   12.50
length=17,align1=0,align2=0:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=17,align1=17,align2=0:   14.84 (-11.77%) 12.50 (5.88%)   13.28
length=17,align1=0,align2=17:   14.84 (-5.56%)  12.50 (11.11%)  14.06
length=17,align1=17,align2=17:  14.84 (-11.77%) 12.50 (5.88%)   13.28
length=18,align1=0,align2=0:    14.06 (0.00%)   12.50 (11.11%)  14.06
length=18,align1=18,align2=0:   13.28 (5.56%)   12.50 (11.11%)  14.06
length=18,align1=0,align2=18:   14.06 (-5.88%)  12.50 (5.88%)   13.28
length=18,align1=18,align2=18:  14.06 (0.00%)   12.50 (11.11%)  14.06
length=19,align1=0,align2=0:    14.06 (-5.88%)  13.28 (0.00%)   13.28
length=19,align1=19,align2=0:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=19,align1=0,align2=19:   14.84 (-5.56%)  12.50 (11.11%)  14.06
length=19,align1=19,align2=19:  14.06 (-5.88%)  12.50 (5.88%)   13.28
length=20,align1=0,align2=0:    14.84 (-11.77%) 12.50 (5.88%)   13.28
length=20,align1=20,align2=0:   14.06 (0.00%)   12.50 (11.11%)  14.06
length=20,align1=0,align2=20:   14.06 (-5.88%)  12.50 (5.88%)   13.28
length=20,align1=20,align2=20:  14.06 (0.00%)   13.28 (5.56%)   14.06
length=21,align1=0,align2=0:    14.84 (-5.56%)  12.50 (11.11%)  14.06
length=21,align1=21,align2=0:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=21,align1=0,align2=21:   14.84 (-11.77%) 12.50 (5.88%)   13.28
length=21,align1=21,align2=21:  13.28 (5.56%)   13.28 (5.56%)   14.06
length=22,align1=0,align2=0:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=22,align1=22,align2=0:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=22,align1=0,align2=22:   14.06 (0.00%)   12.50 (11.11%)  14.06
length=22,align1=22,align2=22:  14.06 (0.00%)   12.50 (11.11%)  14.06
length=23,align1=0,align2=0:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=23,align1=23,align2=0:   14.06 (-5.88%)  13.28 (0.00%)   13.28
length=23,align1=0,align2=23:   14.06 (-5.88%)  12.50 (5.88%)   13.28
length=23,align1=23,align2=23:  14.06 (-5.88%)  13.28 (0.00%)   13.28
length=24,align1=0,align2=0:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=24,align1=24,align2=0:   14.06 (0.00%)   13.28 (5.56%)   14.06
length=24,align1=0,align2=24:   14.84 (-11.77%) 12.50 (5.88%)   13.28
length=24,align1=24,align2=24:  14.06 (-5.88%)  13.28 (0.00%)   13.28
length=25,align1=0,align2=0:    14.06 (0.00%)   12.50 (11.11%)  14.06
length=25,align1=25,align2=0:   14.06 (0.00%)   13.28 (5.56%)   14.06
length=25,align1=0,align2=25:   14.06 (0.00%)   12.50 (11.11%)  14.06
length=25,align1=25,align2=25:  13.28 (0.00%)   13.28 (0.00%)   13.28
length=26,align1=0,align2=0:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=26,align1=26,align2=0:   14.06 (0.00%)   13.28 (5.56%)   14.06
length=26,align1=0,align2=26:   14.06 (-5.88%)  12.50 (5.88%)   13.28
length=26,align1=26,align2=26:  14.06 (0.00%)   13.28 (5.56%)   14.06
length=27,align1=0,align2=0:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=27,align1=27,align2=0:   14.06 (-5.88%)  12.50 (5.88%)   13.28
length=27,align1=0,align2=27:   14.06 (-5.88%)  12.50 (5.88%)   13.28
length=27,align1=27,align2=27:  14.06 (0.00%)   12.50 (11.11%)  14.06
length=28,align1=0,align2=0:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=28,align1=28,align2=0:   14.06 (0.00%)   12.50 (11.11%)  14.06
length=28,align1=0,align2=28:   14.06 (0.00%)   12.50 (11.11%)  14.06
length=28,align1=28,align2=28:  14.84 (-11.77%) 13.28 (0.00%)   13.28
length=29,align1=0,align2=0:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=29,align1=29,align2=0:   13.28 (0.00%)   12.50 (5.88%)   13.28
length=29,align1=0,align2=29:   14.06 (0.00%)   12.50 (11.11%)  14.06
length=29,align1=29,align2=29:  13.28 (5.56%)   12.50 (11.11%)  14.06
length=30,align1=0,align2=0:    14.06 (-5.88%)  12.50 (5.88%)   13.28
length=30,align1=30,align2=0:   13.28 (5.56%)   12.50 (11.11%)  14.06
length=30,align1=0,align2=30:   14.06 (-5.88%)  12.50 (5.88%)   13.28
length=30,align1=30,align2=30:  13.28 (0.00%)   12.50 (5.88%)   13.28
length=31,align1=0,align2=0:    13.28 (0.00%)   12.50 (5.88%)   13.28
length=31,align1=31,align2=0:   14.06 (0.00%)   12.50 (11.11%)  14.06
length=31,align1=0,align2=31:   13.28 (0.00%)   12.50 (5.88%)   13.28
length=31,align1=31,align2=31:  14.06 (0.00%)   12.50 (11.11%)  14.06
length=48,align1=0,align2=0:    14.06 (0.00%)   14.06 (0.00%)   14.06
length=48,align1=3,align2=0:    14.06 (0.00%)   14.06 (0.00%)   14.06
length=48,align1=0,align2=3:    14.06 (-5.88%)  14.06 (-5.88%)  13.28
length=48,align1=3,align2=3:    13.28 (5.56%)   14.06 (0.00%)   14.06
length=80,align1=0,align2=0:    15.62 (-11.11%) 14.84 (-5.56%)  14.06
length=80,align1=5,align2=0:    15.62 (-11.11%) 16.41 (-16.67%) 14.06
length=80,align1=0,align2=5:    14.06 (0.00%)   15.62 (-11.11%) 14.06
length=80,align1=5,align2=5:    15.62 (-5.26%)  17.19 (-15.79%) 14.84
length=96,align1=0,align2=0:    14.06 (0.00%)   14.84 (-5.56%)  14.06
length=96,align1=6,align2=0:    14.84 (-5.56%)  16.41 (-16.67%) 14.06
length=96,align1=0,align2=6:    14.06 (0.00%)   14.84 (-5.56%)  14.06
length=96,align1=6,align2=6:    14.84 (-5.56%)  17.19 (-22.22%) 14.06
length=112,align1=0,align2=0:   17.19 (-4.76%)  14.06 (14.29%)  16.41
length=112,align1=7,align2=0:   17.19 (0.00%)   16.41 (4.55%)   17.19
length=112,align1=0,align2=7:   16.41 (0.00%)   14.84 (9.52%)   16.41
length=112,align1=7,align2=7:   17.19 (0.00%)   17.19 (0.00%)   17.19
length=144,align1=0,align2=0:   17.19 (-10.00%) 17.97 (-15.00%) 15.62
length=144,align1=9,align2=0:   17.19 (-4.76%)  18.75 (-14.29%) 16.41
length=144,align1=0,align2=9:   20.31 (-8.33%)  18.75 (0.00%)   18.75
length=144,align1=9,align2=9:   18.75 (-4.35%)  18.75 (-4.35%)  17.97
length=160,align1=0,align2=0:   18.75 (-4.35%)  17.97 (0.00%)   17.97
length=160,align1=10,align2=0:  18.75 (4.00%)   18.75 (4.00%)   19.53
length=160,align1=0,align2=10:  19.53 (-4.17%)  17.97 (4.17%)   18.75
length=160,align1=10,align2=10:         18.75 (-4.35%)  18.75 (-4.35%)  17.97
length=176,align1=0,align2=0:   18.75 (-4.35%)  17.19 (4.35%)   17.97
length=176,align1=11,align2=0:  19.53 (0.00%)   19.53 (0.00%)   19.53
length=176,align1=0,align2=11:  19.53 (-4.17%)  18.75 (0.00%)   18.75
length=176,align1=11,align2=11:         18.75 (0.00%)   17.97 (4.17%)   18.75
length=192,align1=0,align2=0:   18.75 (0.00%)   17.97 (4.17%)   18.75
length=192,align1=12,align2=0:  21.09 (-8.00%)  18.75 (4.00%)   19.53
length=192,align1=0,align2=12:  18.75 (0.00%)   18.75 (0.00%)   18.75
length=192,align1=12,align2=12:         18.75 (0.00%)   17.97 (4.17%)   18.75
length=208,align1=0,align2=0:   17.97 (0.00%)   20.31 (-13.04%) 17.97
length=208,align1=13,align2=0:  19.53 (7.41%)   21.09 (0.00%)   21.09
length=208,align1=0,align2=13:  23.44 (-11.11%) 21.09 (0.00%)   21.09
length=208,align1=13,align2=13:         21.09 (-3.85%)  21.09 (-3.85%)  20.31
length=224,align1=0,align2=0:   21.09 (-8.00%)  20.31 (-4.00%)  19.53
length=224,align1=14,align2=0:  23.44 (-11.11%) 20.31 (3.70%)   21.09
length=224,align1=0,align2=14:  21.09 (3.57%)   20.31 (7.14%)   21.88
length=224,align1=14,align2=14:         20.31 (0.00%)   19.53 (3.85%)   20.31
length=240,align1=0,align2=0:   20.31 (-4.00%)  19.53 (0.00%)   19.53
length=240,align1=15,align2=0:  22.66 (0.00%)   20.31 (10.34%)  22.66
length=240,align1=0,align2=15:  20.31 (-4.00%)  20.31 (-4.00%)  19.53
length=240,align1=15,align2=15:         21.88 (0.00%)   21.09 (3.57%)   21.88
length=272,align1=0,align2=0:   20.31 (0.00%)   28.12 (-38.46%) 20.31
length=272,align1=17,align2=0:  22.66 (0.00%)   27.34 (-20.69%) 22.66
length=272,align1=0,align2=17:  25.78 (-10.00%) 28.12 (-20.00%) 23.44
length=272,align1=17,align2=17:         22.66 (-3.57%)  27.34 (-25.00%) 21.88
length=288,align1=0,align2=0:   23.44 (-7.14%)  27.34 (-25.00%) 21.88
length=288,align1=18,align2=0:  22.66 (0.00%)   27.34 (-20.69%) 22.66
length=288,align1=0,align2=18:  23.44 (-3.45%)  25.00 (-10.35%) 22.66
length=288,align1=18,align2=18:         22.66 (-3.57%)  21.88 (0.00%)   21.88
length=304,align1=0,align2=0:   21.88 (0.00%)   21.88 (0.00%)   21.88
length=304,align1=19,align2=0:  23.44 (-3.45%)  22.66 (0.00%)   22.66
length=304,align1=0,align2=19:  22.66 (0.00%)   22.66 (0.00%)   22.66
length=304,align1=19,align2=19:         22.66 (-3.57%)  21.88 (0.00%)   21.88
length=320,align1=0,align2=0:   22.66 (-3.57%)  21.88 (0.00%)   21.88
length=320,align1=20,align2=0:  22.66 (0.00%)   22.66 (0.00%)   22.66
length=320,align1=0,align2=20:  22.66 (0.00%)   22.66 (0.00%)   22.66
length=320,align1=20,align2=20:         22.66 (-3.57%)  21.88 (0.00%)   21.88
length=336,align1=0,align2=0:   21.88 (0.00%)   24.22 (-10.71%) 21.88
length=336,align1=21,align2=0:  22.66 (0.00%)   25.00 (-10.35%) 22.66
length=336,align1=0,align2=21:  25.78 (0.00%)   25.00 (3.03%)   25.78
length=336,align1=21,align2=21:         25.00 (0.00%)   23.44 (6.25%)   25.00
length=352,align1=0,align2=0:   24.22 (0.00%)   24.22 (0.00%)   24.22
length=352,align1=22,align2=0:  25.00 (0.00%)   25.00 (0.00%)   25.00
length=352,align1=0,align2=22:  25.00 (-3.23%)  25.00 (-3.23%)  24.22
length=352,align1=22,align2=22:         25.00 (-3.23%)  24.22 (0.00%)   24.22
length=368,align1=0,align2=0:   25.00 (-3.23%)  23.44 (3.23%)   24.22
length=368,align1=23,align2=0:  25.00 (0.00%)   24.22 (3.12%)   25.00
length=368,align1=0,align2=23:  25.00 (-3.23%)  25.00 (-3.23%)  24.22
length=368,align1=23,align2=23:         25.00 (-6.67%)  23.44 (0.00%)   23.44
length=384,align1=0,align2=0:   24.22 (0.00%)   24.22 (0.00%)   24.22
length=384,align1=24,align2=0:  25.00 (0.00%)   24.22 (3.12%)   25.00
length=384,align1=0,align2=24:  25.00 (0.00%)   25.78 (-3.12%)  25.00
length=384,align1=24,align2=24:         24.22 (-3.33%)  23.44 (0.00%)   23.44
length=400,align1=0,align2=0:   25.00 (-3.23%)  26.56 (-9.68%)  24.22
length=400,align1=25,align2=0:  25.78 (-3.12%)  27.34 (-9.38%)  25.00
length=400,align1=0,align2=25:  27.34 (0.00%)   27.34 (0.00%)   27.34
length=400,align1=25,align2=25:         26.56 (0.00%)   25.78 (2.94%)   26.56
length=416,align1=0,align2=0:   26.56 (-3.03%)  25.78 (0.00%)   25.78
length=416,align1=26,align2=0:  28.12 (-2.86%)  27.34 (0.00%)   27.34
length=416,align1=0,align2=26:  27.34 (-2.94%)  28.12 (-5.88%)  26.56
length=416,align1=26,align2=26:         25.78 (0.00%)   26.56 (-3.03%)  25.78
length=432,align1=0,align2=0:   27.34 (-2.94%)  25.78 (2.94%)   26.56
length=432,align1=27,align2=0:  28.12 (-2.86%)  27.34 (0.00%)   27.34
length=432,align1=0,align2=27:  27.34 (0.00%)   28.12 (-2.86%)  27.34
length=432,align1=27,align2=27:         25.78 (0.00%)   25.78 (0.00%)   25.78
length=448,align1=0,align2=0:   26.56 (-3.03%)  25.78 (0.00%)   25.78
length=448,align1=28,align2=0:  27.34 (0.00%)   27.34 (0.00%)   27.34
length=448,align1=0,align2=28:  27.34 (0.00%)   28.12 (-2.86%)  27.34
length=448,align1=28,align2=28:         25.78 (0.00%)   25.78 (0.00%)   25.78
length=464,align1=0,align2=0:   25.78 (0.00%)   28.12 (-9.09%)  25.78
length=464,align1=29,align2=0:  28.12 (-2.86%)  29.69 (-8.57%)  27.34
length=464,align1=0,align2=29:  30.47 (0.00%)   30.47 (0.00%)   30.47
length=464,align1=29,align2=29:         28.12 (0.00%)   27.34 (2.78%)   28.12
length=480,align1=0,align2=0:   29.69 (-5.56%)  28.12 (0.00%)   28.12
length=480,align1=30,align2=0:  31.25 (-2.56%)  29.69 (2.56%)   30.47
length=480,align1=0,align2=30:  29.69 (0.00%)   30.47 (-2.63%)  29.69
length=480,align1=30,align2=30:         28.12 (0.00%)   28.12 (0.00%)   28.12
length=496,align1=0,align2=0:   28.12 (0.00%)   27.34 (2.78%)   28.12
length=496,align1=31,align2=0:  30.47 (-2.63%)  29.69 (0.00%)   29.69
length=496,align1=0,align2=31:  29.69 (0.00%)   30.47 (-2.63%)  29.69
length=496,align1=31,align2=31:         28.12 (-2.86%)  28.12 (-2.86%)  27.34
length=1024,align1=0,align2=0:  44.53 (0.00%)   44.53 (0.00%)   44.53
length=1024,align1=32,align2=0:         44.53 (-1.79%)  44.53 (-1.79%)  43.75
length=1024,align1=0,align2=32:         44.53 (-1.79%)  43.75 (0.00%)   43.75
length=1024,align1=32,align2=32:        43.75 (1.75%)   43.75 (1.75%)   44.53
length=1056,align1=0,align2=0:  46.88 (-1.69%)  46.88 (-1.69%)  46.09
length=1056,align1=33,align2=0:         53.12 (0.00%)   52.34 (1.47%)   53.12
length=1056,align1=0,align2=33:         52.34 (0.00%)   53.12 (-1.49%)  52.34
length=1056,align1=33,align2=33:        46.09 (0.00%)   46.88 (-1.69%)  46.09
length=1088,align1=0,align2=0:  46.88 (-1.69%)  46.09 (0.00%)   46.09
length=1088,align1=34,align2=0:         52.34 (0.00%)   52.34 (0.00%)   52.34
length=1088,align1=0,align2=34:         53.12 (-3.03%)  53.12 (-3.03%)  51.56
length=1088,align1=34,align2=34:        46.09 (0.00%)   46.88 (-1.69%)  46.09
length=1120,align1=0,align2=0:  49.22 (-1.61%)  48.44 (0.00%)   48.44
length=1120,align1=35,align2=0:         54.69 (1.41%)   55.47 (0.00%)   55.47
length=1120,align1=0,align2=35:         57.03 (0.00%)   55.47 (2.74%)   57.03
length=1120,align1=35,align2=35:        48.44 (0.00%)   49.22 (-1.61%)  48.44
length=1152,align1=0,align2=0:  47.66 (1.61%)   48.44 (0.00%)   48.44
length=1152,align1=36,align2=0:         55.47 (-1.43%)  55.47 (-1.43%)  54.69
length=1152,align1=0,align2=36:         58.59 (-1.35%)  55.47 (4.05%)   57.81
length=1152,align1=36,align2=36:        48.44 (0.00%)   49.22 (-1.61%)  48.44
length=1184,align1=0,align2=0:  53.12 (-3.03%)  50.78 (1.52%)   51.56
length=1184,align1=37,align2=0:         61.72 (-2.60%)  57.03 (5.19%)   60.16
length=1184,align1=0,align2=37:         62.50 (-1.27%)  57.03 (7.60%)   61.72
length=1184,align1=37,align2=37:        53.12 (-1.49%)  50.78 (2.99%)   52.34
length=1216,align1=0,align2=0:  53.91 (-4.55%)  50.78 (1.52%)   51.56
length=1216,align1=38,align2=0:         60.94 (0.00%)   57.03 (6.41%)   60.94
length=1216,align1=0,align2=38:         60.16 (0.00%)   57.81 (3.90%)   60.16
length=1216,align1=38,align2=38:        52.34 (-1.52%)  50.00 (3.03%)   51.56
length=1248,align1=0,align2=0:  54.69 (-2.94%)  53.12 (0.00%)   53.12
length=1248,align1=39,align2=0:         64.06 (-1.23%)  60.16 (4.94%)   63.28
length=1248,align1=0,align2=39:         60.94 (-2.63%)  60.16 (-1.32%)  59.38
length=1248,align1=39,align2=39:        53.12 (0.00%)   52.34 (1.47%)   53.12
length=1280,align1=0,align2=0:  52.34 (-1.52%)  52.34 (-1.52%)  51.56
length=1280,align1=40,align2=0:         61.72 (3.66%)   59.38 (7.32%)   64.06
length=1280,align1=0,align2=40:         60.94 (-2.63%)  60.16 (-1.32%)  59.38
length=1280,align1=40,align2=40:        52.34 (-1.52%)  52.34 (-1.52%)  51.56
length=1312,align1=0,align2=0:  54.69 (-1.45%)  55.47 (-2.90%)  53.91
length=1312,align1=41,align2=0:         63.28 (0.00%)   62.50 (1.23%)   63.28
length=1312,align1=0,align2=41:         62.50 (0.00%)   62.50 (0.00%)   62.50
length=1312,align1=41,align2=41:        53.91 (0.00%)   54.69 (-1.45%)  53.91
length=1344,align1=0,align2=0:  54.69 (0.00%)   54.69 (0.00%)   54.69
length=1344,align1=42,align2=0:         62.50 (0.00%)   62.50 (0.00%)   62.50
length=1344,align1=0,align2=42:         62.50 (-1.27%)  62.50 (-1.27%)  61.72
length=1344,align1=42,align2=42:        53.91 (0.00%)   53.91 (0.00%)   53.91
length=1376,align1=0,align2=0:  65.62 (-16.67%) 68.75 (-22.22%) 56.25
length=1376,align1=43,align2=0:         71.88 (-9.52%)  73.44 (-11.90%) 65.62
length=1376,align1=0,align2=43:         72.66 (-12.05%) 74.22 (-14.46%) 64.84
length=1376,align1=43,align2=43:        64.06 (-13.89%) 67.97 (-20.83%) 56.25
length=1408,align1=0,align2=0:  57.03 (-1.39%)  68.75 (-22.22%) 56.25
length=1408,align1=44,align2=0:         65.62 (-1.20%)  73.44 (-13.25%) 64.84
length=1408,align1=0,align2=44:         64.84 (0.00%)   74.22 (-14.46%) 64.84
length=1408,align1=44,align2=44:        56.25 (-1.41%)  68.75 (-23.94%) 55.47
length=1440,align1=0,align2=0:  67.97 (-14.47%) 64.84 (-9.21%)  59.38
length=1440,align1=45,align2=0:         74.22 (-10.47%) 68.75 (-2.33%)  67.19
length=1440,align1=0,align2=45:         72.66 (-6.90%)  69.53 (-2.30%)  67.97
length=1440,align1=45,align2=45:        65.62 (-13.51%) 58.59 (-1.35%)  57.81
length=1472,align1=0,align2=0:  66.41 (-14.86%) 58.59 (-1.35%)  57.81
length=1472,align1=46,align2=0:         73.44 (-9.30%)  67.19 (0.00%)   67.19
length=1472,align1=0,align2=46:         70.31 (-4.65%)  67.97 (-1.16%)  67.19
length=1472,align1=46,align2=46:        57.81 (0.00%)   58.59 (-1.35%)  57.81
length=1504,align1=0,align2=0:  60.94 (0.00%)   60.94 (0.00%)   60.94
length=1504,align1=47,align2=0:         71.09 (-1.11%)  70.31 (0.00%)   70.31
length=1504,align1=0,align2=47:         70.31 (-1.12%)  70.31 (-1.12%)  69.53
length=1504,align1=47,align2=47:        60.94 (-1.30%)  60.16 (0.00%)   60.16
length=1536,align1=0,align2=0:  62.50 (-3.90%)  60.16 (0.00%)   60.16
length=1536,align1=48,align2=0:         60.94 (-1.30%)  60.16 (0.00%)   60.16
length=1536,align1=0,align2=48:         61.72 (-3.95%)  60.16 (-1.32%)  59.38
length=1536,align1=48,align2=48:        60.94 (-1.30%)  60.16 (0.00%)   60.16
length=1568,align1=0,align2=0:  80.47 (-27.16%) 63.28 (0.00%)   63.28
length=1568,align1=49,align2=0:         86.72 (-18.09%) 72.66 (1.06%)   73.44
length=1568,align1=0,align2=49:         74.22 (-3.26%)  74.22 (-3.26%)  71.88
length=1568,align1=49,align2=49:        62.50 (0.00%)   61.72 (1.25%)   62.50
length=1600,align1=0,align2=0:  62.50 (-1.27%)  62.50 (-1.27%)  61.72
length=1600,align1=50,align2=0:         73.44 (0.00%)   71.88 (2.13%)   73.44
length=1600,align1=0,align2=50:         72.66 (0.00%)   73.44 (-1.08%)  72.66
length=1600,align1=50,align2=50:        62.50 (-1.27%)  62.50 (-1.27%)  61.72
length=1632,align1=0,align2=0:  64.84 (0.00%)   64.84 (0.00%)   64.84
length=1632,align1=51,align2=0:         75.78 (0.00%)   75.00 (1.03%)   75.78
length=1632,align1=0,align2=51:         78.91 (0.00%)   75.78 (3.96%)   78.91
length=1632,align1=51,align2=51:        64.84 (-2.47%)  64.84 (-2.47%)  63.28
length=1664,align1=0,align2=0:  64.84 (-1.22%)  64.84 (-1.22%)  64.06
length=1664,align1=52,align2=0:         75.78 (0.00%)   75.00 (1.03%)   75.78
length=1664,align1=0,align2=52:         80.47 (-0.98%)  75.78 (4.90%)   79.69
length=1664,align1=52,align2=52:        64.06 (-1.23%)  65.62 (-3.70%)  63.28
length=1696,align1=0,align2=0:  69.53 (-3.49%)  72.66 (-8.14%)  67.19
length=1696,align1=53,align2=0:         80.47 (-0.98%)  82.03 (-2.94%)  79.69
length=1696,align1=0,align2=53:         80.47 (0.96%)   82.03 (-0.96%)  81.25
length=1696,align1=53,align2=53:        68.75 (-2.33%)  72.66 (-8.14%)  67.19
length=1728,align1=0,align2=0:  67.97 (0.00%)   72.66 (-6.90%)  67.97
length=1728,align1=54,align2=0:         80.47 (-0.98%)  82.81 (-3.92%)  79.69
length=1728,align1=0,align2=54:         78.91 (-1.00%)  82.03 (-5.00%)  78.12
length=1728,align1=54,align2=54:        68.75 (0.00%)   72.66 (-5.68%)  68.75
length=1760,align1=0,align2=0:  77.34 (-12.50%) 68.75 (0.00%)   68.75
length=1760,align1=55,align2=0:         91.41 (-8.33%)  79.69 (5.56%)   84.38
length=1760,align1=0,align2=55:         88.28 (-10.78%) 80.47 (-0.98%)  79.69
length=1760,align1=55,align2=55:        77.34 (-11.24%) 68.75 (1.12%)   69.53
length=1792,align1=0,align2=0:  78.12 (-14.94%) 68.75 (-1.15%)  67.97
length=1792,align1=56,align2=0:         88.28 (-4.63%)  79.69 (5.56%)   84.38
length=1792,align1=0,align2=56:         88.28 (-9.71%)  80.47 (0.00%)   80.47
length=1792,align1=56,align2=56:        77.34 (-11.24%) 68.75 (1.12%)   69.53
length=1824,align1=0,align2=0:  72.66 (7.92%)   70.31 (10.89%)  78.91
length=1824,align1=57,align2=0:         85.94 (5.17%)   82.03 (9.48%)   90.62
length=1824,align1=0,align2=57:         82.03 (3.67%)   82.81 (2.75%)   85.16
length=1824,align1=57,align2=57:        70.31 (-1.12%)  70.31 (-1.12%)  69.53
length=1856,align1=0,align2=0:  70.31 (-1.12%)  70.31 (-1.12%)  69.53
length=1856,align1=58,align2=0:         83.59 (-0.94%)  82.03 (0.94%)   82.81
length=1856,align1=0,align2=58:         178.12 (-115.09%)       82.81 (0.00%)   82.81
length=1856,align1=58,align2=58:        70.31 (-1.12%)  70.31 (-1.12%)  69.53
length=1888,align1=0,align2=0:  73.44 (-1.08%)  78.91 (-8.60%)  72.66
length=1888,align1=59,align2=0:         85.94 (0.00%)   89.84 (-4.55%)  85.94
length=1888,align1=0,align2=59:         84.38 (0.00%)   89.06 (-5.56%)  84.38
length=1888,align1=59,align2=59:        72.66 (-1.09%)  78.12 (-8.70%)  71.88
length=1920,align1=0,align2=0:  72.66 (-1.09%)  78.12 (-8.70%)  71.88
length=1920,align1=60,align2=0:         85.94 (0.00%)   89.84 (-4.55%)  85.94
length=1920,align1=0,align2=60:         85.16 (0.00%)   89.06 (-4.59%)  85.16
length=1920,align1=60,align2=60:        72.66 (-1.09%)  78.91 (-9.78%)  71.88
length=1952,align1=0,align2=0:  75.00 (-1.05%)  75.00 (-1.05%)  74.22
length=1952,align1=61,align2=0:         88.28 (0.00%)   87.50 (0.88%)   88.28
length=1952,align1=0,align2=61:         87.50 (0.00%)   88.28 (-0.89%)  87.50
length=1952,align1=61,align2=61:        74.22 (0.00%)   74.22 (0.00%)   74.22
length=1984,align1=0,align2=0:  75.00 (-1.05%)  73.44 (1.05%)   74.22
length=1984,align1=62,align2=0:         89.06 (-0.89%)  87.50 (0.88%)   88.28
length=1984,align1=0,align2=62:         87.50 (0.00%)   88.28 (-0.89%)  87.50
length=1984,align1=62,align2=62:        74.22 (0.00%)   74.22 (0.00%)   74.22
length=2016,align1=0,align2=0:  77.34 (-1.02%)  76.56 (0.00%)   76.56
length=2016,align1=63,align2=0:         91.41 (-0.86%)  90.62 (0.00%)   90.62
length=2016,align1=0,align2=63:         89.84 (0.00%)   90.62 (-0.87%)  89.84
length=2016,align1=63,align2=63:        77.34 (-1.02%)  76.56 (0.00%)   76.56
length=4096,align1=0,align2=0:  141.41 (-0.56%) 146.88 (-4.44%) 140.62

Function: memcpy
__memcpy_thunderx       __memcpy_falkor __memcpy_generic
Variant: large
================================================================================
length=65543,align1=0,align2=0:         4018.75 (3.09%) 2634.38 (36.47%)        4146.88
length=65551,align1=0,align2=3:         4425.00 (-6.47%)        3134.38 (24.59%)        4156.25
length=65567,align1=3,align2=0:         2909.38 (29.95%)        3134.38 (24.53%)        4153.12
length=65599,align1=3,align2=5:         4415.62 (-6.16%)        3134.38 (24.64%)        4159.38
length=131079,align1=0,align2=0:        5765.62 (30.38%)        5240.62 (36.72%)        8281.25
length=131087,align1=0,align2=3:        8831.25 (-6.56%)        6271.88 (24.32%)        8287.50
length=131103,align1=3,align2=0:        5793.75 (29.05%)        6268.75 (23.23%)        8165.62
length=131135,align1=3,align2=5:        5806.25 (29.97%)        6259.38 (24.50%)        8290.62
length=262151,align1=0,align2=0:        11850.00 (28.91%)       10762.50 (35.43%)       16668.80
length=262159,align1=0,align2=3:        12043.80 (27.72%)       12700.00 (23.78%)       16662.50
length=262175,align1=3,align2=0:        12046.90 (27.90%)       12687.50 (24.07%)       16709.40
length=262207,align1=3,align2=5:        11984.40 (28.08%)       12678.10 (23.91%)       16662.50
length=524295,align1=0,align2=0:        24825.00 (25.00%)       24268.80 (27.34%)       33400.00
length=524303,align1=0,align2=3:        35731.20 (-6.53%)       25678.10 (23.44%)       33540.60
length=524319,align1=3,align2=0:        25893.80 (22.71%)       25725.00 (23.22%)       33503.10
length=524351,align1=3,align2=5:        25887.50 (22.86%)       25690.60 (23.45%)       33559.40
length=1048583,align1=0,align2=0:       50621.90 (0.30%)        50600.00 (0.34%)        50771.90
length=1048591,align1=0,align2=3:       53206.20 (0.54%)        51081.20 (4.51%)        53493.80
length=1048607,align1=3,align2=0:       53221.90 (0.32%)        51975.00 (2.66%)        53393.80
length=1048639,align1=3,align2=5:       53240.60 (0.36%)        51953.10 (2.77%)        53431.20
length=2097159,align1=0,align2=0:       103744.00 (-2.00%)      102447.00 (-1.00%)      102425.00
length=2097167,align1=0,align2=3:       108588.00 (-1.00%)      105159.00 (2.00%)       107606.00
length=2097183,align1=3,align2=0:       107678.00 (0.00%)       105250.00 (2.00%)       108125.00
length=2097215,align1=3,align2=5:       107906.00 (1.00%)       105841.00 (3.00%)       109475.00
length=4194311,align1=0,align2=0:       202994.00 (0.00%)       202500.00 (1.00%)       204809.00
length=4194319,align1=0,align2=3:       213350.00 (0.00%)       205997.00 (3.00%)       213384.00
length=4194335,align1=3,align2=0:       212653.00 (0.00%)       206444.00 (3.00%)       212900.00
length=4194367,align1=3,align2=5:       213044.00 (0.00%)       206084.00 (3.00%)       213847.00
length=8388615,align1=0,align2=0:       401294.00 (0.00%)       401231.00 (0.00%)       401944.00
length=8388623,align1=0,align2=3:       480872.00 (-14.00%)     406444.00 (3.00%)       422900.00
length=8388639,align1=3,align2=0:       422147.00 (0.00%)       407750.00 (3.00%)       422803.00
length=8388671,align1=3,align2=5:       442003.00 (-5.00%)      407125.00 (3.00%)       423509.00
length=16777223,align1=0,align2=0:      799809.00 (0.00%)       800000.00 (0.00%)       801756.00
length=16777231,align1=0,align2=3:      841184.00 (0.00%)       808525.00 (4.00%)       843775.00
length=16777247,align1=3,align2=0:      841166.00 (0.00%)       810147.00 (3.00%)       843147.00
length=16777279,align1=3,align2=5:      972569.00 (-16.00%)     808588.00 (4.00%)       843731.00
length=33554439,align1=0,align2=0:      1842240.00 (-0.01%)     1863590.00 (-1.17%)     1841990.00
length=33554447,align1=0,align2=3:      2103470.00 (-2.74%)     1919460.00 (6.25%)      2047440.00
length=33554463,align1=3,align2=0:      2075690.00 (-1.07%)     1930040.00 (6.02%)      2053720.00
length=33554495,align1=3,align2=5:      2110590.00 (-2.82%)     1924440.00 (6.25%)      2052650.00

Function: memcpy
__memcpy_thunderx       __memcpy_falkor __memcpy_generic
Variant: random
================================================================================
max-size=4096:  44061.90 (5.85%)        38568.20 (17.59%)       46799.90
max-size=8192:  42790.90 (5.27%)        38158.90 (15.52%)       45171.50
max-size=16384:         44912.10 (2.25%)        38710.40 (15.75%)       45945.00
max-size=32768:         43577.90 (1.23%)        37975.10 (13.93%)       44120.00
max-size=65536:         44375.50 (1.04%)        38474.20 (14.20%)       44840.60

	* manual/tunables.texi (Tunable glibc.tune.cpu): Add falkor.
	* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add
	memcpy_falkor.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c (MAX_IFUNC):
	Bump.
	(__libc_ifunc_impl_list): Add __memcpy_falkor.
	* sysdeps/aarch64/multiarch/memcpy.c: Likewise.
	* sysdeps/aarch64/multiarch/memcpy_falkor.S: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list):
	Add falkor.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_FALKOR):
	New macro.
2017-08-09 06:32:17 +05:30
Siddhesh Poyarekar
ab85da1530 aarch64: Call all string function implementations in tests
The string function implementations implemented so far do not use any
instructions that may deviate from standard aarch64, so it is possible
for all routines to run on all armv8 hardware.  Select all
implementations in the benchmarks and tests.

	* sysdeps/aarch64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Unconditionally select thunderx
	routine for testing.
2017-06-30 22:57:12 +05:30
Steve Ellcey
6a2c695266 aarch64: Thunderx specific memcpy and memmove
* sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros.
	(memmove): Use MEMMOVE for name.
	(memcpy): Use MEMCPY for name.  Change internal labels
	to external labels.
	* sysdeps/aarch64/multiarch/Makefile: New file.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise.
	* sysdeps/aarch64/multiarch/init-arch.h: Likewise.
	* sysdeps/aarch64/multiarch/memcpy.c: Likewise.
	* sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise.
	* sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise.
	* sysdeps/aarch64/multiarch/memmove.c: Likewise.
2017-05-24 16:46:48 -07:00