mirror of
https://sourceware.org/git/glibc.git
synced 2024-12-14 07:10:05 +00:00
2f1f7a5f8a
23 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Andrew Pinski
|
2f1f7a5f8a
|
Aarch64: Add new memset for Qualcomm's oryon-1 core
Qualcom's new core, oryon-1, has a different characteristics for memset than the current versions of memset. For non-zero, larger sizes, using GPRs rather than the SIMD stores is ~30% faster. For even larger sizes, using the nontemporal stores is needed not to polute the L1/L2 caches. For zero values, using `dc zva` should be used. Since we know the size will always be 64 bytes, we don't need to figure out the size there. I started with the emag memset and added back the `dc zva` code. Changes since v1: * v3: Fix comment formating Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> |
||
Andrew Pinski
|
4dc83cac78
|
Aarch64: Add memcpy for qualcomm's oryon-1 core
Qualcomm's new core (oryon-1) has a different performance characteristic than other cores. For memcpy, it is faster to use the GPRs to do the copy for large sizes (2x faster). For even larger sizes, it is better to use the nontemporal load/store instructions so we don't pollute the L1/L2 caches. For smaller sizes, the characteristic are very similar to other cores. I used the thunderx memcpy as a starting point and expanded from there. Changes since v1: * v2: Fix ordering in Makefile. * v3: Fix comment grammar about the ldnp/stnp instructions. Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> |
||
Wilco Dijkstra
|
2f5524cc53 |
AArch64: Remove Falkor memcpy
The latest implementations of memcpy are actually faster than the Falkor implementations [1], so remove the falkor/phecda ifuncs for memcpy and the now unused IS_FALKOR/IS_PHECDA defines. [1] https://sourceware.org/pipermail/libc-alpha/2022-December/144227.html Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> |
||
Wilco Dijkstra
|
3d7090f14b |
AArch64: Add memset_zva64
Add a specialized memset for the common ZVA size of 64 to avoid the overhead of reading the ZVA size. Since the code is identical to __memset_falkor, remove the latter. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> |
||
Wilco Dijkstra
|
9fd3409842 |
AArch64: Cleanup ifuncs
Cleanup ifuncs. Remove uses of libc_hidden_builtin_def, use ENTRY rather than ENTRY_ALIGN, remove unnecessary defines and conditional compilation. Rename strlen_mte to strlen_generic. Remove rtld-memset. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> |
||
Wilco Dijkstra
|
2bd0017988 |
AArch64: Add support for MOPS memcpy/memmove/memset
Add support for MOPS in cpu_features and INIT_ARCH. Add ifuncs using MOPS for memcpy, memmove and memset (use .inst for now so it works with all binutils versions without needing complex configure and conditional compilation). Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> |
||
Wilco Dijkstra
|
e6f3fe362f |
aarch64: Use memcpy_simd as the default memcpy
Since __memcpy_simd is the fastest memcpy on almost all cores, replace the generic memcpy with it. If SVE is available, a SVE memcpy will be used by default (including for Neoverse N2). |
||
Wilco Dijkstra
|
eea282d9c6 |
AArch64: Sort makefile entries
Sort makefile entries to reduce conflicts. |
||
Wilco Dijkstra
|
9f298bfe1f |
AArch64: Add SVE memcpy
Add an initial SVE memcpy implementation. Copies up to 32 bytes use SVE vectors which improves the random memcpy benchmark significantly. Cleanup the memcpy and memmove ifunc selectors. |
||
Naohiro Tamura
|
4f26956d5b |
aarch64: Added optimized memset for A64FX
This patch optimizes the performance of memset for A64FX [1] which implements ARMv8-A SVE and has L1 64KB cache per core and L2 8MB cache per NUMA node. The performance optimization makes use of Scalable Vector Register with several techniques such as loop unrolling, memory access alignment, cache zero fill and prefetch. SVE assembler code for memset is implemented as Vector Length Agnostic code so theoretically it can be run on any SOC which supports ARMv8-A SVE standard. We confirmed that all testcases have been passed by running 'make check' and 'make xcheck' not only on A64FX but also on ThunderX2. And also we confirmed that the SVE 512 bit vector register performance is roughly 4 times better than Advanced SIMD 128 bit register and 8 times better than scalar 64 bit register by running 'make bench'. [1] https://github.com/fujitsu/A64FX Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com> |
||
Naohiro Tamura
|
fa527f345c |
aarch64: Added optimized memcpy and memmove for A64FX
This patch optimizes the performance of memcpy/memmove for A64FX [1] which implements ARMv8-A SVE and has L1 64KB cache per core and L2 8MB cache per NUMA node. The performance optimization makes use of Scalable Vector Register with several techniques such as loop unrolling, memory access alignment, cache zero fill, and software pipelining. SVE assembler code for memcpy/memmove is implemented as Vector Length Agnostic code so theoretically it can be run on any SOC which supports ARMv8-A SVE standard. We confirmed that all testcases have been passed by running 'make check' and 'make xcheck' not only on A64FX but also on ThunderX2. And also we confirmed that the SVE 512 bit vector register performance is roughly 4 times better than Advanced SIMD 128 bit register and 8 times better than scalar 64 bit register by running 'make bench'. [1] https://github.com/fujitsu/A64FX Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> Reviewed-by: Szabolcs Nagy <Szabolcs.Nagy@arm.com> |
||
Wilco Dijkstra
|
f46ef33ad1 |
AArch64: Improve strlen_asimd performance (bug 25824)
Optimize strlen using a mix of scalar and SIMD code. On modern micro architectures large strings are 2.6 times faster than existing strlen_asimd and 35% faster than the new MTE version of strlen. On a random strlen benchmark using small sizes the speedup is 7% vs strlen_asimd and 40% vs the MTE strlen. This fixes the main strlen regressions on Cortex-A53 and other cores with a simple Neon unit. Rename __strlen_generic to __strlen_mte, and select strlen_asimd when MTE is not enabled (this is waiting on support for a HWCAP_MTE bit). This fixes big-endian bug 25824. Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> |
||
Wilco Dijkstra
|
4a733bf375 |
AArch64: Add optimized Q-register memcpy
Add a new memcpy using 128-bit Q registers - this is faster on modern cores and reduces codesize. Similar to the generic memcpy, small cases include copies up to 32 bytes. 64-128 byte copies are split into two cases to improve performance of 64-96 byte copies. Large copies align the source rather than the destination. bench-memcpy-random is ~9% faster than memcpy_falkor on Neoverse N1, so make this memcpy the default on N1 (on Centriq it is 15% faster than memcpy_falkor). Passes GLIBC regression tests. Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com> |
||
Krzysztof Koch
|
d1f75e9644 |
AArch64: Merge Falkor memcpy and memmove implementations
Falkor's memcpy and memmove share some implementation details, therefore, the two routines are moved to a single source file for code reuse. The two routines now share code for small and medium copies (up to and including 128 bytes). Large copies in memcpy do not handle overlap correctly, consequently, the loops for moving/copying more than 128 bytes stay separate for memcpy and memmove. To increase code reuse a number of small modifications were made: 1. The old implementation of memcpy copied the first 16-bytes as soon as the size of data was determined to be greater than 32 bytes. For memcpy code to also work when copying small/medium overlapping data, the first load and store was moved to the large copy case. 2. Medium memcpy case no longer assumes that 16 bytes were already copied and uses 8 registers to copy up to 128 bytes. 3. Small case for memmove was enlarged to that of memcpy, which is less than or equal to 32 bytes. 4. Medium case for memmove was enlarged to that of memcpy, which is less than or equal to 128 bytes. Other changes include: 1. Improve alignment of existing loop bodies. 2. 'Delouse' memmove and memcpy input arguments. Make sure that upper 32-bits of input registers are zeroed if unused. 3. Do one more iteration in memmove loops and reduce the number of copies made from the start/end of the buffer, depending on the direction of the memmove loop. Benchmarking: Looking at the results from bench-memcpy-random.out, we can see that now memmove_falkor is about 5% faster than memcpy_falkor_old, while memmove_falkor_old was more than 15% slower. The memcpy implementation remained largely unmodified, so there is no significant performance change. The reason for such a significant memmove performance gain is the increase of the upper bound on the small copy case to 32 bytes and the increase of the upper bound on the medium copy case to 128 bytes. Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org> |
||
Xuelei Zhang
|
525de033a9 |
aarch64: Optimized memset for Kunpeng processor.
Due to the branch prediction issue of Kunpeng processor, we found memset_generic has poor performance on middle sizes setting, and so we reconstructed the logic, expanded the loop by 4 times in set_long to solve the problem, even when setting below 1K sizes have benefit. Another change is that DZ_ZVA seems no work when setting zero, so we discarded it and used set_long to set zero instead. Fewer branches and predictions also make the zero case have slightly improvement. Checked on aarch64-linux-gnu. Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com> |
||
Feng Xue
|
83d1cc42d8 |
aarch64: Optimized memchr specific to AmpereComputing emag
This version uses general register based memory instruction to load data, because vector register based is slightly slower in emag. Character-matching is performed on 16-byte (both size and alignment) memory block in parallel each iteration. * sysdeps/aarch64/memchr.S (__memchr): Rename to MEMCHR. [!MEMCHR](MEMCHR): Set to __memchr. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memchr_generic and memchr_nosimd. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add memchr ifuncs. * sysdeps/aarch64/multiarch/memchr.c: New file. * sysdeps/aarch64/multiarch/memchr_generic.S: Likewise. * sysdeps/aarch64/multiarch/memchr_nosimd.S: Likewise. |
||
Feng Xue
|
c7d3890ff5 |
aarch64: Optimized memset specific to AmpereComputing emag
This version uses general register based memory store instead of vector register based, for the former is faster than the latter in emag. The fact that DC ZVA size in emag is 64-byte, is used by IFUNC dispatch to select this memset, so that cost of runtime-check on DC ZVA size can be saved. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memset_emag. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add __memset_emag to memset ifunc. * sysdeps/aarch64/multiarch/memset.c (libc_ifunc): Add IS_EMAG check for ifunc dispatch. * sysdeps/aarch64/multiarch/memset_base64.S: New file. * sysdeps/aarch64/multiarch/memset_emag.S: New file. |
||
Siddhesh Poyarekar
|
436e4d5b96 |
[aarch64] Add an ASIMD variant of strlen for falkor
This variant of strlen uses vector loads and operations to reduce the size of the code and also eliminate the non-ascii fallback. This works very well for falkor because of its two vector units and efficient vector ops. In the best case it reduces latency of cases in bench-strlen by 48%, with gains throughout the benchmark. strlen-walk also sees uniform gains in the 5%-15% range. Overall the routine appears to work better than the stock one for falkor regardless of the benchmark, length of string or cache state. The same cannot be said of a53 and a72 though. a53 performance was greatly reduced and for a72 it was a bit of a mixed bag, slightly on the negative side but I reckon it might be fast in some situations. * sysdeps/aarch64/strlen.S (__strlen): Rename to STRLEN. [!STRLEN](STRLEN): Set to __strlen. * sysdeps/aarch64/multiarch/strlen.c: New file. * sysdeps/aarch64/multiarch/strlen_generic.S: Likewise. * sysdeps/aarch64/multiarch/strlen_asimd.S: Likewise. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add strlen. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add strlen_generic and strlen_asimd. Reviewed-By: szabolcs.nagy@arm.com CC: pinskia@gmail.com |
||
Steve Ellcey
|
e9537dddc7 |
IFUNC for Cavium ThunderX2
* sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memcpy_thunderx2. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (MAX_IFUNC): Increment to 4. (__libc_ifunc_impl_list): Add __memcpy_thunderx2. * sysdeps/aarch64/multiarch/memcpy.c (libc_ifunc): Add IS_THUNDERX2 and IS_THUNDERX2PA checks. * sysdeps/aarch64/multiarch/memcpy_thunderx.S (USE_THUNDERX2): Use macro to set name appropriately. (memcpy): Use USE_THUNDERX2 macro to modify prefetches. * sysdeps/aarch64/multiarch/memcpy_thunderx2.S: New file. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_THUNDERX2PA): New macro. (IS_THUNDERX2): New macro. |
||
Siddhesh Poyarekar
|
5a67c4fa01 |
aarch64: Optimized memset for falkor
The generic memset reads dczid_el0 on every memset. This has a significant impact on falkor for a range of sizes because reading dczid_el0 is slow. The DZP bit in the dczid_el0 register does not change dynamically, so it is safe to read once during program startup. With this patch dczid_el0 is read once during startup and zva_size is cached. This is used to invoke the falkor-specific memset; the generic memset routine remains unchanged. The gains due to this are significant for falkor, with run time reductions as high as 48%. Here's a sample from the falkor tests: Function: memset Variant: walk simple_memset __memset_falkor __memset_generic ===================================================================== length=256, char=0: 139.96 (-698.28%) 9.07 ( 48.26%) 17.53 length=257, char=0: 140.50 (-699.03%) 9.53 ( 45.80%) 17.58 length=258, char=0: 140.96 (-703.95%) 9.58 ( 45.36%) 17.53 length=259, char=0: 141.56 (-705.16%) 9.53 ( 45.79%) 17.58 length=260, char=0: 142.15 (-710.76%) 9.57 ( 45.39%) 17.53 length=261, char=0: 142.50 (-710.39%) 9.53 ( 45.78%) 17.58 length=262, char=0: 142.97 (-715.09%) 9.57 ( 45.42%) 17.54 length=263, char=0: 143.51 (-716.18%) 9.53 ( 45.80%) 17.58 length=264, char=0: 143.93 (-720.55%) 9.58 ( 45.39%) 17.54 length=265, char=0: 144.56 (-722.07%) 9.53 ( 45.80%) 17.59 length=266, char=0: 144.98 (-726.42%) 9.58 ( 45.42%) 17.54 length=267, char=0: 145.53 (-727.53%) 9.53 ( 45.80%) 17.59 length=268, char=0: 146.25 (-731.81%) 9.53 ( 45.79%) 17.58 length=269, char=0: 146.52 (-735.39%) 9.53 ( 45.66%) 17.54 length=270, char=0: 146.97 (-735.81%) 9.53 ( 45.80%) 17.58 length=271, char=0: 147.54 (-741.08%) 9.58 ( 45.38%) 17.54 length=512, char=0: 268.26 (-1307.85%) 12.06 ( 36.71%) 19.05 length=513, char=0: 268.73 (-1273.89%) 13.56 ( 30.68%) 19.56 length=514, char=0: 269.31 (-1276.89%) 13.56 ( 30.68%) 19.56 length=515, char=0: 269.73 (-1279.05%) 13.56 ( 30.68%) 19.56 length=516, char=0: 270.34 (-1282.24%) 13.56 ( 30.67%) 19.56 length=517, char=0: 270.83 (-1284.71%) 13.56 ( 30.66%) 19.56 length=518, char=0: 271.20 (-1286.54%) 13.56 ( 30.67%) 19.56 length=519, char=0: 271.67 (-1288.67%) 13.65 ( 30.24%) 19.56 length=520, char=0: 272.14 (-1291.04%) 13.65 ( 30.22%) 19.56 length=521, char=0: 272.66 (-1293.69%) 13.65 ( 30.23%) 19.56 length=522, char=0: 273.14 (-1296.13%) 13.65 ( 30.20%) 19.56 length=523, char=0: 273.64 (-1298.75%) 13.65 ( 30.23%) 19.56 length=524, char=0: 274.34 (-1302.16%) 13.66 ( 30.20%) 19.57 length=525, char=0: 274.64 (-1297.78%) 13.56 ( 30.99%) 19.65 length=526, char=0: 275.20 (-1300.04%) 13.56 ( 31.01%) 19.66 length=527, char=0: 275.66 (-1302.86%) 13.56 ( 30.99%) 19.65 length=1024, char=0: 524.46 (-2169.75%) 20.12 ( 12.92%) 23.11 length=1025, char=0: 525.14 (-2124.63%) 21.62 ( 8.40%) 23.61 length=1026, char=0: 525.59 (-2125.36%) 21.88 ( 7.37%) 23.62 length=1027, char=0: 525.98 (-2127.14%) 21.62 ( 8.46%) 23.62 length=1028, char=0: 526.68 (-2131.10%) 21.62 ( 8.42%) 23.61 length=1029, char=0: 527.10 (-2131.70%) 21.79 ( 7.73%) 23.62 length=1030, char=0: 527.54 (-2118.51%) 21.62 ( 9.10%) 23.78 length=1031, char=0: 527.98 (-2136.37%) 21.62 ( 8.43%) 23.61 length=1032, char=0: 528.70 (-2139.38%) 21.62 ( 8.43%) 23.61 length=1033, char=0: 529.25 (-2124.37%) 21.62 ( 9.11%) 23.79 length=1034, char=0: 529.48 (-2142.95%) 21.62 ( 8.43%) 23.61 length=1035, char=0: 530.11 (-2145.13%) 21.62 ( 8.44%) 23.61 length=1036, char=0: 530.76 (-2147.10%) 21.79 ( 7.73%) 23.62 length=1037, char=0: 531.03 (-2149.45%) 21.62 ( 8.42%) 23.61 length=1038, char=0: 531.64 (-2151.87%) 21.62 ( 8.42%) 23.61 length=1039, char=0: 531.99 (-2151.63%) 21.80 ( 7.75%) 23.63 * sysdeps/aarch64/memset-reg.h: New file. * sysdeps/aarch64/memset.S: Use it. (__memset): Rename to MEMSET macro. [ZVA_MACRO]: Use zva_macro. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memset_generic and memset_falkor. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add memset ifuncs. * sysdeps/aarch64/multiarch/init-arch.h (INIT_ARCH): New local variable zva_size. * sysdeps/aarch64/multiarch/memset.c: New file. * sysdeps/aarch64/multiarch/memset_generic.S: New file. * sysdeps/aarch64/multiarch/memset_falkor.S: New file. * sysdeps/aarch64/multiarch/rtld-memset.S: New file. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (DCZID_DZP_MASK): New macro. (DCZID_BS_MASK): Likewise. (init_cpu_features): Read and set zva_size. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (struct cpu_features): New member zva_size. |
||
Siddhesh Poyarekar
|
dd5bc7f1b3 |
aarch64: Optimized implementation of memmove for Qualcomm Falkor
This is an optimized memmove implementation for the Qualcomm Falkor processor core. Due to the way the falkor memcpy needs to be written, code cannot be easily shared between memmove and memcpy like in case of other aarch64 memcpy implementations due to which this routine is separate. The underlying principle is the same as that of memcpy where it tries to use registers with the same lower 4 bits for fetching the same stream, thus optimizing hardware prefetcher performance. The memcpy copy loop copies 64 bytes at a time using the same register pair since that's the way to train the hardware prefetcher on the falkor core. memmove cannot quite do that since it needs to avoid overlaps, so it does the next best thing, i.e. has a 32 byte loop with a 32 byte end (prefetch a loop ahead to account for overlapping locations) with register pairs that alias so that they hit the same prefetcher. Due to this difference in loop size, they have to currently be separate implementations but efforts are on to try and get memmove to fall back into memcpy whenever it can without simply duplicating all of the code. Performance: The routine fares around 20-25% better than the generic memmove for most medium to large sizes (i.e. > 128 bytes) for the new walking memmove benchmark (memmove-walk) with an unexplained regression between 1K and 2K. The minor regression is something worth looking into for us, but the remaining gains are significant enough that we would like this included upstream as we looking into the cause for the regression. Here is a snippet of the numbers as generated from the microbenchmark by the compare_strings script. Comparisons are against __memmove_generic: Function: memmove Variant: walk __memmove_thunderx __memmove_falkor __memmove_generic ======================================================================================================================== <snip> length=16384: 12508800.00 ( 6.09%) 11486800.00 ( 13.76%) 13319600.00 length=16400: 13614200.00 ( -0.67%) 11585000.00 ( 14.33%) 13523600.00 length=16385: 13448400.00 ( 0.10%) 11732700.00 ( 12.84%) 13461200.00 length=16399: 13594100.00 ( -0.22%) 11859600.00 ( 12.57%) 13564400.00 length=16386: 13211600.00 ( 1.13%) 11503800.00 ( 13.91%) 13362400.00 length=16398: 13218600.00 ( 2.12%) 11573200.00 ( 14.30%) 13504700.00 length=16387: 13510900.00 ( -0.37%) 11744200.00 ( 12.76%) 13461300.00 length=16397: 13603700.00 ( -0.15%) 11878200.00 ( 12.55%) 13583200.00 length=16388: 13461700.00 ( -0.13%) 11558000.00 ( 14.03%) 13444100.00 length=16396: 13517500.00 ( -0.03%) 11561300.00 ( 14.45%) 13513900.00 length=16389: 13534100.00 ( 0.17%) 11756800.00 ( 13.28%) 13556900.00 length=16395: 13585600.00 ( 0.11%) 11791800.00 ( 13.30%) 13601200.00 length=16390: 13480100.00 ( -0.13%) 11685500.00 ( 13.20%) 13462100.00 length=16394: 13529900.00 ( -0.23%) 11549800.00 ( 14.43%) 13498200.00 length=16391: 13595400.00 ( -0.26%) 11768200.00 ( 13.22%) 13560600.00 length=16393: 13567000.00 ( 0.20%) 11779700.00 ( 13.35%) 13594700.00 length=32768: 71308800.00 ( -6.53%) 50220800.00 ( 24.98%) 66939200.00 length=32784: 72100800.00 (-11.55%) 50114100.00 ( 22.47%) 64636300.00 length=32769: 71767000.00 ( -7.10%) 51238400.00 ( 23.54%) 67010000.00 length=32783: 70113700.00 (-40.95%) 51129000.00 ( -2.78%) 49744400.00 length=32770: 71367600.00 ( -6.52%) 50244700.00 ( 25.01%) 67000900.00 length=32782: 64366700.00 ( 4.71%) 50101400.00 ( 25.83%) 67545600.00 length=32771: 71440100.00 ( -6.51%) 51263900.00 ( 23.57%) 67074900.00 length=32781: 66993000.00 ( 0.34%) 51108300.00 ( 23.97%) 67220300.00 length=32772: 71443900.00 (-60.50%) 50062100.00 (-12.47%) 44512600.00 length=32780: 71759100.00 ( -6.58%) 50263200.00 ( 25.35%) 67328600.00 length=32773: 71714900.00 (-33.21%) 51076600.00 ( 5.12%) 53835400.00 length=32779: 71756900.00 ( -6.56%) 51290800.00 ( 23.83%) 67337800.00 length=32774: 59689300.00 (-34.55%) 50068400.00 (-12.86%) 44363300.00 length=32778: 71847500.00 (-18.20%) 50084100.00 ( 17.61%) 60786500.00 length=32775: 71599300.00 ( -6.54%) 51278200.00 ( 23.70%) 67204800.00 length=32777: 71862900.00 (-60.85%) 51094000.00 (-14.36%) 44677900.00 length=65536: 282848000.00 ( -6.60%) 199187000.00 ( 24.93%) 265325000.00 length=65552: 243285000.00 (-41.61%) 198512000.00 (-15.54%) 171805000.00 length=65537: 255415000.00 (-23.47%) 202499000.00 ( 2.11%) 206858000.00 length=65551: 280122000.00 (-62.95%) 203349000.00 (-18.29%) 171911000.00 length=65538: 283676000.00 (-14.46%) 198368000.00 ( 19.96%) 247848000.00 length=65550: 275566000.00 (-51.76%) 198494000.00 ( -9.31%) 181581000.00 length=65539: 283699000.00 ( -6.58%) 203453000.00 ( 23.57%) 266195000.00 length=65549: 286572000.00 ( -6.65%) 202607000.00 ( 24.60%) 268712000.00 length=65540: 283710000.00 ( -6.59%) 199161000.00 ( 25.17%) 266160000.00 length=65548: 237573000.00 ( 11.48%) 198462000.00 ( 26.06%) 268395000.00 length=65541: 284150000.00 ( -6.58%) 203273000.00 ( 23.75%) 266600000.00 length=65547: 286250000.00 ( -6.70%) 202594000.00 ( 24.48%) 268263000.00 length=65542: 284167000.00 ( -6.60%) 199122000.00 ( 25.31%) 266584000.00 length=65546: 285656000.00 ( -6.59%) 198443000.00 ( 25.95%) 268002000.00 length=65543: 284600000.00 ( -6.58%) 203247000.00 ( 23.89%) 267030000.00 length=65545: 285665000.00 ( -6.40%) 202575000.00 ( 24.55%) 268472000.00 <snip> * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memmove_falkor. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Likewise. * sysdeps/aarch64/multiarch/memmove.c: Likewise. * sysdeps/aarch64/multiarch/memmove_falkor.S: New file. |
||
Siddhesh Poyarekar
|
36ada5f681 |
aarch64: Optimized memcpy for Qualcomm Falkor processor
This is an optimized implementation of the memcpy routine that gives a significant gain in performance for all sizes of copies on the Qualcomm Falkor processor. A detailed rationale of the implementation is written in a comment in the patch. This implementation improves time for copies up to 128 bytes by up to 15% and for larger copies by up to 35% in the glibc microbenchmark. The memcpy-random benchmark sees improvements in all sizes in the range of 13%-18%. Here are the full numbers extracted from the glibc microbenchmark using the commands: ../benchtests/scripts/compare_strings.py benchtests/bench-memcpy.out \ ../benchtests/scripts/benchout_strings.schema.json \ -base=__memcpy_generic length align1 align2 ../benchtests/scripts/compare_strings.py benchtests/bench-memcpy-large.out \ ../benchtests/scripts/benchout_strings.schema.json \ -base=__memcpy_generic length align1 align2 ../benchtests/scripts/compare_strings.py benchtests/bench-memcpy-random.out \ ../benchtests/scripts/benchout_strings.schema.json \ -base=__memcpy_generic max-size Function: memcpy __memcpy_thunderx __memcpy_falkor __memcpy_generic Variant: default ================================================================================ length=1,align1=0,align2=0: 33.59 (-115.00%) 15.62 (0.00%) 15.62 length=1,align1=0,align2=0: 16.41 (-10.53%) 14.06 (5.26%) 14.84 length=1,align1=0,align2=0: 14.84 (0.00%) 14.84 (0.00%) 14.84 length=1,align1=0,align2=0: 15.62 (-5.26%) 14.06 (5.26%) 14.84 length=2,align1=0,align2=0: 15.62 (-5.26%) 14.06 (5.26%) 14.84 length=2,align1=1,align2=0: 15.62 (-5.26%) 14.06 (5.26%) 14.84 length=2,align1=0,align2=1: 14.84 (0.00%) 14.06 (5.26%) 14.84 length=2,align1=1,align2=1: 14.84 (-5.56%) 14.06 (0.00%) 14.06 length=4,align1=0,align2=0: 14.06 (0.00%) 14.06 (0.00%) 14.06 length=4,align1=2,align2=0: 14.06 (-5.88%) 14.06 (-5.88%) 13.28 length=4,align1=0,align2=2: 14.06 (0.00%) 14.06 (0.00%) 14.06 length=4,align1=2,align2=2: 14.06 (-5.88%) 14.06 (-5.88%) 13.28 length=8,align1=0,align2=0: 14.84 (-5.56%) 13.28 (5.56%) 14.06 length=8,align1=3,align2=0: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=8,align1=0,align2=3: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=8,align1=3,align2=3: 13.28 (-6.25%) 13.28 (-6.25%) 12.50 length=16,align1=0,align2=0: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=16,align1=4,align2=0: 13.28 (0.00%) 12.50 (5.88%) 13.28 length=16,align1=0,align2=4: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=16,align1=4,align2=4: 13.28 (-6.25%) 12.50 (0.00%) 12.50 length=32,align1=0,align2=0: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=32,align1=5,align2=0: 13.28 (0.00%) 12.50 (5.88%) 13.28 length=32,align1=0,align2=5: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=32,align1=5,align2=5: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=64,align1=0,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=64,align1=6,align2=0: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=64,align1=0,align2=6: 14.06 (5.26%) 14.06 (5.26%) 14.84 length=64,align1=6,align2=6: 14.84 (-11.77%) 14.06 (-5.88%) 13.28 length=128,align1=0,align2=0: 17.19 (-4.76%) 14.84 (9.52%) 16.41 length=128,align1=7,align2=0: 16.41 (4.55%) 15.62 (9.09%) 17.19 length=128,align1=0,align2=7: 16.41 (0.00%) 14.06 (14.29%) 16.41 length=128,align1=7,align2=7: 16.41 (4.55%) 15.62 (9.09%) 17.19 length=256,align1=0,align2=0: 21.88 (-3.70%) 21.09 (0.00%) 21.09 length=256,align1=8,align2=0: 21.09 (-3.85%) 21.09 (-3.85%) 20.31 length=256,align1=0,align2=8: 20.31 (-4.00%) 20.31 (-4.00%) 19.53 length=256,align1=8,align2=8: 21.88 (-7.69%) 20.31 (0.00%) 20.31 length=512,align1=0,align2=0: 28.91 (-2.78%) 28.91 (-2.78%) 28.12 length=512,align1=9,align2=0: 30.47 (-2.63%) 30.47 (-2.63%) 29.69 length=512,align1=0,align2=9: 29.69 (0.00%) 29.69 (0.00%) 29.69 length=512,align1=9,align2=9: 28.12 (-2.86%) 28.12 (-2.86%) 27.34 length=1024,align1=0,align2=0: 44.53 (0.00%) 44.53 (0.00%) 44.53 length=1024,align1=10,align2=0: 50.00 (0.00%) 50.00 (0.00%) 50.00 length=1024,align1=0,align2=10: 49.22 (1.56%) 50.78 (-1.56%) 50.00 length=1024,align1=10,align2=10: 44.53 (-1.79%) 43.75 (0.00%) 43.75 length=2048,align1=0,align2=0: 77.34 (-1.02%) 76.56 (0.00%) 76.56 length=2048,align1=11,align2=0: 89.84 (0.00%) 89.84 (0.00%) 89.84 length=2048,align1=0,align2=11: 89.84 (0.00%) 89.84 (0.00%) 89.84 length=2048,align1=11,align2=11: 75.78 (0.00%) 75.78 (0.00%) 75.78 length=4096,align1=0,align2=0: 141.41 (-0.56%) 140.62 (0.00%) 140.62 length=4096,align1=12,align2=0: 171.09 (-0.46%) 170.31 (0.00%) 170.31 length=4096,align1=0,align2=12: 170.31 (0.00%) 170.31 (0.00%) 170.31 length=4096,align1=12,align2=12: 140.62 (0.00%) 140.62 (0.00%) 140.62 length=8192,align1=0,align2=0: 278.91 (-0.28%) 275.78 (0.84%) 278.12 length=8192,align1=13,align2=0: 338.28 (0.23%) 335.94 (0.92%) 339.06 length=8192,align1=0,align2=13: 338.28 (0.00%) 455.47 (-34.64%) 338.28 length=8192,align1=13,align2=13: 278.12 (-0.28%) 275.78 (0.56%) 277.34 length=16384,align1=0,align2=0: 535.94 (-0.15%) 531.25 (0.73%) 535.16 length=16384,align1=14,align2=0: 659.38 (0.12%) 659.38 (0.12%) 660.16 length=16384,align1=0,align2=14: 659.38 (0.00%) 657.03 (0.36%) 659.38 length=16384,align1=14,align2=14: 535.16 (0.44%) 532.81 (0.87%) 537.50 length=32768,align1=0,align2=0: 1260.94 (10.68%) 1121.88 (20.53%) 1411.72 length=32768,align1=15,align2=0: 1368.75 (10.02%) 1376.56 (9.50%) 1521.09 length=32768,align1=0,align2=15: 1333.59 (10.91%) 1373.44 (8.25%) 1496.88 length=32768,align1=15,align2=15: 1256.25 (13.96%) 1125.78 (22.90%) 1460.16 length=65536,align1=0,align2=0: 2853.91 (30.11%) 2589.06 (36.60%) 4083.59 length=65536,align1=16,align2=0: 2850.00 (30.14%) 2589.84 (36.52%) 4079.69 length=65536,align1=0,align2=16: 2853.12 (30.60%) 2589.84 (37.00%) 4110.94 length=65536,align1=16,align2=16: 2850.78 (30.07%) 2589.06 (36.49%) 4076.56 length=0,align1=0,align2=0: 15.62 (-5.26%) 16.41 (-10.53%) 14.84 length=0,align1=0,align2=0: 14.84 (-5.56%) 14.84 (-5.56%) 14.06 length=0,align1=0,align2=0: 14.84 (0.00%) 14.84 (0.00%) 14.84 length=0,align1=0,align2=0: 16.41 (-16.67%) 14.84 (-5.56%) 14.06 length=1,align1=0,align2=0: 15.62 (4.76%) 15.62 (4.76%) 16.41 length=1,align1=1,align2=0: 15.62 (0.00%) 14.84 (5.00%) 15.62 length=1,align1=0,align2=1: 14.84 (0.00%) 14.84 (0.00%) 14.84 length=1,align1=1,align2=1: 14.84 (0.00%) 14.06 (5.26%) 14.84 length=2,align1=0,align2=0: 14.84 (0.00%) 14.06 (5.26%) 14.84 length=2,align1=2,align2=0: 14.84 (0.00%) 14.06 (5.26%) 14.84 length=2,align1=0,align2=2: 14.84 (-5.56%) 14.06 (0.00%) 14.06 length=2,align1=2,align2=2: 14.84 (0.00%) 14.06 (5.26%) 14.84 length=3,align1=0,align2=0: 14.84 (0.00%) 14.84 (0.00%) 14.84 length=3,align1=3,align2=0: 14.84 (-5.56%) 14.06 (0.00%) 14.06 length=3,align1=0,align2=3: 15.62 (-11.11%) 14.06 (0.00%) 14.06 length=3,align1=3,align2=3: 14.84 (0.00%) 14.06 (5.26%) 14.84 length=4,align1=0,align2=0: 17.97 (-27.78%) 14.06 (0.00%) 14.06 length=4,align1=4,align2=0: 13.28 (5.56%) 14.06 (0.00%) 14.06 length=4,align1=0,align2=4: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=4,align1=4,align2=4: 13.28 (5.56%) 13.28 (5.56%) 14.06 length=5,align1=0,align2=0: 13.28 (5.56%) 13.28 (5.56%) 14.06 length=5,align1=5,align2=0: 14.06 (0.00%) 14.06 (0.00%) 14.06 length=5,align1=0,align2=5: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=5,align1=5,align2=5: 14.06 (-5.88%) 14.06 (-5.88%) 13.28 length=6,align1=0,align2=0: 14.06 (-5.88%) 14.06 (-5.88%) 13.28 length=6,align1=6,align2=0: 14.06 (0.00%) 14.06 (0.00%) 14.06 length=6,align1=0,align2=6: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=6,align1=6,align2=6: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=7,align1=0,align2=0: 14.84 (-11.77%) 14.06 (-5.88%) 13.28 length=7,align1=7,align2=0: 13.28 (0.00%) 14.06 (-5.88%) 13.28 length=7,align1=0,align2=7: 14.06 (0.00%) 14.06 (0.00%) 14.06 length=7,align1=7,align2=7: 14.06 (0.00%) 14.06 (0.00%) 14.06 length=8,align1=0,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=8,align1=8,align2=0: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=8,align1=0,align2=8: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=8,align1=8,align2=8: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=9,align1=0,align2=0: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=9,align1=9,align2=0: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=9,align1=0,align2=9: 13.28 (0.00%) 14.06 (-5.88%) 13.28 length=9,align1=9,align2=9: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=10,align1=0,align2=0: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=10,align1=10,align2=0: 14.06 (-5.88%) 14.06 (-5.88%) 13.28 length=10,align1=0,align2=10: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=10,align1=10,align2=10: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=11,align1=0,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=11,align1=11,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=11,align1=0,align2=11: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=11,align1=11,align2=11: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=12,align1=0,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=12,align1=12,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=12,align1=0,align2=12: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=12,align1=12,align2=12: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=13,align1=0,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=13,align1=13,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=13,align1=0,align2=13: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=13,align1=13,align2=13: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=14,align1=0,align2=0: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=14,align1=14,align2=0: 13.28 (5.56%) 13.28 (5.56%) 14.06 length=14,align1=0,align2=14: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=14,align1=14,align2=14: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=15,align1=0,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=15,align1=15,align2=0: 14.06 (-5.88%) 14.06 (-5.88%) 13.28 length=15,align1=0,align2=15: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=15,align1=15,align2=15: 13.28 (0.00%) 14.06 (-5.88%) 13.28 length=16,align1=0,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=16,align1=16,align2=0: 13.28 (5.56%) 14.06 (0.00%) 14.06 length=16,align1=0,align2=16: 14.84 (-11.77%) 13.28 (0.00%) 13.28 length=16,align1=16,align2=16: 13.28 (-6.25%) 12.50 (0.00%) 12.50 length=17,align1=0,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=17,align1=17,align2=0: 14.84 (-11.77%) 12.50 (5.88%) 13.28 length=17,align1=0,align2=17: 14.84 (-5.56%) 12.50 (11.11%) 14.06 length=17,align1=17,align2=17: 14.84 (-11.77%) 12.50 (5.88%) 13.28 length=18,align1=0,align2=0: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=18,align1=18,align2=0: 13.28 (5.56%) 12.50 (11.11%) 14.06 length=18,align1=0,align2=18: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=18,align1=18,align2=18: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=19,align1=0,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=19,align1=19,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=19,align1=0,align2=19: 14.84 (-5.56%) 12.50 (11.11%) 14.06 length=19,align1=19,align2=19: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=20,align1=0,align2=0: 14.84 (-11.77%) 12.50 (5.88%) 13.28 length=20,align1=20,align2=0: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=20,align1=0,align2=20: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=20,align1=20,align2=20: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=21,align1=0,align2=0: 14.84 (-5.56%) 12.50 (11.11%) 14.06 length=21,align1=21,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=21,align1=0,align2=21: 14.84 (-11.77%) 12.50 (5.88%) 13.28 length=21,align1=21,align2=21: 13.28 (5.56%) 13.28 (5.56%) 14.06 length=22,align1=0,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=22,align1=22,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=22,align1=0,align2=22: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=22,align1=22,align2=22: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=23,align1=0,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=23,align1=23,align2=0: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=23,align1=0,align2=23: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=23,align1=23,align2=23: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=24,align1=0,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=24,align1=24,align2=0: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=24,align1=0,align2=24: 14.84 (-11.77%) 12.50 (5.88%) 13.28 length=24,align1=24,align2=24: 14.06 (-5.88%) 13.28 (0.00%) 13.28 length=25,align1=0,align2=0: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=25,align1=25,align2=0: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=25,align1=0,align2=25: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=25,align1=25,align2=25: 13.28 (0.00%) 13.28 (0.00%) 13.28 length=26,align1=0,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=26,align1=26,align2=0: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=26,align1=0,align2=26: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=26,align1=26,align2=26: 14.06 (0.00%) 13.28 (5.56%) 14.06 length=27,align1=0,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=27,align1=27,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=27,align1=0,align2=27: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=27,align1=27,align2=27: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=28,align1=0,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=28,align1=28,align2=0: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=28,align1=0,align2=28: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=28,align1=28,align2=28: 14.84 (-11.77%) 13.28 (0.00%) 13.28 length=29,align1=0,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=29,align1=29,align2=0: 13.28 (0.00%) 12.50 (5.88%) 13.28 length=29,align1=0,align2=29: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=29,align1=29,align2=29: 13.28 (5.56%) 12.50 (11.11%) 14.06 length=30,align1=0,align2=0: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=30,align1=30,align2=0: 13.28 (5.56%) 12.50 (11.11%) 14.06 length=30,align1=0,align2=30: 14.06 (-5.88%) 12.50 (5.88%) 13.28 length=30,align1=30,align2=30: 13.28 (0.00%) 12.50 (5.88%) 13.28 length=31,align1=0,align2=0: 13.28 (0.00%) 12.50 (5.88%) 13.28 length=31,align1=31,align2=0: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=31,align1=0,align2=31: 13.28 (0.00%) 12.50 (5.88%) 13.28 length=31,align1=31,align2=31: 14.06 (0.00%) 12.50 (11.11%) 14.06 length=48,align1=0,align2=0: 14.06 (0.00%) 14.06 (0.00%) 14.06 length=48,align1=3,align2=0: 14.06 (0.00%) 14.06 (0.00%) 14.06 length=48,align1=0,align2=3: 14.06 (-5.88%) 14.06 (-5.88%) 13.28 length=48,align1=3,align2=3: 13.28 (5.56%) 14.06 (0.00%) 14.06 length=80,align1=0,align2=0: 15.62 (-11.11%) 14.84 (-5.56%) 14.06 length=80,align1=5,align2=0: 15.62 (-11.11%) 16.41 (-16.67%) 14.06 length=80,align1=0,align2=5: 14.06 (0.00%) 15.62 (-11.11%) 14.06 length=80,align1=5,align2=5: 15.62 (-5.26%) 17.19 (-15.79%) 14.84 length=96,align1=0,align2=0: 14.06 (0.00%) 14.84 (-5.56%) 14.06 length=96,align1=6,align2=0: 14.84 (-5.56%) 16.41 (-16.67%) 14.06 length=96,align1=0,align2=6: 14.06 (0.00%) 14.84 (-5.56%) 14.06 length=96,align1=6,align2=6: 14.84 (-5.56%) 17.19 (-22.22%) 14.06 length=112,align1=0,align2=0: 17.19 (-4.76%) 14.06 (14.29%) 16.41 length=112,align1=7,align2=0: 17.19 (0.00%) 16.41 (4.55%) 17.19 length=112,align1=0,align2=7: 16.41 (0.00%) 14.84 (9.52%) 16.41 length=112,align1=7,align2=7: 17.19 (0.00%) 17.19 (0.00%) 17.19 length=144,align1=0,align2=0: 17.19 (-10.00%) 17.97 (-15.00%) 15.62 length=144,align1=9,align2=0: 17.19 (-4.76%) 18.75 (-14.29%) 16.41 length=144,align1=0,align2=9: 20.31 (-8.33%) 18.75 (0.00%) 18.75 length=144,align1=9,align2=9: 18.75 (-4.35%) 18.75 (-4.35%) 17.97 length=160,align1=0,align2=0: 18.75 (-4.35%) 17.97 (0.00%) 17.97 length=160,align1=10,align2=0: 18.75 (4.00%) 18.75 (4.00%) 19.53 length=160,align1=0,align2=10: 19.53 (-4.17%) 17.97 (4.17%) 18.75 length=160,align1=10,align2=10: 18.75 (-4.35%) 18.75 (-4.35%) 17.97 length=176,align1=0,align2=0: 18.75 (-4.35%) 17.19 (4.35%) 17.97 length=176,align1=11,align2=0: 19.53 (0.00%) 19.53 (0.00%) 19.53 length=176,align1=0,align2=11: 19.53 (-4.17%) 18.75 (0.00%) 18.75 length=176,align1=11,align2=11: 18.75 (0.00%) 17.97 (4.17%) 18.75 length=192,align1=0,align2=0: 18.75 (0.00%) 17.97 (4.17%) 18.75 length=192,align1=12,align2=0: 21.09 (-8.00%) 18.75 (4.00%) 19.53 length=192,align1=0,align2=12: 18.75 (0.00%) 18.75 (0.00%) 18.75 length=192,align1=12,align2=12: 18.75 (0.00%) 17.97 (4.17%) 18.75 length=208,align1=0,align2=0: 17.97 (0.00%) 20.31 (-13.04%) 17.97 length=208,align1=13,align2=0: 19.53 (7.41%) 21.09 (0.00%) 21.09 length=208,align1=0,align2=13: 23.44 (-11.11%) 21.09 (0.00%) 21.09 length=208,align1=13,align2=13: 21.09 (-3.85%) 21.09 (-3.85%) 20.31 length=224,align1=0,align2=0: 21.09 (-8.00%) 20.31 (-4.00%) 19.53 length=224,align1=14,align2=0: 23.44 (-11.11%) 20.31 (3.70%) 21.09 length=224,align1=0,align2=14: 21.09 (3.57%) 20.31 (7.14%) 21.88 length=224,align1=14,align2=14: 20.31 (0.00%) 19.53 (3.85%) 20.31 length=240,align1=0,align2=0: 20.31 (-4.00%) 19.53 (0.00%) 19.53 length=240,align1=15,align2=0: 22.66 (0.00%) 20.31 (10.34%) 22.66 length=240,align1=0,align2=15: 20.31 (-4.00%) 20.31 (-4.00%) 19.53 length=240,align1=15,align2=15: 21.88 (0.00%) 21.09 (3.57%) 21.88 length=272,align1=0,align2=0: 20.31 (0.00%) 28.12 (-38.46%) 20.31 length=272,align1=17,align2=0: 22.66 (0.00%) 27.34 (-20.69%) 22.66 length=272,align1=0,align2=17: 25.78 (-10.00%) 28.12 (-20.00%) 23.44 length=272,align1=17,align2=17: 22.66 (-3.57%) 27.34 (-25.00%) 21.88 length=288,align1=0,align2=0: 23.44 (-7.14%) 27.34 (-25.00%) 21.88 length=288,align1=18,align2=0: 22.66 (0.00%) 27.34 (-20.69%) 22.66 length=288,align1=0,align2=18: 23.44 (-3.45%) 25.00 (-10.35%) 22.66 length=288,align1=18,align2=18: 22.66 (-3.57%) 21.88 (0.00%) 21.88 length=304,align1=0,align2=0: 21.88 (0.00%) 21.88 (0.00%) 21.88 length=304,align1=19,align2=0: 23.44 (-3.45%) 22.66 (0.00%) 22.66 length=304,align1=0,align2=19: 22.66 (0.00%) 22.66 (0.00%) 22.66 length=304,align1=19,align2=19: 22.66 (-3.57%) 21.88 (0.00%) 21.88 length=320,align1=0,align2=0: 22.66 (-3.57%) 21.88 (0.00%) 21.88 length=320,align1=20,align2=0: 22.66 (0.00%) 22.66 (0.00%) 22.66 length=320,align1=0,align2=20: 22.66 (0.00%) 22.66 (0.00%) 22.66 length=320,align1=20,align2=20: 22.66 (-3.57%) 21.88 (0.00%) 21.88 length=336,align1=0,align2=0: 21.88 (0.00%) 24.22 (-10.71%) 21.88 length=336,align1=21,align2=0: 22.66 (0.00%) 25.00 (-10.35%) 22.66 length=336,align1=0,align2=21: 25.78 (0.00%) 25.00 (3.03%) 25.78 length=336,align1=21,align2=21: 25.00 (0.00%) 23.44 (6.25%) 25.00 length=352,align1=0,align2=0: 24.22 (0.00%) 24.22 (0.00%) 24.22 length=352,align1=22,align2=0: 25.00 (0.00%) 25.00 (0.00%) 25.00 length=352,align1=0,align2=22: 25.00 (-3.23%) 25.00 (-3.23%) 24.22 length=352,align1=22,align2=22: 25.00 (-3.23%) 24.22 (0.00%) 24.22 length=368,align1=0,align2=0: 25.00 (-3.23%) 23.44 (3.23%) 24.22 length=368,align1=23,align2=0: 25.00 (0.00%) 24.22 (3.12%) 25.00 length=368,align1=0,align2=23: 25.00 (-3.23%) 25.00 (-3.23%) 24.22 length=368,align1=23,align2=23: 25.00 (-6.67%) 23.44 (0.00%) 23.44 length=384,align1=0,align2=0: 24.22 (0.00%) 24.22 (0.00%) 24.22 length=384,align1=24,align2=0: 25.00 (0.00%) 24.22 (3.12%) 25.00 length=384,align1=0,align2=24: 25.00 (0.00%) 25.78 (-3.12%) 25.00 length=384,align1=24,align2=24: 24.22 (-3.33%) 23.44 (0.00%) 23.44 length=400,align1=0,align2=0: 25.00 (-3.23%) 26.56 (-9.68%) 24.22 length=400,align1=25,align2=0: 25.78 (-3.12%) 27.34 (-9.38%) 25.00 length=400,align1=0,align2=25: 27.34 (0.00%) 27.34 (0.00%) 27.34 length=400,align1=25,align2=25: 26.56 (0.00%) 25.78 (2.94%) 26.56 length=416,align1=0,align2=0: 26.56 (-3.03%) 25.78 (0.00%) 25.78 length=416,align1=26,align2=0: 28.12 (-2.86%) 27.34 (0.00%) 27.34 length=416,align1=0,align2=26: 27.34 (-2.94%) 28.12 (-5.88%) 26.56 length=416,align1=26,align2=26: 25.78 (0.00%) 26.56 (-3.03%) 25.78 length=432,align1=0,align2=0: 27.34 (-2.94%) 25.78 (2.94%) 26.56 length=432,align1=27,align2=0: 28.12 (-2.86%) 27.34 (0.00%) 27.34 length=432,align1=0,align2=27: 27.34 (0.00%) 28.12 (-2.86%) 27.34 length=432,align1=27,align2=27: 25.78 (0.00%) 25.78 (0.00%) 25.78 length=448,align1=0,align2=0: 26.56 (-3.03%) 25.78 (0.00%) 25.78 length=448,align1=28,align2=0: 27.34 (0.00%) 27.34 (0.00%) 27.34 length=448,align1=0,align2=28: 27.34 (0.00%) 28.12 (-2.86%) 27.34 length=448,align1=28,align2=28: 25.78 (0.00%) 25.78 (0.00%) 25.78 length=464,align1=0,align2=0: 25.78 (0.00%) 28.12 (-9.09%) 25.78 length=464,align1=29,align2=0: 28.12 (-2.86%) 29.69 (-8.57%) 27.34 length=464,align1=0,align2=29: 30.47 (0.00%) 30.47 (0.00%) 30.47 length=464,align1=29,align2=29: 28.12 (0.00%) 27.34 (2.78%) 28.12 length=480,align1=0,align2=0: 29.69 (-5.56%) 28.12 (0.00%) 28.12 length=480,align1=30,align2=0: 31.25 (-2.56%) 29.69 (2.56%) 30.47 length=480,align1=0,align2=30: 29.69 (0.00%) 30.47 (-2.63%) 29.69 length=480,align1=30,align2=30: 28.12 (0.00%) 28.12 (0.00%) 28.12 length=496,align1=0,align2=0: 28.12 (0.00%) 27.34 (2.78%) 28.12 length=496,align1=31,align2=0: 30.47 (-2.63%) 29.69 (0.00%) 29.69 length=496,align1=0,align2=31: 29.69 (0.00%) 30.47 (-2.63%) 29.69 length=496,align1=31,align2=31: 28.12 (-2.86%) 28.12 (-2.86%) 27.34 length=1024,align1=0,align2=0: 44.53 (0.00%) 44.53 (0.00%) 44.53 length=1024,align1=32,align2=0: 44.53 (-1.79%) 44.53 (-1.79%) 43.75 length=1024,align1=0,align2=32: 44.53 (-1.79%) 43.75 (0.00%) 43.75 length=1024,align1=32,align2=32: 43.75 (1.75%) 43.75 (1.75%) 44.53 length=1056,align1=0,align2=0: 46.88 (-1.69%) 46.88 (-1.69%) 46.09 length=1056,align1=33,align2=0: 53.12 (0.00%) 52.34 (1.47%) 53.12 length=1056,align1=0,align2=33: 52.34 (0.00%) 53.12 (-1.49%) 52.34 length=1056,align1=33,align2=33: 46.09 (0.00%) 46.88 (-1.69%) 46.09 length=1088,align1=0,align2=0: 46.88 (-1.69%) 46.09 (0.00%) 46.09 length=1088,align1=34,align2=0: 52.34 (0.00%) 52.34 (0.00%) 52.34 length=1088,align1=0,align2=34: 53.12 (-3.03%) 53.12 (-3.03%) 51.56 length=1088,align1=34,align2=34: 46.09 (0.00%) 46.88 (-1.69%) 46.09 length=1120,align1=0,align2=0: 49.22 (-1.61%) 48.44 (0.00%) 48.44 length=1120,align1=35,align2=0: 54.69 (1.41%) 55.47 (0.00%) 55.47 length=1120,align1=0,align2=35: 57.03 (0.00%) 55.47 (2.74%) 57.03 length=1120,align1=35,align2=35: 48.44 (0.00%) 49.22 (-1.61%) 48.44 length=1152,align1=0,align2=0: 47.66 (1.61%) 48.44 (0.00%) 48.44 length=1152,align1=36,align2=0: 55.47 (-1.43%) 55.47 (-1.43%) 54.69 length=1152,align1=0,align2=36: 58.59 (-1.35%) 55.47 (4.05%) 57.81 length=1152,align1=36,align2=36: 48.44 (0.00%) 49.22 (-1.61%) 48.44 length=1184,align1=0,align2=0: 53.12 (-3.03%) 50.78 (1.52%) 51.56 length=1184,align1=37,align2=0: 61.72 (-2.60%) 57.03 (5.19%) 60.16 length=1184,align1=0,align2=37: 62.50 (-1.27%) 57.03 (7.60%) 61.72 length=1184,align1=37,align2=37: 53.12 (-1.49%) 50.78 (2.99%) 52.34 length=1216,align1=0,align2=0: 53.91 (-4.55%) 50.78 (1.52%) 51.56 length=1216,align1=38,align2=0: 60.94 (0.00%) 57.03 (6.41%) 60.94 length=1216,align1=0,align2=38: 60.16 (0.00%) 57.81 (3.90%) 60.16 length=1216,align1=38,align2=38: 52.34 (-1.52%) 50.00 (3.03%) 51.56 length=1248,align1=0,align2=0: 54.69 (-2.94%) 53.12 (0.00%) 53.12 length=1248,align1=39,align2=0: 64.06 (-1.23%) 60.16 (4.94%) 63.28 length=1248,align1=0,align2=39: 60.94 (-2.63%) 60.16 (-1.32%) 59.38 length=1248,align1=39,align2=39: 53.12 (0.00%) 52.34 (1.47%) 53.12 length=1280,align1=0,align2=0: 52.34 (-1.52%) 52.34 (-1.52%) 51.56 length=1280,align1=40,align2=0: 61.72 (3.66%) 59.38 (7.32%) 64.06 length=1280,align1=0,align2=40: 60.94 (-2.63%) 60.16 (-1.32%) 59.38 length=1280,align1=40,align2=40: 52.34 (-1.52%) 52.34 (-1.52%) 51.56 length=1312,align1=0,align2=0: 54.69 (-1.45%) 55.47 (-2.90%) 53.91 length=1312,align1=41,align2=0: 63.28 (0.00%) 62.50 (1.23%) 63.28 length=1312,align1=0,align2=41: 62.50 (0.00%) 62.50 (0.00%) 62.50 length=1312,align1=41,align2=41: 53.91 (0.00%) 54.69 (-1.45%) 53.91 length=1344,align1=0,align2=0: 54.69 (0.00%) 54.69 (0.00%) 54.69 length=1344,align1=42,align2=0: 62.50 (0.00%) 62.50 (0.00%) 62.50 length=1344,align1=0,align2=42: 62.50 (-1.27%) 62.50 (-1.27%) 61.72 length=1344,align1=42,align2=42: 53.91 (0.00%) 53.91 (0.00%) 53.91 length=1376,align1=0,align2=0: 65.62 (-16.67%) 68.75 (-22.22%) 56.25 length=1376,align1=43,align2=0: 71.88 (-9.52%) 73.44 (-11.90%) 65.62 length=1376,align1=0,align2=43: 72.66 (-12.05%) 74.22 (-14.46%) 64.84 length=1376,align1=43,align2=43: 64.06 (-13.89%) 67.97 (-20.83%) 56.25 length=1408,align1=0,align2=0: 57.03 (-1.39%) 68.75 (-22.22%) 56.25 length=1408,align1=44,align2=0: 65.62 (-1.20%) 73.44 (-13.25%) 64.84 length=1408,align1=0,align2=44: 64.84 (0.00%) 74.22 (-14.46%) 64.84 length=1408,align1=44,align2=44: 56.25 (-1.41%) 68.75 (-23.94%) 55.47 length=1440,align1=0,align2=0: 67.97 (-14.47%) 64.84 (-9.21%) 59.38 length=1440,align1=45,align2=0: 74.22 (-10.47%) 68.75 (-2.33%) 67.19 length=1440,align1=0,align2=45: 72.66 (-6.90%) 69.53 (-2.30%) 67.97 length=1440,align1=45,align2=45: 65.62 (-13.51%) 58.59 (-1.35%) 57.81 length=1472,align1=0,align2=0: 66.41 (-14.86%) 58.59 (-1.35%) 57.81 length=1472,align1=46,align2=0: 73.44 (-9.30%) 67.19 (0.00%) 67.19 length=1472,align1=0,align2=46: 70.31 (-4.65%) 67.97 (-1.16%) 67.19 length=1472,align1=46,align2=46: 57.81 (0.00%) 58.59 (-1.35%) 57.81 length=1504,align1=0,align2=0: 60.94 (0.00%) 60.94 (0.00%) 60.94 length=1504,align1=47,align2=0: 71.09 (-1.11%) 70.31 (0.00%) 70.31 length=1504,align1=0,align2=47: 70.31 (-1.12%) 70.31 (-1.12%) 69.53 length=1504,align1=47,align2=47: 60.94 (-1.30%) 60.16 (0.00%) 60.16 length=1536,align1=0,align2=0: 62.50 (-3.90%) 60.16 (0.00%) 60.16 length=1536,align1=48,align2=0: 60.94 (-1.30%) 60.16 (0.00%) 60.16 length=1536,align1=0,align2=48: 61.72 (-3.95%) 60.16 (-1.32%) 59.38 length=1536,align1=48,align2=48: 60.94 (-1.30%) 60.16 (0.00%) 60.16 length=1568,align1=0,align2=0: 80.47 (-27.16%) 63.28 (0.00%) 63.28 length=1568,align1=49,align2=0: 86.72 (-18.09%) 72.66 (1.06%) 73.44 length=1568,align1=0,align2=49: 74.22 (-3.26%) 74.22 (-3.26%) 71.88 length=1568,align1=49,align2=49: 62.50 (0.00%) 61.72 (1.25%) 62.50 length=1600,align1=0,align2=0: 62.50 (-1.27%) 62.50 (-1.27%) 61.72 length=1600,align1=50,align2=0: 73.44 (0.00%) 71.88 (2.13%) 73.44 length=1600,align1=0,align2=50: 72.66 (0.00%) 73.44 (-1.08%) 72.66 length=1600,align1=50,align2=50: 62.50 (-1.27%) 62.50 (-1.27%) 61.72 length=1632,align1=0,align2=0: 64.84 (0.00%) 64.84 (0.00%) 64.84 length=1632,align1=51,align2=0: 75.78 (0.00%) 75.00 (1.03%) 75.78 length=1632,align1=0,align2=51: 78.91 (0.00%) 75.78 (3.96%) 78.91 length=1632,align1=51,align2=51: 64.84 (-2.47%) 64.84 (-2.47%) 63.28 length=1664,align1=0,align2=0: 64.84 (-1.22%) 64.84 (-1.22%) 64.06 length=1664,align1=52,align2=0: 75.78 (0.00%) 75.00 (1.03%) 75.78 length=1664,align1=0,align2=52: 80.47 (-0.98%) 75.78 (4.90%) 79.69 length=1664,align1=52,align2=52: 64.06 (-1.23%) 65.62 (-3.70%) 63.28 length=1696,align1=0,align2=0: 69.53 (-3.49%) 72.66 (-8.14%) 67.19 length=1696,align1=53,align2=0: 80.47 (-0.98%) 82.03 (-2.94%) 79.69 length=1696,align1=0,align2=53: 80.47 (0.96%) 82.03 (-0.96%) 81.25 length=1696,align1=53,align2=53: 68.75 (-2.33%) 72.66 (-8.14%) 67.19 length=1728,align1=0,align2=0: 67.97 (0.00%) 72.66 (-6.90%) 67.97 length=1728,align1=54,align2=0: 80.47 (-0.98%) 82.81 (-3.92%) 79.69 length=1728,align1=0,align2=54: 78.91 (-1.00%) 82.03 (-5.00%) 78.12 length=1728,align1=54,align2=54: 68.75 (0.00%) 72.66 (-5.68%) 68.75 length=1760,align1=0,align2=0: 77.34 (-12.50%) 68.75 (0.00%) 68.75 length=1760,align1=55,align2=0: 91.41 (-8.33%) 79.69 (5.56%) 84.38 length=1760,align1=0,align2=55: 88.28 (-10.78%) 80.47 (-0.98%) 79.69 length=1760,align1=55,align2=55: 77.34 (-11.24%) 68.75 (1.12%) 69.53 length=1792,align1=0,align2=0: 78.12 (-14.94%) 68.75 (-1.15%) 67.97 length=1792,align1=56,align2=0: 88.28 (-4.63%) 79.69 (5.56%) 84.38 length=1792,align1=0,align2=56: 88.28 (-9.71%) 80.47 (0.00%) 80.47 length=1792,align1=56,align2=56: 77.34 (-11.24%) 68.75 (1.12%) 69.53 length=1824,align1=0,align2=0: 72.66 (7.92%) 70.31 (10.89%) 78.91 length=1824,align1=57,align2=0: 85.94 (5.17%) 82.03 (9.48%) 90.62 length=1824,align1=0,align2=57: 82.03 (3.67%) 82.81 (2.75%) 85.16 length=1824,align1=57,align2=57: 70.31 (-1.12%) 70.31 (-1.12%) 69.53 length=1856,align1=0,align2=0: 70.31 (-1.12%) 70.31 (-1.12%) 69.53 length=1856,align1=58,align2=0: 83.59 (-0.94%) 82.03 (0.94%) 82.81 length=1856,align1=0,align2=58: 178.12 (-115.09%) 82.81 (0.00%) 82.81 length=1856,align1=58,align2=58: 70.31 (-1.12%) 70.31 (-1.12%) 69.53 length=1888,align1=0,align2=0: 73.44 (-1.08%) 78.91 (-8.60%) 72.66 length=1888,align1=59,align2=0: 85.94 (0.00%) 89.84 (-4.55%) 85.94 length=1888,align1=0,align2=59: 84.38 (0.00%) 89.06 (-5.56%) 84.38 length=1888,align1=59,align2=59: 72.66 (-1.09%) 78.12 (-8.70%) 71.88 length=1920,align1=0,align2=0: 72.66 (-1.09%) 78.12 (-8.70%) 71.88 length=1920,align1=60,align2=0: 85.94 (0.00%) 89.84 (-4.55%) 85.94 length=1920,align1=0,align2=60: 85.16 (0.00%) 89.06 (-4.59%) 85.16 length=1920,align1=60,align2=60: 72.66 (-1.09%) 78.91 (-9.78%) 71.88 length=1952,align1=0,align2=0: 75.00 (-1.05%) 75.00 (-1.05%) 74.22 length=1952,align1=61,align2=0: 88.28 (0.00%) 87.50 (0.88%) 88.28 length=1952,align1=0,align2=61: 87.50 (0.00%) 88.28 (-0.89%) 87.50 length=1952,align1=61,align2=61: 74.22 (0.00%) 74.22 (0.00%) 74.22 length=1984,align1=0,align2=0: 75.00 (-1.05%) 73.44 (1.05%) 74.22 length=1984,align1=62,align2=0: 89.06 (-0.89%) 87.50 (0.88%) 88.28 length=1984,align1=0,align2=62: 87.50 (0.00%) 88.28 (-0.89%) 87.50 length=1984,align1=62,align2=62: 74.22 (0.00%) 74.22 (0.00%) 74.22 length=2016,align1=0,align2=0: 77.34 (-1.02%) 76.56 (0.00%) 76.56 length=2016,align1=63,align2=0: 91.41 (-0.86%) 90.62 (0.00%) 90.62 length=2016,align1=0,align2=63: 89.84 (0.00%) 90.62 (-0.87%) 89.84 length=2016,align1=63,align2=63: 77.34 (-1.02%) 76.56 (0.00%) 76.56 length=4096,align1=0,align2=0: 141.41 (-0.56%) 146.88 (-4.44%) 140.62 Function: memcpy __memcpy_thunderx __memcpy_falkor __memcpy_generic Variant: large ================================================================================ length=65543,align1=0,align2=0: 4018.75 (3.09%) 2634.38 (36.47%) 4146.88 length=65551,align1=0,align2=3: 4425.00 (-6.47%) 3134.38 (24.59%) 4156.25 length=65567,align1=3,align2=0: 2909.38 (29.95%) 3134.38 (24.53%) 4153.12 length=65599,align1=3,align2=5: 4415.62 (-6.16%) 3134.38 (24.64%) 4159.38 length=131079,align1=0,align2=0: 5765.62 (30.38%) 5240.62 (36.72%) 8281.25 length=131087,align1=0,align2=3: 8831.25 (-6.56%) 6271.88 (24.32%) 8287.50 length=131103,align1=3,align2=0: 5793.75 (29.05%) 6268.75 (23.23%) 8165.62 length=131135,align1=3,align2=5: 5806.25 (29.97%) 6259.38 (24.50%) 8290.62 length=262151,align1=0,align2=0: 11850.00 (28.91%) 10762.50 (35.43%) 16668.80 length=262159,align1=0,align2=3: 12043.80 (27.72%) 12700.00 (23.78%) 16662.50 length=262175,align1=3,align2=0: 12046.90 (27.90%) 12687.50 (24.07%) 16709.40 length=262207,align1=3,align2=5: 11984.40 (28.08%) 12678.10 (23.91%) 16662.50 length=524295,align1=0,align2=0: 24825.00 (25.00%) 24268.80 (27.34%) 33400.00 length=524303,align1=0,align2=3: 35731.20 (-6.53%) 25678.10 (23.44%) 33540.60 length=524319,align1=3,align2=0: 25893.80 (22.71%) 25725.00 (23.22%) 33503.10 length=524351,align1=3,align2=5: 25887.50 (22.86%) 25690.60 (23.45%) 33559.40 length=1048583,align1=0,align2=0: 50621.90 (0.30%) 50600.00 (0.34%) 50771.90 length=1048591,align1=0,align2=3: 53206.20 (0.54%) 51081.20 (4.51%) 53493.80 length=1048607,align1=3,align2=0: 53221.90 (0.32%) 51975.00 (2.66%) 53393.80 length=1048639,align1=3,align2=5: 53240.60 (0.36%) 51953.10 (2.77%) 53431.20 length=2097159,align1=0,align2=0: 103744.00 (-2.00%) 102447.00 (-1.00%) 102425.00 length=2097167,align1=0,align2=3: 108588.00 (-1.00%) 105159.00 (2.00%) 107606.00 length=2097183,align1=3,align2=0: 107678.00 (0.00%) 105250.00 (2.00%) 108125.00 length=2097215,align1=3,align2=5: 107906.00 (1.00%) 105841.00 (3.00%) 109475.00 length=4194311,align1=0,align2=0: 202994.00 (0.00%) 202500.00 (1.00%) 204809.00 length=4194319,align1=0,align2=3: 213350.00 (0.00%) 205997.00 (3.00%) 213384.00 length=4194335,align1=3,align2=0: 212653.00 (0.00%) 206444.00 (3.00%) 212900.00 length=4194367,align1=3,align2=5: 213044.00 (0.00%) 206084.00 (3.00%) 213847.00 length=8388615,align1=0,align2=0: 401294.00 (0.00%) 401231.00 (0.00%) 401944.00 length=8388623,align1=0,align2=3: 480872.00 (-14.00%) 406444.00 (3.00%) 422900.00 length=8388639,align1=3,align2=0: 422147.00 (0.00%) 407750.00 (3.00%) 422803.00 length=8388671,align1=3,align2=5: 442003.00 (-5.00%) 407125.00 (3.00%) 423509.00 length=16777223,align1=0,align2=0: 799809.00 (0.00%) 800000.00 (0.00%) 801756.00 length=16777231,align1=0,align2=3: 841184.00 (0.00%) 808525.00 (4.00%) 843775.00 length=16777247,align1=3,align2=0: 841166.00 (0.00%) 810147.00 (3.00%) 843147.00 length=16777279,align1=3,align2=5: 972569.00 (-16.00%) 808588.00 (4.00%) 843731.00 length=33554439,align1=0,align2=0: 1842240.00 (-0.01%) 1863590.00 (-1.17%) 1841990.00 length=33554447,align1=0,align2=3: 2103470.00 (-2.74%) 1919460.00 (6.25%) 2047440.00 length=33554463,align1=3,align2=0: 2075690.00 (-1.07%) 1930040.00 (6.02%) 2053720.00 length=33554495,align1=3,align2=5: 2110590.00 (-2.82%) 1924440.00 (6.25%) 2052650.00 Function: memcpy __memcpy_thunderx __memcpy_falkor __memcpy_generic Variant: random ================================================================================ max-size=4096: 44061.90 (5.85%) 38568.20 (17.59%) 46799.90 max-size=8192: 42790.90 (5.27%) 38158.90 (15.52%) 45171.50 max-size=16384: 44912.10 (2.25%) 38710.40 (15.75%) 45945.00 max-size=32768: 43577.90 (1.23%) 37975.10 (13.93%) 44120.00 max-size=65536: 44375.50 (1.04%) 38474.20 (14.20%) 44840.60 * manual/tunables.texi (Tunable glibc.tune.cpu): Add falkor. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memcpy_falkor. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (MAX_IFUNC): Bump. (__libc_ifunc_impl_list): Add __memcpy_falkor. * sysdeps/aarch64/multiarch/memcpy.c: Likewise. * sysdeps/aarch64/multiarch/memcpy_falkor.S: New file. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (cpu_list): Add falkor. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_FALKOR): New macro. |
||
Steve Ellcey
|
6a2c695266 |
aarch64: Thunderx specific memcpy and memmove
* sysdeps/aarch64/memcpy.S (MEMMOVE, MEMCPY): New macros. (memmove): Use MEMMOVE for name. (memcpy): Use MEMCPY for name. Change internal labels to external labels. * sysdeps/aarch64/multiarch/Makefile: New file. * sysdeps/aarch64/multiarch/ifunc-impl-list.c: Likewise. * sysdeps/aarch64/multiarch/init-arch.h: Likewise. * sysdeps/aarch64/multiarch/memcpy.c: Likewise. * sysdeps/aarch64/multiarch/memcpy_generic.S: Likewise. * sysdeps/aarch64/multiarch/memcpy_thunderx.S: Likewise. * sysdeps/aarch64/multiarch/memmove.c: Likewise. |