glibc/sysdeps
Krzysztof Koch d1f75e9644 AArch64: Merge Falkor memcpy and memmove implementations
Falkor's memcpy and memmove share some implementation details,
therefore, the two routines are moved to a single source file
for code reuse.

The two routines now share code for small and medium copies
(up to and including 128 bytes). Large copies in memcpy do not
handle overlap correctly, consequently, the loops for
moving/copying more than 128 bytes stay separate for memcpy
and memmove.

To increase code reuse a number of small modifications were made:

1. The old implementation of memcpy copied the first 16-bytes as
   soon as the size of data was determined to be greater than 32 bytes.
   For memcpy code to also work when copying small/medium overlapping
   data, the first load and store was moved to the large copy case.
2. Medium memcpy case no longer assumes that 16 bytes were already
   copied and uses 8 registers to copy up to 128 bytes.
3. Small case for memmove was enlarged to that of memcpy, which is
   less than or equal to 32 bytes.
4. Medium case for memmove was enlarged to that of memcpy, which is
   less than or equal to 128 bytes.

Other changes include:

1. Improve alignment of existing loop bodies.
2. 'Delouse' memmove and memcpy input arguments. Make sure that
   upper 32-bits of input registers are zeroed if unused.
3. Do one more iteration in memmove loops and reduce the number of
   copies made from the start/end of the buffer, depending on
   the direction of the memmove loop.

Benchmarking:

Looking at the results from bench-memcpy-random.out, we can see that
now memmove_falkor is about 5% faster than memcpy_falkor_old, while
memmove_falkor_old was more than 15% slower. The memcpy implementation
remained largely unmodified, so there is no significant performance
change.

The reason for such a significant memmove performance gain is the
increase of the upper bound on the small copy case to 32 bytes and
the increase of the upper bound on the medium copy case to 128 bytes.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-06-08 14:13:05 +01:00
..
aarch64 AArch64: Merge Falkor memcpy and memmove implementations 2020-06-08 14:13:05 +01:00
alpha Update alpha libm-test-ulps 2020-04-08 13:52:45 -03:00
arm arm: XFAIL string/tst-memmove-overflow due to bug 25620 2020-05-13 16:48:26 +02:00
csky semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
generic Update HP_TIMING_NOW for _ISOMAC in sysdeps/generic/hp-timing.h 2020-06-05 09:44:06 -07:00
gnu Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
hppa dl-runtime: reloc_{offset,index} now functions arch overide'able 2020-06-05 13:45:46 -07:00
htl htl: Fix registration of atfork handlers in modules 2020-06-07 23:36:42 +00:00
hurd Hurd: Move <hurd/sigpreempt.h> internals into wrapper header 2020-05-28 11:40:13 +02:00
i386 hurd: fix clearing SS_ONSTACK when longjmp-ing from sighandler 2020-06-06 20:24:30 +02:00
ia64 semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
ieee754 ieee754: provide gcc builtins based generic fma functions 2020-06-03 10:23:28 -07:00
m68k math: Remove inline math tests 2020-03-19 11:45:44 -03:00
mach hurd: document that gcc&gdb look at the trampoline code 2020-06-08 14:41:57 +02:00
microblaze semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
mips Rename __LONG_DOUBLE_USES_FLOAT128 to __LDOUBLE_REDIRECTS_TO_FLOAT128_ABI 2020-04-30 08:52:08 -05:00
nios2 semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
nptl nptl: Add pthread_attr_setsigmask_np, pthread_attr_getsigmask_np 2020-06-02 11:59:18 +02:00
posix linux: Use internal DIR locks when accessing filepos on telldir 2020-05-27 11:55:00 -03:00
powerpc powerpc64le: add optimized strlen for P9 2020-06-05 15:30:00 -05:00
pthread pthread: Move back linking rules to nptl and htl 2020-06-08 14:34:22 +02:00
riscv semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
s390 ieee754: provide gcc builtins based generic fma functions 2020-06-03 10:23:28 -07:00
sh semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
sparc semaphore: consolidate arch headers into a generic one 2020-05-06 13:07:12 -07:00
unix Linux: Use __pthread_attr_setsigmask_internal for timer helper thread 2020-06-02 11:59:26 +02:00
wordsize-32 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
wordsize-64 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
x86 x86: Update Intel Atom processor family optimization 2020-05-21 13:36:54 -07:00
x86_64 dl-runtime: reloc_{offset,index} now functions arch overide'able 2020-06-05 13:45:46 -07:00