I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
DELOUSE was added to asm code to make them compatible with non-LP64
ABIs, but it is an unfortunate name and the code was not compatible
with ABIs where pointer and size_t are different. Glibc currently
only supports the LP64 ABI so these macros are not really needed or
tested, but for now the name is changed to be more meaningful instead
of removing them completely.
Some DELOUSE macros were dropped: clone, strlen and strnlen used it
unnecessarily.
The out of tree ILP32 patches are currently not maintained and will
likely need a rework to rebase them on top of the time64 changes.
The loop body is expanded from a 16-byte comparison to a 64-byte
comparison, and the usage of ldp is replaced by the Post-index
mode to the Base plus offset mode. Hence, compare can faster 18%
around > 128 bytes in all.
Checked on aarch64-linux-gnu.
Reviewed-by: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
I goofed up when changing the loop8 name to loop16 and missed on out
the branch instance. Fixed and actually build tested this time.
* sysdeps/aarch64/memcmp.S (more16): Fix branch target loop16.
This improved memcmp provides a fast path for compares up to 16 bytes
and then compares 16 bytes at a time, thus optimizing loads from both
sources. The glibc memcmp microbenchmark retains performance (with an
error of ~1ns) for smaller compare sizes and reduces up to 31% of
execution time for compares up to 4K on the APM Mustang. On Qualcomm
Falkor this improves to almost 48%, i.e. it is almost 2x improvement
for sizes of 2K and above.
* sysdeps/aarch64/memcmp.S: Widen comparison to 16 bytes at a
time.
This is an optimized memcmp for AArch64. This is a complete rewrite
using a different algorithm. The previous version split into cases
where both inputs were aligned, the inputs were mutually aligned and
unaligned using a byte loop. The new version combines all these cases,
while small inputs of less than 8 bytes are handled separately.
This allows the main code to be sped up using unaligned loads since
there are now at least 8 bytes to be compared. After the first 8 bytes,
align the first input. This ensures each iteration does at most one
unaligned access and mutually aligned inputs behave as aligned.
After the main loop, process the last 8 bytes using unaligned accesses.
This improves performance of (mutually) aligned cases by 25% and
unaligned by >500% (yes >6 times faster) on large inputs.
* sysdeps/aarch64/memcmp.S (memcmp):
Rewrite of optimized memcmp.
This patch moves the AArch64 port to the main sysdeps hierarchy. The
move is essentially:
git mv ports/sysdeps/aarch64 sysdeps/aarch64
git mv ports/sysdeps/unix/sysv/linux/aarch64 sysdeps/unix/sysv/linux/aarch64
The README is updated and I've updated ChangeLog.aarch64 along the
lines of the ARM move. The AArch64 build has been tested to confirm
that there were no changes in objdump -dr output or the shared
objects.