Improve string benchtest timing. Many tests run for 0.01s which is way too
short to give accurate results. Other tests take over 40 seconds which is
way too long. Significantly increase the iterations of the short running
tests. Reduce number of alignment variations in the long running memcpy walk
tests so they take less than 5 seconds.
As a result most tests take at least 0.1s and all finish within 5 seconds.
* benchtests/bench-memcpy-random.c (do_one_test): Use medium iterations.
* benchtests/bench-memcpy-walk.c (test_main): Reduce alignment tests.
* benchtests/bench-memmem.c (do_one_test): Use small iterations.
* benchtests/bench-memmove-walk.c (test_main): Reduce alignment tests.
* benchtests/bench-memset-walk.c (test_main): Reduce alignment tests.
* benchtests/bench-strcasestr.c (do_one_test): Use small iterations.
* benchtests/bench-string.h (INNER_LOOP_ITERS): Increase iterations.
(INNER_LOOP_ITERS_MEDIUM): New define.
(INNER_LOOP_ITERS_SMALL): New define.
* benchtests/bench-strpbrk.c (do_one_test): Use medium iterations.
* benchtests/bench-strsep.c (do_one_test): Use small iterations.
* benchtests/bench-strspn.c (do_one_test): Use medium iterations.
* benchtests/bench-strstr.c (do_one_test): Use small iterations.
* benchtests/bench-strtok.c (do_one_test): Use small iterations.
This is a minor style change to move the definition of I to its usage
scope instead of at the top of the function. This is consistent with
glibc style guidelines and more importantly it was getting in the way
of my testing.
* benchtests/bench-memcpy-walk.c (do_test): Move declaration
of I into loop header.
* benchtests/bench-memmove-walk.c (do_test): Likewise.
Numbers for very small sizes (< 128B) are much noisier for non-cached
benchmarks like the walk benchmarks, so don't include them.
* benchtests/bench-memcpy-walk.c (START_SIZE): Set to 128.
* benchtests/bench-memmove-walk.c (START_SIZE): Likewise.
* benchtests/bench-memset-walk.c (START_SIZE): Likewise.
Make the walking benchmarks walk only backwards since copying both
ways is biased in favour of implementations that use non-temporal
stores for larger sizes; falkor is one of them. This also fixes up
bugs in computation of the result which ended up multiplying the
length with the timing result unnecessarily.
* benchtests/bench-memcpy-walk.c (do_one_test): Copy only
backwards. Fix timing computation.
* benchtests/bench-memmove-walk.c (do_one_test): Likewise.
* benchtests/bench-memset-walk.c (do_one_test): Walk backwards
on memset by N at a time. Fix timing computation.
This benchmark is an attempt to eliminate cache effects from string
benchmarks. The benchmark walks both ways through a large memory area
and copies different sizes of memory and alignments one at a time
instead of looping around in the same memory area. This is a good
metric to have alongside the other memcpy benchmarks, especially for
larger sizes where the likelihood of the call being done only once is
pretty high.
* benchtests/bench-memcpy-walk.c: New file.
* benchtests/Makefile (string-benchset): Add it.