Numbers for very small sizes (< 128B) are much noisier for non-cached
benchmarks like the walk benchmarks, so don't include them.
* benchtests/bench-memcpy-walk.c (START_SIZE): Set to 128.
* benchtests/bench-memmove-walk.c (START_SIZE): Likewise.
* benchtests/bench-memset-walk.c (START_SIZE): Likewise.
Make the walking benchmarks walk only backwards since copying both
ways is biased in favour of implementations that use non-temporal
stores for larger sizes; falkor is one of them. This also fixes up
bugs in computation of the result which ended up multiplying the
length with the timing result unnecessarily.
* benchtests/bench-memcpy-walk.c (do_one_test): Copy only
backwards. Fix timing computation.
* benchtests/bench-memmove-walk.c (do_one_test): Likewise.
* benchtests/bench-memset-walk.c (do_one_test): Walk backwards
on memset by N at a time. Fix timing computation.
This benchmark is an attempt to eliminate cache effects from string
benchmarks. The benchmark walks backward through a large memory area
and sets different sizes of memory and alignments one at a time
instead of looping around in the same memory area. This is a good
metric to have alongside the simple memset benchmark (which is only
really useful for smaller sizes) especially for larger sizes where the
likelihood of the call being done only once is pretty high.
* benchtests/bench-memset-walk.c: New file.
* benchtests/Makefile (string-benchset): Add it.