Previous method would produce too many time() invocations,
becoming a significant fraction of workload measured.
The new strategy is to use time() only once per batch,
and dynamically resize batch size so that each round lasts approximately 1 second.
This only matters for small inputs.
Measurement for large files (such as silesia.tar) are much less impacted
(though decoding speed is so fast that even medium-size files will notice an improvement).
The timer used was only accurate up to 0.01 seconds. This timer is accurate up to 1 ns.
It is a monotonic timer that measures the real time difference, not on CPU time.
Copied the benchmark code from 6ab4d5e904