This patch further improves math function benchmarking by adding a latency
test in addition to throughput. This enables more accurate comparisons of the
math functions. The latency test works by creating a dependency on the previous
iteration: func_res = F (func_res * zero + input[i]). The multiply by zero
avoids changing the input.
It reports reciprocal throughput and latency in nanoseconds (depending on the
timing header used) and max/min throughput in iterations per second:
"workload-spec2006.wrf": {
"reciprocal-throughput": 100,
"latency": 200,
"max-throughput": 1.0e+07,
"min-throughput": 5.0e+06
}
* benchtests/bench-skeleton.c (main): Add support for
latency benchmarking.
* benchtests/scripts/bench.py: Add support for latency benchmarking.
Prevent function calls that don't return anything from being optimized
out by the compiler by marking its input variables as used.
This prevents the sincos function call from being optimized out in the
benchmark.
Add a new 'init' directive that specifies the name of the function to
call to do function-specific initialization. This is useful for
benchmarks that need to do a one-time initialization before the
functions are executed.
This patch adds an option to get detailed benchmark output for
functions. Invoking the benchmark with 'make DETAILED=1 bench' causes
each benchmark program to store a mean execution time for each input
it works on. This is useful to give a more comprehensive picture of
performance of functions compared to just the single mean figure.
This patch changes the output format of the main benchmark output file
(bench.out) to an extensible format. I chose JSON over XML because in
addition to being extensible, it is also not too verbose.
Additionally it has good support in python.
The significant change I have made in terms of functionality is to put
timing information as an attribute in JSON instead of a string and to
do that, there is a separate program that prints out a JSON snippet
mentioning the type of timing (hp_timing or clock_gettime). The mean
timing has now changed from iterations per unit to actual timing per
iteration.