Improve math benchmark infrastructure

Improve support for math function benchmarking.  This patch adds
a feature that allows accurate benchmarking of traces extracted
from real workloads.  This is done by iterating over all samples
rather than repeating each sample many times (which completely
ignores branch prediction and cache effects).  A trace can be
added to existing math function inputs via
"## name: workload-<name>", followed by the trace.

        * benchtests/README: Describe workload feature.
        * benchtests/bench-skeleton.c (main): Add support for
        benchmarking traces from workloads.
This commit is contained in:
Wilco Dijkstra 2017-06-20 16:26:26 +01:00
parent c0b23001a8
commit beb52f502f
3 changed files with 53 additions and 20 deletions

View File

@ -1,3 +1,9 @@
2017-06-20 Wilco Dijkstra <wdijkstr@arm.com>
* benchtests/README: Describe workload feature.
* benchtests/bench-skeleton.c (main): Add support for
benchmarking traces from workloads.
2017-06-20 Zack Weinberg <zackw@panix.com> 2017-06-20 Zack Weinberg <zackw@panix.com>
* string/string.h (__mempcpy_inline): Delete. * string/string.h (__mempcpy_inline): Delete.

View File

@ -102,6 +102,12 @@ the same file by using the `name' directive that looks something like this:
See the pow-inputs file for an example of what such a partitioned input file See the pow-inputs file for an example of what such a partitioned input file
would look like. would look like.
It is also possible to measure throughput of a (partial) trace extracted from
a real workload. In this case the whole trace is iterated over multiple times
rather than repeating every input multiple times. This can be done via:
##name: workload-<name>
Benchmark Sets: Benchmark Sets:
============== ==============

View File

@ -68,14 +68,30 @@ main (int argc, char **argv)
clock_gettime (CLOCK_MONOTONIC_RAW, &runtime); clock_gettime (CLOCK_MONOTONIC_RAW, &runtime);
runtime.tv_sec += DURATION; runtime.tv_sec += DURATION;
bool is_bench = strncmp (VARIANT (v), "workload-", 9) == 0;
double d_total_i = 0; double d_total_i = 0;
timing_t total = 0, max = 0, min = 0x7fffffffffffffff; timing_t total = 0, max = 0, min = 0x7fffffffffffffff;
int64_t c = 0; int64_t c = 0;
uint64_t cur;
while (1) while (1)
{ {
if (is_bench)
{
/* Benchmark a real trace of calls - all samples are iterated
over once before repeating. This models actual use more
accurately than repeating the same sample many times. */
TIMING_NOW (start);
for (k = 0; k < iters; k++)
for (i = 0; i < NUM_SAMPLES (v); i++)
BENCH_FUNC (v, i);
TIMING_NOW (end);
TIMING_DIFF (cur, start, end);
TIMING_ACCUM (total, cur);
d_total_i += iters * NUM_SAMPLES (v);
}
else
for (i = 0; i < NUM_SAMPLES (v); i++) for (i = 0; i < NUM_SAMPLES (v); i++)
{ {
uint64_t cur;
TIMING_NOW (start); TIMING_NOW (start);
for (k = 0; k < iters; k++) for (k = 0; k < iters; k++)
BENCH_FUNC (v, i); BENCH_FUNC (v, i);
@ -117,11 +133,16 @@ main (int argc, char **argv)
json_attr_double (&json_ctx, "duration", d_total_s); json_attr_double (&json_ctx, "duration", d_total_s);
json_attr_double (&json_ctx, "iterations", d_total_i); json_attr_double (&json_ctx, "iterations", d_total_i);
if (is_bench)
json_attr_double (&json_ctx, "throughput", d_total_s / d_total_i);
else
{
json_attr_double (&json_ctx, "max", max / d_iters); json_attr_double (&json_ctx, "max", max / d_iters);
json_attr_double (&json_ctx, "min", min / d_iters); json_attr_double (&json_ctx, "min", min / d_iters);
json_attr_double (&json_ctx, "mean", d_total_s / d_total_i); json_attr_double (&json_ctx, "mean", d_total_s / d_total_i);
}
if (detailed) if (detailed && !is_bench)
{ {
json_array_begin (&json_ctx, "timings"); json_array_begin (&json_ctx, "timings");