2013-05-21 09:29:50 +00:00
|
|
|
Using the glibc microbenchmark suite
|
|
|
|
====================================
|
|
|
|
|
|
|
|
The glibc microbenchmark suite automatically generates code for specified
|
|
|
|
functions, builds and calls them repeatedly for given inputs to give some
|
|
|
|
basic performance properties of the function.
|
|
|
|
|
|
|
|
Running the benchmark:
|
|
|
|
=====================
|
|
|
|
|
2013-12-06 08:21:09 +00:00
|
|
|
The benchmark needs python 2.7 or later in addition to the
|
|
|
|
dependencies required to build the GNU C Library. One may run the
|
|
|
|
benchmark by invoking make as follows:
|
2013-05-21 09:29:50 +00:00
|
|
|
|
|
|
|
$ make bench
|
|
|
|
|
|
|
|
This runs each function for 10 seconds and appends its output to
|
|
|
|
benchtests/bench.out. To ensure that the tests are rebuilt, one could run:
|
|
|
|
|
|
|
|
$ make bench-clean
|
|
|
|
|
|
|
|
The duration of each test can be configured setting the BENCH_DURATION variable
|
|
|
|
in the call to make. One should run `make bench-clean' before changing
|
|
|
|
BENCH_DURATION.
|
|
|
|
|
|
|
|
$ make BENCH_DURATION=1 bench
|
|
|
|
|
|
|
|
The benchmark suite does function call measurements using architecture-specific
|
|
|
|
high precision timing instructions whenever available. When such support is
|
|
|
|
not available, it uses clock_gettime (CLOCK_PROCESS_CPUTIME_ID). One can force
|
|
|
|
the benchmark to use clock_gettime by invoking make as follows:
|
|
|
|
|
|
|
|
$ make USE_CLOCK_GETTIME=1 bench
|
|
|
|
|
|
|
|
Again, one must run `make bench-clean' before changing the measurement method.
|
|
|
|
|
2016-04-20 05:28:20 +00:00
|
|
|
Running benchmarks on another target:
|
|
|
|
====================================
|
|
|
|
|
|
|
|
If the target where you want to run benchmarks is not capable of building the
|
|
|
|
code or you're cross-building, you could build and execute the benchmark in
|
|
|
|
separate steps. On the build system run:
|
|
|
|
|
|
|
|
$ make bench-build
|
|
|
|
|
|
|
|
and then copy the source and build directories to the target and run the
|
|
|
|
benchmarks from the build directory as usual:
|
|
|
|
|
|
|
|
$ make bench
|
|
|
|
|
|
|
|
make sure the copy preserves timestamps by using either rsync or scp -p
|
2016-04-20 07:49:01 +00:00
|
|
|
otherwise the above command may try to build the benchmark again. Benchmarks
|
|
|
|
that require generated code to be executed during the build are skipped when
|
|
|
|
cross-building.
|
2016-04-20 05:28:20 +00:00
|
|
|
|
2013-05-21 09:29:50 +00:00
|
|
|
Adding a function to benchtests:
|
|
|
|
===============================
|
|
|
|
|
|
|
|
If the name of the function is `foo', then the following procedure should allow
|
|
|
|
one to add `foo' to the bench tests:
|
|
|
|
|
|
|
|
- Append the function name to the bench variable in the Makefile.
|
|
|
|
|
2013-10-07 06:21:24 +00:00
|
|
|
- Make a file called `foo-inputs` to provide the definition and input for the
|
|
|
|
function. The file should have some directives telling the parser script
|
|
|
|
about the function and then one input per line. Directives are lines that
|
|
|
|
have a special meaning for the parser and they begin with two hashes '##'.
|
|
|
|
The following directives are recognized:
|
|
|
|
|
|
|
|
- args: This should be assigned a colon separated list of types of the input
|
|
|
|
arguments. This directive may be skipped if the function does not take any
|
Accept output arguments to benchmark functions
This patch adds the ability to accept output arguments to functions
being benchmarked, by nesting the argument type in <> in the args
directive. It includes the sincos implementation as an example, where
the function would have the following args directive:
## args: double:<double *>:<double *>
This simply adds a definition for a static variable whose pointer gets
passed into the function, so it's not yet possible to pass something
more complicated like a pre-allocated string or array. That would be
a good feature to add if a function needs it.
The values in the input file will map only to the input arguments. So
if I had a directive like this for a function foo:
## args: int:<int *>:int:<int *>
and I have a value list like this:
1, 2
3, 4
5, 6
then the function calls generated would be:
foo (1, &out1, 2, &out2);
foo (3, &out1, 4, &out2);
foo (5, &out1, 6, &out2);
2013-12-05 04:42:59 +00:00
|
|
|
inputs. One may identify output arguments by nesting them in <>. The
|
|
|
|
generator will create variables to get outputs from the calling function.
|
2013-10-07 06:21:24 +00:00
|
|
|
- ret: This should be assigned the type that the function returns. This
|
|
|
|
directive may be skipped if the function does not return a value.
|
2013-10-08 11:04:10 +00:00
|
|
|
- includes: This should be assigned a comma-separated list of headers that
|
2013-10-07 06:21:24 +00:00
|
|
|
need to be included to provide declarations for the function and types it
|
2013-10-08 11:04:10 +00:00
|
|
|
may need (specifically, this includes using "#include <header>").
|
|
|
|
- include-sources: This should be assigned a comma-separated list of source
|
|
|
|
files that need to be included to provide definitions of global variables
|
|
|
|
and functions (specifically, this includes using "#include "source").
|
2013-10-08 11:17:01 +00:00
|
|
|
See pthread_once-inputs and pthreads_once-source.c for an example of how
|
|
|
|
to use this to benchmark a function that needs state across several calls.
|
2014-02-22 04:39:27 +00:00
|
|
|
- init: Name of an initializer function to call to initialize the benchtest.
|
2013-10-07 06:21:24 +00:00
|
|
|
- name: See following section for instructions on how to use this directive.
|
|
|
|
|
|
|
|
Lines beginning with a single hash '#' are treated as comments. See
|
|
|
|
pow-inputs for an example of an input file.
|
2013-05-21 09:29:50 +00:00
|
|
|
|
|
|
|
Multiple execution units per function:
|
|
|
|
=====================================
|
|
|
|
|
|
|
|
Some functions have distinct performance characteristics for different input
|
|
|
|
domains and it may be necessary to measure those separately. For example, some
|
|
|
|
math functions perform computations at different levels of precision (64-bit vs
|
|
|
|
240-bit vs 768-bit) and mixing them does not give a very useful picture of the
|
|
|
|
performance of these functions. One could separate inputs for these domains in
|
|
|
|
the same file by using the `name' directive that looks something like this:
|
|
|
|
|
|
|
|
##name: 240bit
|
|
|
|
|
|
|
|
See the pow-inputs file for an example of what such a partitioned input file
|
|
|
|
would look like.
|
2013-04-16 12:07:24 +00:00
|
|
|
|
2017-06-20 15:26:26 +00:00
|
|
|
It is also possible to measure throughput of a (partial) trace extracted from
|
|
|
|
a real workload. In this case the whole trace is iterated over multiple times
|
|
|
|
rather than repeating every input multiple times. This can be done via:
|
|
|
|
|
|
|
|
##name: workload-<name>
|
|
|
|
|
2013-04-16 12:07:24 +00:00
|
|
|
Benchmark Sets:
|
|
|
|
==============
|
|
|
|
|
|
|
|
In addition to standard benchmarking of functions, one may also generate
|
|
|
|
custom outputs for a set of functions. This is currently used by string
|
|
|
|
function benchmarks where the aim is to compare performance between
|
|
|
|
implementations at various alignments and for various sizes.
|
|
|
|
|
|
|
|
To add a benchset for `foo':
|
|
|
|
|
|
|
|
- Add `foo' to the benchset variable.
|
|
|
|
- Write your bench-foo.c that prints out the measurements to stdout.
|
|
|
|
- On execution, a bench-foo.out is created in $(objpfx) with the contents of
|
|
|
|
stdout.
|
2017-09-16 06:17:32 +00:00
|
|
|
|
|
|
|
Reading String Benchmark Results:
|
|
|
|
================================
|
|
|
|
|
|
|
|
Some of the string benchmark results are now in JSON to make it easier to read
|
|
|
|
in scripts. Use the benchtests/compare_strings.py script to show the results
|
|
|
|
in a tabular format, generate graphs and more. Run
|
|
|
|
|
|
|
|
benchtests/scripts/compare_strings.py -h
|
|
|
|
|
|
|
|
for usage information.
|