mirror of
https://sourceware.org/git/glibc.git
synced 2025-01-15 05:20:05 +00:00
6a5d6ea128
We have a single thread that runs a no-op initialization once and then repeatedly runs checks of the initialization (i.e., an acquire load and conditional jump) in a tight loop. This gives us, on average, the best-case latency of pthread_once (the initialization is the exactly-once slow path, and we're not looking at initialization-related synchronization overheads in this case).
99 lines
4.2 KiB
Plaintext
99 lines
4.2 KiB
Plaintext
Using the glibc microbenchmark suite
|
|
====================================
|
|
|
|
The glibc microbenchmark suite automatically generates code for specified
|
|
functions, builds and calls them repeatedly for given inputs to give some
|
|
basic performance properties of the function.
|
|
|
|
Running the benchmark:
|
|
=====================
|
|
|
|
The benchmark needs python 2.7 or later in addition to the
|
|
dependencies required to build the GNU C Library. One may run the
|
|
benchmark by invoking make as follows:
|
|
|
|
$ make bench
|
|
|
|
This runs each function for 10 seconds and appends its output to
|
|
benchtests/bench.out. To ensure that the tests are rebuilt, one could run:
|
|
|
|
$ make bench-clean
|
|
|
|
The duration of each test can be configured setting the BENCH_DURATION variable
|
|
in the call to make. One should run `make bench-clean' before changing
|
|
BENCH_DURATION.
|
|
|
|
$ make BENCH_DURATION=1 bench
|
|
|
|
The benchmark suite does function call measurements using architecture-specific
|
|
high precision timing instructions whenever available. When such support is
|
|
not available, it uses clock_gettime (CLOCK_PROCESS_CPUTIME_ID). One can force
|
|
the benchmark to use clock_gettime by invoking make as follows:
|
|
|
|
$ make USE_CLOCK_GETTIME=1 bench
|
|
|
|
Again, one must run `make bench-clean' before changing the measurement method.
|
|
|
|
Adding a function to benchtests:
|
|
===============================
|
|
|
|
If the name of the function is `foo', then the following procedure should allow
|
|
one to add `foo' to the bench tests:
|
|
|
|
- Append the function name to the bench variable in the Makefile.
|
|
|
|
- Make a file called `foo-inputs` to provide the definition and input for the
|
|
function. The file should have some directives telling the parser script
|
|
about the function and then one input per line. Directives are lines that
|
|
have a special meaning for the parser and they begin with two hashes '##'.
|
|
The following directives are recognized:
|
|
|
|
- args: This should be assigned a colon separated list of types of the input
|
|
arguments. This directive may be skipped if the function does not take any
|
|
inputs. One may identify output arguments by nesting them in <>. The
|
|
generator will create variables to get outputs from the calling function.
|
|
- ret: This should be assigned the type that the function returns. This
|
|
directive may be skipped if the function does not return a value.
|
|
- includes: This should be assigned a comma-separated list of headers that
|
|
need to be included to provide declarations for the function and types it
|
|
may need (specifically, this includes using "#include <header>").
|
|
- include-sources: This should be assigned a comma-separated list of source
|
|
files that need to be included to provide definitions of global variables
|
|
and functions (specifically, this includes using "#include "source").
|
|
See pthread_once-inputs and pthreads_once-source.c for an example of how
|
|
to use this to benchmark a function that needs state across several calls.
|
|
- name: See following section for instructions on how to use this directive.
|
|
|
|
Lines beginning with a single hash '#' are treated as comments. See
|
|
pow-inputs for an example of an input file.
|
|
|
|
Multiple execution units per function:
|
|
=====================================
|
|
|
|
Some functions have distinct performance characteristics for different input
|
|
domains and it may be necessary to measure those separately. For example, some
|
|
math functions perform computations at different levels of precision (64-bit vs
|
|
240-bit vs 768-bit) and mixing them does not give a very useful picture of the
|
|
performance of these functions. One could separate inputs for these domains in
|
|
the same file by using the `name' directive that looks something like this:
|
|
|
|
##name: 240bit
|
|
|
|
See the pow-inputs file for an example of what such a partitioned input file
|
|
would look like.
|
|
|
|
Benchmark Sets:
|
|
==============
|
|
|
|
In addition to standard benchmarking of functions, one may also generate
|
|
custom outputs for a set of functions. This is currently used by string
|
|
function benchmarks where the aim is to compare performance between
|
|
implementations at various alignments and for various sizes.
|
|
|
|
To add a benchset for `foo':
|
|
|
|
- Add `foo' to the benchset variable.
|
|
- Write your bench-foo.c that prints out the measurements to stdout.
|
|
- On execution, a bench-foo.out is created in $(objpfx) with the contents of
|
|
stdout.
|