This benchmark is an attempt to eliminate cache effects from string
benchmarks. The benchmark walks both ways through a large memory area
and copies different sizes of memory and alignments one at a time
instead of looping around in the same memory area. This is a good
metric to have alongside the simple memmove benchmark (which is only
really useful for smaller sizes) especially for larger sizes where the
likelihood of the call being done only once is pretty high.
This benchmark is different from memcpy in that it also tests
overlapping copies.
* benchtests/bench-memmove-walk.c: New file.
* benchtests/Makefile (string-benchset): Add it.
This benchmark is an attempt to eliminate cache effects from string
benchmarks. The benchmark walks backward through a large memory area
and sets different sizes of memory and alignments one at a time
instead of looping around in the same memory area. This is a good
metric to have alongside the simple memset benchmark (which is only
really useful for smaller sizes) especially for larger sizes where the
likelihood of the call being done only once is pretty high.
* benchtests/bench-memset-walk.c: New file.
* benchtests/Makefile (string-benchset): Add it.
This benchmark is an attempt to eliminate cache effects from string
benchmarks. The benchmark walks both ways through a large memory area
and copies different sizes of memory and alignments one at a time
instead of looping around in the same memory area. This is a good
metric to have alongside the other memcpy benchmarks, especially for
larger sizes where the likelihood of the call being done only once is
pretty high.
* benchtests/bench-memcpy-walk.c: New file.
* benchtests/Makefile (string-benchset): Add it.
exp2f and log2f benchmark traces are just copies of the existing
expf and logf traces from wrf_r.
* benchtests/Makefile: Add exp2f and log2f benchmarks.
* benchtests/exp2f-inputs: Copy of expf-inputs.
* benchtests/log2f-inputs: Copy of logf-inputs.
Add a trace for logf. This is a reduced trace based on 2.8 billion
samples extracted from wrf_r.
* benchtests/Makefile: Add logf benchmark.
* benchtests/logf-inputs: Add reduced trace from wrf_r.
Add a trace for expf. This is a reduced trace based on 2.4 billion
samples extracted from wrf_r.
* benchtests/Makefile: Add expf benchmark.
* benchtests/expf-inputs: Add reduced trace from wrf_r.
This patch adds benchtests for the trunc and truncf functions. The
inputs listed are fairly arbitrary; I do not assert they are
representative of any particular application.
* benchtests/Makefile (bench-math): Add trunc and truncf.
(CFLAGS-bench-trunc.c): New variable.
(CFLAGS-bench-truncf.c): Likewise.
* benchtests/trunc-inputs: New file.
* benchtests/truncf-inputs: Likewise.
Add powf() bench test with input which covers these cases:
- positive base to positive exponent
- exponent 0
- negative base to even exponent
- exponent 1
- exponent -1
- squared
- squareroot
- 1 to negative exponent
- -1 to negative exponent
- base 0
- -1 to even exponent
- small base
- small exponent
* benchtests/Makefile (bench-math): Add powf.
* benchtests/powf-inputs: New file.
Current allocate_stack logic for create stacks is to first mmap all
the required memory with the desirable memory and then mprotect the
guard area with PROT_NONE if required. Although it works as expected,
it pessimizes the allocation because it requires the kernel to actually
increase commit charge (it counts against the available physical/swap
memory available for the system).
The only issue is to actually check this change since side-effects are
really Linux specific and to actually account them it would require a
kernel specific tests to parse the system wide information. On the kernel
I checked /proc/self/statm does not show any meaningful difference for
vmm and/or rss before and after thread creation. I could only see
really meaningful information checking on system wide /proc/meminfo
between thread creation: MemFree, MemAvailable, and Committed_AS shows
large difference without the patch. I think trying to use these
kind of information on a testcase is fragile.
The BZ#18988 reports shows that the commit pages are easily seen with
mlockall (MCL_FUTURE) (with lock all pages that become mapped in the
process) however a more straighfoward testcase shows that pthread_create
could be faster using this patch:
--
static const int inner_count = 256;
static const int outer_count = 128;
static
void *thread1(void *arg)
{
return NULL;
}
static
void *sleeper(void *arg)
{
pthread_t ts[inner_count];
for (int i = 0; i < inner_count; i++)
pthread_create (&ts[i], &a, thread1, NULL);
for (int i = 0; i < inner_count; i++)
pthread_join (ts[i], NULL);
return NULL;
}
int main(void)
{
pthread_attr_init(&a);
pthread_attr_setguardsize(&a, 1<<20);
pthread_attr_setstacksize(&a, 1134592);
pthread_t ts[outer_count];
for (int i = 0; i < outer_count; i++)
pthread_create(&ts[i], &a, sleeper, NULL);
for (int i = 0; i < outer_count; i++)
pthread_join(ts[i], NULL);
assert(r == 0);
}
return 0;
}
--
On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests
I see:
$ time ./test
real 0m3.647s
user 0m0.080s
sys 0m11.836s
While with the patch I see:
$ time ./test
real 0m0.696s
user 0m0.040s
sys 0m1.152s
So I added a pthread_create benchtest (thread_create) which check
the thread creation latency. As for the simple benchtests, I saw
improvements in thread creation on all architectures I tested the
change.
Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
arm-linux-gnueabihf, powerpc64le-linux-gnu, sparc64-linux-gnu,
and sparcv9-linux-gnu.
[BZ #18988]
* benchtests/thread_create-inputs: New file.
* benchtests/thread_create-source.c: Likewise.
* support/xpthread_attr_setguardsize.c: Likewise.
* support/Makefile (libsupport-routines): Add
xpthread_attr_setguardsize object.
* support/xthread.h: Add xpthread_attr_setguardsize prototype.
* benchtests/Makefile (bench-pthread): Add thread_create.
* nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and
then mprotect the required area.
cppflags-iterator.mk no longer has anything to do with CPPFLAGS; all
it does is set libof-$(foo) for a list of files. extra-modules.mk
does the same thing, but with a different input variable, and doesn't
let the caller control the module. Therefore, this patch gives
cppflags-iterator.mk a better name, removes extra-modules.mk, and
updates all uses of both.
* extra-modules.mk: Delete file.
* cppflags-iterator.mk: Rename to ...
* libof-iterator.mk: ...this. Adjust comments.
* Makerules, extra-lib.mk, benchtests/Makefile, elf/Makefile
* elf/rtld-Rules, iconv/Makefile, locale/Makefile, malloc/Makefile
* nscd/Makefile, sunrpc/Makefile, sysdeps/s390/Makefile:
Use libof-iterator.mk instead of cppflags-iterator.mk or
extra-modules.mk.
* benchtests/strcoll-inputs/filelist#en_US.UTF-8: Remove
extra-modules.mk and cppflags-iterator.mk, add libof-iterator.mk.
of the size and alignment is based on a trace of SPEC2006. Instead of
repeating the same copy over and over again like the existing tests, it times
several thousand different copies to more accurately estimate the overhead of
branch prediction.
* benchtests/Makefile (string-benchset): Add memcpy-random.
* benchtests/bench-memcpy-random.c: New file.
Add a configure check that looks for python3 and python in that order
since we had agreed in the past to prefer python3 over python in all
our code. The patch also adjusts invocations through the various
Makefiles to use the set variable.
* configure.ac: Check for python3 or python.
* configure: Regenerated.
* config.make.in (PYTHON): New variable.
* benchtests/Makefile: Don't define PYTHON.
(bench): Define target only if PYTHON was defined.
* Rules: Don't define PYTHON.
Define pretty printer targets only if PYTHON was defined.
(tests-printers): Add to tests-unsupported if PYTHON is not
found.
(python-flags, python-invoke): Remove.
(tests-printers-out): Use PYTHON instead of python-invoke.
This patch adds fmaxf and fminf benchtests. It is based on
math/s_fmax_template.c implementation which checks for basically four
different classes:
1. if x is greater or equal than y.
2. if x is less than y.
3. if x or y is signaling.
4. if y is nan.
Cases 1 and 2 are used for default input number (by mixing normal double
numbers and infinity), while case 3 and 4 are used each for on for a
benchmark class.
Checked on x86_64-linux-gnu and powerpc64-linux-gnu.
* benchtests/Makefile (bench-math): Add fminf and fmaxf.
(CFLAGS-bench-fmaxf.c): New rule.
(CFLAGS-bench-fminf.c): Likewise.
* benchtests/fmaxf-inputs: New file.
* benchtests/fminf-inputs: Likewise.
This patch adds fmax and fmin benchtests. It is based math/s_fmax_template.c
implementation which checks for basically four different classes:
1. if x is greater or equal than y.
2. if x is less than y.
3. if x or y is signaling.
4. if y is nan.
Cases 1 and 2 are used for default input number (by mixing normal double
numbers and infinity), while case 3 and 4 are used each for on for a
benchmark class.
Checked on x86_64-linux-gnu and powerpc64-linux-gnu.
* benchtests/Makefile (bench-math): Add fmin and fmax.
(CFLAGS-bench-fmax.c): New rule.
(CFLAGS-bench-fmin.c): New rule.
* benchtests/fmax-inputs: New file.
* benchtests/fmin-inputs: Likewise.
Benchsets in benchtests use test-skeleton, so they too need to be
linked against the new libsupport DSO.
* benchtests/Makefile (binaries-benchset): Depend on libsupport
DSO.
This patch makes the sqrt benchmark use -fno-builtin, as already done
for benchmarks of ffs and ffsll, so that it actually benchmarks the
glibc function as (presumably) intended even in the presence of the
compiler inlining sqrt.
Tested for x86_64 and also used for benchmarking my ARM sqrt patch.
* benchtests/Makefile (CFLAGS-bench-sqrt.c): New variable.
This patch adds full support for cross-building benchmarks. Some
benchmarks like those that need locales to be generated cannot be
built and are hence skipped for cross builds.
Tested by cross building for aarch64 on x86_64 and then running the
generated benchmark on aarch64.
* benchtests/Makefile (wcsmbs-benchset): Include only for
native builds and runs.
(LOCALES): Likewise.
(bench-build): Build timing-type here instead of the bench
target. Generate locale only for native builds.
* benchtests/README: Add note for cross-building.
For situations where we are cross-building or where we want to avoid
building on the target system, we want a way to only build benchmarks
and then copy them over to the target system to run them. I have also
added a simple enhancement for the 'bench' target where all benchmark
binaries are built and then the benchmarks executed.
Tested on arm.
Makefile.in (bench-build): New target.
Rules (PHONY): Add bench-build target.
benchtests/Makefile (bench): Depend on bench-build.
(bench-build): New target.
From the bug:
Obsolete locale. The ISO-639 code for Hebrew was changed from 'iw'
to 'he' in 1989, according to Bruno Haible on libc-alpha 2003-09-01.
Reported-by: Chris Leonard <cjlhomeaddress@gmail.com>
benchtests should use $(test-via-rtld-prefix) and $(+link-tests) like
other glibc tests.
[BZ #19783]
* benchtests/Makefile (run-bench): Replace $(rtld-prefix) with
$(test-via-rtld-prefix).
($(binaries-bench)): Replace $(+link) with $(+link-tests).
The ffs and ffsll functions were listed as math functions when they
are actually defined in strings.h and string.h respectively. Shuffle
around the Makefile variables a bit and make a separate space for ffs
and ffsll.
ChangeLog:
2015-09-18 Wilco Dijkstra <wdijkstr@arm.com>
* benchtests/Makefile: Add bench-math-inlines, link with libm.
* benchtests/bench-math-inlines.c: New benchmark.
* benchtests/bench-util.h: New file.
* benchtests/bench-util.c: New file.
* benchtests/bench-skeleton.c: Add include of bench-util.c/h.
This patch provides optimized versions of strcmp and wcscmp with the z13
vector instructions.
The architecture specific string.h had a typo, which leads to ommiting the
inline version in this file if __USE_STRING_INLINES is defined.
Tested this inline version by tweaking test-strcmp.c.
ChangeLog:
* sysdeps/s390/multiarch/strcmp-vx.S: New File.
* sysdeps/s390/multiarch/strcmp.c: Likewise.
* sysdeps/s390/multiarch/wcscmp-c.c: Likewise.
* sysdeps/s390/multiarch/wcscmp-vx.S: Likewise.
* sysdeps/s390/multiarch/wcscmp.c: Likewise.
* sysdeps/s390/s390-32/multiarch/strcmp.c: Likewise.
* sysdeps/s390/s390-64/multiarch/strcmp.c: Likewise.
* sysdeps/s390/multiarch/Makefile (sysdep_routines): Add strcmp and
wcscmp functions.
* sysdeps/s390/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add ifunc test for strcmp, wcscmp.
* string/strcmp.c (STRCMP): Define and use macro.
* benchtests/bench-wcscmp.c: New File.
* benchtests/Makefile (wcsmbs-bench): Add wcscmp.
* sysdeps/s390/bits/string.h: Fix typo: _HAVE_STRING_ARCH_strcmp
instead of _HAVE_STRING_ARCH_memchr.
This patch provides optimized versions of strlen and wcslen with the z13 vector
instructions.
The helper macro IFUNC_VX_IMPL is introduced and is used to register all
__<func>_c() and __<func>_vx() functions within __libc_ifunc_impl_list()
to the ifunc test framework.
ChangeLog:
* sysdeps/s390/multiarch/Makefile: New File.
* sysdeps/s390/multiarch/strlen-c.c: Likewise.
* sysdeps/s390/multiarch/strlen-vx.S: Likewise.
* sysdeps/s390/multiarch/strlen.c: Likewise.
* sysdeps/s390/multiarch/wcslen-c.c: Likewise.
* sysdeps/s390/multiarch/wcslen-vx.S: Likewise.
* sysdeps/s390/multiarch/wcslen.c: Likewise.
* string/strlen.c (STRLEN): Define and use macro.
* sysdeps/s390/multiarch/ifunc-impl-list.c
(IFUNC_VX_IMPL): New macro function.
(__libc_ifunc_impl_list): Add ifunc test for strlen, wcslen.
* benchtests/Makefile (wcsmbs-bench): New variable.
(string-bench-all): Added wcsmbs-bench.
* benchtests/bench-wcslen.c: New File.
Add a microbenchmark for measuring malloc and free performance with
varying numbers of threads. The benchmark allocates and frees buffers
of random sizes in a random order and measures the overall execution
time and RSS. Variants of the benchmark are run with 1, 8, 16 and
32 threads.
The random block sizes used follow an inverse square distribution
which is intended to mimic the behaviour of real applications which
tend to allocate many more small blocks than large ones.
ChangeLog:
2014-11-05 Will Newton <will.newton@linaro.org>
* benchtests/Makefile: (bench-malloc): Add malloc thread
scalability benchmark.
* benchtests/bench-malloc-threads.c: New file.
One wart in the original support for test wrappers for cross testing,
as noted in
<https://sourceware.org/ml/libc-alpha/2012-10/msg00722.html>, is the
requirement for test wrappers to pass a poorly-defined set of
environment variables from the build system to the system running the
glibc under test. Although some variables are passed explicitly via
$(test-wrapper-env), including LD_* variables that simply can't be
passed implicitly because of the side effects they'd have on the build
system's dynamic linker, others are passed implicitly, including
variables such as GCONV_PATH and LOCPATH that could potentially affect
the build system's libc (so effectively relying on any such effects
not breaking the wrappers). In addition, the code in
cross-test-ssh.sh for preserving environment variables is fragile (it
depends on how bash formats a list of exported variables, and could
well break for multi-line variable definitions where the contents
contain things looking like other variable definitions).
This patch moves to explicitly passing environment variables via
$(test-wrapper-env). Makefile variables that previously used
$(test-wrapper) are split up into -before-env and -after-env parts
that can be passed separately to the various .sh files used in
testing, so those files can then insert environment settings between
the two parts.
The common default environment settings in make-test-out are made into
a separate makefile variable that can also be passed to scripts,
rather than many scripts duplicating those settings (for testing an
installed glibc, it is desirable to have the GCONV_PATH setting on
just one place, so just that one place needs to support it pointing to
an installed sysroot instead of the build tree). The default settings
are included in the variables such as $(test-program-prefix), so that
if tests do not need any non-default settings they can continue to use
single variables rather than the split-up variables.
Although this patch cleans up LC_ALL=C settings (that being part of
the common defaults), various LANG=C and LANGUAGE=C settings remain.
Those are generally unnecessary and I propose a subsequent cleanup to
remove them. LC_ALL takes precedence over LANG, and while LANGUAGE
takes precedence over LC_ALL, it only does so for settings other than
LC_ALL=C. So LC_ALL=C on its own is sufficient to ensure the C
locale, and anything that gets LC_ALL=C does not need the other
settings.
While preparing this patch I noticed some tests with .sh files that
appeared to do nothing beyond what the generic makefile support for
tests can do (localedata/tst-wctype.sh - the makefiles support -ENV
variables and .input files - and localedata/tst-mbswcs.sh - just runs
five tests that could be run individually from the makefile). So I
propose another subsequent cleanup to move those to using the generic
support instead of special .sh files.
Tested x86_64 (native) and powerpc32 (cross).
* Makeconfig (run-program-env): New variable.
(run-program-prefix-before-env): Likewise.
(run-program-prefix-after-env): Likewise.
(run-program-prefix): Define in terms of new variables.
(built-program-cmd-before-env): New variable.
(built-program-cmd-after-env): Likewise.
(built-program-cmd): Define in terms of new variables.
(test-program-prefix-before-env): New variable.
(test-program-prefix-after-env): Likewise.
(test-program-prefix): Define in terms of new variables.
(test-program-cmd-before-env): New variable.
(test-program-cmd-after-env): Likewise.
(test-program-cmd): Define in terms of new variables.
* Rules (make-test-out): Use $(run-program-env).
* scripts/cross-test-ssh.sh (env_blacklist): Remove variable.
(help): Do not mention environment variables. Mention
--timeoutfactor option.
(timeoutfactor): New variable.
(blacklist_exports): Remove function.
(exports): Remove variable.
(command): Do not include ${exports}.
* manual/install.texi (Configuring and compiling): Do not mention
test wrappers preserving environment variables. Mention that last
assignment to a variable must take precedence.
* INSTALL: Regenerated.
* benchtests/Makefile (run-bench): Use $(run-program-env).
* catgets/Makefile ($(objpfx)test1.cat): Use
$(built-program-cmd-before-env), $(run-program-env) and
$(built-program-cmd-after-env).
($(objpfx)test2.cat): Do not specify environment variables
explicitly.
($(objpfx)de/libc.cat): Use $(built-program-cmd-before-env),
$(run-program-env) and $(built-program-cmd-after-env).
($(objpfx)test-gencat.out): Use $(test-program-cmd-before-env),
$(run-program-env) and $(test-program-cmd-after-env).
($(objpfx)sample.SJIS.cat): Do not specify environment variables
explicitly.
* catgets/test-gencat.sh: Use test_program_cmd_before_env,
run_program_env and test_program_cmd_after_env arguments.
* elf/Makefile ($(objpfx)tst-pathopt.out): Use $(run-program-env).
* elf/tst-pathopt.sh: Use run_program_env argument.
* iconvdata/Makefile ($(objpfx)iconv-test.out): Use
$(test-wrapper-env) and $(run-program-env).
* iconvdata/run-iconv-test.sh: Use test_wrapper_env and
run_program_env arguments.
* iconvdata/tst-table.sh: Do not set GCONV_PATH explicitly.
* intl/Makefile ($(objpfx)tst-gettext.out): Use
$(test-program-prefix-before-env), $(run-program-env) and
$(test-program-prefix-after-env).
($(objpfx)tst-gettext2.out): Likewise.
* intl/tst-gettext.sh: Use test_program_prefix_before_env,
run_program_env and test_program_prefix_after_env arguments.
* intl/tst-gettext2.sh: Likewise.
* intl/tst-gettext4.sh: Do not set environment variables
explicitly.
* intl/tst-gettext6.sh: Likewise.
* intl/tst-translit.sh: Likewise.
* malloc/Makefile ($(objpfx)tst-mtrace.out): Use
$(test-program-prefix-before-env), $(run-program-env) and
$(test-program-prefix-after-env).
* malloc/tst-mtrace.sh: Use test_program_prefix_before_env,
run_program_env and test_program_prefix_after_env arguments.
* math/Makefile (run-regen-ulps): Use $(run-program-env).
* nptl/Makefile ($(objpfx)tst-tls6.out): Use $(run-program-env).
* nptl/tst-tls6.sh: Use run_program_env argument. Set LANG=C
explicitly with each use of ${test_wrapper_env}.
* posix/Makefile ($(objpfx)wordexp-tst.out): Use
$(test-program-prefix-before-env), $(run-program-env) and
$(test-program-prefix-after-env).
* posix/tst-getconf.sh: Do not set environment variables
explicitly.
* posix/wordexp-tst.sh: Use test_program_prefix_before_env,
run_program_env and test_program_prefix_after_env arguments.
* stdio-common/tst-printf.sh: Do not set environment variables
explicitly.
* stdlib/Makefile ($(objpfx)tst-fmtmsg.out): Use
$(test-program-prefix-before-env), $(run-program-env) and
$(test-program-prefix-after-env).
* stdlib/tst-fmtmsg.sh: Use test_program_prefix_before_env,
run_program_env and test_program_prefix_after_env arguments.
Split $test calls into $test_pre and $test.
* timezone/Makefile (build-testdata): Use
$(built-program-cmd-before-env), $(run-program-env) and
$(built-program-cmd-after-env).
localedata/ChangeLog:
* Makefile ($(addprefix $(objpfx),$(CTYPE_FILES))): Use
$(built-program-cmd-before-env), $(run-program-env) and
$(built-program-cmd-after-env).
($(objpfx)sort-test.out): Use $(test-program-prefix-before-env),
$(run-program-env) and $(test-program-prefix-after-env).
($(objpfx)tst-fmon.out): Use $(run-program-prefix-before-env),
$(run-program-env) and $(run-program-prefix-after-env).
($(objpfx)tst-locale.out): Use $(built-program-cmd-before-env),
$(run-program-env) and $(built-program-cmd-after-env).
($(objpfx)tst-trans.out): Use $(run-program-prefix-before-env),
$(run-program-env), $(run-program-prefix-after-env),
$(test-program-prefix-before-env) and
$(test-program-prefix-after-env).
($(objpfx)tst-ctype.out): Use $(test-program-cmd-before-env),
$(run-program-env) and $(test-program-cmd-after-env).
($(objpfx)tst-wctype.out): Likewise.
($(objpfx)tst-langinfo.out): Likewise.
($(objpfx)tst-langinfo-static.out): Likewise.
* gen-locale.sh: Use localedef_before_env, run_program_env and
localedef_after_env arguments.
* sort-test.sh: Use test_program_prefix_before_env,
run_program_env and test_program_prefix_after_env arguments.
* tst-ctype.sh: Use tst_ctype_before_env, run_program_env and
tst_ctype_after_env arguments.
* tst-fmon.sh: Use run_program_prefix_before_env, run_program_env
and run_program_prefix_after_env arguments.
* tst-langinfo.sh: Use tst_langinfo_before_env, run_program_env
and tst_langinfo_after_env arguments.
* tst-locale.sh: Use localedef_before_env, run_program_env and
localedef_after_env arguments.
* tst-mbswcs.sh: Do not set environment variables explicitly.
* tst-numeric.sh: Likewise.
* tst-rpmatch.sh: Likewise.
* tst-trans.sh: Use run_program_prefix_before_env,
run_program_env, run_program_prefix_after_env,
test_program_prefix_before_env and test_program_prefix_after_env
arguments.
* tst-wctype.sh: Use tst_wctype_before_env, run_program_env and
tst_wctype_after_env arguments.
glibc's Makeconfig defines some variables such as $(libm) and $(libdl)
for linking with libraries built by glibc, and nptl/Makeconfig
(included by the toplevel Makeconfig) defines others such as
$(shared-thread-library).
In some places glibc's Makefiles use those variables when linking
against the relevant libraries, but in other places they hardcode the
location of the libraries in the build tree. This patch cleans up
various places to use the variables that already exist (in the case of
libm, replacing several duplicate definitions of a $(link-libm)
variable in subdirectory Makefiles). (It's not necessarily exactly
equivalent to what the existing code does - in particular,
$(shared-thread-library) includes libpthread_nonshared, but is
replacing places that just referred to libpthread.so. But I think
that change is desirable on the general principle of linking things as
close as possible to the way in which they would be linked with an
installed library, unless there is a clear reason not to do so.)
To support running tests with an installed copy of glibc without
needing the full build tree from when that copy was built, I think it
will be useful to use such variables more generally and systematically
- every time the rules for building a test refer to some file from the
build tree that's also installed by glibc, use a makefile variable so
that the installed-testing case can point those variables to installed
copies of the files. This patch just deals with straightforward cases
where such variables already exist.
It's quite possible some uses of $(shared-thread-library) should
actually be a new $(thread-library) variable that's set appropriately
in the --disable-shared case, if those uses would in fact work without
shared libraries. I didn't change the status quo that those cases
hardcode use of a shared library whether or not it's actually needed
(but other uses such as $(libm) and $(libdl) would now get the static
library if the shared library isn't built, when some previously
hardcoded use of the shared library - if they actually need shared
libraries, the test itself needs an enable-shared conditional anyway).
Tested x86_64.
* benchtests/Makefile
($(addprefix $(objpfx)bench-,$(bench-math))): Depend on $(libm),
not $(common-objpfx)math/libm.so.
($(addprefix $(objpfx)bench-,$(bench-pthread))): Depend on
$(shared-thread-library), not $(common-objpfx)nptl/libpthread.so.
* elf/Makefile ($(objpfx)noload): Depend on $(libdl), not
$(common-objpfx)dlfcn/libdl.so.
($(objpfx)tst-audit8): Depend on $(libm), not
$(common-objpfx)math/libm.so.
* malloc/Makefile ($(objpfx)libmemusage.so): Depend on $(libdl),
not $(common-objpfx)dlfcn/libdl.so.
* math/Makefile
($(addprefix $(objpfx),$(filter-out $(tests-static),$(tests)))):
Depend on $(libm), not $(objpfx)libm.so. Do not condition on
[$(build-shared) = yes].
($(objpfx)test-fenv-tls): Depend on $(shared-thread-library), not
$(common-objpfx)nptl/libpthread.so.
* misc/Makefile ($(objpfx)tst-tsearch): Depend on $(libm), not
$(common-objpfx)math/libm.so$(libm.so-version) or
$(common-objpfx)math/libm.a depending on [$(build-shared) = yes].
* nptl/Makefile ($(objpfx)tst-unload): Depend on $(libdl), not
$(common-objpfx)dlfcn/libdl.so.
* setjmp/Makefile (link-libm): Remove variable.
($(objpfx)tst-setjmp-fp): Depend on $(libm), not $(link-libm).
* stdio-common/Makefile (link-libm): Remove variable.
($(objpfx)tst-printf-round): Depend on $(libm), not $(link-libm).
* stdlib/Makefile (link-libm): Remove variable.
($(objpfx)bug-getcontext): Depend on $(libm), not $(link-libm).
($(objpfx)tst-strtod-round): Likewise.
($(objpfx)tst-tininess): Likewise.
($(objpfx)tst-strtod-underflow): Likewise.
($(objpfx)tst-strtod6): Likewise.
($(objpfx)tst-tls-atexit): Depend on $(shared-thread-library) and
$(libdl), not $(common-objpfx)nptl/libpthread.so and
$(common-objpfx)dlfcn/libdl.so.
Using -lm and -lpthread results in the shared objects in the system
being used to link against. This happened to work for libm because
there haven't been any changes to the libm ABI recently that could
break the existing benchmarks. This doesn't always work for the
pthread benchmarks. The correct way to build against libraries in the
build directory is to have the binaries explicitly depend on them so
that $(+link) can pick them up.
Add a small library to print JSON values and use it to improve the
readability of the benchmark output and the readability of the
benchmark code.
ChangeLog:
2014-04-11 Will Newton <will.newton@linaro.org>
* benchtests/Makefile (extra-objs): Add json-lib.o.
(bench-func): Tidy up JSON output.
* benchtests/bench-skeleton.c: Include json-lib.h.
(main): Use JSON library functions to do output of
benchmark results.
* benchtests/bench-timing-type.c (main): Output the
timing type simply, leaving formatting to the user.
* benchtests/json-lib.c: New file.
* benchtests/json-lib.h: Likewise.
We have a single thread that runs a no-op initialization once and then
repeatedly runs checks of the initialization (i.e., an acquire load and
conditional jump) in a tight loop. This gives us, on average, the
best-case latency of pthread_once (the initialization is the
exactly-once slow path, and we're not looking at initialization-related
synchronization overheads in this case).
Without this flag it is possible that the compiler will optimize
away the calls to ffs/ffsll.
ChangeLog:
2014-04-01 Will Newton <will.newton@linaro.org>
* benchtests/Makefile (CFLAGS-bench-ffs.c): Add
-fno-builtin. (CFLAGS-bench-ffsll.c): Likewise.
Add benchtests for ffs and ffsll. There is no benchtest for ffsl as
it is identical to one of the other functions.
2014-03-31 Will Newton <will.newton@linaro.org>
* benchtests/Makefile (bench): Add ffs and ffsll to list
of tests.
* benchtests/ffs-inputs: New file.
* benchtests/ffsll-inputs: Likewise.
This patch adds an option to get detailed benchmark output for
functions. Invoking the benchmark with 'make DETAILED=1 bench' causes
each benchmark program to store a mean execution time for each input
it works on. This is useful to give a more comprehensive picture of
performance of functions compared to just the single mean figure.
This patch changes the output format of the main benchmark output file
(bench.out) to an extensible format. I chose JSON over XML because in
addition to being extensible, it is also not too verbose.
Additionally it has good support in python.
The significant change I have made in terms of functionality is to put
timing information as an attribute in JSON instead of a string and to
do that, there is a separate program that prints out a JSON snippet
mentioning the type of timing (hp_timing or clock_gettime). The mean
timing has now changed from iterations per unit to actual timing per
iteration.
In <https://sourceware.org/ml/libc-alpha/2014-01/msg00196.html> I
noted it was necessary to add includes of Makeconfig early in various
subdirectory makefiles for the tests-special variable settings added
by that patch to be conditional on configuration information. No-one
commented on the general question there of whether Makeconfig should
always be included immediately after the definition of subdir.
This patch implements that early inclusion of Makeconfig in each
directory (which is a lot easier than consistent placement of includes
of Rules). Includes are added if needed, or moved up if already
present. Subdirectory "all:" targets are removed, since Makeconfig
provides one.
There is potential for further cleanups I haven't done. Rules and
Makerules have code such as
ifneq "$(findstring env,$(origin headers))" ""
headers :=
endif
to override to empty any value of various variables that came from the
environment. I think there is a case for Makeconfig setting all the
subdirectory variables (other than subdir) to empty to ensure no
outside value is going to take effect if a subdirectory fails to
define a variable. (A list of such variables, possibly out of date
and incomplete, is in manual/maint.texi.) Rules and Makerules would
give errors if Makeconfig hadn't already been included, instead of
including it themselves. The special code to override values coming
from the environment would then be obsolete and could be removed.
Tested x86_64, including that installed binaries are identical before
and after the patch.
* argp/Makefile: Include Makeconfig immediately after defining
subdir.
* assert/Makefile: Likewise.
* benchtests/Makefile: Likewise.
* catgets/Makefile: Likewise.
* conform/Makefile: Likewise.
* crypt/Makefile: Likewise.
* csu/Makefile: Likewise.
(all): Remove target.
* ctype/Makefile: Include Makeconfig immediately after defining
subdir.
* debug/Makefile: Likewise.
* dirent/Makefile: Likewise.
* dlfcn/Makefile: Likewise.
* gmon/Makefile: Likewise.
* gnulib/Makefile: Likewise.
* grp/Makefile: Likewise.
* gshadow/Makefile: Likewise.
* hesiod/Makefile: Likewise.
* hurd/Makefile: Likewise.
(all): Remove target.
* iconvdata/Makefile: Include Makeconfig immediately after
defining subdir.
* inet/Makefile: Likewise.
* intl/Makefile: Likewise.
* io/Makefile: Likewise.
* libio/Makefile: Likewise.
(all): Remove target.
* locale/Makefile: Include Makeconfig immediately after defining
subdir.
* login/Makefile: Likewise.
* mach/Makefile: Likewise.
(all): Remove target.
* malloc/Makefile: Include Makeconfig immediately after defining
subdir.
(all): Remove target.
* manual/Makefile: Include Makeconfig immediately after defining
subdir.
* math/Makefile: Likewise.
* misc/Makefile: Likewise.
* nis/Makefile: Likewise.
* nss/Makefile: Likewise.
* po/Makefile: Likewise.
(all): Remove target.
* posix/Makefile: Include Makeconfig immediately after defining
subdir.
* pwd/Makefile: Likewise.
* resolv/Makefile: Likewise.
* resource/Makefile: Likewise.
* rt/Makefile: Likewise.
* setjmp/Makefile: Likewise.
* shadow/Makefile: Likewise.
* signal/Makefile: Likewise.
* socket/Makefile: Likewise.
* soft-fp/Makefile: Likewise.
* stdio-common/Makefile: Likewise.
* stdlib/Makefile: Likewise.
* streams/Makefile: Likewise.
* string/Makefile: Likewise.
* sunrpc/Makefile: Likewise.
(all): Remove target.
* sysvipc/Makefile: Include Makeconfig immediately after defining
subdir.
* termios/Makefile: Likewise.
* time/Makefile: Likewise.
* timezone/Makefile: Likewise.
(all): Remove target.
* wcsmbs/Makefile: Include Makeconfig immediately after defining
subdir.
* wctype/Makefile: Likewise.
libidn/ChangeLog:
* Makefile: Include Makeconfig immediately after defining subdir.
localedata/ChangeLog:
* Makefile: Include Makeconfig immediately after defining subdir.
(all): Remove target.
nptl/ChangeLog:
* Makefile: Include Makeconfig immediately after defining subdir.
nptl_db/ChangeLog:
* Makefile: Include Makeconfig immediately after defining subdir.
This patch adds some more directives to the benchmark inputs file,
moving functionality from the Makefile and making the code generation
script a bit cleaner. The function argument and return types that
were earlier added as variables in the makefile and passed to the
script via command line arguments are now the 'args' and 'ret'
directive respectively. 'args' should be a colon separated list of
argument types (skipped if the function doesn't accept any arguments)
and 'ret' should be the return type.
Additionally, an 'includes' directive may have a comma separated list
of headers to include in the source. For example, the pow input file
now looks like this:
42.0, 42.0
1.0000000000000020, 1.5
I did this to unclutter the benchtests Makefile a bit and eventually
eliminate dependency of the tests on the Makefile and have tests
depend on their respective include files only.
The benchmark for memcpy got disabled accidentally. Re-enable it.
ChangeLog:
2013-09-06 Will Newton <will.newton@linaro.org>
* benchtests/Makefile (string-bench): Add memcpy.
LDFLAGS puts the library too early in the command line if --as-needed
is being used. Use LDLIBS instead.
ChangeLog:
2013-09-04 Will Newton <will.newton@linaro.org>
* benchtests/Makefile: Use LDLIBS instead of LDFLAGS.
This is the initial support for string function performance tests,
along with copying tests for memcpy and memcpy-ifunc as proof of
concept. The string function benchmarks perform operations at
different alignments and for different sizes and compare performance
between plain operations and the optimized string operations. Due to
this their output is incompatible with the function benchmarks where
we're interested in fastest time, throughput, etc.
In future, the correctness checks in the benchmark tests can be
removed. Same goes for the performance measurements in the
string/test-*.
When setting BENCH_DURATION in CPPFLAGS-nonlib, append to the variable
instead of assigning to it, to avoid overwriting earlier set flags,
notably the -DNOT_IN_libc=1 flag.
HP_TIMING uses native timestamping instructions if available, thus
greatly reducing the overhead of recording start and end times for
function calls. For architectures that don't have HP_TIMING
available, we fall back to the clock_gettime bits. One may also
override this by invoking the benchmark as follows:
make USE_CLOCK_GETTIME=1 bench
and get the benchmark results using clock_gettime. One has to do
`make bench-clean` to ensure that the benchmark programs are rebuilt.
Some math functions have distinct performance characteristics in
specific domains of inputs, where some inputs return via a fast path
while other inputs require multiple precision calculations, that too
at different precision levels. The way to implement different domains
was to have a separate source file and benchmark definition, resulting
in separate programs.
This clutters up the benchmark, so this change allows these domains to
be consolidated into the same input file. To do this, the input file
format is now enhanced to allow comments with a preceding # and
directives with two # at the begining of a line. A directive that
looks like:
tells the benchmark generation script that what follows is a different
domain of inputs. The value of the 'name' directive (in this case,
foo) is used in the output. The two input domains are then executed
sequentially and their results collated separately. with the above
directive, there would be two lines in the result that look like:
func(): ....
func(foo): ...
The idea to run benchmarks for a constant number of iterations is
problematic. While the benchmarks may run for 10 seconds on x86_64,
they could run for about 30 seconds on powerpc and worse, over 3
minutes on arm. Besides that, adding a new benchmark is cumbersome
since one needs to find out the number of iterations needed for a
sufficient runtime.
A better idea would be to run each benchmark for a specific amount of
time. This patch does just that. The run time defaults to 10 seconds
and it is configurable at command line:
make BENCH_DURATION=5 bench
Appending benchmark program output on every run could result in a case
where the benchmark run was cancelled, resulting in a partially
written file. This file gets used again on the next run, resulting in
results being appended to old results.
It could have been possible to remove the file before every benchmark
run, but it is easier to just write the output to bench.out-tmp only
once.
Benchmark programs are generated using parameters from the Makefile,
so it is necessary to rebuild them whenever the parameters in the
Makefile are updated. Hence, added a dependency for the generated C
source on the Makefile so that it gets regenerated when the Makefile
is updated.
Separate benchmarks for the fast and slow implementations of pow and
exp since measuring both together doesn't make sense. Adjust the
iterations for pow and exp accordingly so that they run long enough
for the measurements to be meaningful.
The branch prediction hints is actually hurts performance in this case.
The assembly implementation make two assumptions: 1. 'fabs (x) < 2^52'
is unlikely and 2. 'x > 0.0' is unlike (if 1. is true). Since it a
general floating point function, expected input is not bounded and then
it is better to let the hardware handle the branches.