This patch is an updated version of
<https://sourceware.org/ml/libc-alpha/2014-01/msg00198.html> and
<https://sourceware.org/ml/libc-alpha/2014-03/msg00180.html>.
Normal practice for software testsuites is that rather than
terminating immediately when a test fails, they continue running and
report at the end on how many tests passed or failed.
The principle behind the glibc testsuite stopping on failure was
probably that the expected state is no failures and so any failure
indicates a problem such as miscompilation. In practice, while this
is fairly close to true for native testing on x86_64 and x86 (kernel
bugs and race conditions can still cause intermittent failures), it's
less likely to be the case on other platforms, and so people testing
glibc run the testsuite with "make -k" and then examine the logs to
determine whether the failures are what they expect to fail on that
platform, possibly with some automation for the comparison.
This patch switches the glibc testsuite to the normal convention of
not stopping on failure - unless you use stop-on-test-failure=y, in
which case it behaves essentially as it did before (and does not
generate overall test summaries on failure). Instead, the summary
tests.sum may contain tests that FAILed. At the end of the test run,
any FAIL or ERROR lines from tests.sum are printed, and then it exits
with error status if there were any such lines. In addition, build
failures will also cause the test run to stop - this has the
justification that those *do* indicate serious problems that should be
promptly fixed and aren't generally hard to fix (but apart from that,
avoiding the build stopping on those failures seems harder).
Note that unlike the previous patches in this series, this *does*
require people with automation around testing glibc to change their
processes - either to start using tests.sum / xtests.sum to track
failures and compare them with expectations (with or without also
using "make -k" and examining "make" logs to identify build failures),
or else to use stop-on-test-failure=y and ignore the new tests.sum /
xtests.sum mechanism. (If all you check is the exit status from "make
check", no changes are needed unless you want to avoid test runs
continuing after the first failure.)
Tested x86_64.
* scripts/evaluate-test.sh: Handle fourth argument to determine
whether test run should stop on failure.
* Makeconfig (stop-on-test-failure): New variable.
(evaluate-test): Pass fourth argument to evaluate-test.sh based on
$(stop-on-test-failure).
* Makefile (tests): Give a summary of results from testing and
exit with failure status if they include an ERROR or FAIL.
(xtests): Likewise.
* manual/install.texi (Configuring and compiling): Mention
stop-on-test-failure=y.
* INSTALL: Regenerated.
This patch, an updated version of
<https://sourceware.org/ml/libc-alpha/2014-01/msg00197.html>, makes
testsuite runs generate an overall summary of test results.
A new script merge-test-results.sh deals both with collecting results
within a directory to a file with all the results from that directory,
and collecting the results from subdirectories into a single overall
file (there's not much in common between the two modes of operation of
the script, but it seemed silly to have two separate scripts for
this). Within a directory, missing results produce UNRESOLVED lines;
at top level, missing results for a whole directory produce an ERROR
line (since toplevel can't identify what the specific missing tests
are in this case).
Note that this does not change the rules for when "make" considers
there has been an error, or terminates, so unexpected failures will
still cause make to terminate, or, with -k, mean the commands for
"tests" don't get run because of failure of a dependency.
Tested x86_64, including that the summary does in fact reflect all the
tests with .test-result files.
* scripts/merge-test-results.sh: New file.
* Makefile (tests-special-notdir): New variable.
(tests): Run merge-test-results.sh.
(xtests): Likewise.
* Rules (tests-special-notdir): New variable.
(xtests-special-notdir): Likewise.
(tests): Run merge-test-results.sh
(xtests): Likewise.
This patch, an updated version of
<https://sourceware.org/ml/libc-alpha/2014-01/msg00195.html>, makes it
possible for .test-result files for individual tests to contain XPASS
and XFAIL rather than PASS and FAIL in cases where failure is
expected. This replaces the marking of two individual tests with "-"
to cause them to be expected at makefile level to fail;
evaluate-test.sh will ensure it exits with status 0 for an expected
failure.
Tested x86_64.
* scripts/evaluate-test.sh: Take new argument indicating whether
failure is expected.
* Makeconfig (evaluate-test): Pass argument to evaluate-test.sh
indicating whether failure is expected.
* conform/Makefile (test-xfail-run-conformtest): New variable.
($(objpfx)run-conformtest.out): Don't expect to fail at makefile
level.
* posix/Makefile (test-xfail-annexc): New variable.
($(objpfx)annexc.out): Don't expect to fail at makefile level.
This patch, an updated version of
<https://sourceware.org/ml/libc-alpha/2014-01/msg00193.html>, starts
the process of generating explicit PASS or FAIL status for individual
glibc tests. It's based on Tomas Dohnalek's patch
<https://sourceware.org/ml/libc-alpha/2012-10/msg00278.html>, but is
deliberately more minimal: it doesn't try to cover any tests outside
of $(tests) / $(xtests) (that's for a later patch), nor does it put
the result together in an overall summary file (again, a later patch):
it just generates the .test-result files.
Thus, this patch keeps the overall logic for when a testsuite run
finishes completely unchanged: a test failing will terminate the run.
I think we *should* move to a more conventional approach where plain
"make check" does not terminate for an individual test failure, unless
e.g. you say "make stop-on-test-failure=y check", but that sort of
policy change is best done as a separate patch once the infrastructure
is in place to generate summary files for completed test runs (which
will entirely consist of PASS and XFAIL lines if the testsuite run
reaches the point of generating them, until such a policy change is
made).
Tested x86_64.
2014-02-14 Tomas Dohnalek <tdohnale@redhat.com>
Joseph Myers <joseph@codesourcery.com>
* Makeconfig (test-name): New variable.
(evaluate-test): Likewise.
* Makerules (do-test-clean): Remove .test-result files.
(common-mostlyclean): Likewise.
* Rules ($(objpfx)%.out): Use $(evaluate-test) in both rules.
* scripts/evaluate-test.sh: New file.
This patch updates various miscellaneous files we take from upstream
GNU sources (texinfo.texi, config.guess, config.sub - various others
haven't changed upstream since we last updated them) to their current
upstream versions.
Tested x86_64.
* manual/texinfo.tex: Update to version 2013-11-26.10 with
trailing whitespace removed.
* scripts/config.guess: Update to version 2013-11-29.
* scripts/config.sub: Update to version 2013-10-01.
`volatile int` means the same as 'int volatile', but that's not the
case for 'volatile char *' and 'char * volatile'. We won't need a
'char volatile *' or other complicated semantics for now.
This patch adds the ability to accept output arguments to functions
being benchmarked, by nesting the argument type in <> in the args
directive. It includes the sincos implementation as an example, where
the function would have the following args directive:
## args: double:<double *>:<double *>
This simply adds a definition for a static variable whose pointer gets
passed into the function, so it's not yet possible to pass something
more complicated like a pre-allocated string or array. That would be
a good feature to add if a function needs it.
The values in the input file will map only to the input arguments. So
if I had a directive like this for a function foo:
## args: int:<int *>:int:<int *>
and I have a value list like this:
1, 2
3, 4
5, 6
then the function calls generated would be:
foo (1, &out1, 2, &out2);
foo (3, &out1, 4, &out2);
foo (5, &out1, 6, &out2);
This adds the "include-sources" directive to scripts/bench.pl. This
allows for including source code (vs including headers, which might get
a different search path) after the inclusion of any headers.
This patch adds some more directives to the benchmark inputs file,
moving functionality from the Makefile and making the code generation
script a bit cleaner. The function argument and return types that
were earlier added as variables in the makefile and passed to the
script via command line arguments are now the 'args' and 'ret'
directive respectively. 'args' should be a colon separated list of
argument types (skipped if the function doesn't accept any arguments)
and 'ret' should be the return type.
Additionally, an 'includes' directive may have a comma separated list
of headers to include in the source. For example, the pow input file
now looks like this:
42.0, 42.0
1.0000000000000020, 1.5
I did this to unclutter the benchtests Makefile a bit and eventually
eliminate dependency of the tests on the Makefile and have tests
depend on their respective include files only.
Resolves: #15424
The compiler would optimize the benchmark function call out of the
loop and call it only once, resulting in blazingly fast times for some
benchmarks (notably atan, sin and cos). Mark the inputs as volatile
so that the code is forced to read again from the input for each
iteration.
Some math functions have distinct performance characteristics in
specific domains of inputs, where some inputs return via a fast path
while other inputs require multiple precision calculations, that too
at different precision levels. The way to implement different domains
was to have a separate source file and benchmark definition, resulting
in separate programs.
This clutters up the benchmark, so this change allows these domains to
be consolidated into the same input file. To do this, the input file
format is now enhanced to allow comments with a preceding # and
directives with two # at the begining of a line. A directive that
looks like:
tells the benchmark generation script that what follows is a different
domain of inputs. The value of the 'name' directive (in this case,
foo) is used in the output. The two input domains are then executed
sequentially and their results collated separately. with the above
directive, there would be two lines in the result that look like:
func(): ....
func(foo): ...
The idea to run benchmarks for a constant number of iterations is
problematic. While the benchmarks may run for 10 seconds on x86_64,
they could run for about 30 seconds on powerpc and worse, over 3
minutes on arm. Besides that, adding a new benchmark is cumbersome
since one needs to find out the number of iterations needed for a
sufficient runtime.
A better idea would be to run each benchmark for a specific amount of
time. This patch does just that. The run time defaults to 10 seconds
and it is configurable at command line:
make BENCH_DURATION=5 bench
The tile architecture's Linux port installs headers in an
<arch> directory; these headers are in part shared with glibc.
Ignore these headers for check-local-headers like we ignore
all the other Linux headers.