Commit Graph

140 Commits

Author SHA1 Message Date
H.J. Lu
dc98eeeb95 benchtests: Add benches for bzero
Add bench-bzero-large.c, bench-bzero-walk.c and bench-bzero.c.
2022-02-08 14:41:58 -08:00
H.J. Lu
03c9c4fce4 benchtests: Sort benches in Makefile
Put one bench per line and sort them.
2022-02-07 07:09:38 -08:00
Paul Eggert
581c785bf3 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 7061 files FOO.

I then removed trailing white space from math/tgmath.h,
support/tst-support-open-dev-null-range.c, and
sysdeps/x86_64/multiarch/strlen-vec.S, to work around the following
obscure pre-commit check failure diagnostics from Savannah.  I don't
know why I run into these diagnostics whereas others evidently do not.

remote: *** 912-#endif
remote: *** 913:
remote: *** 914-
remote: *** error: lines with trailing whitespace found
...
remote: *** error: sysdeps/unix/sysv/linux/statx_cp.c: trailing lines
2022-01-01 11:40:24 -08:00
Sunil K Pandey
2856829ee7 Revert "benchtests: Add acosf function to bench-math"
This reverts commit 79d0fc6539.
2021-11-05 16:13:12 -07:00
Adhemerval Zanella
b8a6ee43bb benchtests: Add hypotf
Based on random input arguments.  About 85% tuples have exponents
of the two arguments close together (+-1 range).

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-01 16:23:39 -03:00
Sunil K Pandey
79d0fc6539 benchtests: Add acosf function to bench-math
Add acosf function to bench-math and copy acosf-inputs to benchtests.
Motivation for this patch is to prepare for upcoming libmvec new
functions.  Float and double version of libmvec functions stays
together.

acosf-inputs file generated from acos-inputs file using following
scaling formula:

f = d * (FLT_MAX/DBL_MAX)

Where d is input(double) and f is output(float).  If scaled float value
is duplicate in new input file, nextafterf() function used to find next
float value, ensuring no duplicates.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-10-29 08:52:30 -07:00
Noah Goldstein
cf3acd774f Benchtests: Add benchtests for __memcmpeq
No bug. This commit adds __memcmpeq benchmarks. The benchmarks just
use the existing ones in memcmp. This will be useful for testing
implementations of __memcmpeq that do not just alias memcmp.
2021-10-27 13:03:46 -05:00
H.J. Lu
d8e7d06381 bench-math: Sort and put each bench per line
Sort and put each math bench per line to prepare for new math benches.

Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-10-23 05:20:25 -07:00
H.J. Lu
de0a7c5a0b benchtests: Building benchmarks as static executables
Building benchmarks as static executables:
=========================================

To build benchmarks as static executables, on the build system, run:

  $ make STATIC-BENCHTESTS=yes bench-build

You can copy benchmark executables to another machine and run them
without copying the source nor build directories.
2021-10-04 10:09:13 -07:00
Paul Zimmermann
8d0985b055 add workload traces for cbrtl
These workload traces cover the whole "long double" range.
This patch was prepared with the help of Adhemerval Zanella.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-05-10 18:45:34 +02:00
Paul Zimmermann
934d88d862 add workload traces for missing functions (double format)
This patch adds workload traces for all double format functions where such
files are missing.  For each function, a set of 1000 random values is
generated at random using SageMath, such that the output values are
meaningful (for example avoiding too large inputs for exp10 where the
output would be +Inf).  More details about the generated values are
given at the beginning of each file.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
2021-03-29 16:23:19 +02:00
Raphael Moreira Zinsly
6cf1911122 benchtests: Add ilogb* tests
Add a benchtest to ilogb, ilogbf and ilogbf128 based on the logb* benchtests.
2021-03-16 12:19:09 -03:00
Arjun Shankar
3725ee39db benchtests: Do not build bench-timing-type with MODULE_NAME=libc
Since commit 2682695e5c, `make bench-build' with `--enable-static-pie'
fails due to bench-timing-type being incorrectly built with MODULE_NAME
set to `libc'.  This commit sets MODULE_NAME to nonlib, thus fixing the
build failure.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-01-26 18:14:19 +01:00
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
DJ Delorie
4be44c3208 New benchtest: pthread locks
Performance benchmarks for various posix locks: mutex, rwlock,
spinlock, condvar, and semaphore.  Each test is performed with
an empty loop body or with a computationally "interesting" (i.e.
difficult to optimize away, and used just to allow lock code to
be "hidden" in the filler's CPU cycles).
2020-10-21 11:03:52 -04:00
Arjun Shankar
03e26098b1 benchtests: Run _Float128 tests only on architectures that support it
__float128 is a non-standard name and is not available on some architectures
(like aarch64 or s390x) even though they may support the standard _Float128
type.  Other architectures (like armv7) don't support quad-precision
floating-point operations at all.

This commit replaces benchtests references to __float128 with _Float128 and
runs the corresponding tests only on architectures that support it.
2020-09-23 16:11:57 +02:00
Paul Zimmermann
26fbd74059 benchtests: Add "workload" traces for sinf128
This patch adds workload traces for sinf128 in binary32.  The trace is
made of 1000 random numbers, generated with SageMath.
2020-09-10 15:25:22 -03:00
Paul Zimmermann
e24b248dcb benchtests: Add "workload" traces for powf128
This patch adds workload traces for pow in binary128.  The trace is
made of 1000 random numbers, generated with SageMath.
2020-09-10 15:25:22 -03:00
Paul Zimmermann
abc9732aee benchtests: Add "workload" traces for expf128
This patch adds workload traces for exp in binary128.  The trace is
made of 1000 random numbers, generated with SageMath.
2020-09-10 15:25:22 -03:00
Adhemerval Zanella
2004063fb4 benchtests: Add exp10f benchmark
It is based on expf one by converting each line with the formula:

  new_val = (float) log10 (exp ((double) old_val))
2020-06-19 10:48:15 -03:00
H.J. Lu
e52434a2e4 benchtests: Restore the clock_gettime option
commit 7621e38bf3
Author: Wilco Dijkstra <Wilco.Dijkstra@arm.com>
Date:   Tue Jan 29 17:43:45 2019 +0000

    Add generic hp-timing support

removed the clock_gettime option.  Restore the clock_gettime option for
some x86 CPUs on which value from RDTSC may not be incremented at a fixed
rate.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-06-05 09:48:07 -07:00
Shen-Ta Hsieh
642d5abaf1 Add benchtests for roundeven and roundevenf.
This patch adds benchtests for the roundeven and roundevenf functions.
The inputs are copied from trunc-inputs.
2020-03-27 23:24:02 +00:00
Joseph Myers
92ce43eef7 Run bench-timing-type with newly built libc.
benchtests/timing-type is built with the newly built libc, so should
be run with it like actual tests and benchmarks.
2020-01-20 11:29:41 +00:00
Joseph Myers
d614a75396 Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
Paul Eggert
5a82c74822 Prefer https to http for gnu.org and fsf.org URLs
Also, change sources.redhat.com to sourceware.org.
This patch was automatically generated by running the following shell
script, which uses GNU sed, and which avoids modifying files imported
from upstream:

sed -ri '
  s,(http|ftp)(://(.*\.)?(gnu|fsf|sourceware)\.org($|[^.]|\.[^a-z])),https\2,g
  s,(http|ftp)(://(.*\.)?)sources\.redhat\.com($|[^.]|\.[^a-z]),https\2sourceware.org\4,g
' \
  $(find $(git ls-files) -prune -type f \
      ! -name '*.po' \
      ! -name 'ChangeLog*' \
      ! -path COPYING ! -path COPYING.LIB \
      ! -path manual/fdl-1.3.texi ! -path manual/lgpl-2.1.texi \
      ! -path manual/texinfo.tex ! -path scripts/config.guess \
      ! -path scripts/config.sub ! -path scripts/install-sh \
      ! -path scripts/mkinstalldirs ! -path scripts/move-if-change \
      ! -path INSTALL ! -path  locale/programs/charmap-kw.h \
      ! -path po/libc.pot ! -path sysdeps/gnu/errlist.c \
      ! '(' -name configure \
            -execdir test -f configure.ac -o -f configure.in ';' ')' \
      ! '(' -name preconfigure \
            -execdir test -f preconfigure.ac ';' ')' \
      -print)

and then by running 'make dist-prepare' to regenerate files built
from the altered files, and then executing the following to cleanup:

  chmod a+x sysdeps/unix/sysv/linux/riscv/configure
  # Omit irrelevant whitespace and comment-only changes,
  # perhaps from a slightly-different Autoconf version.
  git checkout -f \
    sysdeps/csky/configure \
    sysdeps/hppa/configure \
    sysdeps/riscv/configure \
    sysdeps/unix/sysv/linux/csky/configure
  # Omit changes that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/powerpc/powerpc64/ppc-mcount.S: trailing lines
  git checkout -f \
    sysdeps/powerpc/powerpc64/ppc-mcount.S \
    sysdeps/unix/sysv/linux/s390/s390-64/syscall.S
  # Omit change that caused a pre-commit check to fail like this:
  # remote: *** error: sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S: last line does not end in newline
  git checkout -f sysdeps/sparc/sparc64/multiarch/memcpy-ultra3.S
2019-09-07 02:43:31 -07:00
Adhemerval Zanella
0cccd37f70 benchtests: Add logb{f} benchmark
* benchtests/Makefile (bench-math): Add logb.
	* benchtests/logb-inputs: New file.
	* benchtests/logbf-inputs: New file.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-07-08 17:22:22 -03:00
Adhemerval Zanella
f215dbbdf1 benchtests: hypot benchmark
Inputs are based on argument reductions from generic and powerpc
implementation.

	* benchtests/Makefile (bench-math): Add hypot.
	* benchtests/hypot-inputs: New file.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-07-08 17:14:04 -03:00
Adhemerval Zanella
2731a326b1 benchtests: Add isnan/isinf/isfinite benchmark
* benchtests/Makefile (bench-math): Add isnan, isinf, and isfinite.
	(CFLAGS-bench-isnan.c, CFLAGS-bench-isinf.c,
	CFLAGS-bench-isfinite.c): New rule.
	* benchtests/isnan-input: New file.
	* benchtests/isinf-input: New file.
	* benchtests/isfinite-input: New file.

Reviewed-by: Gabriel F. T. Gomes <gabrielftg@linux.ibm.com>
2019-06-12 11:46:30 -03:00
Florian Weimer
b5ffdc48c2 benchtests: Enable BIND_NOW if configured with --enable-bind-now
Benchmarks should reflect distribution build policies, so it makes
sense to honor the BIND_NOW configuration for them.

This commit keeps using $(+link-tests), so that the benchmarks are
linked according to the --enable-hardcoded-path-in-tests configure
option.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2019-04-25 10:41:52 +02:00
Wilco Dijkstra
fe92a91f1e Reduce benchtests time
Reduce the total time taken by benchtests.  The malloc thread test takes 4
minutes to run which is significantly more than most other tests. Reduce
this to a more reasonable 40 seconds.  The math tests take 10 seconds each,
eventhough all they do is loop on the same input.  Anything more than 1
second runtime is way overkill, so set the limit to 1 second.

	* benchtests/Makefile (BENCH_DURATION): Set to 1 second.
	* benchtests/bench-malloc-thread.c (BENCH_DURATION): Set to 10 seconds.
2019-04-24 15:38:49 +01:00
Wilco Dijkstra
7621e38bf3 Add generic hp-timing support
Add missing generic hp_timing support.  It uses clock_gettime (CLOCK_MONOTONIC)
which has unspecified starting time, nano-second accuracy, and should faster on
architectures that implementes the symbol as vDSO.

Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. I also
checked the builds for all afected ABIs.

	* benchtests/Makefile (USE_CLOCK_GETTIME) Remove.
	* benchtests/README: Update description.
	* benchtests/bench-timing.h: Default to hp-timing.
	* sysdeps/generic/hp-timing.h (HP_TIMING_DIFF, HP_TIMING_ACCUM_NT,
	HP_TIMING_PRINT): Remove.
	(HP_TIMING_NOW): Add generic implementation.
	(hp_timing_t): Change to uint64_t.
2019-03-22 17:30:44 -03:00
Wilco Dijkstra
3904fd85d3 Add malloc micro benchmark
Add a malloc micro benchmark to enable accurate testing of the
various paths in malloc and free.  The benchmark does a varying
number of allocations of a given block size, then frees them again.

It tests 3 different scenarios: single-threaded using main arena,
multi-threaded using thread-arena, main arena with SINGLE_THREAD_P
false.

	* benchtests/Makefile: Add malloc-simple benchmark.
	* benchtests/bench-malloc-simple.c: New benchmark.
2019-02-14 16:37:11 +00:00
Wilco Dijkstra
16f87cfd63 String benchtest cleanup
Continue cleanup of the string benchtests.  Remove simplistic
byte-oriented versions with faster generic implementations.
Remove bcopy/bzero benchmarks (bcopy/bzero are obsolete and never
emitted by compilers).  Remove builtin versions of memcpy, memset
and strlen.  Remove all remaining "stupid" implementations given
they are always slower than the "simple" variants and thus don't
add anything useful.

	* benchtests/bench-strcasecmp.c (stupid_strcasecmp): Remove.
	* benchtests/bench-strcasestr.c (stupid_strcasestr): Remove.
	* benchtests/bench-strchr.c (stupid_strchr): Remove.
	* benchtests/bench-strcmp.c (stupid_strcmp): Remove.
	* benchtests/bench-strcspn.c (stupid_strcspn): Remove.
	* benchtests/bench-strlen.c (builtin_strlen): Remove.
	* benchtests/bench-strncasecmp.c (stupid_strncasecmp): Remove.
	* benchtests/bench-strncmp.c (stupid_strncmp): Remove.
	* benchtests/bench-strpbrk.c (stupid_strpbrk): Remove.
	* benchtests/bench-strspn.c (stupid_strspn): Remove.
	* benchtests/Makefile: Remove bench-bcopy.c and bench-bzero.c.
	* benchtests/bench-bcopy.c: Delete file.
	* benchtests/bench-bzero.c: Likewise.
	* benchtests/bench-memccpy.c (stupid_memccpy): Remove.
	(simple_memccpy): Remove.
	(generic_memccpy): Add function.
	* benchtests/bench-memcpy.c: (builtin_memcpy): Remove.
	* benchtests/bench-memmove.c (simple_bcopy): Remove.
	* benchtests/bench-mempcpy.c (simple_mempcpy): Remove.
	(generic_mempcpy): Add new function.
	* benchtests/bench-memset.c (simple_bzero): Remove.
	(builtin_bzero): Remove.
	(builtin_memset): Remove.
	* benchtests/bench-rawmemchr.c (simple_rawmemchr): Remove.
	(generic_rawmemchr): Add new function.
2019-02-12 17:19:51 +00:00
Joseph Myers
04277e02d7 Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2019-01-01 00:11:28 +00:00
Joseph Myers
c6982f7efc Patch to require Python 3.4 or later to build glibc.
This patch makes Python 3.4 or later a required tool for building
glibc, so allowing changes of awk, perl etc. code used in the build
and test to Python code without any such changes needing makefile
conditionals or to handle older Python versions.

This patch makes the configure test for Python check the version and
give an error if Python is missing or too old, and removes makefile
conditionals that are no longer needed.  It does not itself convert
any code from another language to Python, and does not remove any
compatibility with older Python versions from existing scripts.

Tested for x86_64.

	* configure.ac (PYTHON_PROG): Use AC_CHECK_PROG_VER.  Set
	critic_missing for versions before 3.4.
	* configure: Regenerated.
	* manual/install.texi (Tools for Compilation): Document
	requirement for Python to build glibc.
	* INSTALL: Regenerated.
	* Rules [PYTHON]: Make code unconditional.
	* benchtests/Makefile [PYTHON]: Likewise.
	* conform/Makefile [PYTHON]: Likewise.
	* manual/Makefile [PYTHON]: Likewise.
	* math/Makefile [PYTHON]: Likewise.
2018-10-29 15:28:05 +00:00
H.J. Lu
7cc65773f0 x86: Support RDTSCP for benchtests
RDTSCP waits until all previous instructions have executed and all
previous loads are globally visible before reading the counter.  RDTSC
doesn't wait until all previous instructions have been executed before
reading the counter.  All x86 processors since 2010 support RDTSCP
instruction.  This patch adds RDTSCP support to benchtests.

	* benchtests/Makefile (CPPFLAGS-nonlib): Add -DUSE_RDTSCP if
	USE_RDTSCP is defined.
	* sysdeps/x86/hp-timing.h (HP_TIMING_NOW): Use RDTSCP if
	USE_RDTSCP is defined.
2018-10-24 02:19:34 -07:00
Wilco Dijkstra
f1c8185d34 Use correct includes in benchtests
Currently the benchtests are run with internal GLIBC headers, which is incorrect.
Defining _ISOMAC in the makefile ensures the internal headers are bypassed.
Fix all tests which were relying on internal defines or includes.

	* benchtests/Makefile: Define _ISOMAC.
	* benchtests/bench-strcoll.c: Add missing sys/stat.h include.
	* benchtests/bench-string.h: Define inhibit_loop_to_libcall macro.
	* benchtests/bench-strstr.c: Define empty libc_hidden_builtin_def.
	* benchtests/bench-strtok.c (oldstrtok): Use rawmemchr.
	* benchtests/bench-timing.h: Define attribute_hidden.
2018-03-15 15:44:58 +00:00
Joseph Myers
688903eb3e Update copyright dates with scripts/update-copyrights.
* All files with FSF copyright notices: Update copyright dates
	using scripts/update-copyrights.
	* locale/programs/charmap-kw.h: Regenerated.
	* locale/programs/locfile-kw.h: Likewise.
2018-01-01 00:32:25 +00:00
Victor Rodriguez
0422ed1e84 benchtests: Enable BENCHSET to run subset of tests
This patch adds BENCHSET variable to benchtests/Makefile in order to
provide the capability to run a list of subsets of benchmark tests, ie;

    make bench BENCHSET="bench-pthread bench-math malloc-thread"

This helps users to benchmark specific glibc area

ChangeLog:

        * benchtests/Makefile:Add BENCHSET to allow subsets of
        benchmarks to be run.
        * benchtests/README: Add documentation for: Running subsets of
        benchmarks.

Signed-off-by: Victor Rodriguez <victor.rodriguez.bahena@intel.com>
Signed-off-by: Icarus Sparry <icarus.w.sparry@intel.com>
Reviewed-By: Siddhesh Poyarekar <siddhesh@sourceware.org>
2017-11-28 19:57:46 +05:30
Rajalakshmi Srinivasaraghavan
077ee12978 Benchtests for sinf, cosf and sincosf
Numbers used from cos and sin inputs.
2017-10-13 14:19:45 +05:30
Siddhesh Poyarekar
5bfb04042d benchtests: Memory walking benchmark for memmove
This benchmark is an attempt to eliminate cache effects from string
benchmarks.  The benchmark walks both ways through a large memory area
and copies different sizes of memory and alignments one at a time
instead of looping around in the same memory area.  This is a good
metric to have alongside the simple memmove benchmark (which is only
really useful for smaller sizes) especially for larger sizes where the
likelihood of the call being done only once is pretty high.

This benchmark is different from memcpy in that it also tests
overlapping copies.

	* benchtests/bench-memmove-walk.c: New file.
	* benchtests/Makefile (string-benchset): Add it.
2017-10-05 22:20:23 +05:30
Siddhesh Poyarekar
36bb8edf51 benchtests: Memory walking benchmark for memset
This benchmark is an attempt to eliminate cache effects from string
benchmarks.  The benchmark walks backward through a large memory area
and sets different sizes of memory and alignments one at a time
instead of looping around in the same memory area.  This is a good
metric to have alongside the simple memset benchmark (which is only
really useful for smaller sizes) especially for larger sizes where the
likelihood of the call being done only once is pretty high.

	* benchtests/bench-memset-walk.c: New file.
	* benchtests/Makefile (string-benchset): Add it.
2017-10-05 22:20:23 +05:30
Siddhesh Poyarekar
9ec87fd2b1 benchtests: Memory walking benchmark for memcpy
This benchmark is an attempt to eliminate cache effects from string
benchmarks.  The benchmark walks both ways through a large memory area
and copies different sizes of memory and alignments one at a time
instead of looping around in the same memory area.  This is a good
metric to have alongside the other memcpy benchmarks, especially for
larger sizes where the likelihood of the call being done only once is
pretty high.

	* benchtests/bench-memcpy-walk.c: New file.
	* benchtests/Makefile (string-benchset): Add it.
2017-10-05 22:20:23 +05:30
Szabolcs Nagy
0525ce4850 Add exp2f and log2f benchmark trace
exp2f and log2f benchmark traces are just copies of the existing
expf and logf traces from wrf_r.

	* benchtests/Makefile: Add exp2f and log2f benchmarks.
	* benchtests/exp2f-inputs: Copy of expf-inputs.
	* benchtests/log2f-inputs: Copy of logf-inputs.
2017-09-20 10:04:12 +01:00
Wilco Dijkstra
a5dcc87e77 Add logf trace
Add a trace for logf.  This is a reduced trace based on 2.8 billion
samples extracted from wrf_r.

	* benchtests/Makefile: Add logf benchmark.
	* benchtests/logf-inputs: Add reduced trace from wrf_r.
2017-09-19 15:14:46 +01:00
Wilco Dijkstra
7024d5446d Add expf trace
Add a trace for expf.  This is a reduced trace based on 2.4 billion
samples extracted from wrf_r.

	* benchtests/Makefile: Add expf benchmark.
	* benchtests/expf-inputs: Add reduced trace from wrf_r.
2017-09-19 15:14:18 +01:00
Joseph Myers
eb375def3d Add benchtests for trunc and truncf.
This patch adds benchtests for the trunc and truncf functions.  The
inputs listed are fairly arbitrary; I do not assert they are
representative of any particular application.

	* benchtests/Makefile (bench-math): Add trunc and truncf.
	(CFLAGS-bench-trunc.c): New variable.
	(CFLAGS-bench-truncf.c): Likewise.
	* benchtests/trunc-inputs: New file.
	* benchtests/truncf-inputs: Likewise.
2017-09-19 12:59:01 +00:00
Florian Weimer
4504783c0f benchtests: Do not compile benchmark objects as libc modules [BZ #21864]
Otherwise, this will lead to link failures due to hidden symbol
references.
2017-08-21 19:28:54 +02:00
Paul Clarke
4cedcaea8d Add powf bench tests
Add powf() bench test with input which covers these cases:
- positive base to positive exponent
- exponent 0
- negative base to even exponent
- exponent 1
- exponent -1
- squared
- squareroot
- 1 to negative exponent
- -1 to negative exponent
- base 0
- -1 to even exponent
- small base
- small exponent

	* benchtests/Makefile (bench-math): Add powf.
	* benchtests/powf-inputs: New file.
2017-06-20 10:14:42 -03:00
Adhemerval Zanella
0edbf12301 nptl: Invert the mmap/mprotect logic on allocated stacks (BZ#18988)
Current allocate_stack logic for create stacks is to first mmap all
the required memory with the desirable memory and then mprotect the
guard area with PROT_NONE if required.  Although it works as expected,
it pessimizes the allocation because it requires the kernel to actually
increase commit charge (it counts against the available physical/swap
memory available for the system).

The only issue is to actually check this change since side-effects are
really Linux specific and to actually account them it would require a
kernel specific tests to parse the system wide information.  On the kernel
I checked /proc/self/statm does not show any meaningful difference for
vmm and/or rss before and after thread creation.  I could only see
really meaningful information checking on system wide /proc/meminfo
between thread creation: MemFree, MemAvailable, and Committed_AS shows
large difference without the patch.  I think trying to use these
kind of information on a testcase is fragile.

The BZ#18988 reports shows that the commit pages are easily seen with
mlockall (MCL_FUTURE) (with lock all pages that become mapped in the
process) however a more straighfoward testcase shows that pthread_create
could be faster using this patch:

--
static const int inner_count = 256;
static const int outer_count = 128;

static
void *thread1(void *arg)
{
  return NULL;
}

static
void *sleeper(void *arg)
{
  pthread_t ts[inner_count];
  for (int i = 0; i < inner_count; i++)
    pthread_create (&ts[i], &a, thread1, NULL);
  for (int i = 0; i < inner_count; i++)
    pthread_join (ts[i], NULL);

  return NULL;
}

int main(void)
{
  pthread_attr_init(&a);
  pthread_attr_setguardsize(&a, 1<<20);
  pthread_attr_setstacksize(&a, 1134592);

  pthread_t ts[outer_count];
  for (int i = 0; i < outer_count; i++)
    pthread_create(&ts[i], &a, sleeper, NULL);
  for (int i = 0; i < outer_count; i++)
    pthread_join(ts[i], NULL);
    assert(r == 0);
  }
  return 0;
}

--

On x86_64 (4.4.0-45-generic, gcc 5.4.0) running the small benchtests
I see:

$ time ./test

real	0m3.647s
user	0m0.080s
sys	0m11.836s

While with the patch I see:

$ time ./test

real	0m0.696s
user	0m0.040s
sys	0m1.152s

So I added a pthread_create benchtest (thread_create) which check
the thread creation latency.  As for the simple benchtests, I saw
improvements in thread creation on all architectures I tested the
change.

Checked on x86_64-linux-gnu, i686-linux-gnu, aarch64-linux-gnu,
arm-linux-gnueabihf, powerpc64le-linux-gnu, sparc64-linux-gnu,
and sparcv9-linux-gnu.

	[BZ #18988]
	* benchtests/thread_create-inputs: New file.
	* benchtests/thread_create-source.c: Likewise.
	* support/xpthread_attr_setguardsize.c: Likewise.
	* support/Makefile (libsupport-routines): Add
	xpthread_attr_setguardsize object.
	* support/xthread.h: Add xpthread_attr_setguardsize prototype.
	* benchtests/Makefile (bench-pthread): Add thread_create.
	* nptl/allocatestack.c (allocate_stack): Call mmap with PROT_NONE and
	then mprotect the required area.
2017-06-14 17:22:35 -03:00