glibc/manual
Patrick McGehearty 8d730cb25a Reversing calculation of __x86_shared_non_temporal_threshold
The __x86_shared_non_temporal_threshold determines when memcpy on x86
uses non_temporal stores to avoid pushing other data out of the last
level cache.

This patch proposes to revert the calculation change made by H.J. Lu's
patch of June 2, 2017.

H.J. Lu's patch selected a threshold suitable for a single thread
getting maximum performance. It was tuned using the single threaded
large memcpy micro benchmark on an 8 core processor. The last change
changes the threshold from using 3/4 of one thread's share of the
cache to using 3/4 of the entire cache of a multi-threaded system
before switching to non-temporal stores. Multi-threaded systems with
more than a few threads are server-class and typically have many
active threads. If one thread consumes 3/4 of the available cache for
all threads, it will cause other active threads to have data removed
from the cache. Two examples show the range of the effect. John
McCalpin's widely parallel Stream benchmark, which runs in parallel
and fetches data sequentially, saw a 20% slowdown with this patch on
an internal system test of 128 threads. This regression was discovered
when comparing OL8 performance to OL7.  An example that compares
normal stores to non-temporal stores may be found at
https://vgatherps.github.io/2018-09-02-nontemporal/.  A simple test
shows performance loss of 400 to 500% due to a failure to use
nontemporal stores. These performance losses are most likely to occur
when the system load is heaviest and good performance is critical.

The tunable x86_non_temporal_threshold can be used to override the
default for the knowledgable user who really wants maximum cache
allocation to a single thread in a multi-threaded system.
The manual entry for the tunable has been expanded to provide
more information about its purpose.

	modified: sysdeps/x86/cacheinfo.c
	modified: manual/tunables.texi

(cherry picked from commit d3c5702747)
(Conflicts in sysdeps/x86/cacheinfo.c due to missing
rep_movsb_threshold, x86_rep_stosb_threshold tunables.)
2020-10-30 13:03:00 +01:00
..
examples Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
argp.texi manual: Complete @standards in argp.texi. 2017-06-16 01:19:30 -07:00
arith.texi Make totalorder and totalordermag functions take pointer arguments. 2019-08-15 15:18:34 +00:00
charset.texi Fix missing @ before texinfo command 2018-04-06 08:56:24 +02:00
check-safety.sh Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
conf.texi [manual] Job control is no longer optional. 2018-10-17 14:10:51 -04:00
contrib.texi Add more contributors to the manual 2020-02-01 17:16:54 +05:30
creature.texi Add feature test macro _ISOC2X_SOURCE. 2019-08-13 11:26:00 +00:00
crypt.texi manual: Revise crypt.texi. 2018-06-29 16:53:37 +02:00
ctype.texi manual: Replace summary.awk with summary.pl. 2017-06-15 21:26:20 -07:00
debug.texi Add manual documentation for threads.h 2018-07-24 14:07:31 -03:00
dir
errno.texi hurd: Fix errno* generation 2018-10-31 10:32:39 +01:00
fdl-1.3.texi
filesys.texi Revise the documentation of simple calendar time. 2019-10-30 17:11:10 -03:00
freemanuals.texi Prefer https to http for gnu.org and fsf.org URLs 2019-09-07 02:43:31 -07:00
getopt.texi manual: Replace summary.awk with summary.pl. 2017-06-15 21:26:20 -07:00
header.texi manual: Replace summary.awk with summary.pl. 2017-06-15 21:26:20 -07:00
install-plain.texi
install.texi Update newest tested versions of dependencies in install.texi 2020-02-01 17:16:54 +05:30
intro.texi manual: Revise crypt.texi. 2018-06-29 16:53:37 +02:00
io.texi
ipc.texi
job.texi manual: Use @code{errno} instead of @var{errno} [BZ #24063] 2019-01-07 11:42:04 +01:00
lang.texi manual: Rewrite the section on widths of integer types. 2017-08-10 20:28:28 -07:00
lgpl-2.1.texi
libc-texinfo.sh Remove add-ons mechanism. 2017-10-05 15:58:13 +00:00
libc.texinfo Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
libcbook.texi
libdl.texi
llio.texi Revise the documentation of simple calendar time. 2019-10-30 17:11:10 -03:00
locale.texi Use STRFMON_LDBL_IS_DBL instead of __ldbl_is_dbl. 2018-11-16 09:21:14 -02:00
macros.texi manual: Replace summary.awk with summary.pl. 2017-06-15 21:26:20 -07:00
maint.texi Fix the manual for old texinfo 2019-01-04 11:45:13 +00:00
Makefile Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
math.texi manual: Use @code{errno} instead of @var{errno} [BZ #24063] 2019-01-07 11:42:04 +01:00
memory.texi Fix tst-pkey.c pkey_alloc return checks and manual 2020-01-17 09:05:03 -03:00
message.texi manual: Use @code{errno} instead of @var{errno} [BZ #24063] 2019-01-07 11:42:04 +01:00
nss.texi nss: Make nsswitch.conf more distribution friendly. 2019-08-19 09:56:59 -04:00
nsswitch.texi
pattern.texi manual: Replace summary.awk with summary.pl. 2017-06-15 21:26:20 -07:00
pipe.texi manual: Replace summary.awk with summary.pl. 2017-06-15 21:26:20 -07:00
platform.texi manual: Fix a syntax error. 2018-02-16 08:21:47 -08:00
probes.texi malloc: tcache double free check 2018-11-20 13:24:09 -05:00
process.texi Linux: Add gettid system call wrapper [BZ #6399] 2019-02-08 11:27:55 +01:00
README.pretty-printers Use gen-as-const.py to process .pysym files. 2018-12-10 22:56:59 +00:00
README.tunables Rename the glibc.tune namespace to glibc.cpu 2018-08-02 23:49:19 +05:30
resource.texi manual: Document lack of conformance of sched_* functions [BZ #14829] 2019-02-02 14:15:27 +01:00
search.texi manual: Adjust twalk_r documentation. 2019-05-14 15:56:56 -04:00
setjmp.texi manual: Use @code{errno} instead of @var{errno} [BZ #24063] 2019-01-07 11:42:04 +01:00
signal.texi Linux: Add the tgkill function 2019-05-14 22:55:51 +02:00
socket.texi manual: Update struct sockaddr_in, struct sockaddr_sin6 description 2019-02-01 14:15:50 +01:00
startup.texi manual: Remove warning in the documentation of the abort function 2019-10-11 20:15:24 +02:00
stdio-fp.c
stdio.texi manual: clarify fopen with the x flag 2019-12-11 09:40:02 -08:00
string.texi Disallow use of DES encryption functions in new programs. 2018-06-29 16:53:18 +02:00
summary.pl Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
sysinfo.texi manual: Replace summary.awk with summary.pl. 2017-06-15 21:26:20 -07:00
syslog.texi manual: Replace summary.awk with summary.pl. 2017-06-15 21:26:20 -07:00
terminal.texi Remove obsolete, never-implemented XSI STREAMS declarations 2019-03-14 15:44:15 +01:00
texinfo.tex Update miscellaneous files from upstream sources. 2019-01-01 00:52:59 +00:00
texis.awk
threads.texi nptl: Fix niggles with pthread_clockjoin_np 2019-11-04 16:44:49 -03:00
time.texi Revise the documentation of simple calendar time. 2019-10-30 17:11:10 -03:00
tsort.awk Update copyright dates with scripts/update-copyrights. 2020-01-01 00:14:33 +00:00
tunables.texi Reversing calculation of __x86_shared_non_temporal_threshold 2020-10-30 13:03:00 +01:00
users.texi login: Remove utmp backend jump tables [BZ #23518] 2019-08-05 15:55:05 +02:00
xtract-typefun.awk

			TUNABLE FRAMEWORK
			=================

Tunables is a feature in the GNU C Library that allows application authors and
distribution maintainers to alter the runtime library behaviour to match their
workload.

The tunable framework allows modules within glibc to register variables that
may be tweaked through an environment variable.  It aims to enforce a strict
namespace rule to bring consistency to naming of these tunable environment
variables across the project.  This document is a guide for glibc developers to
add tunables to the framework.

ADDING A NEW TUNABLE
--------------------

The TOP_NAMESPACE macro is defined by default as 'glibc'.  If distributions
intend to add their own tunables, they should do so in a different top
namespace by overriding the TOP_NAMESPACE macro for that tunable.  Downstream
implementations are discouraged from using the 'glibc' top namespace for
tunables they don't already have consensus to push upstream.

There are three steps to adding a tunable:

1. Add a tunable to the list and fully specify its properties:

For each tunable you want to add, make an entry in elf/dl-tunables.list.  The
format of the file is as follows:

TOP_NAMESPACE {
  NAMESPACE1 {
    TUNABLE1 {
      # tunable attributes, one per line
    }
    # A tunable with default attributes, i.e. string variable.
    TUNABLE2
    TUNABLE3 {
      # its attributes
    }
  }
  NAMESPACE2 {
    ...
  }
}

The list of allowed attributes are:

- type:			Data type.  Defaults to STRING.  Allowed types are:
			INT_32, UINT_64, SIZE_T and STRING.  Numeric types may
			be in octal or hexadecimal format too.

- minval:		Optional minimum acceptable value.  For a string type
			this is the minimum length of the value.

- maxval:		Optional maximum acceptable value.  For a string type
			this is the maximum length of the value.

- default:		Specify an optional default value for the tunable.

- env_alias:		An alias environment variable

- security_level:	Specify security level of the tunable.  Valid values:

			SXID_ERASE: (default) Don't read for AT_SECURE binaries and
				    removed so that child processes can't read it.
			SXID_IGNORE: Don't read for AT_SECURE binaries, but retained for
				     non-AT_SECURE subprocesses.
			NONE: Read all the time.

2. Use TUNABLE_GET/TUNABLE_SET to get and set tunables.

3. OPTIONAL: If tunables in a namespace are being used multiple times within a
   specific module, set the TUNABLE_NAMESPACE macro to reduce the amount of
   typing.

GETTING AND SETTING TUNABLES
----------------------------

When the TUNABLE_NAMESPACE macro is defined, one may get tunables in that
module using the TUNABLE_GET macro as follows:

  val = TUNABLE_GET (check, int32_t, TUNABLE_CALLBACK (check_callback))

where 'check' is the tunable name, 'int32_t' is the C type of the tunable and
'check_callback' is the function to call if the tunable got initialized to a
non-default value.  The macro returns the value as type 'int32_t'.

The callback function should be defined as follows:

  void
  TUNABLE_CALLBACK (check_callback) (int32_t *valp)
  {
  ...
  }

where it can expect the tunable value to be passed in VALP.

Tunables in the module can be updated using:

  TUNABLE_SET (check, int32_t, val)

where 'check' is the tunable name, 'int32_t' is the C type of the tunable and
'val' is a value of same type.

To get and set tunables in a different namespace from that module, use the full
form of the macros as follows:

  val = TUNABLE_GET_FULL (glibc, cpu, hwcap_mask, uint64_t, NULL)

  TUNABLE_SET_FULL (glibc, cpu, hwcap_mask, uint64_t, val)

where 'glibc' is the top namespace, 'cpu' is the tunable namespace and the
remaining arguments are the same as the short form macros.

When TUNABLE_NAMESPACE is not defined in a module, TUNABLE_GET is equivalent to
TUNABLE_GET_FULL, so you will need to provide full namespace information for
both macros.  Likewise for TUNABLE_SET and TUNABLE_SET_FULL.

** IMPORTANT NOTE **

The tunable list is set as read-only after the dynamic linker relocates itself,
so setting tunable values must be limited only to tunables within the dynamic
linker, that too before relocation.

FUTURE WORK
-----------

The framework currently only allows a one-time initialization of variables
through environment variables and in some cases, modification of variables via
an API call.  A future goals for this project include:

- Setting system-wide and user-wide defaults for tunables through some
  mechanism like a configuration file.

- Allow tweaking of some tunables at runtime