mirror of
https://sourceware.org/git/glibc.git
synced 2025-01-11 20:00:07 +00:00
6310e6be9b
This patch does not have any functionality change, we only provide a spin count tunes for pthread adaptive spin mutex. The tunable glibc.pthread.mutex_spin_count tunes can be used by system administrator to squeeze system performance according to different hardware capabilities and workload characteristics. The maximum value of spin count is limited to 32767 to avoid the overflow of mutex->__data.__spins variable with the possible type of short in pthread_mutex_lock (). The default value of spin count is set to 100 with the reference to the previous number of times of spinning via trylock. This value would be architecture-specific and can be tuned with kinds of benchmarks to fit most cases in future. I would extend my appreciation sincerely to H.J.Lu for his help to refine this patch series. * manual/tunables.texi (POSIX Thread Tunables): New node. * nptl/Makefile (libpthread-routines): Add pthread_mutex_conf. * nptl/nptl-init.c: Include pthread_mutex_conf.h (__pthread_initialize_minimal_internal) [HAVE_TUNABLES]: Call __pthread_tunables_init. * nptl/pthreadP.h (MAX_ADAPTIVE_COUNT): Remove. (max_adaptive_count): Define. * nptl/pthread_mutex_conf.c: New file. * nptl/pthread_mutex_conf.h: New file. * sysdeps/generic/adaptive_spin_count.h: New file. * sysdeps/nptl/dl-tunables.list: New file. * nptl/pthread_mutex_lock.c (__pthread_mutex_lock): Use max_adaptive_count () not MAX_ADAPTIVE_COUNT. * nptl/pthread_mutex_timedlock.c (__pthrad_mutex_timedlock): Likewise. Suggested-by: Andi Kleen <andi.kleen@intel.com> Reviewed-by: Carlos O'Donell <carlos@redhat.com> Signed-off-by: Kemi.wang <kemi.wang@intel.com>
414 lines
17 KiB
Plaintext
414 lines
17 KiB
Plaintext
@node Tunables
|
|
@c @node Tunables, , Internal Probes, Top
|
|
@c %MENU% Tunable switches to alter libc internal behavior
|
|
@chapter Tunables
|
|
@cindex tunables
|
|
|
|
@dfn{Tunables} are a feature in @theglibc{} that allows application authors and
|
|
distribution maintainers to alter the runtime library behavior to match
|
|
their workload. These are implemented as a set of switches that may be
|
|
modified in different ways. The current default method to do this is via
|
|
the @env{GLIBC_TUNABLES} environment variable by setting it to a string
|
|
of colon-separated @var{name}=@var{value} pairs. For example, the following
|
|
example enables malloc checking and sets the malloc trim threshold to 128
|
|
bytes:
|
|
|
|
@example
|
|
GLIBC_TUNABLES=glibc.malloc.trim_threshold=128:glibc.malloc.check=3
|
|
export GLIBC_TUNABLES
|
|
@end example
|
|
|
|
Tunables are not part of the @glibcadj{} stable ABI, and they are
|
|
subject to change or removal across releases. Additionally, the method to
|
|
modify tunable values may change between releases and across distributions.
|
|
It is possible to implement multiple `frontends' for the tunables allowing
|
|
distributions to choose their preferred method at build time.
|
|
|
|
Finally, the set of tunables available may vary between distributions as
|
|
the tunables feature allows distributions to add their own tunables under
|
|
their own namespace.
|
|
|
|
@menu
|
|
* Tunable names:: The structure of a tunable name
|
|
* Memory Allocation Tunables:: Tunables in the memory allocation subsystem
|
|
* Elision Tunables:: Tunables in elision subsystem
|
|
* POSIX Thread Tunables:: Tunables in the POSIX thread subsystem
|
|
* Hardware Capability Tunables:: Tunables that modify the hardware
|
|
capabilities seen by @theglibc{}
|
|
@end menu
|
|
|
|
@node Tunable names
|
|
@section Tunable names
|
|
@cindex Tunable names
|
|
@cindex Tunable namespaces
|
|
|
|
A tunable name is split into three components, a top namespace, a tunable
|
|
namespace and the tunable name. The top namespace for tunables implemented in
|
|
@theglibc{} is @code{glibc}. Distributions that choose to add custom tunables
|
|
in their maintained versions of @theglibc{} may choose to do so under their own
|
|
top namespace.
|
|
|
|
The tunable namespace is a logical grouping of tunables in a single
|
|
module. This currently holds no special significance, although that may
|
|
change in the future.
|
|
|
|
The tunable name is the actual name of the tunable. It is possible that
|
|
different tunable namespaces may have tunables within them that have the
|
|
same name, likewise for top namespaces. Hence, we only support
|
|
identification of tunables by their full name, i.e. with the top
|
|
namespace, tunable namespace and tunable name, separated by periods.
|
|
|
|
@node Memory Allocation Tunables
|
|
@section Memory Allocation Tunables
|
|
@cindex memory allocation tunables
|
|
@cindex malloc tunables
|
|
@cindex tunables, malloc
|
|
|
|
@deftp {Tunable namespace} glibc.malloc
|
|
Memory allocation behavior can be modified by setting any of the
|
|
following tunables in the @code{malloc} namespace:
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.check
|
|
This tunable supersedes the @env{MALLOC_CHECK_} environment variable and is
|
|
identical in features.
|
|
|
|
Setting this tunable to a non-zero value enables a special (less
|
|
efficient) memory allocator for the malloc family of functions that is
|
|
designed to be tolerant against simple errors such as double calls of
|
|
free with the same argument, or overruns of a single byte (off-by-one
|
|
bugs). Not all such errors can be protected against, however, and memory
|
|
leaks can result. Any detected heap corruption results in immediate
|
|
termination of the process.
|
|
|
|
Like @env{MALLOC_CHECK_}, @code{glibc.malloc.check} has a problem in that it
|
|
diverges from normal program behavior by writing to @code{stderr}, which could
|
|
by exploited in SUID and SGID binaries. Therefore, @code{glibc.malloc.check}
|
|
is disabled by default for SUID and SGID binaries. This can be enabled again
|
|
by the system administrator by adding a file @file{/etc/suid-debug}; the
|
|
content of the file could be anything or even empty.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.top_pad
|
|
This tunable supersedes the @env{MALLOC_TOP_PAD_} environment variable and is
|
|
identical in features.
|
|
|
|
This tunable determines the amount of extra memory in bytes to obtain from the
|
|
system when any of the arenas need to be extended. It also specifies the
|
|
number of bytes to retain when shrinking any of the arenas. This provides the
|
|
necessary hysteresis in heap size such that excessive amounts of system calls
|
|
can be avoided.
|
|
|
|
The default value of this tunable is @samp{0}.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.perturb
|
|
This tunable supersedes the @env{MALLOC_PERTURB_} environment variable and is
|
|
identical in features.
|
|
|
|
If set to a non-zero value, memory blocks are initialized with values depending
|
|
on some low order bits of this tunable when they are allocated (except when
|
|
allocated by calloc) and freed. This can be used to debug the use of
|
|
uninitialized or freed heap memory. Note that this option does not guarantee
|
|
that the freed block will have any specific values. It only guarantees that the
|
|
content the block had before it was freed will be overwritten.
|
|
|
|
The default value of this tunable is @samp{0}.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.mmap_threshold
|
|
This tunable supersedes the @env{MALLOC_MMAP_THRESHOLD_} environment variable
|
|
and is identical in features.
|
|
|
|
When this tunable is set, all chunks larger than this value in bytes are
|
|
allocated outside the normal heap, using the @code{mmap} system call. This way
|
|
it is guaranteed that the memory for these chunks can be returned to the system
|
|
on @code{free}. Note that requests smaller than this threshold might still be
|
|
allocated via @code{mmap}.
|
|
|
|
If this tunable is not set, the default value is set to @samp{131072} bytes and
|
|
the threshold is adjusted dynamically to suit the allocation patterns of the
|
|
program. If the tunable is set, the dynamic adjustment is disabled and the
|
|
value is set as static.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.trim_threshold
|
|
This tunable supersedes the @env{MALLOC_TRIM_THRESHOLD_} environment variable
|
|
and is identical in features.
|
|
|
|
The value of this tunable is the minimum size (in bytes) of the top-most,
|
|
releasable chunk in an arena that will trigger a system call in order to return
|
|
memory to the system from that arena.
|
|
|
|
If this tunable is not set, the default value is set as 128 KB and the
|
|
threshold is adjusted dynamically to suit the allocation patterns of the
|
|
program. If the tunable is set, the dynamic adjustment is disabled and the
|
|
value is set as static.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.mmap_max
|
|
This tunable supersedes the @env{MALLOC_MMAP_MAX_} environment variable and is
|
|
identical in features.
|
|
|
|
The value of this tunable is maximum number of chunks to allocate with
|
|
@code{mmap}. Setting this to zero disables all use of @code{mmap}.
|
|
|
|
The default value of this tunable is @samp{65536}.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.arena_test
|
|
This tunable supersedes the @env{MALLOC_ARENA_TEST} environment variable and is
|
|
identical in features.
|
|
|
|
The @code{glibc.malloc.arena_test} tunable specifies the number of arenas that
|
|
can be created before the test on the limit to the number of arenas is
|
|
conducted. The value is ignored if @code{glibc.malloc.arena_max} is set.
|
|
|
|
The default value of this tunable is 2 for 32-bit systems and 8 for 64-bit
|
|
systems.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.arena_max
|
|
This tunable supersedes the @env{MALLOC_ARENA_MAX} environment variable and is
|
|
identical in features.
|
|
|
|
This tunable sets the number of arenas to use in a process regardless of the
|
|
number of cores in the system.
|
|
|
|
The default value of this tunable is @code{0}, meaning that the limit on the
|
|
number of arenas is determined by the number of CPU cores online. For 32-bit
|
|
systems the limit is twice the number of cores online and on 64-bit systems, it
|
|
is 8 times the number of cores online.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.tcache_max
|
|
The maximum size of a request (in bytes) which may be met via the
|
|
per-thread cache. The default (and maximum) value is 1032 bytes on
|
|
64-bit systems and 516 bytes on 32-bit systems.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.tcache_count
|
|
The maximum number of chunks of each size to cache. The default is 7.
|
|
There is no upper limit, other than available system memory. If set
|
|
to zero, the per-thread cache is effectively disabled.
|
|
|
|
The approximate maximum overhead of the per-thread cache is thus equal
|
|
to the number of bins times the chunk count in each bin times the size
|
|
of each chunk. With defaults, the approximate maximum overhead of the
|
|
per-thread cache is approximately 236 KB on 64-bit systems and 118 KB
|
|
on 32-bit systems.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.malloc.tcache_unsorted_limit
|
|
When the user requests memory and the request cannot be met via the
|
|
per-thread cache, the arenas are used to meet the request. At this
|
|
time, additional chunks will be moved from existing arena lists to
|
|
pre-fill the corresponding cache. While copies from the fastbins,
|
|
smallbins, and regular bins are bounded and predictable due to the bin
|
|
sizes, copies from the unsorted bin are not bounded, and incur
|
|
additional time penalties as they need to be sorted as they're
|
|
scanned. To make scanning the unsorted list more predictable and
|
|
bounded, the user may set this tunable to limit the number of chunks
|
|
that are scanned from the unsorted list while searching for chunks to
|
|
pre-fill the per-thread cache with. The default, or when set to zero,
|
|
is no limit.
|
|
@end deftp
|
|
|
|
@node Elision Tunables
|
|
@section Elision Tunables
|
|
@cindex elision tunables
|
|
@cindex tunables, elision
|
|
|
|
@deftp {Tunable namespace} glibc.elision
|
|
Contended locks are usually slow and can lead to performance and scalability
|
|
issues in multithread code. Lock elision will use memory transactions to under
|
|
certain conditions, to elide locks and improve performance.
|
|
Elision behavior can be modified by setting the following tunables in
|
|
the @code{elision} namespace:
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.elision.enable
|
|
The @code{glibc.elision.enable} tunable enables lock elision if the feature is
|
|
supported by the hardware. If elision is not supported by the hardware this
|
|
tunable has no effect.
|
|
|
|
Elision tunables are supported for 64-bit Intel, IBM POWER, and z System
|
|
architectures.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.elision.skip_lock_busy
|
|
The @code{glibc.elision.skip_lock_busy} tunable sets how many times to use a
|
|
non-transactional lock after a transactional failure has occurred because the
|
|
lock is already acquired. Expressed in number of lock acquisition attempts.
|
|
|
|
The default value of this tunable is @samp{3}.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.elision.skip_lock_internal_abort
|
|
The @code{glibc.elision.skip_lock_internal_abort} tunable sets how many times
|
|
the thread should avoid using elision if a transaction aborted for any reason
|
|
other than a different thread's memory accesses. Expressed in number of lock
|
|
acquisition attempts.
|
|
|
|
The default value of this tunable is @samp{3}.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.elision.skip_lock_after_retries
|
|
The @code{glibc.elision.skip_lock_after_retries} tunable sets how many times
|
|
to try to elide a lock with transactions, that only failed due to a different
|
|
thread's memory accesses, before falling back to regular lock.
|
|
Expressed in number of lock elision attempts.
|
|
|
|
This tunable is supported only on IBM POWER, and z System architectures.
|
|
|
|
The default value of this tunable is @samp{3}.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.elision.tries
|
|
The @code{glibc.elision.tries} sets how many times to retry elision if there is
|
|
chance for the transaction to finish execution e.g., it wasn't
|
|
aborted due to the lock being already acquired. If elision is not supported
|
|
by the hardware this tunable is set to @samp{0} to avoid retries.
|
|
|
|
The default value of this tunable is @samp{3}.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.elision.skip_trylock_internal_abort
|
|
The @code{glibc.elision.skip_trylock_internal_abort} tunable sets how many
|
|
times the thread should avoid trying the lock if a transaction aborted due to
|
|
reasons other than a different thread's memory accesses. Expressed in number
|
|
of try lock attempts.
|
|
|
|
The default value of this tunable is @samp{3}.
|
|
@end deftp
|
|
|
|
@node POSIX Thread Tunables
|
|
@section POSIX Thread Tunables
|
|
@cindex pthread mutex tunables
|
|
@cindex thread mutex tunables
|
|
@cindex mutex tunables
|
|
@cindex tunables thread mutex
|
|
|
|
@deftp {Tunable namespace} glibc.pthread
|
|
The behavior of POSIX threads can be tuned to gain performance improvements
|
|
according to specific hardware capabilities and workload characteristics by
|
|
setting the following tunables in the @code{pthread} namespace:
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.pthread.mutex_spin_count
|
|
The @code{glibc.pthread.mutex_spin_count} tunable sets the maximum number of times
|
|
a thread should spin on the lock before calling into the kernel to block.
|
|
Adaptive spin is used for mutexes initialized with the
|
|
@code{PTHREAD_MUTEX_ADAPTIVE_NP} GNU extension. It affects both
|
|
@code{pthread_mutex_lock} and @code{pthread_mutex_timedlock}.
|
|
|
|
The thread spins until either the maximum spin count is reached or the lock
|
|
is acquired.
|
|
|
|
The default value of this tunable is @samp{100}.
|
|
@end deftp
|
|
|
|
@node Hardware Capability Tunables
|
|
@section Hardware Capability Tunables
|
|
@cindex hardware capability tunables
|
|
@cindex hwcap tunables
|
|
@cindex tunables, hwcap
|
|
@cindex hwcaps tunables
|
|
@cindex tunables, hwcaps
|
|
@cindex data_cache_size tunables
|
|
@cindex tunables, data_cache_size
|
|
@cindex shared_cache_size tunables
|
|
@cindex tunables, shared_cache_size
|
|
@cindex non_temporal_threshold tunables
|
|
@cindex tunables, non_temporal_threshold
|
|
|
|
@deftp {Tunable namespace} glibc.cpu
|
|
Behavior of @theglibc{} can be tuned to assume specific hardware capabilities
|
|
by setting the following tunables in the @code{cpu} namespace:
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.cpu.hwcap_mask
|
|
This tunable supersedes the @env{LD_HWCAP_MASK} environment variable and is
|
|
identical in features.
|
|
|
|
The @code{AT_HWCAP} key in the Auxiliary Vector specifies instruction set
|
|
extensions available in the processor at runtime for some architectures. The
|
|
@code{glibc.cpu.hwcap_mask} tunable allows the user to mask out those
|
|
capabilities at runtime, thus disabling use of those extensions.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.cpu.hwcaps
|
|
The @code{glibc.cpu.hwcaps=-xxx,yyy,-zzz...} tunable allows the user to
|
|
enable CPU/ARCH feature @code{yyy}, disable CPU/ARCH feature @code{xxx}
|
|
and @code{zzz} where the feature name is case-sensitive and has to match
|
|
the ones in @code{sysdeps/x86/cpu-features.h}.
|
|
|
|
This tunable is specific to i386 and x86-64.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.cpu.cached_memopt
|
|
The @code{glibc.cpu.cached_memopt=[0|1]} tunable allows the user to
|
|
enable optimizations recommended for cacheable memory. If set to
|
|
@code{1}, @theglibc{} assumes that the process memory image consists
|
|
of cacheable (non-device) memory only. The default, @code{0},
|
|
indicates that the process may use device memory.
|
|
|
|
This tunable is specific to powerpc, powerpc64 and powerpc64le.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.cpu.name
|
|
The @code{glibc.cpu.name=xxx} tunable allows the user to tell @theglibc{} to
|
|
assume that the CPU is @code{xxx} where xxx may have one of these values:
|
|
@code{generic}, @code{falkor}, @code{thunderxt88}, @code{thunderx2t99},
|
|
@code{thunderx2t99p1}.
|
|
|
|
This tunable is specific to aarch64.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.cpu.x86_data_cache_size
|
|
The @code{glibc.cpu.x86_data_cache_size} tunable allows the user to set
|
|
data cache size in bytes for use in memory and string routines.
|
|
|
|
This tunable is specific to i386 and x86-64.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.cpu.x86_shared_cache_size
|
|
The @code{glibc.cpu.x86_shared_cache_size} tunable allows the user to
|
|
set shared cache size in bytes for use in memory and string routines.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.cpu.x86_non_temporal_threshold
|
|
The @code{glibc.cpu.x86_non_temporal_threshold} tunable allows the user
|
|
to set threshold in bytes for non temporal store.
|
|
|
|
This tunable is specific to i386 and x86-64.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.cpu.x86_ibt
|
|
The @code{glibc.cpu.x86_ibt} tunable allows the user to control how
|
|
indirect branch tracking (IBT) should be enabled. Accepted values are
|
|
@code{on}, @code{off}, and @code{permissive}. @code{on} always turns
|
|
on IBT regardless of whether IBT is enabled in the executable and its
|
|
dependent shared libraries. @code{off} always turns off IBT regardless
|
|
of whether IBT is enabled in the executable and its dependent shared
|
|
libraries. @code{permissive} is the same as the default which disables
|
|
IBT on non-CET executables and shared libraries.
|
|
|
|
This tunable is specific to i386 and x86-64.
|
|
@end deftp
|
|
|
|
@deftp Tunable glibc.cpu.x86_shstk
|
|
The @code{glibc.cpu.x86_shstk} tunable allows the user to control how
|
|
the shadow stack (SHSTK) should be enabled. Accepted values are
|
|
@code{on}, @code{off}, and @code{permissive}. @code{on} always turns on
|
|
SHSTK regardless of whether SHSTK is enabled in the executable and its
|
|
dependent shared libraries. @code{off} always turns off SHSTK regardless
|
|
of whether SHSTK is enabled in the executable and its dependent shared
|
|
libraries. @code{permissive} changes how dlopen works on non-CET shared
|
|
libraries. By default, when SHSTK is enabled, dlopening a non-CET shared
|
|
library returns an error. With @code{permissive}, it turns off SHSTK
|
|
instead.
|
|
|
|
This tunable is specific to i386 and x86-64.
|
|
@end deftp
|