Add per-thread cache to malloc

* config.make.in: Enable experimental malloc option.
* configure.ac: Likewise.
* configure: Regenerate.
* manual/install.texi: Document it.
* INSTALL: Regenerate.
* malloc/Makefile: Likewise.
* malloc/malloc.c: Add per-thread cache (tcache).
(tcache_put): New.
(tcache_get): New.
(tcache_thread_freeres): New.
(tcache_init): New.
(__libc_malloc): Use cached chunks if available.
(__libc_free): Initialize tcache if needed.
(__libc_realloc): Likewise.
(__libc_calloc): Likewise.
(_int_malloc): Prefill tcache when appropriate.
(_int_free): Likewise.
(do_set_tcache_max): New.
(do_set_tcache_count): New.
(do_set_tcache_unsorted_limit): New.
* manual/probes.texi: Document new probes.
* malloc/arena.c: Add new tcache tunables.
* elf/dl-tunables.list: Likewise.
* manual/tunables.texi: Document them.
* NEWS: Mention the per-thread cache.
This commit is contained in:
DJ Delorie 2017-07-06 13:37:30 -04:00
parent 3cefdd7310
commit d5c3fafc43
13 changed files with 490 additions and 9 deletions

View File

@ -1,3 +1,31 @@
2017-07-06 DJ Delorie <dj@delorie.com>
* config.make.in: Enable experimental malloc option.
* configure.ac: Likewise.
* configure: Regenerate.
* manual/install.texi: Document it.
* INSTALL: Regenerate.
* malloc/Makefile: Likewise.
* malloc/malloc.c: Add per-thread cache (tcache).
(tcache_put): New.
(tcache_get): New.
(tcache_thread_freeres): New.
(tcache_init): New.
(__libc_malloc): Use cached chunks if available.
(__libc_free): Initialize tcache if needed.
(__libc_realloc): Likewise.
(__libc_calloc): Likewise.
(_int_malloc): Prefill tcache when appropriate.
(_int_free): Likewise.
(do_set_tcache_max): New.
(do_set_tcache_count): New.
(do_set_tcache_unsorted_limit): New.
* manual/probes.texi: Document new probes.
* malloc/arena.c: Add new tcache tunables.
* elf/dl-tunables.list: Likewise.
* manual/tunables.texi: Document them.
* NEWS: Mention the per-thread cache.
2017-07-06 Joseph Myers <joseph@codesourcery.com> 2017-07-06 Joseph Myers <joseph@codesourcery.com>
* iconvdata/tst-loading.c (TIMEOUT): Define to 30. * iconvdata/tst-loading.c (TIMEOUT): Define to 30.

View File

@ -200,6 +200,12 @@ will be used, and CFLAGS sets optimization options for the compiler.
libnss_nisplus are not built at all. Use this option to enable libnss_nisplus are not built at all. Use this option to enable
libnsl with all depending NSS modules and header files. libnsl with all depending NSS modules and header files.
'--disable-experimental-malloc'
By default, a per-thread cache is enabled in 'malloc'. While this
cache can be disabled on a per-application basis using tunables
(set glibc.malloc.tcache_count to zero), this option can be used to
remove it from the build completely.
'--build=BUILD-SYSTEM' '--build=BUILD-SYSTEM'
'--host=HOST-SYSTEM' '--host=HOST-SYSTEM'
These options are for cross-compiling. If you specify both options These options are for cross-compiling. If you specify both options

8
NEWS
View File

@ -9,6 +9,14 @@ Version 2.26
Major new features: Major new features:
* A per-thread cache has been added to malloc. Access to the cache requires
no locks and therefore significantly accelerates the fast path to allocate
and free small amounts of memory. Refilling an empty cache requires locking
the underlying arena. Performance measurements show significant gains in a
wide variety of user workloads. Workloads were captured using a special
instrumented malloc and analyzed with a malloc simulator. Contributed by
DJ Delorie with the help of Florian Weimer, and Carlos O'Donell.
* Unicode 10.0.0 Support: Character encoding, character type info, and * Unicode 10.0.0 Support: Character encoding, character type info, and
transliteration tables are all updated to Unicode 10.0.0, using transliteration tables are all updated to Unicode 10.0.0, using
generator scripts contributed by Mike FABIAN (Red Hat). generator scripts contributed by Mike FABIAN (Red Hat).

View File

@ -78,6 +78,8 @@ multi-arch = @multi_arch@
mach-interface-list = @mach_interface_list@ mach-interface-list = @mach_interface_list@
experimental-malloc = @experimental_malloc@
nss-crypt = @libc_cv_nss_crypt@ nss-crypt = @libc_cv_nss_crypt@
static-nss-crypt = @libc_cv_static_nss_crypt@ static-nss-crypt = @libc_cv_static_nss_crypt@

13
configure vendored
View File

@ -674,6 +674,7 @@ build_obsolete_nsl
link_obsolete_rpc link_obsolete_rpc
libc_cv_static_nss_crypt libc_cv_static_nss_crypt
libc_cv_nss_crypt libc_cv_nss_crypt
experimental_malloc
enable_werror enable_werror
all_warnings all_warnings
force_install force_install
@ -779,6 +780,7 @@ enable_kernel
enable_all_warnings enable_all_warnings
enable_werror enable_werror
enable_multi_arch enable_multi_arch
enable_experimental_malloc
enable_nss_crypt enable_nss_crypt
enable_obsolete_rpc enable_obsolete_rpc
enable_obsolete_nsl enable_obsolete_nsl
@ -1450,6 +1452,8 @@ Optional Features:
--disable-werror do not build with -Werror --disable-werror do not build with -Werror
--enable-multi-arch enable single DSO with optimizations for multiple --enable-multi-arch enable single DSO with optimizations for multiple
architectures architectures
--disable-experimental-malloc
disable experimental malloc features
--enable-nss-crypt enable libcrypt to use nss --enable-nss-crypt enable libcrypt to use nss
--enable-obsolete-rpc build and install the obsolete RPC code for --enable-obsolete-rpc build and install the obsolete RPC code for
link-time usage link-time usage
@ -3522,6 +3526,15 @@ else
fi fi
# Check whether --enable-experimental-malloc was given.
if test "${enable_experimental_malloc+set}" = set; then :
enableval=$enable_experimental_malloc; experimental_malloc=$enableval
else
experimental_malloc=yes
fi
# Check whether --enable-nss-crypt was given. # Check whether --enable-nss-crypt was given.
if test "${enable_nss_crypt+set}" = set; then : if test "${enable_nss_crypt+set}" = set; then :
enableval=$enable_nss_crypt; nss_crypt=$enableval enableval=$enable_nss_crypt; nss_crypt=$enableval

View File

@ -313,6 +313,13 @@ AC_ARG_ENABLE([multi-arch],
[multi_arch=$enableval], [multi_arch=$enableval],
[multi_arch=default]) [multi_arch=default])
AC_ARG_ENABLE([experimental-malloc],
AC_HELP_STRING([--disable-experimental-malloc],
[disable experimental malloc features]),
[experimental_malloc=$enableval],
[experimental_malloc=yes])
AC_SUBST(experimental_malloc)
AC_ARG_ENABLE([nss-crypt], AC_ARG_ENABLE([nss-crypt],
AC_HELP_STRING([--enable-nss-crypt], AC_HELP_STRING([--enable-nss-crypt],
[enable libcrypt to use nss]), [enable libcrypt to use nss]),

View File

@ -76,6 +76,18 @@ glibc {
minval: 1 minval: 1
security_level: SXID_IGNORE security_level: SXID_IGNORE
} }
tcache_max {
type: SIZE_T
security_level: SXID_ERASE
}
tcache_count {
type: SIZE_T
security_level: SXID_ERASE
}
tcache_unsorted_limit {
type: SIZE_T
security_level: SXID_ERASE
}
} }
tune { tune {
hwcap_mask { hwcap_mask {

View File

@ -189,6 +189,11 @@ tst-malloc-usable-static-ENV = $(tst-malloc-usable-ENV)
tst-malloc-usable-tunables-ENV = GLIBC_TUNABLES=glibc.malloc.check=3 tst-malloc-usable-tunables-ENV = GLIBC_TUNABLES=glibc.malloc.check=3
tst-malloc-usable-static-tunables-ENV = $(tst-malloc-usable-tunables-ENV) tst-malloc-usable-static-tunables-ENV = $(tst-malloc-usable-tunables-ENV)
ifeq ($(experimental-malloc),yes)
CPPFLAGS-malloc.c += -DUSE_TCACHE=1
else
CPPFLAGS-malloc.c += -DUSE_TCACHE=0
endif
# Uncomment this for test releases. For public releases it is too expensive. # Uncomment this for test releases. For public releases it is too expensive.
#CPPFLAGS-malloc.o += -DMALLOC_DEBUG=1 #CPPFLAGS-malloc.o += -DMALLOC_DEBUG=1

View File

@ -236,6 +236,11 @@ TUNABLE_CALLBACK_FNDECL (set_perturb_byte, int32_t)
TUNABLE_CALLBACK_FNDECL (set_trim_threshold, size_t) TUNABLE_CALLBACK_FNDECL (set_trim_threshold, size_t)
TUNABLE_CALLBACK_FNDECL (set_arena_max, size_t) TUNABLE_CALLBACK_FNDECL (set_arena_max, size_t)
TUNABLE_CALLBACK_FNDECL (set_arena_test, size_t) TUNABLE_CALLBACK_FNDECL (set_arena_test, size_t)
#if USE_TCACHE
TUNABLE_CALLBACK_FNDECL (set_tcache_max, size_t)
TUNABLE_CALLBACK_FNDECL (set_tcache_count, size_t)
TUNABLE_CALLBACK_FNDECL (set_tcache_unsorted_limit, size_t)
#endif
#else #else
/* Initialization routine. */ /* Initialization routine. */
#include <string.h> #include <string.h>
@ -322,6 +327,12 @@ ptmalloc_init (void)
TUNABLE_GET (mmap_max, int32_t, TUNABLE_CALLBACK (set_mmaps_max)); TUNABLE_GET (mmap_max, int32_t, TUNABLE_CALLBACK (set_mmaps_max));
TUNABLE_GET (arena_max, size_t, TUNABLE_CALLBACK (set_arena_max)); TUNABLE_GET (arena_max, size_t, TUNABLE_CALLBACK (set_arena_max));
TUNABLE_GET (arena_test, size_t, TUNABLE_CALLBACK (set_arena_test)); TUNABLE_GET (arena_test, size_t, TUNABLE_CALLBACK (set_arena_test));
#if USE_TCACHE
TUNABLE_GET (tcache_max, size_t, TUNABLE_CALLBACK (set_tcache_max));
TUNABLE_GET (tcache_count, size_t, TUNABLE_CALLBACK (set_tcache_count));
TUNABLE_GET (tcache_unsorted_limit, size_t,
TUNABLE_CALLBACK (set_tcache_unsorted_limit));
#endif
__libc_lock_unlock (main_arena.mutex); __libc_lock_unlock (main_arena.mutex);
#else #else
const char *s = NULL; const char *s = NULL;

View File

@ -238,6 +238,9 @@
/* For ALIGN_UP et. al. */ /* For ALIGN_UP et. al. */
#include <libc-pointer-arith.h> #include <libc-pointer-arith.h>
/* For DIAG_PUSH/POP_NEEDS_COMMENT et al. */
#include <libc-diag.h>
#include <malloc/malloc-internal.h> #include <malloc/malloc-internal.h>
/* /*
@ -296,6 +299,30 @@ __malloc_assert (const char *assertion, const char *file, unsigned int line,
} }
#endif #endif
#if USE_TCACHE
/* We want 64 entries. This is an arbitrary limit, which tunables can reduce. */
# define TCACHE_MAX_BINS 64
# define MAX_TCACHE_SIZE tidx2usize (TCACHE_MAX_BINS-1)
/* Only used to pre-fill the tunables. */
# define tidx2usize(idx) (((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)
/* When "x" is from chunksize(). */
# define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)
/* When "x" is a user-provided size. */
# define usize2tidx(x) csize2tidx (request2size (x))
/* With rounding and alignment, the bins are...
idx 0 bytes 0..24 (64-bit) or 0..12 (32-bit)
idx 1 bytes 25..40 or 13..20
idx 2 bytes 41..56 or 21..28
etc. */
/* This is another arbitrary limit, which tunables can change. Each
tcache bin will hold at most this number of chunks. */
# define TCACHE_FILL_COUNT 7
#endif
/* /*
REALLOC_ZERO_BYTES_FREES should be set if a call to REALLOC_ZERO_BYTES_FREES should be set if a call to
@ -1712,6 +1739,17 @@ struct malloc_par
/* First address handed out by MORECORE/sbrk. */ /* First address handed out by MORECORE/sbrk. */
char *sbrk_base; char *sbrk_base;
#if USE_TCACHE
/* Maximum number of buckets to use. */
size_t tcache_bins;
size_t tcache_max_bytes;
/* Maximum number of chunks in each bucket. */
size_t tcache_count;
/* Maximum number of chunks to remove from the unsorted list, which
aren't used to prefill the cache. */
size_t tcache_unsorted_limit;
#endif
}; };
/* There are several instances of this struct ("arenas") in this /* There are several instances of this struct ("arenas") in this
@ -1750,6 +1788,13 @@ static struct malloc_par mp_ =
.trim_threshold = DEFAULT_TRIM_THRESHOLD, .trim_threshold = DEFAULT_TRIM_THRESHOLD,
#define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8)) #define NARENAS_FROM_NCORES(n) ((n) * (sizeof (long) == 4 ? 2 : 8))
.arena_test = NARENAS_FROM_NCORES (1) .arena_test = NARENAS_FROM_NCORES (1)
#if USE_TCACHE
,
.tcache_count = TCACHE_FILL_COUNT,
.tcache_bins = TCACHE_MAX_BINS,
.tcache_max_bytes = tidx2usize (TCACHE_MAX_BINS-1),
.tcache_unsorted_limit = 0 /* No limit. */
#endif
}; };
/* Maximum size of memory handled in fastbins. */ /* Maximum size of memory handled in fastbins. */
@ -2875,6 +2920,124 @@ mremap_chunk (mchunkptr p, size_t new_size)
/*------------------------ Public wrappers. --------------------------------*/ /*------------------------ Public wrappers. --------------------------------*/
#if USE_TCACHE
/* We overlay this structure on the user-data portion of a chunk when
the chunk is stored in the per-thread cache. */
typedef struct tcache_entry
{
struct tcache_entry *next;
} tcache_entry;
/* There is one of these for each thread, which contains the
per-thread cache (hence "tcache_perthread_struct"). Keeping
overall size low is mildly important. Note that COUNTS and ENTRIES
are redundant (we could have just counted the linked list each
time), this is for performance reasons. */
typedef struct tcache_perthread_struct
{
char counts[TCACHE_MAX_BINS];
tcache_entry *entries[TCACHE_MAX_BINS];
} tcache_perthread_struct;
static __thread char tcache_shutting_down = 0;
static __thread tcache_perthread_struct *tcache = NULL;
/* Caller must ensure that we know tc_idx is valid and there's room
for more chunks. */
static void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
tcache_entry *e = (tcache_entry *) chunk2mem (chunk);
assert (tc_idx < TCACHE_MAX_BINS);
e->next = tcache->entries[tc_idx];
tcache->entries[tc_idx] = e;
++(tcache->counts[tc_idx]);
}
/* Caller must ensure that we know tc_idx is valid and there's
available chunks to remove. */
static void *
tcache_get (size_t tc_idx)
{
tcache_entry *e = tcache->entries[tc_idx];
assert (tc_idx < TCACHE_MAX_BINS);
assert (tcache->entries[tc_idx] > 0);
tcache->entries[tc_idx] = e->next;
--(tcache->counts[tc_idx]);
return (void *) e;
}
static void __attribute__ ((section ("__libc_thread_freeres_fn")))
tcache_thread_freeres (void)
{
int i;
tcache_perthread_struct *tcache_tmp = tcache;
if (!tcache)
return;
tcache = NULL;
for (i = 0; i < TCACHE_MAX_BINS; ++i)
{
while (tcache_tmp->entries[i])
{
tcache_entry *e = tcache_tmp->entries[i];
tcache_tmp->entries[i] = e->next;
__libc_free (e);
}
}
__libc_free (tcache_tmp);
tcache_shutting_down = 1;
}
text_set_element (__libc_thread_subfreeres, tcache_thread_freeres);
static void
tcache_init(void)
{
mstate ar_ptr;
void *victim = 0;
const size_t bytes = sizeof (tcache_perthread_struct);
if (tcache_shutting_down)
return;
arena_get (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
if (!victim && ar_ptr != NULL)
{
ar_ptr = arena_get_retry (ar_ptr, bytes);
victim = _int_malloc (ar_ptr, bytes);
}
if (ar_ptr != NULL)
__libc_lock_unlock (ar_ptr->mutex);
/* In a low memory situation, we may not be able to allocate memory
- in which case, we just keep trying later. However, we
typically do this very early, so either there is sufficient
memory, or there isn't enough memory to do non-trivial
allocations anyway. */
if (victim)
{
tcache = (tcache_perthread_struct *) victim;
memset (tcache, 0, sizeof (tcache_perthread_struct));
}
}
#define MAYBE_INIT_TCACHE() \
if (__glibc_unlikely (tcache == NULL)) \
tcache_init();
#else
#define MAYBE_INIT_TCACHE()
#endif
void * void *
__libc_malloc (size_t bytes) __libc_malloc (size_t bytes)
{ {
@ -2885,6 +3048,23 @@ __libc_malloc (size_t bytes)
= atomic_forced_read (__malloc_hook); = atomic_forced_read (__malloc_hook);
if (__builtin_expect (hook != NULL, 0)) if (__builtin_expect (hook != NULL, 0))
return (*hook)(bytes, RETURN_ADDRESS (0)); return (*hook)(bytes, RETURN_ADDRESS (0));
#if USE_TCACHE
/* int_free also calls request2size, be careful to not pad twice. */
size_t tbytes = request2size (bytes);
size_t tc_idx = csize2tidx (tbytes);
MAYBE_INIT_TCACHE ();
DIAG_PUSH_NEEDS_COMMENT;
if (tc_idx < mp_.tcache_bins
/*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
&& tcache
&& tcache->entries[tc_idx] != NULL)
{
return tcache_get (tc_idx);
}
DIAG_POP_NEEDS_COMMENT;
#endif
arena_get (ar_ptr, bytes); arena_get (ar_ptr, bytes);
@ -2944,6 +3124,8 @@ __libc_free (void *mem)
return; return;
} }
MAYBE_INIT_TCACHE ();
ar_ptr = arena_for_chunk (p); ar_ptr = arena_for_chunk (p);
_int_free (ar_ptr, p, 0); _int_free (ar_ptr, p, 0);
} }
@ -2981,7 +3163,10 @@ __libc_realloc (void *oldmem, size_t bytes)
if (chunk_is_mmapped (oldp)) if (chunk_is_mmapped (oldp))
ar_ptr = NULL; ar_ptr = NULL;
else else
{
MAYBE_INIT_TCACHE ();
ar_ptr = arena_for_chunk (oldp); ar_ptr = arena_for_chunk (oldp);
}
/* Little security check which won't hurt performance: the allocator /* Little security check which won't hurt performance: the allocator
never wrapps around at the end of the address space. Therefore never wrapps around at the end of the address space. Therefore
@ -3206,6 +3391,8 @@ __libc_calloc (size_t n, size_t elem_size)
sz = bytes; sz = bytes;
MAYBE_INIT_TCACHE ();
arena_get (av, sz); arena_get (av, sz);
if (av) if (av)
{ {
@ -3336,6 +3523,10 @@ _int_malloc (mstate av, size_t bytes)
mchunkptr fwd; /* misc temp for linking */ mchunkptr fwd; /* misc temp for linking */
mchunkptr bck; /* misc temp for linking */ mchunkptr bck; /* misc temp for linking */
#if USE_TCACHE
size_t tcache_unsorted_count; /* count of unsorted chunks processed */
#endif
const char *errstr = NULL; const char *errstr = NULL;
/* /*
@ -3365,19 +3556,22 @@ _int_malloc (mstate av, size_t bytes)
can try it without checking, which saves some time on this fast path. can try it without checking, which saves some time on this fast path.
*/ */
#define REMOVE_FB(fb, victim, pp) \
do \
{ \
victim = pp; \
if (victim == NULL) \
break; \
} \
while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) \
!= victim); \
if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ())) if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
{ {
idx = fastbin_index (nb); idx = fastbin_index (nb);
mfastbinptr *fb = &fastbin (av, idx); mfastbinptr *fb = &fastbin (av, idx);
mchunkptr pp = *fb; mchunkptr pp = *fb;
do REMOVE_FB (fb, victim, pp);
{
victim = pp;
if (victim == NULL)
break;
}
while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
!= victim);
if (victim != 0) if (victim != 0)
{ {
if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0)) if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))
@ -3388,6 +3582,26 @@ _int_malloc (mstate av, size_t bytes)
return NULL; return NULL;
} }
check_remalloced_chunk (av, victim, nb); check_remalloced_chunk (av, victim, nb);
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
{
mchunkptr tc_victim;
/* While bin not empty and tcache not full, copy chunks over. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (pp = *fb) != NULL)
{
REMOVE_FB (fb, tc_victim, pp);
if (tc_victim != 0)
{
tcache_put (tc_victim, tc_idx);
}
}
}
#endif
void *p = chunk2mem (victim); void *p = chunk2mem (victim);
alloc_perturb (p, bytes); alloc_perturb (p, bytes);
return p; return p;
@ -3426,6 +3640,32 @@ _int_malloc (mstate av, size_t bytes)
if (av != &main_arena) if (av != &main_arena)
set_non_main_arena (victim); set_non_main_arena (victim);
check_malloced_chunk (av, victim, nb); check_malloced_chunk (av, victim, nb);
#if USE_TCACHE
/* While we're here, if we see other chunks of the same size,
stash them in the tcache. */
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
{
mchunkptr tc_victim;
/* While bin not empty and tcache not full, copy chunks over. */
while (tcache->counts[tc_idx] < mp_.tcache_count
&& (tc_victim = last (bin)) != bin)
{
if (tc_victim != 0)
{
bck = tc_victim->bk;
set_inuse_bit_at_offset (tc_victim, nb);
if (av != &main_arena)
set_non_main_arena (tc_victim);
bin->bk = bck;
bck->fd = bin;
tcache_put (tc_victim, tc_idx);
}
}
}
#endif
void *p = chunk2mem (victim); void *p = chunk2mem (victim);
alloc_perturb (p, bytes); alloc_perturb (p, bytes);
return p; return p;
@ -3464,6 +3704,16 @@ _int_malloc (mstate av, size_t bytes)
otherwise need to expand memory to service a "small" request. otherwise need to expand memory to service a "small" request.
*/ */
#if USE_TCACHE
INTERNAL_SIZE_T tcache_nb = 0;
size_t tc_idx = csize2tidx (nb);
if (tcache && tc_idx < mp_.tcache_bins)
tcache_nb = nb;
int return_cached = 0;
tcache_unsorted_count = 0;
#endif
for (;; ) for (;; )
{ {
int iters = 0; int iters = 0;
@ -3524,10 +3774,26 @@ _int_malloc (mstate av, size_t bytes)
set_inuse_bit_at_offset (victim, size); set_inuse_bit_at_offset (victim, size);
if (av != &main_arena) if (av != &main_arena)
set_non_main_arena (victim); set_non_main_arena (victim);
#if USE_TCACHE
/* Fill cache first, return to user only if cache fills.
We may return one of these chunks later. */
if (tcache_nb
&& tcache->counts[tc_idx] < mp_.tcache_count)
{
tcache_put (victim, tc_idx);
return_cached = 1;
continue;
}
else
{
#endif
check_malloced_chunk (av, victim, nb); check_malloced_chunk (av, victim, nb);
void *p = chunk2mem (victim); void *p = chunk2mem (victim);
alloc_perturb (p, bytes); alloc_perturb (p, bytes);
return p; return p;
#if USE_TCACHE
}
#endif
} }
/* place chunk in bin */ /* place chunk in bin */
@ -3594,11 +3860,31 @@ _int_malloc (mstate av, size_t bytes)
fwd->bk = victim; fwd->bk = victim;
bck->fd = victim; bck->fd = victim;
#if USE_TCACHE
/* If we've processed as many chunks as we're allowed while
filling the cache, return one of the cached ones. */
++tcache_unsorted_count;
if (return_cached
&& mp_.tcache_unsorted_limit > 0
&& tcache_unsorted_count > mp_.tcache_unsorted_limit)
{
return tcache_get (tc_idx);
}
#endif
#define MAX_ITERS 10000 #define MAX_ITERS 10000
if (++iters >= MAX_ITERS) if (++iters >= MAX_ITERS)
break; break;
} }
#if USE_TCACHE
/* If all the small chunks we found ended up cached, return one now. */
if (return_cached)
{
return tcache_get (tc_idx);
}
#endif
/* /*
If a large request, scan through the chunks of current bin in If a large request, scan through the chunks of current bin in
sorted order to find smallest that fits. Use the skip list for this. sorted order to find smallest that fits. Use the skip list for this.
@ -3884,6 +4170,20 @@ _int_free (mstate av, mchunkptr p, int have_lock)
check_inuse_chunk(av, p); check_inuse_chunk(av, p);
#if USE_TCACHE
{
size_t tc_idx = csize2tidx (size);
if (tcache
&& tc_idx < mp_.tcache_bins
&& tcache->counts[tc_idx] < mp_.tcache_count)
{
tcache_put (p, tc_idx);
return;
}
}
#endif
/* /*
If eligible, place chunk on a fastbin so it can be found If eligible, place chunk on a fastbin so it can be found
and used quickly in malloc. and used quickly in malloc.
@ -4845,6 +5145,38 @@ do_set_arena_max (size_t value)
return 1; return 1;
} }
#if USE_TCACHE
static inline int
__always_inline
do_set_tcache_max (size_t value)
{
if (value >= 0 && value <= MAX_TCACHE_SIZE)
{
LIBC_PROBE (memory_tunable_tcache_max_bytes, 2, value, mp_.tcache_max_bytes);
mp_.tcache_max_bytes = value;
mp_.tcache_bins = csize2tidx (request2size(value)) + 1;
}
return 1;
}
static inline int
__always_inline
do_set_tcache_count (size_t value)
{
LIBC_PROBE (memory_tunable_tcache_count, 2, value, mp_.tcache_count);
mp_.tcache_count = value;
return 1;
}
static inline int
__always_inline
do_set_tcache_unsorted_limit (size_t value)
{
LIBC_PROBE (memory_tunable_tcache_unsorted_limit, 2, value, mp_.tcache_unsorted_limit);
mp_.tcache_unsorted_limit = value;
return 1;
}
#endif
int int
__libc_mallopt (int param_number, int value) __libc_mallopt (int param_number, int value)

View File

@ -232,6 +232,12 @@ libnss_nisplus are not built at all.
Use this option to enable libnsl with all depending NSS modules and Use this option to enable libnsl with all depending NSS modules and
header files. header files.
@item --disable-experimental-malloc
By default, a per-thread cache is enabled in @code{malloc}. While
this cache can be disabled on a per-application basis using tunables
(set glibc.malloc.tcache_count to zero), this option can be used to
remove it from the build completely.
@item --build=@var{build-system} @item --build=@var{build-system}
@itemx --host=@var{host-system} @itemx --host=@var{host-system}
These options are for cross-compiling. If you specify both options and These options are for cross-compiling. If you specify both options and

View File

@ -231,6 +231,25 @@ dynamic brk/mmap thresholds. Argument @var{$arg1} and @var{$arg2} are
the adjusted mmap and trim thresholds, respectively. the adjusted mmap and trim thresholds, respectively.
@end deftp @end deftp
@deftp Probe memory_tunable_tcache_max_bytes (int @var{$arg1}, int @var{$arg2})
This probe is triggered when the @code{glibc.malloc.tcache_max}
tunable is set. Argument @var{$arg1} is the requested value, and
@var{$arg2} is the previous value of this tunable.
@end deftp
@deftp Probe memory_tunable_tcache_count (int @var{$arg1}, int @var{$arg2})
This probe is triggered when the @code{glibc.malloc.tcache_count}
tunable is set. Argument @var{$arg1} is the requested value, and
@var{$arg2} is the previous value of this tunable.
@end deftp
@deftp Probe memory_tunable_tcache_unsorted_limit (int @var{$arg1}, int @var{$arg2})
This probe is triggered when the
@code{glibc.malloc.tcache_unsorted_limit} tunable is set. Argument
@var{$arg1} is the requested value, and @var{$arg2} is the previous
value of this tunable.
@end deftp
@node Mathematical Function Probes @node Mathematical Function Probes
@section Mathematical Function Probes @section Mathematical Function Probes

View File

@ -193,6 +193,38 @@ systems the limit is twice the number of cores online and on 64-bit systems, it
is 8 times the number of cores online. is 8 times the number of cores online.
@end deftp @end deftp
@deftp Tunable glibc.malloc.tcache_max
The maximum size of a request (in bytes) which may be met via the
per-thread cache. The default (and maximum) value is 1032 bytes on
64-bit systems and 516 bytes on 32-bit systems.
@end deftp
@deftp Tunable glibc.malloc.tcache_count
The maximum number of chunks of each size to cache. The default is 7.
There is no upper limit, other than available system memory. If set
to zero, the per-thread cache is effectively disabled.
The approximate maximum overhead of the per-thread cache is thus equal
to the number of bins times the chunk count in each bin times the size
of each chunk. With defaults, the approximate maximum overhead of the
per-thread cache is approximately 236 KB on 64-bit systems and 118 KB
on 32-bit systems.
@end deftp
@deftp Tunable glibc.malloc.tcache_unsorted_limit
When the user requests memory and the request cannot be met via the
per-thread cache, the arenas are used to meet the request. At this
time, additional chunks will be moved from existing arena lists to
pre-fill the corresponding cache. While copies from the fastbins,
smallbins, and regular bins are bounded and predictable due to the bin
sizes, copies from the unsorted bin are not bounded, and incur
additional time penalties as they need to be sorted as they're
scanned. To make scanning the unsorted list more predictable and
bounded, the user may set this tunable to limit the number of chunks
that are scanned from the unsorted list while searching for chunks to
pre-fill the per-thread cache with. The default, or when set to zero,
is no limit.
@node Hardware Capability Tunables @node Hardware Capability Tunables
@section Hardware Capability Tunables @section Hardware Capability Tunables
@cindex hardware capability tunables @cindex hardware capability tunables