Change the tcache->counts[] entries to uint16_t - this removes
the limit set by char and allows a larger tcache. Remove a few
redundant asserts.
bench-malloc-thread with 4 threads is ~15% faster on Cortex-A72.
Reviewed-by: DJ Delorie <dj@redhat.com>
* malloc/malloc.c (MAX_TCACHE_COUNT): Increase to UINT16_MAX.
(tcache_put): Remove redundant assert.
(tcache_get): Remove redundant asserts.
(__libc_malloc): Check tcache count is not zero.
* manual/tunables.texi (glibc.malloc.tcache_count): Update maximum.
The tcache counts[] array is a char, which has a very small range and thus
may overflow. When setting tcache_count tunable, there is no overflow check.
However the tunable must not be larger than the maximum value of the tcache
counts[] array, otherwise it can overflow when filling the tcache.
[BZ #24531]
* malloc/malloc.c (MAX_TCACHE_COUNT): New define.
(do_set_tcache_count): Only update if count is small enough.
* manual/tunables.texi (glibc.malloc.tcache_count): Document max value.
As discussed previously on libc-alpha [1], this patch follows up the idea
and add both the __attribute_alloc_size__ on malloc functions (malloc,
calloc, realloc, reallocarray, valloc, pvalloc, and memalign) and limit
maximum requested allocation size to up PTRDIFF_MAX (taking into
consideration internal padding and alignment).
This aligns glibc with gcc expected size defined by default warning
-Walloc-size-larger-than value which warns for allocation larger than
PTRDIFF_MAX. It also aligns with gcc expectation regarding libc and
expected size, such as described in PR#67999 [2] and previously discussed
ISO C11 issues [3] on libc-alpha.
From the RFC thread [4] and previous discussion, it seems that consensus
is only to limit such requested size for malloc functions, not the system
allocation one (mmap, sbrk, etc.).
The implementation changes checked_request2size to check for both overflow
and maximum object size up to PTRDIFF_MAX. No additional checks are done
on sysmalloc, so it can still issue mmap with values larger than
PTRDIFF_T depending on the requested size.
The __attribute_alloc_size__ is for functions that return a pointer only,
which means it cannot be applied to posix_memalign (see remarks in GCC
PR#87683 [5]). The runtimes checks to limit maximum requested allocation
size does applies to posix_memalign.
Checked on x86_64-linux-gnu and i686-linux-gnu.
[1] https://sourceware.org/ml/libc-alpha/2018-11/msg00223.html
[2] https://gcc.gnu.org/bugzilla//show_bug.cgi?id=67999
[3] https://sourceware.org/ml/libc-alpha/2011-12/msg00066.html
[4] https://sourceware.org/ml/libc-alpha/2018-11/msg00224.html
[5] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87683
[BZ #23741]
* malloc/hooks.c (malloc_check, realloc_check): Use
__builtin_add_overflow on overflow check and adapt to
checked_request2size change.
* malloc/malloc.c (__libc_malloc, __libc_realloc, _mid_memalign,
__libc_pvalloc, __libc_calloc, _int_memalign): Limit maximum
allocation size to PTRDIFF_MAX.
(REQUEST_OUT_OF_RANGE): Remove macro.
(checked_request2size): Change to inline function and limit maximum
requested size to PTRDIFF_MAX.
(__libc_malloc, __libc_realloc, _int_malloc, _int_memalign): Limit
maximum allocation size to PTRDIFF_MAX.
(_mid_memalign): Use _int_memalign call for overflow check.
(__libc_pvalloc): Use __builtin_add_overflow on overflow check.
(__libc_calloc): Use __builtin_mul_overflow for overflow check and
limit maximum requested size to PTRDIFF_MAX.
* malloc/malloc.h (malloc, calloc, realloc, reallocarray, memalign,
valloc, pvalloc): Add __attribute_alloc_size__.
* stdlib/stdlib.h (malloc, realloc, reallocarray, valloc): Likewise.
* malloc/tst-malloc-too-large.c (do_test): Add check for allocation
larger than PTRDIFF_MAX.
* malloc/tst-memalign.c (do_test): Disable -Walloc-size-larger-than=
around tests of malloc with negative sizes.
* malloc/tst-posix_memalign.c (do_test): Likewise.
* malloc/tst-pvalloc.c (do_test): Likewise.
* malloc/tst-valloc.c (do_test): Likewise.
* malloc/tst-reallocarray.c (do_test): Replace call to reallocarray
with resulting size allocation larger than PTRDIFF_MAX with
reallocarray_nowarn.
(reallocarray_nowarn): New function.
* NEWS: Mention the malloc function semantic change.
Fixes bug 24216. This patch adds security checks for bk and bk_nextsize pointers
of chunks in large bin when inserting chunk from unsorted bin. It was possible
to write the pointer to victim (newly inserted chunk) to arbitrary memory
locations if bk or bk_nextsize pointers of the next large bin chunk
got corrupted.
One group of warnings seen with -Wextra is warnings for static or
inline not at the start of a declaration (-Wold-style-declaration).
This patch fixes various such cases for inline, ensuring it comes at
the start of the declaration (after any static). A common case of the
fix is "static inline <type> __always_inline"; the definition of
__always_inline starts with __inline, so the natural change is to
"static __always_inline <type>". Other cases of the warning may be
harder to fix (one pattern is a function definition that gets
rewritten to be static by an including file, "#define funcname static
wrapped_funcname" or similar), but it seems worth fixing these cases
with inline anyway.
Tested for x86_64.
* elf/dl-load.h (_dl_postprocess_loadcmd): Use __always_inline
before return type, without separate inline.
* elf/dl-tunables.c (maybe_enable_malloc_check): Likewise.
* elf/dl-tunables.h (tunable_is_name): Likewise.
* malloc/malloc.c (do_set_trim_threshold): Likewise.
(do_set_top_pad): Likewise.
(do_set_mmap_threshold): Likewise.
(do_set_mmaps_max): Likewise.
(do_set_mallopt_check): Likewise.
(do_set_perturb_byte): Likewise.
(do_set_arena_test): Likewise.
(do_set_arena_max): Likewise.
(do_set_tcache_max): Likewise.
(do_set_tcache_count): Likewise.
(do_set_tcache_unsorted_limit): Likewise.
* nis/nis_subr.c (count_dots): Likewise.
* nptl/allocatestack.c (advise_stack_range): Likewise.
* sysdeps/ieee754/dbl-64/s_sin.c (do_cos): Likewise.
(do_sin): Likewise.
(reduce_sincos): Likewise.
(do_sincos): Likewise.
* sysdeps/unix/sysv/linux/x86/elision-conf.c
(do_set_elision_enable): Likewise.
(TUNABLE_CALLBACK_FNDECL): Likewise.
One of the warnings that appears with -Wextra is "ordered comparison
of pointer with integer zero" in malloc.c:tcache_get, for the
assertion:
assert (tcache->entries[tc_idx] > 0);
Indeed, a "> 0" comparison does not make sense for
tcache->entries[tc_idx], which is a pointer. My guess is that
tcache->counts[tc_idx] is what's intended here, and this patch changes
the assertion accordingly.
Tested for x86_64.
* malloc/malloc.c (tcache_get): Compare tcache->counts[tc_idx]
with 0, not tcache->entries[tc_idx].
Commit 6923f6db1e ("malloc: Use current
(C11-style) atomics for fastbin access") caused a substantial
performance regression on POWER and Aarch64, and the old atomics,
while hard to prove correct, seem to work in practice.
This commit removes the custom memcpy implementation from _int_realloc
for small chunk sizes. The ncopies variable has the wrong type, and
an integer wraparound could cause the existing code to copy too few
elements (leaving the new memory region mostly uninitialized).
Therefore, removing this code fixes bug 24027.
The previous check could read beyond the end of the tcache entry
array. If the e->key == tcache cookie check happened to pass, this
would result in crashes.
This commit is in preparation of turning the macro into a proper
function. The output arguments of the macro were in fact unused.
Also clean up uses of __builtin_expect.
On Thu, Jan 11, 2018 at 3:50 PM, Florian Weimer <fweimer@redhat.com> wrote:
> On 11/07/2017 04:27 PM, Istvan Kurucsai wrote:
>>
>> + next = chunk_at_offset (victim, size);
>
>
> For new code, we prefer declarations with initializers.
Noted.
>> + if (__glibc_unlikely (chunksize_nomask (victim) <= 2 * SIZE_SZ)
>> + || __glibc_unlikely (chunksize_nomask (victim) >
>> av->system_mem))
>> + malloc_printerr("malloc(): invalid size (unsorted)");
>> + if (__glibc_unlikely (chunksize_nomask (next) < 2 * SIZE_SZ)
>> + || __glibc_unlikely (chunksize_nomask (next) >
>> av->system_mem))
>> + malloc_printerr("malloc(): invalid next size (unsorted)");
>> + if (__glibc_unlikely ((prev_size (next) & ~(SIZE_BITS)) !=
>> size))
>> + malloc_printerr("malloc(): mismatching next->prev_size
>> (unsorted)");
>
>
> I think this check is redundant because prev_size (next) and chunksize
> (victim) are loaded from the same memory location.
I'm fairly certain that it compares mchunk_size of victim against
mchunk_prev_size of the next chunk, i.e. the size of victim in its
header and footer.
>> + if (__glibc_unlikely (bck->fd != victim)
>> + || __glibc_unlikely (victim->fd != unsorted_chunks (av)))
>> + malloc_printerr("malloc(): unsorted double linked list
>> corrupted");
>> + if (__glibc_unlikely (prev_inuse(next)))
>> + malloc_printerr("malloc(): invalid next->prev_inuse
>> (unsorted)");
>
>
> There's a missing space after malloc_printerr.
Noted.
> Why do you keep using chunksize_nomask? We never investigated why the
> original code uses it. It may have been an accident.
You are right, I don't think it makes a difference in these checks. So
the size local can be reused for the checks against victim. For next,
leaving it as such avoids the masking operation.
> Again, for non-main arenas, the checks against av->system_mem could be made
> tighter (against the heap size). Maybe you could put the condition into a
> separate inline function?
We could also do a chunk boundary check similar to what I proposed in
the thread for the first patch in the series to be even more strict.
I'll gladly try to implement either but believe that refining these
checks would bring less benefits than in the case of the top chunk.
Intra-arena or intra-heap overlaps would still be doable here with
unsorted chunks and I don't see any way to counter that besides more
generic measures like randomizing allocations and your metadata
encoding patches.
I've attached a revised version with the above comments incorporated
but without the refined checks.
Thanks,
Istvan
From a12d5d40fd7aed5fa10fc444dcb819947b72b315 Mon Sep 17 00:00:00 2001
From: Istvan Kurucsai <pistukem@gmail.com>
Date: Tue, 16 Jan 2018 14:48:16 +0100
Subject: [PATCH v2 1/1] malloc: Additional checks for unsorted bin integrity
I.
Ensure the following properties of chunks encountered during binning:
- victim chunk has reasonable size
- next chunk has reasonable size
- next->prev_size == victim->size
- valid double linked list
- PREV_INUSE of next chunk is unset
* malloc/malloc.c (_int_malloc): Additional binning code checks.
The House of Force is a well-known technique to exploit heap
overflow. In essence, this exploit takes three steps:
1. Overwrite the size of top chunk with very large value (e.g. -1).
2. Request x bytes from top chunk. As the size of top chunk
is corrupted, x can be arbitrarily large and top chunk will
still be offset by x.
3. The next allocation from top chunk will thus be controllable.
If we verify the size of top chunk at step 2, we can stop such attack.
This patch mechanically removes all remaining uses, and the
definitions, of the following libio name aliases:
name replaced with
---- -------------
_IO_FILE FILE
_IO_fpos_t __fpos_t
_IO_fpos64_t __fpos64_t
_IO_size_t size_t
_IO_ssize_t ssize_t or __ssize_t
_IO_off_t off_t
_IO_off64_t off64_t
_IO_pid_t pid_t
_IO_uid_t uid_t
_IO_wint_t wint_t
_IO_va_list va_list or __gnuc_va_list
_IO_BUFSIZ BUFSIZ
_IO_cookie_io_functions_t cookie_io_functions_t
__io_read_fn cookie_read_function_t
__io_write_fn cookie_write_function_t
__io_seek_fn cookie_seek_function_t
__io_close_fn cookie_close_function_t
I used __fpos_t and __fpos64_t instead of fpos_t and fpos64_t because
the definitions of fpos_t and fpos64_t depend on the largefile mode.
I used __ssize_t and __gnuc_va_list in a handful of headers where
namespace cleanliness might be relevant even though they're
internal-use-only. In all other cases, I used the public-namespace
name.
There are a tiny handful of places where I left a use of 'struct _IO_FILE'
alone, because it was being used together with 'struct _IO_FILE_plus'
or 'struct _IO_FILE_complete' in the same arithmetic expression.
Because this patch was almost entirely done with search and replace, I
may have introduced indentation botches. I did proofread the diff,
but I may have missed something.
The ChangeLog below calls out all of the places where this was not a
pure search-and-replace change.
Installed stripped libraries and executables are unchanged by this patch,
except that some assertions in vfscanf.c change line numbers.
* libio/libio.h (_IO_FILE): Delete; all uses changed to FILE.
(_IO_fpos_t): Delete; all uses changed to __fpos_t.
(_IO_fpos64_t): Delete; all uses changed to __fpos64_t.
(_IO_size_t): Delete; all uses changed to size_t.
(_IO_ssize_t): Delete; all uses changed to ssize_t or __ssize_t.
(_IO_off_t): Delete; all uses changed to off_t.
(_IO_off64_t): Delete; all uses changed to off64_t.
(_IO_pid_t): Delete; all uses changed to pid_t.
(_IO_uid_t): Delete; all uses changed to uid_t.
(_IO_wint_t): Delete; all uses changed to wint_t.
(_IO_va_list): Delete; all uses changed to va_list or __gnuc_va_list.
(_IO_BUFSIZ): Delete; all uses changed to BUFSIZ.
(_IO_cookie_io_functions_t): Delete; all uses changed to
cookie_io_functions_t.
(__io_read_fn): Delete; all uses changed to cookie_read_function_t.
(__io_write_fn): Delete; all uses changed to cookie_write_function_t.
(__io_seek_fn): Delete; all uses changed to cookie_seek_function_t.
(__io_close_fn): Delete: all uses changed to cookie_close_function_t.
* libio/iofopncook.c: Remove unnecessary forward declarations.
* libio/iolibio.h: Correct outdated commentary.
* malloc/malloc.c (__malloc_stats): Remove unnecessary casts.
* stdio-common/fxprintf.c (__fxprintf_nocancel):
Remove unnecessary casts.
* stdio-common/getline.c: Use _IO_getdelim directly.
Don't redefine ssize_t.
* stdio-common/printf_fp.c, stdio_common/printf_fphex.c
* stdio-common/printf_size.c: Don't redefine size_t or FILE.
Remove outdated comments.
* stdio-common/vfscanf.c: Don't redefine va_list.
malloc_stats means to disable cancellation for writes to stderr while
it runs, but it restores stderr->_flags2 with |= instead of =, so what
it actually does is disable cancellation on stderr permanently.
[BZ #22830]
* malloc/malloc.c (__malloc_stats): Restore stderr->_flags2
correctly.
* malloc/tst-malloc-stats-cancellation.c: New test case.
* malloc/Makefile: Add new test case.
This avoids assert definition conflicts if some of the headers used by
malloc.c happens to include assert.h. Malloc still needs a malloc-avoiding
implementation, which we get by redirecting __assert_fail to malloc's
__malloc_assert.
* malloc/malloc.c: Include <assert.h>.
(assert): Do not define.
[!defined NDEBUG] (__assert_fail): Define to __malloc_assert.
When posix_memalign is called with an alignment less than MALLOC_ALIGNMENT
and a requested size close to SIZE_MAX, it falls back to malloc code
(because the alignment of a block returned by malloc is sufficient to
satisfy the call). In this case, an integer overflow in _int_malloc leads
to posix_memalign incorrectly returning successfully.
Upon fixing this and writing a somewhat thorough regression test, it was
discovered that when posix_memalign is called with an alignment larger than
MALLOC_ALIGNMENT (so it uses _int_memalign instead) and a requested size
close to SIZE_MAX, a different integer overflow in _int_memalign leads to
posix_memalign incorrectly returning successfully.
Both integer overflows affect other memory allocation functions that use
_int_malloc (one affected malloc in x86) or _int_memalign as well.
This commit fixes both integer overflows. In addition to this, it adds a
regression test to guard against false successful allocations by the
following memory allocation functions when called with too-large allocation
sizes and, where relevant, various valid alignments:
malloc, realloc, calloc, reallocarray, memalign, posix_memalign,
aligned_alloc, valloc, and pvalloc.
When the per-thread cache is enabled, __libc_malloc uses request2size (which
does not perform an overflow check) to calculate the chunk size from the
requested allocation size. This leads to an integer overflow causing malloc
to incorrectly return the last successfully allocated block when called with
a very large size argument (close to SIZE_MAX).
This commit uses checked_request2size instead, removing the overflow.
It does not make sense to register separate cleanup functions for arena
and tcache since they're always going to be called together. Call the
tcache cleanup function from within arena_thread_freeres since it at
least makes the order of those cleanups clear in the code.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
This commit adds a "subheaps" field to the malloc_info output that
shows the number of heaps that were allocated to extend a non-main
arena.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
This patch adds a single-threaded fast path to malloc, realloc,
calloc and memalloc. When we're single-threaded, we can bypass
arena_get (which always locks the arena it returns) and just use
the main arena. Also avoid retrying a different arena since
there is just the main arena.
* malloc/malloc.c (__libc_malloc): Add SINGLE_THREAD_P path.
(__libc_realloc): Likewise.
(_mid_memalign): Likewise.
(__libc_calloc): Likewise.
This patch adds single-threaded fast paths to _int_free.
Bypass the explicit locking for larger allocations.
* malloc/malloc.c (_int_free): Add SINGLE_THREAD_P fast paths.
This patch fixes a deadlock in the fastbin consistency check.
If we fail the fast check due to concurrent modifications to
the next chunk or system_mem, we should not lock if we already
have the arena lock. Simplify the check to make it obviously
correct.
* malloc/malloc.c (_int_free): Fix deadlock bug in consistency check.
The current malloc initialization is quite convoluted. Instead of
sometimes calling malloc_consolidate from ptmalloc_init, call
malloc_init_state early so that the main_arena is always initialized.
The special initialization can now be removed from malloc_consolidate.
This also fixes BZ #22159.
Check all calls to malloc_consolidate and remove calls that are
redundant initialization after ptmalloc_init, like in int_mallinfo
and __libc_mallopt (but keep the latter as consolidation is required for
set_max_fast). Update comments to improve clarity.
Remove impossible initialization check from _int_malloc, fix assert
in do_check_malloc_state to ensure arena->top != 0. Fix the obvious bugs
in do_check_free_chunk and do_check_remalloced_chunk to enable single
threaded malloc debugging (do_check_malloc_state is not thread safe!).
[BZ #22159]
* malloc/arena.c (ptmalloc_init): Call malloc_init_state.
* malloc/malloc.c (do_check_free_chunk): Fix build bug.
(do_check_remalloced_chunk): Fix build bug.
(do_check_malloc_state): Add assert that checks arena->top.
(malloc_consolidate): Remove initialization.
(int_mallinfo): Remove call to malloc_consolidate.
(__libc_mallopt): Clarify why malloc_consolidate is needed.
Currently free typically uses 2 atomic operations per call. The have_fastchunks
flag indicates whether there are recently freed blocks in the fastbins. This
is purely an optimization to avoid calling malloc_consolidate too often and
avoiding the overhead of walking all fast bins even if all are empty during a
sequence of allocations. However using catomic_or to update the flag is
completely unnecessary since it can be changed into a simple boolean and
accessed using relaxed atomics. There is no change in multi-threaded behaviour
given the flag is already approximate (it may be set when there are no blocks in
any fast bins, or it may be clear when there are free blocks that could be
consolidated).
Performance of malloc/free improves by 27% on a simple benchmark on AArch64
(both single and multithreaded). The number of load/store exclusive instructions
is reduced by 33%. Bench-malloc-thread speeds up by ~3% in all cases.
* malloc/malloc.c (FASTCHUNKS_BIT): Remove.
(have_fastchunks): Remove.
(clear_fastchunks): Remove.
(set_fastchunks): Remove.
(malloc_state): Add have_fastchunks.
(malloc_init_state): Use have_fastchunks.
(do_check_malloc_state): Remove incorrect invariant checks.
(_int_malloc): Use have_fastchunks.
(_int_free): Likewise.
(malloc_consolidate): Likewise.
The functions tcache_get and tcache_put show up in profiles as they
are a critical part of the tcache code. Inline them to give tcache
a 16% performance gain. Since this improves multi-threaded cases
as well, it helps offset any potential performance loss due to adding
single-threaded fast paths.
* malloc/malloc.c (tcache_put): Inline.
(tcache_get): Inline.
The malloc tcache added in 2.26 will leak all of the elements remaining
in the cache and the cache structure itself when a thread exits. The
defect is that we do not set tcache_shutting_down early enough, and the
thread simply recreates the tcache and places the elements back onto a
new tcache which is subsequently lost as the thread exits (unfreed
memory). The fix is relatively simple, move the setting of
tcache_shutting_down earlier in tcache_thread_freeres. We add a test
case which uses mallinfo and some heuristics to look for unaccounted for
memory usage between the start and end of a thread start/join loop. It
is very reliable at detecting that there is a leak given the number of
iterations. Without the fix the test will consume 122MiB of leaked
memory.
Clean up calls to malloc_printerr and trim its argument list.
This also removes a few bits of work done before calling
malloc_printerr (such as unlocking operations).
The tunable/environment variable still enables the lightweight
additional malloc checking, but mallopt (M_CHECK_ACTION)
no longer has any effect.
__stack_chk_fail is called on corrupted stack. Stack backtrace is very
unreliable against corrupted stack. __libc_message is changed to accept
enum __libc_message_action and call BEFORE_ABORT only if action includes
do_backtrace. __fortify_fail_abort is added to avoid backtrace from
__stack_chk_fail.
[BZ #12189]
* debug/Makefile (CFLAGS-tst-ssp-1.c): New.
(tests): Add tst-ssp-1 if -fstack-protector works.
* debug/fortify_fail.c: Include <stdbool.h>.
(_fortify_fail_abort): New function.
(__fortify_fail): Call _fortify_fail_abort.
(__fortify_fail_abort): Add a hidden definition.
* debug/stack_chk_fail.c: Include <stdbool.h>.
(__stack_chk_fail): Call __fortify_fail_abort, instead of
__fortify_fail.
* debug/tst-ssp-1.c: New file.
* include/stdio.h (__libc_message_action): New enum.
(__libc_message): Replace int with enum __libc_message_action.
(__fortify_fail_abort): New hidden prototype.
* malloc/malloc.c (malloc_printerr): Update __libc_message calls.
* sysdeps/posix/libc_fatal.c (__libc_message): Replace int
with enum __libc_message_action. Call BEFORE_ABORT only if
action includes do_backtrace.
(__libc_fatal): Update __libc_message call.
MMap'd memory isn't shrunk without MREMAP, but IIRC this is intentional for
performance reasons. Regardless, this patch tweaks the existing comment to
be more accurate wrt the existing code.
[BZ #21411]
* malloc/malloc.c: Tweak realloc/MREMAP comment to be more accurate.
Fixes a typo introduced in commit
be7991c070. This caused
mallopt(M_ARENA_MAX) as well as the environment variable
MALLOC_ARENA_MAX to not work as intended because it set the
wrong internal parameter.
[BZ #21338]
* malloc/malloc.c: Call do_set_arena_max for M_ARENA_MAX
instead of incorrect do_set_arena_test
Additional check for chunk_size == next->prev->chunk_size in unlink()
2017-03-17 Chris Evans <scarybeasts@gmail.com>
* malloc/malloc.c (unlink): Add consistency check between size and
next->prev->size, to further harden against 1-byte overflows.
posix/wordexp-test.c used libc-internal.h for PTR_ALIGN_DOWN; similar
to what was done with libc-diag.h, I have split the definitions of
cast_to_integer, ALIGN_UP, ALIGN_DOWN, PTR_ALIGN_UP, and PTR_ALIGN_DOWN
to a new header, libc-pointer-arith.h.
It then occurred to me that the remaining declarations in libc-internal.h
are mostly to do with early initialization, and probably most of the
files including it, even in the core code, don't need it anymore. Indeed,
only 19 files actually need what remains of libc-internal.h. 23 others
need libc-diag.h instead, and 12 need libc-pointer-arith.h instead.
No file needs more than one of them, and 16 don't need any of them!
So, with this patch, libc-internal.h stops including libc-diag.h as
well as losing the pointer arithmetic macros, and all including files
are adjusted.
* include/libc-pointer-arith.h: New file. Define
cast_to_integer, ALIGN_UP, ALIGN_DOWN, PTR_ALIGN_UP, and
PTR_ALIGN_DOWN here.
* include/libc-internal.h: Definitions of above macros
moved from here. Don't include libc-diag.h anymore either.
* posix/wordexp-test.c: Include stdint.h and libc-pointer-arith.h.
Don't include libc-internal.h.
* debug/pcprofile.c, elf/dl-tunables.c, elf/soinit.c, io/openat.c
* io/openat64.c, misc/ptrace.c, nptl/pthread_clock_gettime.c
* nptl/pthread_clock_settime.c, nptl/pthread_cond_common.c
* string/strcoll_l.c, sysdeps/nacl/brk.c
* sysdeps/unix/clock_settime.c
* sysdeps/unix/sysv/linux/i386/get_clockfreq.c
* sysdeps/unix/sysv/linux/ia64/get_clockfreq.c
* sysdeps/unix/sysv/linux/powerpc/get_clockfreq.c
* sysdeps/unix/sysv/linux/sparc/sparc64/get_clockfreq.c:
Don't include libc-internal.h.
* elf/get-dynamic-info.h, iconv/loop.c
* iconvdata/iso-2022-cn-ext.c, locale/weight.h, locale/weightwc.h
* misc/reboot.c, nis/nis_table.c, nptl_db/thread_dbP.h
* nscd/connections.c, resolv/res_send.c, soft-fp/fmadf4.c
* soft-fp/fmasf4.c, soft-fp/fmatf4.c, stdio-common/vfscanf.c
* sysdeps/ieee754/dbl-64/e_lgamma_r.c
* sysdeps/ieee754/dbl-64/k_rem_pio2.c
* sysdeps/ieee754/flt-32/e_lgammaf_r.c
* sysdeps/ieee754/flt-32/k_rem_pio2f.c
* sysdeps/ieee754/ldbl-128/k_tanl.c
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c
* sysdeps/ieee754/ldbl-96/k_tanl.c, sysdeps/nptl/futex-internal.h:
Include libc-diag.h instead of libc-internal.h.
* elf/dl-load.c, elf/dl-reloc.c, locale/programs/locarchive.c
* nptl/nptl-init.c, string/strcspn.c, string/strspn.c
* malloc/malloc.c, sysdeps/i386/nptl/tls.h
* sysdeps/nacl/dl-map-segments.h, sysdeps/x86_64/atomic-machine.h
* sysdeps/unix/sysv/linux/spawni.c
* sysdeps/x86_64/nptl/tls.h:
Include libc-pointer-arith.h instead of libc-internal.h.
* elf/get-dynamic-info.h, sysdeps/nacl/dl-map-segments.h
* sysdeps/x86_64/atomic-machine.h:
Add multiple include guard.
Make mallopt helper functions for each mallopt parameter so that it
can be called consistently in other areas, like setting tunables.
* malloc/malloc.c (do_set_mallopt_check): New function.
(do_set_mmap_threshold): Likewise.
(do_set_mmaps_max): Likewise.
(do_set_top_pad): Likewise.
(do_set_perturb_byte): Likewise.
(do_set_trim_threshold): Likewise.
(do_set_arena_max): Likewise.
(do_set_arena_test): Likewise.
(__libc_mallopt): Use them.
After the removal of __malloc_initialize_hook, newly compiled
Emacs binaries are no longer able to use these interfaces.
malloc_get_state is only used during the Emacs build process,
so we provide a stub implementation only. Existing Emacs binaries
will not call this stub function, but still reference the symbol.
The rewritten tst-mallocstate test constructs a dumped heap
which should approximates what existing Emacs binaries pass
to glibc malloc.
The M_ARENA_MAX and M_ARENA_TEST macros are defined in malloc.c as
well as malloc.h, and the former is unnecessary. This patch removes
the duplicate. Tested on x86_64 to verify that the generated code
remains unchanged barring changed line numbers to __malloc_assert.
* malloc/malloc.c (M_ARENA_TEST, M_ARENA_MAX): Remove.
The M_ARENA_* mallopt parameters are in wide use in production to
control the number of arenas that a long lived process creates and
hence there is no point in stating that this interface is non-public.
Document this interface and remove the obsolete comment.
* manual/memory.texi (M_ARENA_TEST): Add documentation.
(M_ARENA_MAX): Likewise.
* malloc/malloc.c: Remove obsolete comment.
The dynamic linker currently uses __libc_memalign for TLS-related
allocations. The goal is to switch to malloc instead. If the minimal
malloc follows the ABI fundamental alignment, we can assume that malloc
provides this alignment, and thus skip explicit alignment in a few
cases as an optimization.
It was requested on libc-alpha that MALLOC_ALIGNMENT should be used,
although this results in wasted space if MALLOC_ALIGNMENT is larger
than the fundamental alignment. (The dynamic linker cannot assume
that the non-minimal malloc will provide an alignment of
MALLOC_ALIGNMENT; the ABI provides _Alignof (max_align_t) only.)
__malloc_initialize_hook is interposed by application code, so
the usual approach to define a compatibility symbol does not work.
This commit adds a new mechanism based on #pragma GCC poison in
<stdc-predef.h>.
For regular mmapped chunks there are two size fields (hence a reduction
by 2 * SIZE_SZ bytes), but for fake chunks, we only have one size field,
so we need to subtract SIZE_SZ bytes.
This was initially reported as Emacs bug 23726.
After the heap rewriting added in commit
4cf6c72fd2 (malloc: Rewrite dumped heap
for compatibility in __malloc_set_state), we can change malloc alignment
for new allocations because the alignment of old allocations no longer
matters.
We need to increase the malloc state version number, so that binaries
containing dumped heaps of the new layout will not try to run on
previous versions of glibc, resulting in obscure crashes.
This commit addresses a failure of tst-malloc-thread-fail on the
affected architectures (32-bit ppc and mips) because the test checks
pointer alignment.
This will allow us to change many aspects of the malloc implementation
while preserving compatibility with existing Emacs binaries.
As a result, existing Emacs binaries will have a larger RSS, and Emacs
needs a few more milliseconds to start. This overhead is specific
to Emacs (and will go away once Emacs switches to its internal malloc).
The new checks to make free and realloc compatible with the dumped heap
are confined to the mmap paths, which are already quite slow due to the
munmap overhead.
This commit weakens some security checks, but only for heap pointers
in the dumped main arena. By default, this area is empty, so those
checks are as effective as before.
The fork handler now runs so late that there is no risk anymore that
other fork handlers in the same thread use malloc, so it is no
longer necessary to install malloc hooks which made a subset
of malloc functionality available to the thread that called fork.
Previously, a thread M invoking fork would acquire locks in this order:
(M1) malloc arena locks (in the registered fork handler)
(M2) libio list lock
A thread F invoking flush (NULL) would acquire locks in this order:
(F1) libio list lock
(F2) individual _IO_FILE locks
A thread G running getdelim would use this order:
(G1) _IO_FILE lock
(G2) malloc arena lock
After executing (M1), (F1), (G1), none of the threads can make progress.
This commit changes the fork lock order to:
(M'1) libio list lock
(M'2) malloc arena locks
It explicitly encodes the lock order in the implementations of fork,
and does not rely on the registration order, thus avoiding the deadlock.
* malloc/arena.c (list_lock): Document lock ordering requirements.
(free_list_lock): New lock.
(ptmalloc_lock_all): Comment on free_list_lock.
(ptmalloc_unlock_all2): Reinitialize free_list_lock.
(detach_arena): Update comment. free_list_lock is now needed.
(_int_new_arena): Use free_list_lock around detach_arena call.
Acquire arena lock after list_lock. Add comment, including FIXME
about incorrect synchronization.
(get_free_list): Switch to free_list_lock.
(reused_arena): Acquire free_list_lock around detach_arena call
and attached threads counter update. Add two FIXMEs about
incorrect synchronization.
(arena_thread_freeres): Switch to free_list_lock.
* malloc/malloc.c (struct malloc_state): Update comments to
mention free_list_lock.
This mostly automatically-generated patch converts 113 function
definitions in glibc from old-style K&R to prototype-style. Following
my other recent such patches, this one deals with the case of function
definitions in files that either contain assertions or where grep
suggested they might contain assertions - and thus where it isn't
possible to use a simple object code comparison as a sanity check on
the correctness of the patch, because line numbers are changed.
A few such automatically-generated changes needed to be supplemented
by manual changes for the result to compile. openat64 had a prototype
declaration with "..." but an old-style definition in
sysdeps/unix/sysv/linux/dl-openat64.c, and "..." needed adding to the
generated prototype in the definition (I've filed
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68024> for diagnosing
such cases in GCC; the old state was undefined behavior not requiring
a diagnostic, but one seems a good idea). In addition, as Florian has
noted regparm attribute mismatches between declaration and definition
are only diagnosed for prototype definitions, and five functions
needed internal_function added to their definitions (in the case of
__pthread_mutex_cond_lock, via the macro definition of
__pthread_mutex_lock) to compile on i386.
After this patch is in, remaining old-style definitions are probably
most readily fixed manually before we can turn on
-Wold-style-definition for all builds.
Tested for x86_64 and x86 (testsuite).
* crypt/md5-crypt.c (__md5_crypt_r): Convert to prototype-style
function definition.
* crypt/sha256-crypt.c (__sha256_crypt_r): Likewise.
* crypt/sha512-crypt.c (__sha512_crypt_r): Likewise.
* debug/backtracesyms.c (__backtrace_symbols): Likewise.
* elf/dl-minimal.c (_itoa): Likewise.
* hurd/hurdmalloc.c (malloc): Likewise.
(free): Likewise.
(realloc): Likewise.
* inet/inet6_option.c (inet6_option_space): Likewise.
(inet6_option_init): Likewise.
(inet6_option_append): Likewise.
(inet6_option_alloc): Likewise.
(inet6_option_next): Likewise.
(inet6_option_find): Likewise.
* io/ftw.c (FTW_NAME): Likewise.
(NFTW_NAME): Likewise.
(NFTW_NEW_NAME): Likewise.
(NFTW_OLD_NAME): Likewise.
* libio/iofwide.c (_IO_fwide): Likewise.
* libio/strops.c (_IO_str_init_static_internal): Likewise.
(_IO_str_init_static): Likewise.
(_IO_str_init_readonly): Likewise.
(_IO_str_overflow): Likewise.
(_IO_str_underflow): Likewise.
(_IO_str_count): Likewise.
(_IO_str_seekoff): Likewise.
(_IO_str_pbackfail): Likewise.
(_IO_str_finish): Likewise.
* libio/wstrops.c (_IO_wstr_init_static): Likewise.
(_IO_wstr_overflow): Likewise.
(_IO_wstr_underflow): Likewise.
(_IO_wstr_count): Likewise.
(_IO_wstr_seekoff): Likewise.
(_IO_wstr_pbackfail): Likewise.
(_IO_wstr_finish): Likewise.
* locale/programs/localedef.c (normalize_codeset): Likewise.
* locale/programs/locarchive.c (add_locale_to_archive): Likewise.
(add_locales_to_archive): Likewise.
(delete_locales_from_archive): Likewise.
* malloc/malloc.c (__libc_mallinfo): Likewise.
* math/gen-auto-libm-tests.c (init_fp_formats): Likewise.
* misc/tsearch.c (__tfind): Likewise.
* nptl/pthread_attr_destroy.c (__pthread_attr_destroy): Likewise.
* nptl/pthread_attr_getdetachstate.c
(__pthread_attr_getdetachstate): Likewise.
* nptl/pthread_attr_getguardsize.c (pthread_attr_getguardsize):
Likewise.
* nptl/pthread_attr_getinheritsched.c
(__pthread_attr_getinheritsched): Likewise.
* nptl/pthread_attr_getschedparam.c
(__pthread_attr_getschedparam): Likewise.
* nptl/pthread_attr_getschedpolicy.c
(__pthread_attr_getschedpolicy): Likewise.
* nptl/pthread_attr_getscope.c (__pthread_attr_getscope):
Likewise.
* nptl/pthread_attr_getstack.c (__pthread_attr_getstack):
Likewise.
* nptl/pthread_attr_getstackaddr.c (__pthread_attr_getstackaddr):
Likewise.
* nptl/pthread_attr_getstacksize.c (__pthread_attr_getstacksize):
Likewise.
* nptl/pthread_attr_init.c (__pthread_attr_init_2_1): Likewise.
(__pthread_attr_init_2_0): Likewise.
* nptl/pthread_attr_setdetachstate.c
(__pthread_attr_setdetachstate): Likewise.
* nptl/pthread_attr_setguardsize.c (pthread_attr_setguardsize):
Likewise.
* nptl/pthread_attr_setinheritsched.c
(__pthread_attr_setinheritsched): Likewise.
* nptl/pthread_attr_setschedparam.c
(__pthread_attr_setschedparam): Likewise.
* nptl/pthread_attr_setschedpolicy.c
(__pthread_attr_setschedpolicy): Likewise.
* nptl/pthread_attr_setscope.c (__pthread_attr_setscope):
Likewise.
* nptl/pthread_attr_setstack.c (__pthread_attr_setstack):
Likewise.
* nptl/pthread_attr_setstackaddr.c (__pthread_attr_setstackaddr):
Likewise.
* nptl/pthread_attr_setstacksize.c (__pthread_attr_setstacksize):
Likewise.
* nptl/pthread_condattr_setclock.c (pthread_condattr_setclock):
Likewise.
* nptl/pthread_create.c (__find_in_stack_list): Likewise.
* nptl/pthread_getattr_np.c (pthread_getattr_np): Likewise.
* nptl/pthread_mutex_cond_lock.c (__pthread_mutex_lock): Define to
use internal_function.
* nptl/pthread_mutex_init.c (__pthread_mutex_init): Convert to
prototype-style function definition.
* nptl/pthread_mutex_lock.c (__pthread_mutex_lock): Likewise.
(__pthread_mutex_cond_lock_adjust): Likewise. Use
internal_function.
* nptl/pthread_mutex_timedlock.c (pthread_mutex_timedlock):
Convert to prototype-style function definition.
* nptl/pthread_mutex_trylock.c (__pthread_mutex_trylock):
Likewise.
* nptl/pthread_mutex_unlock.c (__pthread_mutex_unlock_usercnt):
Likewise.
(__pthread_mutex_unlock): Likewise.
* nptl_db/td_ta_clear_event.c (td_ta_clear_event): Likewise.
* nptl_db/td_ta_set_event.c (td_ta_set_event): Likewise.
* nptl_db/td_thr_clear_event.c (td_thr_clear_event): Likewise.
* nptl_db/td_thr_event_enable.c (td_thr_event_enable): Likewise.
* nptl_db/td_thr_set_event.c (td_thr_set_event): Likewise.
* nss/makedb.c (process_input): Likewise.
* posix/fnmatch.c (__strchrnul): Likewise.
(__wcschrnul): Likewise.
(fnmatch): Likewise.
* posix/fnmatch_loop.c (FCT): Likewise.
* posix/glob.c (globfree): Likewise.
(__glob_pattern_type): Likewise.
(__glob_pattern_p): Likewise.
* posix/regcomp.c (re_compile_pattern): Likewise.
(re_set_syntax): Likewise.
(re_compile_fastmap): Likewise.
(regcomp): Likewise.
(regerror): Likewise.
(regfree): Likewise.
* posix/regexec.c (regexec): Likewise.
(re_match): Likewise.
(re_search): Likewise.
(re_match_2): Likewise.
(re_search_2): Likewise.
(re_search_stub): Likewise. Use internal_function
(re_copy_regs): Likewise.
(re_set_registers): Convert to prototype-style function
definition.
(prune_impossible_nodes): Likewise. Use internal_function.
* resolv/inet_net_pton.c (inet_net_pton): Convert to
prototype-style function definition.
(inet_net_pton_ipv4): Likewise.
* stdlib/strtod_l.c (____STRTOF_INTERNAL): Likewise.
* sysdeps/pthread/aio_cancel.c (aio_cancel): Likewise.
* sysdeps/pthread/aio_suspend.c (aio_suspend): Likewise.
* sysdeps/pthread/timer_delete.c (timer_delete): Likewise.
* sysdeps/unix/sysv/linux/dl-openat64.c (openat64): Likewise.
Make variadic.
* time/strptime_l.c (localtime_r): Convert to prototype-style
function definition.
* wcsmbs/mbsnrtowcs.c (__mbsnrtowcs): Likewise.
* wcsmbs/mbsrtowcs_l.c (__mbsrtowcs_l): Likewise.
* wcsmbs/wcsnrtombs.c (__wcsnrtombs): Likewise.
* wcsmbs/wcsrtombs.c (__wcsrtombs): Likewise.
While doing code review I converted another bespoke round down, and
corrected a comment.
The comment spoke about keeping at least one page allocated even
during systrim, which is not correct. The code does nothing to keep
a page allocated. The code does attempt to keep PAD padding as
documented in comments and MINSIZE as required by design.
Historically in 2002 when Ulrich wrote the code (fa8d436c) the math
was inlined into one statement which did reserve an extra page:
extra = ((top_size - pad - MINSIZE + (pagesz-1)) / pagesz - 1) * pagesz;
There is no reason given for this extra page.
In 2010 Anton Branchard's change (b9b42ee0) from division
to shifts removed the extra page by dropping the "+ (pagesiz-1), which
mean we might have attempted to return -0 via MORECORE. The fix by Will
Newton in 2014 added a check for extra being zero (51a7380b).
From first principles I see no reason why we should keep an extra
page of memory from being trimmed back to the OS. The only sensible
interface is to honour PAD padding as the function is documented,
with the caveat the MINSIZE is maintained for the top chunk.
Given that we've been using this code for 5+ years with no extra
page allocated is sufficient evidence that the comment should be changed
to match the code that I'm touching.
Tested on x86_64 and i686, no regressions.
When the malloc subsystem detects some kind of memory corruption,
depending on the configuration it prints the error, a backtrace, a
memory map and then aborts the process. In this process, the
backtrace() call may result in a call to malloc, resulting in
various kinds of problematic behavior.
In one case, the malloc it calls may detect a corruption and call
backtrace again, and a stack overflow may result due to the infinite
recursion. In another case, the malloc it calls may deadlock on an
arena lock with the malloc (or free, realloc, etc.) that detected the
corruption. In yet another case, if the program is linked with
pthreads, backtrace may do a pthread_once initialization, which
deadlocks on itself.
In all these cases, the program exit is not as intended. This is
avoidable by marking the arena that malloc detected a corruption on,
as unusable. The following patch does that. Features of this patch
are as follows:
- A flag is added to the mstate struct of the arena to indicate if the
arena is corrupt.
- The flag is checked whenever malloc functions try to get a lock on
an arena. If the arena is unusable, a NULL is returned, causing the
malloc to use mmap or try the next arena.
- malloc_printerr sets the corrupt flag on the arena when it detects a
corruption
- free does not concern itself with the flag at all. It is not
important since the backtrace workflow does not need free. A free
in a parallel thread may cause another corruption, but that's not
new
- The flag check and set are not atomic and may race. This is fine
since we don't care about contention during the flag check. We want
to make sure that the malloc call in the backtrace does not trip on
itself and all that action happens in the same thread and not across
threads.
I verified that the test case does not show any regressions due to
this patch. I also ran the malloc benchmarks and found an
insignificant difference in timings (< 2%).
* malloc/Makefile (tests): New test case tst-malloc-backtrace.
* malloc/arena.c (arena_lock): Check if arena is corrupt.
(reused_arena): Find a non-corrupt arena.
(heap_trim): Pass arena to unlink.
* malloc/hooks.c (malloc_check_get_size): Pass arena to
malloc_printerr.
(top_check): Likewise.
(free_check): Likewise.
(realloc_check): Likewise.
* malloc/malloc.c (malloc_printerr): Add arena argument.
(unlink): Likewise.
(munmap_chunk): Adjust.
(ARENA_CORRUPTION_BIT): New macro.
(arena_is_corrupt): Likewise.
(set_arena_corrupt): Likewise.
(sysmalloc): Use mmap if there are no usable arenas.
(_int_malloc): Likewise.
(__libc_malloc): Don't fail if arena_get returns NULL.
(_mid_memalign): Likewise.
(__libc_calloc): Likewise.
(__libc_realloc): Adjust for additional argument to
malloc_printerr.
(_int_free): Likewise.
(malloc_consolidate): Likewise.
(_int_realloc): Likewise.
(_int_memalign): Don't touch corrupt arenas.
* malloc/tst-malloc-backtrace.c: New test case.
This seems to have been left behind as an artifact of some old changes
and can now be merged. Verified that the only generated code change
on x86_64 is that of line numbers in asserts, like so:
@@ -27253,7 +27253,7 @@ Disassembly of section .text:
416f09: 48 89 42 20 mov %rax,0x20(%rdx)
416f0d: e9 7e f6 ff ff jmpq 416590 <_int_free+0x230>
416f12: b9 3f 9f 4a 00 mov $0x4a9f3f,%ecx
- 416f17: ba d5 0f 00 00 mov $0xfd5,%edx
+ 416f17: ba d6 0f 00 00 mov $0xfd6,%edx
416f1c: be a8 9b 4a 00 mov $0x4a9ba8,%esi
416f21: bf 6a 9c 4a 00 mov $0x4a9c6a,%edi
416f26: e8 45 e8 ff ff callq 415770 <__malloc_assert>
We are replacing all of the bespoke alignment code with
ALIGN_UP, ALIGN_DOWN, PTR_ALIGN_UP, and PTR_ALIGN_DOWN.
This cleans up malloc/malloc.c, malloc/arena.c, and
elf/dl-reloc.c. It also makes all the code consistently
use pagesize, and powerof2 as required.
Code size is reduced with the removal of precomputed
pagemask, and use of pagesize instead. No measurable
difference in performance.
No regressions on x86_64.
malloc_info is defined in the same file as malloc and free, but is not
an ISO C function, so should be a weak symbol. This patch makes it
so.
Tested for x86_64 (testsuite, and that disassembly of installed shared
libraries is unchanged by the patch).
[BZ #17570]
* malloc/malloc.c (malloc_info): Rename to __malloc_info and
define as weak alias of __malloc_info.
Due to my bad review suggestion for the fix for BZ #15089 a check
was removed from systrim to prevent sbrk being called with a zero
argument. Add the check back to avoid this useless work.
ChangeLog:
2014-06-19 Will Newton <will.newton@linaro.org>
* malloc/malloc.c (systrim): If extra is zero then return
early.
The current malloc_info xml output only has information about
allocations on the heap. Display information about number of mappings
and total mmapped size to this to complete the picture.