About a decade ago, I accidentally wrote the GPLv3 license text on the
test case when the rest of glibc source is LGPL v2.1 or later. As
original author of the test (and there are no other legally
significant changes to the test) I propose to update the license text
to be consistent with the project.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Some math functions (such as __isnan*) are built into both libm and
libc because they are needed in libc. The symbol gets exported from
libc.so and not libm.so, because of which dynamic linking works fine;
the symbols are always resolved from libc.so and libm.so uses its
internal copy of the same function if needed.
When linking statically though, the libm variants get used throughout
because the symbols are exported in both archives and libm.a is
searched first.
This patch removes these duplicate objects from the libm.a archive so
that programs always link to libc in both, the static and dynamic
case. The difference this will cause is that libm uses of these
functions will start using the libc versions in the !SHARED case.
This is harmless at the moment because the objects are identical
except for their names.
Some of these duplicates could be removed from libm.so too, but I
avoided that in the interest of retaining an internal reference if at
all those functions get used within libm in future.
Reviewed-by: Paul Zimmermann <Paul.Zimmermann@inria.fr>
All of the isnan functions are in libc.so due to printf_fp, so move
__isnanf128 there too for consistency.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
UNREGISTER_ATFORK is now defined for all ports in register-atfork.h, so most
previous includes of fork.h actually only need register-atfork.h now, and
cxa_finalize.c does not need an ifdef UNREGISTER_ATFORK any more.
The nptl-specific fork generation counters can then go to pthreadP.h, and
fork.h be removed.
Checked on x86_64-linux-gnu and i686-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort
with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort. When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation. Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.
Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with
xtest
jz 1f
vzeroall
ret
1:
vzeroupper
ret
at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.
Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function
exit.
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM
abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
Update ifunc-memmove.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to
select the function optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW
and BMI2 since VZEROUPPER isn't needed at function exit.
For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP
is set.
1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered
by VZEROUPPER inside a transactionally executing RTM region.
2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
This patch adds workload traces for all double format functions where such
files are missing. For each function, a set of 1000 random values is
generated at random using SageMath, such that the output values are
meaningful (for example avoiding too large inputs for exp10 where the
output would be +Inf). More details about the generated values are
given at the beginning of each file.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
The tests are refactored to use a common skeleton that handles whether
the underlying filesystem supports 64 bit time, skips 64 bit time
tests when the TU only supports 32 bit, and also skip 64 bit time
tests larger than 32 unsigned int (y2106) if the system does not
support it (MIPSn64 on kernels without statx support).
Checked on x86_64-linux-gnu and i686-linux-gnu. I also checked
on a mips64el-linux-gnu with 4.1.4 and 5.10.0-4-5kc-malta kernel
to verify if the y2106 are indeed skipped.
MIPSn64 kernel ABI for legacy stat uses unsigned 32 bit for second
timestamp, which limits the maximum value to y2106. This patch
make mips64 use statx as for 32-bit architectures.
Thie __cp_stat64_t64_statx is open coded, its usage is solely on
fstatat64 and it avoid the need to redefine the name for mips64
(which will call __cp_stat64_statx since its does not use
__stat64_t64 internally).
If the minimum kernel supports statx there is no need to call the
fallback stat legacy syscalls.
The statx is also called on compat xstat syscall, but different
than the fstatat it calls no fallback and it is assumed to be
always present.
Checked on powerpc-linux-gnu (with and without --enable-kernel=4.11)
and on powerpc64-linux-gnu.
It makes fstatat use __NR_statx, which fix the s390 issue with
missing nanoxsecond support on compat stat syscalls (at least
on recent kernels) and limits the statx call to only one function
(which simplifies the __ASSUME_STATX support).
Checked on i686-linux-gnu and on powerpc-linux-gnu.
1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE.
2. Disable all features which depend on XSAVE:
a. If OSXSAVE is disabled by glibc tunables. Or
b. If both XSAVE and XSAVEC aren't usable.
The Linux version already target the current thread by using tgkill
along with getpid and gettid.
For arm, libpthread does not do a intra PLT since it will call the
raise from libc.
Checked on x86_64-linux-gnu.
The libc version is identical and built with same flags. The libc
version is set as the default version.
The libpthread compat symbol requires to mask it when building the
loader object otherwise ld might complain about a missing
versioned symbol (as for alpha).
Checked on x86_64-linux-gnu.
The libc version is identical and built with same flags. Both aarch64
and nios2 also requires to export __send and tt was done previously with
the HAVE_INTERNAL_SEND_SYMBOL (which forced the symbol creation).
All __send callers are internal to libc and the original issue that
required the symbol export was due a missing libc_hidden_def. So
a compat symbol is added for __send and the libc_hidden_def is
defined regardless.
Checked on x86_64-linux-gnu and i686-linux-gnu.
Instead of polling the stderr, create two pipes and fork to check
if child timeout as expected similar to tst-pselect.c. Also lower
the timeout value.
Checked on x86_64-linux-gnu.
This is a workaround (hack) for a gcc optimization issue (PR 99551).
Without this the generated code may evaluate the expression in the
cold path which causes performance regression for small allocations
in the memory tagging disabled (common) case.
Reviewed-by: DJ Delorie <dj@redhat.com>
The internal _mid_memalign already returns newly tagged memory.
(__libc_memalign and posix_memalign already relied on this, this
patch fixes the other call sites.)
Reviewed-by: DJ Delorie <dj@redhat.com>
The previous patch ensured that all chunk to mem computations use
chunk2rawmem, so now we can rename it to chunk2mem, and in the few
cases where the tag of mem is relevant chunk2mem_tag can be used.
Replaced tag_at (chunk2rawmem (x)) with chunk2mem_tag (x).
Renamed chunk2rawmem to chunk2mem.
Reviewed-by: DJ Delorie <dj@redhat.com>
The difference between chunk2mem and chunk2rawmem is that the latter
does not get the memory tag for the returned pointer. It turns out
chunk2rawmem almost always works:
The input of chunk2mem is a chunk pointer that is untagged so it can
access the chunk header. All memory that is not user allocated heap
memory is untagged, which in the current implementation means that it
has the 0 tag, but this patch does not rely on the tag value. The
patch relies on that chunk operations are either done on untagged
chunks or without doing memory access to the user owned part.
Internal interface contracts:
sysmalloc: Returns untagged memory.
_int_malloc: Returns untagged memory.
_int_free: Takes untagged memory.
_int_memalign: Returns untagged memory.
_int_realloc: Takes and returns tagged memory.
So only _int_realloc and functions outside this list need care.
Alignment checks do not need the right tag and tcache works with
untagged memory.
tag_at was kept in realloc after an mremap, which is not strictly
necessary, since the pointer is only used to retag the memory, but this
way the tag is guaranteed to be different from the old tag.
Reviewed-by: DJ Delorie <dj@redhat.com>
The comment explained why different tag is used after mremap, but
for that correctly tagged pointer should be passed to tag_new_usable.
Use chunk2mem to get the tag.
Reviewed-by: DJ Delorie <dj@redhat.com>
This is a pure refactoring change that does not affect behaviour.
The CHUNK_AVAILABLE_SIZE name was unclear, the memsize name tries to
follow the existing convention of mem denoting the allocation that is
handed out to the user, while chunk is its internally used container.
The user owned memory for a given chunk starts at chunk2mem(p) and
the size is memsize(p). It is not valid to use on dumped heap chunks.
Moved the definition next to other chunk and mem related macros.
Reviewed-by: DJ Delorie <dj@redhat.com>
This is a target hook for memory tagging, the original was a naive
implementation. Uses the same algorithm as __libc_mtag_tag_region,
but with instructions that also zero the memory. This was not
benchmarked on real cpu, but expected to be faster than the naive
implementation.
This is a target hook for memory tagging, the original was a naive
implementation. The optimized version relies on "dc gva" to tag 64
bytes at a time for large allocations and optimizes small cases without
adding too many branches. This was not benchmarked on real cpu, but
expected to be faster than the naive implementation.
This is a common operation when heap tagging is enabled, so inline the
instruction instead of using an extern call.
The .inst directive is used instead of the name of the instruction (or
acle intrinsics) because malloc.c is not compiled for armv8.5-a+memtag
architecture, runtime cpu support detection is used.
Prototypes are removed from the comments as they were not always
correct.
Use the runtime check where possible: it should not cause slow down in
the !USE_MTAG case since then mtag_enabled is constant false, but it
allows compiling the tagging logic so it's less likely to break or
diverge when developers only test the !USE_MTAG case.
Reviewed-by: DJ Delorie <dj@redhat.com>
The branches may be better optimized since mtag_enabled is widely used.
Granule size larger than a chunk header is not supported since then we
cannot have both the chunk header and user area granule aligned. To
fix that for targets with large granule, the chunk layout has to change.
So code that attempted to handle the granule mask generally was changed.
This simplified CHUNK_AVAILABLE_SIZE and the logic in malloc_usable_size.
Reviewed-by: DJ Delorie <dj@redhat.com>