In _dl_tlsdesc_dynamic, there are three 'addi.d sp, sp, -size'
instructions to allocate stack size for Float/LSX/LASX registers.
Every 'addi.d sp, sp, -size' needs a cfi_adjust_cfa_offset because
of sp is used to compute CFA. But only one 'addi.d sp, sp, -size'
will be run according to HWCAP value. And all cfi_adjust_cfa_offset
will be executed in stack unwinding, it result in incorrect CFA.
Change _dl_tlsdesc_dynamic to _dl_tlsdesc_dynamic,
_dl_tlsdesc_dynamic_lsx and _dl_tlsdesc_dynamic_lasx.
Conflicting cfi instructions can be distributed to the three functions.
And cfi instructions can correspond to stack down instructions.
The original commit enabling non-temporal memset on Skylake Server had
erroneous benchmarks (actually done on ICX).
Further benchmarks indicate non-temporal stores may in fact by a
regression on Skylake Server.
This commit may be over-cautious in some cases, but should avoid any
regressions for 2.40.
Tested using qemu on all x86_64 cpu arch supported by both qemu +
GLIBC.
Reviewed-by: DJ Delorie <dj@redhat.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
We use thread_get_name and thread_set_name to get and set the thread
name, so nothing is stored in the thread structure since these functions
are supposed to be called sparingly.
One notable difference with Linux is that the thread name is up to 32
chars, whereas Linux's is 16.
Also added a mach_RPC_CHECK to check for the existing of gnumach RPCs.
save_data stores the start of the original message to be retried,
overwritten by the EINTR reply. In 64b builds the overwrite is however
rounded up to the 64b pointer size, so we have to save more than just
the 32b err.
Thanks a lot to Luca Dariz for the investigation!
Fix an issue with commit 2af4e3e566 ("Test of semaphores.") by making
the tst-sem11 and tst-sem12 tests use the test driver, preventing them
from ever causing testing to hang forever and never complete, such as
currently happening with the 'mips-linux-gnu' (o32 ABI) target. Adjust
the name of the PREPARE macro, which clashes with the interpretation of
its presence by the test driver, by using a TF_ prefix in reference to
the name of the 'tf' function.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add a copyright notice to the tst-sem11 and tst-sem12 tests, observing
that they have been originally contributed back in 2007, with commit
2af4e3e566 ("Test of semaphores.").
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The z13/vector-optimized wcsncmp implementation segfaults if n=1
and there is only one character (equal on both strings) before
the page end. Then it loads and compares one character and misses
to check n again. The following load fails.
This patch removes the extra load and compare of the first character
and just start with the loop which uses vector-load-to-block-boundary.
This code-path also checks n.
With this patch both tests are passing:
- the simplified one mentioned in the bugzilla 31934
- the full one in Florian Weimer's patch:
"manual: Document a GNU extension for strncmp/wcsncmp"
(https://patchwork.sourceware.org/project/glibc/patch/874j9eml6y.fsf@oldenburg.str.redhat.com/):
On s390x-linux-gnu (z16), the new wcsncmp test fails due to bug 31934.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The __rseq_size value is now the active area of struct rseq
(so 20 initially), not the full struct size including padding
at the end (32 initially).
Update misc/tst-rseq to print some additional diagnostics.
Reviewed-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
The purpose of this patch is to add some system calls that (1) aren't
otherwise documented, and (2) are merely redirected to the kernel, so
can refer to their documentation; and define a standard way of doing
so in the future. A more detailed explaination of how system calls
are wrapped is added along with reference to the Linux Man-Pages
project.
Default version of man-pages is in configure.ac but can be overridden
by --with-man-pages=X.Y
Reviewed-by: Alejandro Colomar <alx@kernel.org>
ldconfig already ignores files with the -gdb.py suffix, but GDB also
looks for -gdb.gdb and -gdb.scm files. These aren't as widely used, but
libguile at least comes with a -gdb.scm file.
Rename is_gdb_python_file to is_gdb_extension_file, and make it
recognise all three types of GDB extension.
Signed-off-by: Adam Sampson <ats@offog.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
is_gdb_python_file is doing a similar test, so it can use this helper
function as well.
Signed-off-by: Adam Sampson <ats@offog.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This hasn't been looked at for a loong time (already guessing from
the number of missing entries), and it ain't pretty.
There are some 9-ulps results for float.
- ZaZaZebra (qemu-system-m68k clone of PowerBook 190 system)
- GCC 13.3.1 20240614 (Gentoo 13.3.1_p20240614 p17)
- ld GNU ld (Gentoo 2.42 p6) 2.42.0
- Linux ZaZaZebra 4.19.0-5-m68k #1 Gentoo 4.19.37-5 (2019-06-19) m68k 68040 68040 GNU/Linux
- manual build
- ../glibc/configure --enable-fortify-source --prefix=/usr
- Tested by Immolo (via Andreas K. Hüttel)
Signed-off-by: Andreas K. Hüttel <dilfridge@gentoo.org>
The __getrandom_nocancel used by __arc4random_buf uses
INLINE_SYSCALL_CALL (which returns -1/errno) and the loop checks for
the return value instead of errno to fallback to /dev/urandom.
The malloc code now uses __getrandom_nocancel_nostatus, which uses
INTERNAL_SYSCALL_CALL, so there is no need to use the variant that does
not set errno (BZ#29624).
Checked on x86_64-linux-gnu.
Reviewed-by: Xi Ruoyao <xry111@xry111.site>
While working on a patch to add support for the extensible rseq ABI, we
came across an issue where a new 'const' variable would be merged with
the existing '__rseq_size' variable. We tracked this to the use of
'-fmerge-all-constants' which allows the compiler to merge identical
constant variables. This means that all 'const' variables in a compile
unit that are of the same size and are initialized to the same value can
be merged.
In this specific case, on 32 bit systems 'unsigned int' and 'ptrdiff_t'
are both 4 bytes and initialized to 0 which should trigger the merge.
However for reasons we haven't delved into when the attribute 'section
(".data.rel.ro")' is added to the mix, only variables of the same exact
types are merged. As far as we know this behavior is not specified
anywhere and could change with a new compiler version, hence this patch.
Move the definitions of these variables into an assembler file and add
hidden writable aliases for internal use. This has the added bonus of
removing the asm workaround to set the values on rseq registration.
Tested on Debian 12 with GCC 12.2.
Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This new section in the manual provides recommendations for
use of glibc in environments with higher integrity requirements.
It's reflecting both current implementation shortcomings, and
challenges we inherit from ELF and psABI requirements.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Starting with commit
59974938fe
elf/rtld: Count skipped environment variables for enable_secure
The new testcase elf/tst-tunables-enable_secure-env segfaults on s390 (31bit).
There _start parses the auxiliary vector for some additional checks.
Therefore it skips over the zeros after the environment variables ...
0x7fffac20: 0x7fffbd17 0x7fffbd32 0x7fffbd69 0x00000000
------------------------------------------------^^^last environment variable
... and then it parses the auxiliary vector and stops at AT_NULL.
0x7fffac30: 0x00000000 0x00000021 0x00000000 0x00000000
--------------------------------^^^AT_SYSINFO_EHDR--------------^^^AT_NULL
----------------^^^newp-----------------------------------------^^^oldp
Afterwards it tries to access AT_PHDR which points to somewhere and segfaults.
Due to not incorporating the skip_env variable in the computation of oldp
when shuffling down the auxv in rtld.c, it just copies one entry with AT_NULL
and value 0x00000021 and stops the loop. In reality we have skipped
GLIBC_TUNABLES environment variable (=> skip_env=1). Thus we should copy from
here:
0x7fffac40: 0x00000021 0x7ffff000 0x00000010 0x007fffff
----------------^^^fixed-oldp
This patch fixes the computation of oldp when shuffling down auxiliary vector.
It also adds some checks in the testcase. Those checks also fail on
s390x (64bit) and x86_64 without the fix.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Adhemerval noticed that the gettimeofday() and 32-bit clock_gettime()
vDSO calls won't be used by glibc on hppa, so there is no need to
declare them. Both syscalls will be emulated by utilizing return values
of the 64-bit clock_gettime() vDSO instead.
Signed-off-by: Helge Deller <deller@gmx.de>
Suggested-by: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
The clang open fortify wrapper from 4228baef1a added
a restriction where open with 3 arguments where flags do not
contain O_CREAT or O_TMPFILE are handled as invalid. They are
not invalid, since the third argument is ignored, and the gcc
wrapper also allows it.
Checked x86_64-linux-gnu and with a yocto build for some affected
packages.
Tested-by: “Khem Raj <raj.khen@gmail.com>”
By default, if the C++ toolchain lacks support for static linking,
configure fails to find the C++ header files and the glibc build fails.
The --disable-static-c++-link-check option allows the glibc build to
finish, but static C++ tests will fail if the C++ toolchain doesn't
have the necessary static C++ libraries which may not be easily installed.
Add --disable-static-c++-tests option to skip the static C++ link check
and tests. This fixes BZ #31797.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
The current minimum GCC version of glibc build is GCC 6.2 or newer. But
building i686 glibc with GCC 6.4 on Fedora 40 failed since the C++ header
files couldn't be found which was caused by the static C++ link check
failure due to missing __divmoddi4 which was referenced in i686 libc.a
and added to GCC 7. Add --disable-static-c++-link-check configure option
to disable the static C++ link test. The newly built i686 libc.a can be
used by GCC 6.4 to create static C++ tests. This fixes BZ #31412.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Extend the list of MAP_* macros to include all macros available
to the average program (gcc -E -dM | grep MAP_*)
Extend the list of errno codes.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
MIPSr6 has MADDF.s/MADDF.d instructions, which are fused.
In MIPS ISA, double support can be subsetted. Only FMAF is enabled
for this case.
* sysdeps/mips/fpu/math-use-builtins-fma.h
Signed-off-by: YunQiang Su <syq@gcc.gnu.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It turns out that quite a few applications use bundled mallocs that
have been built to use global-dynamic TLS (instead of the recommended
initial-exec TLS). The previous workaround from
commit afe42e935b ("elf: Avoid some
free (NULL) calls in _dl_update_slotinfo") does not fix all
encountered cases unfortunatelly.
This change avoids the TLS generation update for recursive use
of TLS from a malloc that was called during a TLS update. This
is possible because an interposed malloc has a fixed module ID and
TLS slot. (It cannot be unloaded.) If an initially-loaded module ID
is encountered in __tls_get_addr and the dynamic linker is already
in the middle of a TLS update, use the outdated DTV, thus avoiding
another call into malloc. It's still necessary to update the
DTV to the most recent generation, to get out of the slow path,
which is why the check for recursion is needed.
The bookkeeping is done using a global counter instead of per-thread
flag because TLS access in the dynamic linker is tricky.
All this will go away once the dynamic linker stops using malloc
for TLS, likely as part of a change that pre-allocates all TLS
during pthread_create/dlopen.
Fixes commit d2123d6827 ("elf: Fix slow
tls access after dlopen [BZ #19924]").
Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
The conditionals for several mtrace-based tests in catgets, elf, libio,
malloc, misc, nptl, posix, and stdio-common were incorrect leading to
test failures when bootstrapping glibc without perl.
The correct conditional for mtrace-based tests requires three checks:
first checking for run-built-tests, then build-shared, and lastly that
PERL is not equal to "no" (missing perl).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This will make it easier to add tests later; see also the test section
in malloc/Makefile that inspires this.
Signed-off-by: Michel Lind <salimma@fedoraproject.org>
Suggested-by: Arjun Shankar <arjun@redhat.com>
Reviewed-by: Arjun Shankar <arjun@redhat.com>
Current 'non_temporal_threshold' set to 'non_temporal_threshold_lowbound'
on Zhaoxin processors without ERMS. The default
'non_temporal_threshold_lowbound' is too small for the KH-40000 and KX-7000
Zhaoxin processors, this patch updates the value to
'shared / cachesize_non_temporal_divisor'.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
This patch optimizes large size copy using normal store when src > dst
and overlap. Make it the same as the logic in memmove-vec-unaligned-erms.S.
Current memmove-ssse3 use '__x86_shared_cache_size_half' as the non-
temporal threshold, this patch updates that value to
'__x86_shared_non_temporal_threshold'. Currently, the
__x86_shared_non_temporal_threshold is cpu-specific, and different CPUs
will have different values based on the related nt-benchmark results.
However, in memmove-ssse3, the nontemporal threshold uses
'__x86_shared_cache_size_half', which sounds unreasonable.
The performance is not changed drastically although shows overall
improvements without any major regressions or gains.
Results on Zhaoxin KX-7000:
bench-memcpy geometric_mean(N=20) New / Original: 0.999
bench-memcpy-random geometric_mean(N=20) New / Original: 0.999
bench-memcpy-large geometric_mean(N=20) New / Original: 0.978
bench-memmove geometric_mean(N=20) New / Original: 1.000
bench-memmmove-large geometric_mean(N=20) New / Original: 0.962
Results on Intel Core i5-6600K:
bench-memcpy geometric_mean(N=20) New / Original: 1.001
bench-memcpy-random geometric_mean(N=20) New / Original: 0.999
bench-memcpy-large geometric_mean(N=20) New / Original: 1.001
bench-memmove geometric_mean(N=20) New / Original: 0.995
bench-memmmove-large geometric_mean(N=20) New / Original: 0.936
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Fix code formatting under the Zhaoxin branch and add comments for
different Zhaoxin models.
Unaligned AVX load are slower on KH-40000 and KX-7000, so disable
the AVX_Fast_Unaligned_Load.
Enable Prefer_No_VZEROUPPER and Fast_Unaligned_Load features to
use sse2_unaligned version of memset,strcpy and strcat.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Qualcom's new core, oryon-1, has a different characteristics for
memset than the current versions of memset. For non-zero, larger
sizes, using GPRs rather than the SIMD stores is ~30% faster.
For even larger sizes, using the nontemporal stores is needed
not to polute the L1/L2 caches.
For zero values, using `dc zva` should be used. Since we
know the size will always be 64 bytes, we don't need to figure
out the size there.
I started with the emag memset and added back the `dc zva` code.
Changes since v1:
* v3: Fix comment formating
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Qualcomm's new core (oryon-1) has a different performance characteristic
than other cores. For memcpy, it is faster to use the GPRs to
do the copy for large sizes (2x faster). For even larger sizes,
it is better to use the nontemporal load/store instructions so
we don't pollute the L1/L2 caches.
For smaller sizes, the characteristic are very similar to
other cores.
I used the thunderx memcpy as a starting point and expanded from there.
Changes since v1:
* v2: Fix ordering in Makefile.
* v3: Fix comment grammar about the ldnp/stnp instructions.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
The fcntl.h fortify wrapper for clang added by 86889e22db
missed the __fortify_clang_overload_arg and and also added the
mode argument for the __fortify_function_error_function function,
which leads clang to be able to correct resolve which overloaded
function it should emit.
Checked on x86_64-linux-gnu.
Reported-by: Khem Raj <raj.khem@gmail.com>
Tested-by: Khem Raj <raj.khem@gmail.com>
The mqueue.h fortify wrapper for clang added by c23107effb
is not fully correct, where correct 4 argument usage are not
being correctly handled. For instance, while building socat 1.8
with a yocto clang based system shows:
./socat-1.8.0.0/xio-posixmq.c:119:8: error: 'mq_open' is unavailable: mq_open can be called either with 2 or 4 arguments
119 | mqd = mq_open(name, oflag, opt_mode, NULL);
| ^
[...] /usr/include/bits/mqueue2.h:66:8: note: 'mq_open' has been explicitly marked unavailable here
66 | __NTH (mq_open (const char *__name, int __oflag, mode_t mode,
| ^
1 error generated.
The correct way to define the wrapper is to set invalid usage
with __fortify_clang_unavailable (for the case with 5 or more
arguments), followed by the expected ones. This fix make mq_open
similar to current open wrappers.
[1] http://www.dest-unreach.org/socat/
Reported-by: Khem Raj <raj.khem@gmail.com>
Acked-by: Khem Raj <raj.khem@gmail.com>
With gcc 14, I get this warning/werror when building the localedata tests:
tests-mbwc/tsp_common.c: In function ‘result.constprop.isra’:
tests-mbwc/tsp_common.c:55:43: error: ‘%s’ directive writing up to 92 bytes into a region of size between 0 and 114 [-Werror=format-overflow=]
55 | sprintf (result_rec, "%s:%s:%d:%d:%d:%c:%s\n", func, loc, rec_no, seq_no,
| ^~
In file included from ../include/bits/stdio2.h:1,
from ../libio/stdio.h:980,
from ../include/stdio.h:14,
from tests-mbwc/tsp_common.c:10:
In function ‘sprintf’,
inlined from ‘result.constprop.isra’ at tests-mbwc/tsp_common.c:55:3:
../libio/bits/stdio2.h:30:10: note: ‘__builtin___sprintf_chk’ output between 20 and 234 bytes into a destination of size 132
30 | return __builtin___sprintf_chk (__s, __USE_FORTIFY_LEVEL - 1,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
31 | __glibc_objsize (__s), __fmt,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
32 | __va_arg_pack ());
| ~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
This patch now gets rid of using sprintf and the result_rec buffer and just
prints to fp directly.
The test does not necessarily trigger the crash, depending on memcmp
behavior. A crash was observed in __memcmp_ia32 on i686 builds.
Reviewed-by: Paul Eggert <eggert@cs.ucla.edu>
* manual/string.texi: For strnlen (s, maxlen), do not say that s must
be of size maxlen, as it can be smaller if it is null-terminated.
This should help avoid confusion such as seen in
<https://lists.gnu.org/r/bug-gnulib/2024-06/msg00280.html>.
Mention that strnlen and wcsnlen have been in POSIX since
POSIX.1-2008.
This recently came up during a cleanup to remove misaligned accesses
from the RISC-V port.
Link: https://sourceware.org/pipermail/libc-alpha/2022-June/139961.html
Suggested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Fangrui Song <maskray@google.com>
asm volatile ("movfcsr2gr $t0, $fcsr0" ::: "$t0");
asm volatile ("st.d $t0, %0" :"=m"(restore_fcsr));
generate to the following instructions with -Og flag:
movfcsr2gr $t0, $zero
addi.d $t0, $sp, 2047(0x7ff)
addi.d $t0, $t0, 77(0x4d)
st.w $t0, $t0, 0
fcsr0 register and restore_fcsr variable are both stored in t0 register.
Change to:
asm volatile ("movfcsr2gr %0, $fcsr0" :"=r"(restore_fcsr));
to avoid restore_fcsr address in t0.
Comparing float value using memcmp because float value cannot be
directly compared for equality.
Put LOAD_REGISTER_FCSR and SAVE_REGISTER_FCC after LOAD_REGISTER_FLOAT.
Some float instructions may change fcsr register.
If the pidfd_spawn/pidfd_spawnp helper process succeeds, but evecve
fails for some reason (either with an invalid/non-existent, memory
allocation, etc.) the resulting pidfd is never closed, nor returned
to caller (so it can call close).
Since the process creation failed, it should be up to posix_spawn to
also, close the file descriptor in this case (similar to what it
does to reap the process).
This patch also changes the waitpid with waitid (P_PIDFD) for pidfd
case, to avoid a possible pid re-use.
Checked on x86_64-linux-gnu.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
The atomic_spin_nop() macro can be used to run arch-specific
code in the body of a spin loop to potentially improve efficiency.
RISC-V's Zihintpause extension includes a PAUSE instruction for
this use-case, which is encoded as a HINT, which means that it
behaves like a NOP on systems that don't implement Zihintpause.
Binutils supports Zihintpause since 2.36, so this patch uses
the ".insn" directive to keep the code compatible with older
toolchains.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>