The value of PR_TAGGED_ADDR_ENABLE was incorrect in the installed
headers and the prctl command macros were missing that are needed
for it to be useful (PR_SET_TAGGED_ADDR_CTRL). Linux headers have
the definitions since 5.4 so it's widely available, we don't need
to repeat these definitions. The remaining definitions are from
Linux 5.10.
To build glibc with --enable-memory-tagging, Linux 5.4 headers and
binutils 2.33.1 or newer is needed.
Reviewed-by: DJ Delorie <dj@redhat.com>
Since Linux 4.13, kernel limits the maximum command line arguments
length to 6 MiB [1]. Normally the limit is still quarter of the maximum
stack size but if that limit exceeds 6 MiB it's clamped down.
glibc's __sysconf implementation for Linux platform is not aware of
this limitation and for stack sizes of over 24 MiB it returns higher
ARG_MAX than Linux will actually accept. This can be verified by
executing the following application on Linux 4.13 or newer:
#include <stdio.h>
#include <string.h>
#include <sys/resource.h>
#include <sys/time.h>
#include <unistd.h>
int main(void) {
const struct rlimit rlim = { 40 * 1024 * 1024,
40 * 1024 * 1024 };
if (setrlimit(RLIMIT_STACK, &rlim) < 0) {
perror("setrlimit: RLIMIT_STACK");
return 1;
}
printf("ARG_MAX : %8ld\n", sysconf(_SC_ARG_MAX));
printf("63 * 100 KiB: %8ld\n", 63L * 100 * 1024);
printf("6 MiB : %8ld\n", 6L * 1024 * 1024);
char str[100 * 1024], *argv[64], *envp[1];
memset(&str, 'A', sizeof str);
str[sizeof str - 1] = '\0';
for (size_t i = 0; i < sizeof argv / sizeof *argv - 1; ++i) {
argv[i] = str;
}
argv[sizeof argv / sizeof *argv - 1] = envp[0] = 0;
execve("/bin/true", argv, envp);
perror("execve");
return 1;
}
On affected systems the program will report ARG_MAX as 10 MiB but
despite that executing /bin/true with a bit over 6 MiB of command line
arguments will fail with E2BIG error. Expected result is that ARG_MAX
is reported as 6 MiB.
Update the __sysconf function to clamp ARG_MAX value to 6 MiB if it
would otherwise exceed it. This resolves bug #25305 which was market
WONTFIX as suggested solution was to cap ARG_MAX at 128 KiB.
As an aside and point of comparison, bionic (a libc implementation for
Android systems) decided to resolve this issue by always returning 128
KiB ignoring any potential xargs regressions [2].
On older kernels this results in returning overly conservative value
but that's a safer option than being aggressive and returning invalid
value on recent systems. It's also worth noting that at this point
all supported Linux releases have the 6 MiB barrier so only someone
running an unsupported kernel version would get incorrectly truncated
result.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
[1] See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=da029c11e6b12f321f36dac8771e833b65cec962
[2] See baed51ee3a
The commit 2433d39b69, which added time64 support to select, changed
the function to use __NR_pselect6 (or __NR_pelect6_time64) on all
architectures. However, on architectures where the symbol was
implemented with __NR_select the kernel normalizes the passed timeout
instead of return EINVAL. For instance, the input timeval
{ 0, 5000000 } is interpreted as { 5, 0 }.
And as indicated by BZ #27651, this semantic seems to be expected
and changing it results in some performance issues (most likely
the program does not check the return code and keeps issuing
select with unormalized tv_usec argument).
To avoid a different semantic depending whether which syscall the
architecture used to issue, select now always normalize the timeout
input. This is a slight change for some ABIs (for instance aarch64).
Checked on x86_64-linux-gnu and i686-linux-gnu.
An incorrect check in __longjmp_chk could fail on valid code causing
FAIL: debug/tst-longjmp_chk2
The original check was
altstack_sp + altstack_size - setjmp_sp > altstack_size
i.e. sp at setjmp was outside of the altstack range. Here we know that
longjmp is called from a signal handler on the altstack (SS_ONSTACK),
and that it jumps in the wrong direction (sp decreases), so the check
wants to ensure the jump goes to another stack.
The check is wrong when altstack_sp == setjmp_sp which can happen
when the altstack is a local buffer in the function that calls setjmp,
so the patch allows == too. This fixes bug 27709.
Note that the generic __longjmp_chk check seems to be different.
(it checks if longjmp was on the altstack but does not check setjmp,
so it would not catch incorrect longjmp use within the signal handler).
With this patch, the maximal known error for tgamma is now reduced to 9 ulps
for dbl-64, for all rounding modes. Since exhaustive testing is not possible
for dbl-64, it might be that there are still cases with an error larger than
9 ulps, but all known cases are fixed (intensive tests were done to find cases
with large errors).
Tested on x86_64 and powerpc (and by Adhemerval Zanella on aarch64, arm,
s390x, sparc, and i686).
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
DL_UNMAP_IS_SPECIAL and DL_UNMAP were not defined. The definitions are
now copied from arm, since the same is needed on aarch64. The cleanup
of tlsdesc data is handled by the custom _dl_unmap.
Fixes bug 27403.
For j0f/j1f/y0f/y1f, the largest error for all binary32
inputs is reduced to at most 9 ulps for all rounding modes.
The new code is enabled only when there is a cancellation at the very end of
the j0f/j1f/y0f/y1f computation, or for very large inputs, thus should not
give any visible slowdown on average. Two different algorithms are used:
* around the first 64 zeros of j0/j1/y0/y1, approximation polynomials of
degree 3 are used, computed using the Sollya tool (https://www.sollya.org/)
* for large inputs, an asymptotic formula from [1] is used
[1] Fast and Accurate Bessel Function Computation,
John Harrison, Proceedings of Arith 19, 2009.
Inputs yielding the new largest errors are added to auto-libm-test-in,
and ulps are regenerated for various targets (thanks Adhemerval Zanella).
Tested on x86_64 with --disable-multi-arch and on powerpc64le-linux-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
config/i386/constraints.md in GCC has
(define_constraint "e"
"32-bit signed integer constant, or a symbolic reference known
to fit that range (for immediate operands in sign-extending x86-64
instructions)."
(match_operand 0 "x86_64_immediate_operand"))
Since movq takes a signed 32-bit immediate or a register source operand,
use "er", instead of "nr"/"ir", constraint for 32-bit signed integer
constant or register on movq.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This fixes missing definition of math functions in libc in a static link
that are no longer built for libm after commit 4898d9712b ("Avoid adding
duplicated symbols into static libraries").
If building on s390 / i686 with -Os, various conformance
tests are failing with e.g.
conform/ISO/assert.h/linknamespace.out:
[initial] __assert_fail -> [libc.a(assert.o)] __dcgettext -> [libc.a(dcgettext.o)] __dcigettext -> [libc.a(dcigettext.o)] __getcwd -> [libc.a(getcwd.o)] __fstatat64 -> [libc.a(fstatat64.o)] gnu_dev_makedev
The usage of gnu_dev_makedev was recently introduced by
usage of the makedev makro in commit:
5b980d4809
linux: Use statx for MIPSn64
This patch is now linking against __gnu_dev_makedev as
also done in commit:
8b4a118222
Fix -Os gnu_dev_* linknamespace, localplt issues (bug 15105, bug 19463).
All of the isnan functions are in libc.so due to printf_fp, so move
__isnanf128 there too for consistency.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
UNREGISTER_ATFORK is now defined for all ports in register-atfork.h, so most
previous includes of fork.h actually only need register-atfork.h now, and
cxa_finalize.c does not need an ifdef UNREGISTER_ATFORK any more.
The nptl-specific fork generation counters can then go to pthreadP.h, and
fork.h be removed.
Checked on x86_64-linux-gnu and i686-gnu.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Update ifunc-memmove.h to select the function optimized with AVX512
instructions using ZMM16-ZMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with AVX512 instructions using ZMM16-ZMM31 registers to avoid RTM abort
with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort. When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation. Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.
Since VZEROUPPER triggers RTM abort while VZEROALL won't, select AVX
optimized string/memory functions with
xtest
jz 1f
vzeroall
ret
1:
vzeroupper
ret
at function exit on processors with usable RTM, but without 256-bit EVEX
instructions to avoid VZEROUPPER inside a transactionally executing RTM
region.
Update ifunc-memcmp.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL, AVX512BW and MOVBE since VZEROUPPER isn't needed at function
exit.
Update ifunc-memset.h/ifunc-wmemset.h to select the function optimized
with 256-bit EVEX instructions using YMM16-YMM31 registers to avoid RTM
abort with usable AVX512VL and AVX512BW since VZEROUPPER isn't needed at
function exit.
Update ifunc-memmove.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL since VZEROUPPER isn't needed at function exit.
Update ifunc-strcpy.h to select the function optimized with 256-bit EVEX
instructions using YMM16-YMM31 registers to avoid RTM abort with usable
AVX512VL and AVX512BW since VZEROUPPER isn't needed at function exit.
Update ifunc-avx2.h, strchr.c, strcmp.c, strncmp.c and wcsnlen.c to
select the function optimized with 256-bit EVEX instructions using
YMM16-YMM31 registers to avoid RTM abort with usable AVX512VL, AVX512BW
and BMI2 since VZEROUPPER isn't needed at function exit.
For strcmp/strncmp, prefer AVX2 strcmp/strncmp if Prefer_AVX2_STRCMP
is set.
1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered
by VZEROUPPER inside a transactionally executing RTM region.
2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
The tests are refactored to use a common skeleton that handles whether
the underlying filesystem supports 64 bit time, skips 64 bit time
tests when the TU only supports 32 bit, and also skip 64 bit time
tests larger than 32 unsigned int (y2106) if the system does not
support it (MIPSn64 on kernels without statx support).
Checked on x86_64-linux-gnu and i686-linux-gnu. I also checked
on a mips64el-linux-gnu with 4.1.4 and 5.10.0-4-5kc-malta kernel
to verify if the y2106 are indeed skipped.
MIPSn64 kernel ABI for legacy stat uses unsigned 32 bit for second
timestamp, which limits the maximum value to y2106. This patch
make mips64 use statx as for 32-bit architectures.
Thie __cp_stat64_t64_statx is open coded, its usage is solely on
fstatat64 and it avoid the need to redefine the name for mips64
(which will call __cp_stat64_statx since its does not use
__stat64_t64 internally).
If the minimum kernel supports statx there is no need to call the
fallback stat legacy syscalls.
The statx is also called on compat xstat syscall, but different
than the fstatat it calls no fallback and it is assumed to be
always present.
Checked on powerpc-linux-gnu (with and without --enable-kernel=4.11)
and on powerpc64-linux-gnu.
It makes fstatat use __NR_statx, which fix the s390 issue with
missing nanoxsecond support on compat stat syscalls (at least
on recent kernels) and limits the statx call to only one function
(which simplifies the __ASSUME_STATX support).
Checked on i686-linux-gnu and on powerpc-linux-gnu.
1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE.
2. Disable all features which depend on XSAVE:
a. If OSXSAVE is disabled by glibc tunables. Or
b. If both XSAVE and XSAVEC aren't usable.
The Linux version already target the current thread by using tgkill
along with getpid and gettid.
For arm, libpthread does not do a intra PLT since it will call the
raise from libc.
Checked on x86_64-linux-gnu.
The libc version is identical and built with same flags. The libc
version is set as the default version.
The libpthread compat symbol requires to mask it when building the
loader object otherwise ld might complain about a missing
versioned symbol (as for alpha).
Checked on x86_64-linux-gnu.
The libc version is identical and built with same flags. Both aarch64
and nios2 also requires to export __send and tt was done previously with
the HAVE_INTERNAL_SEND_SYMBOL (which forced the symbol creation).
All __send callers are internal to libc and the original issue that
required the symbol export was due a missing libc_hidden_def. So
a compat symbol is added for __send and the libc_hidden_def is
defined regardless.
Checked on x86_64-linux-gnu and i686-linux-gnu.
This is a target hook for memory tagging, the original was a naive
implementation. Uses the same algorithm as __libc_mtag_tag_region,
but with instructions that also zero the memory. This was not
benchmarked on real cpu, but expected to be faster than the naive
implementation.
This is a target hook for memory tagging, the original was a naive
implementation. The optimized version relies on "dc gva" to tag 64
bytes at a time for large allocations and optimizes small cases without
adding too many branches. This was not benchmarked on real cpu, but
expected to be faster than the naive implementation.
This is a common operation when heap tagging is enabled, so inline the
instruction instead of using an extern call.
The .inst directive is used instead of the name of the instruction (or
acle intrinsics) because malloc.c is not compiled for armv8.5-a+memtag
architecture, runtime cpu support detection is used.
Prototypes are removed from the comments as they were not always
correct.
The memset api is suboptimal and does not provide much benefit. Memory
tagging only needs a zeroing memset (and only for memory that's sized
and aligned to multiples of the tag granule), so change the internal
api and the target hooks accordingly. This is to simplify the
implementation of the target hook.
Reviewed-by: DJ Delorie <dj@redhat.com>
Use inline functions instead of macros, because macros can cause unused
variable warnings and type conversion issues. We assume these functions
may appear in the code but only in dead code paths (hidden by a runtime
check), so it's important that they can compile with correct types, but
if they are actually used that should be an error.
Currently the hooks are only used when USE_MTAG is true which only
happens on aarch64 and then the aarch64 specific code is used not this
generic header. However followup refactoring will allow the hooks to
be used with !USE_MTAG.
Note: the const qualifier in the comment was wrong: changing tags is a
write operation.
Reviewed-by: DJ Delorie <dj@redhat.com>
The arch13 memmove variant is currently selected by the ifunc selector
if the Miscellaneous-Instruction-Extensions Facility 3 facility bit
is present, but the function is also using vector instructions.
If the vector support is not present, one is receiving an operation
exception.
Therefore this patch also checks for vector support in the ifunc
selector and in ifunc-impl-list.c.
Just to be sure, the configure check is now also testing an arch13
vector instruction and an arch13 Miscellaneous-Instruction-Extensions
Facility 3 instruction.
This essentially folds compat_symbol_unique functionality into
compat_symbol.
This change eliminates the need for intermediate aliases for defining
multiple symbol versions, for both compat_symbol and versioned_symbol.
Some binutils versions do not suport multiple versions per symbol on
some targets, so aliases are automatically introduced, similar to what
compat_symbol_unique did. To reduce symbol table sizes, a configure
check is added to avoid these aliases if they are not needed.
The new mechanism works with data symbols as well as function symbols,
due to the way an assembler-level redirect is used. It is not
compatible with weak symbols for old binutils versions, which is why
the definition of __malloc_initialize_hook had to be changed. This
is not a loss of functionality because weak symbols do not matter
to dynamic linking.
The placeholder symbol needs repeating in nptl/libpthread-compat.c
now that compat_symbol is used, but that seems more obvious than
introducing yet another macro.
A subtle difference was that compat_symbol_unique made the symbol
global automatically. compat_symbol does not do this, so static
had to be removed from the definition of
__libpthread_version_placeholder.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
A subsequent change will require including <config.h> for defining
symbol_version_reference. <libc-symbol.h> should not include
<config.h> for _ISOMAC, so it cannot define symbol_version_reference
anymore, but symbol_version_reference is needed <shlib-compat.h> even
for _ISOMAC. Moving the definition of symbol_version_reference to a
separate file <libc-symver.h> makes it possible to use a single
definition for both cases.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2b47727c68 ("posix: Consolidate register-atfork") introduced a fork.h
header to declare the atfork unregister hook, but was missing adding it
for htl.
This fixes tst-atfork2.
During critical sections, signal handling is deferred and thus RPCs return
EINTR, even if SA_RESTART is set. We thus have to restart the whole critical
section in that case.
This also adds HURD_CRITICAL_UNLOCK in the cases where one wants to
break the section in the middle.
This change adds new test to assess sigtimedwait's timeout related
functionality - the sigset_t is configured for SIGUSR1, which will
not be triggered, so sigtimedwait just waits for timeout.
To be more specific - two use cases are checked:
- if sigtimedwait times out immediately when passed struct timespec has
zero values of tv_nsec and tv_sec.
- if sigtimedwait times out after timeout specified in passed argument
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This code provides test to check if time on target machine is properly
read via ntp_gettime syscall.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
After this patch applied the ntp_gettimex function is always declared
in the sys/timex.h header. Currently it is not when __REDIRECT_NTH is
defined (i.e. in ARM 32 bit port).
The generic implementation basically handle the system agnostic logic
(filtering out the invalid signals) while the __libc_sigaction is
the function with implements the system and architecture bits.
Checked on x86_64-linux-gnu and i686-linux-gnu.
The instructions xsxexpdp and xsxexpqp introduced on POWER9 extract
the exponent from a double-precision and quad-precision floating-point
respectively, thus they can be used to improve ilogb, ilogbf and ilogbf128.
It is not actually used by the legacy unwinder linked into
libc.so, and it conflicts with the unwind-link functionality
in libpthread.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
__x86_get_cpuid_feature_leaf is called during early startup, before
the stack check guard is initialized and is hence not safe to build
with stack-protector.
Additionally, IFUNC resolvers for static tst-ifunc-isa tests get
called too early for stack protector to be useful, so fix them to
disable stack protector for the resolver functions.
This fixes all failures seen with --enable-stack-protector=all
configuration.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
commit 2d651eb926
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Sep 18 07:55:14 2020 -0700
x86: Move x86 processor cache info to cpu_features
missed _SC_LEVEL1_ICACHE_LINESIZE.
1. Add level1_icache_linesize to struct cpu_features.
2. Initialize level1_icache_linesize by calling handle_intel,
handle_zhaoxin and handle_amd with _SC_LEVEL1_ICACHE_LINESIZE.
3. Return level1_icache_linesize for _SC_LEVEL1_ICACHE_LINESIZE.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Both htl and nptl uses a different data structure to implement atfork
handlers. The nptl one was refactored by 27761a1042 to use a dynarray
which simplifies the code.
This patch moves the nptl one to be the generic implementation and
replace Hurd linked one. Different than previous NPTL, Hurd also uses
a global lock, so performance should be similar.
Checked on x86_64-linux-gnu, i686-linux-gnu, and with a build for
i686-gnu.
The nptl already expects a Linux syscall internally. Also
__is_internal_signal is used and the DEBUGGING_P check is removed.
Checked on x86_64-linux-gnu.