A not so recent kernel change[1] changed how the trampoline
`__kernel_sigtramp_rt64` is used to call signal handlers.
This was exposed on the test misc/tst-sigcontext-get_pc
Before kernel 5.9, the kernel set LR to the trampoline address and
jumped directly to the signal handler, and at the end the signal
handler, as any other function, would `blr` to the address set. In
other words, the trampoline was executed just at the end of the signal
handler and the only thing it did was call sigreturn. But since
kernel 5.9 the kernel set CTRL to the signal handler and calls to the
trampoline code, the trampoline then `bctrl` to the address in CTRL,
setting the LR to the next instruction in the middle of the
trampoline, when the signal handler returns, the rest of the
trampoline code executes the same code as before.
Here is the full trampoline code as of kernel 5.11.0-rc5 for
reference:
V_FUNCTION_BEGIN(__kernel_sigtramp_rt64)
.Lsigrt_start:
bctrl /* call the handler */
addi r1, r1, __SIGNAL_FRAMESIZE
li r0,__NR_rt_sigreturn
sc
.Lsigrt_end:
V_FUNCTION_END(__kernel_sigtramp_rt64)
This new behavior breaks how `backtrace()` uses to detect the
trampoline frame to correctly reconstruct the stack frame when it is
called from inside a signal handling.
This workaround rely on the fact that the trampoline code is at very
least two (maybe 3?) instructions in size (as it is in the 32 bits
version, only on `li` and `sc`), so it is safe to check the return
address be in the range __kernel_sigtramp_rt64 .. + 4.
[1] subject: powerpc/64/signal: Balance return predictor stack in signal trampoline
commit: 0138ba5783ae0dcc799ad401a1e8ac8333790df9
url: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0138ba5783ae0dcc799ad401a1e8ac8333790df9
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Some IFUNC variants are not compatible with BTI and MTE so don't
set them as usable for testing and benchmarking on a BTI or MTE
enabled system.
As far as IFUNC selectors are concerned a system is BTI enabled if
the cpu supports it and glibc was built with BTI branch protection.
Most IFUNC variants are BTI compatible, but thunderx2 memcpy and
memmove use a jump table with indirect jump, without a BTI j.
Fixes bug 26818.
The hwcap value is now in linux 5.10 and in glibc bits/hwcap.h, so use
that definition.
Move the definition to init-arch.h so all ifunc selectors can use it
and expose an "mte" shorthand for mte enabled runtime.
For now we allow user code to enable tag checks and use PROT_MTE
mappings without libc involvment, this is not guaranteed ABI, but
can be useful for testing and debugging with MTE.
GCC mainline shows the following error:
../sysdeps/unix/sysv/linux/mips/mips64/getdents64.c: In function '__getdents64':
../sysdeps/unix/sysv/linux/mips/mips64/getdents64.c:121:7: error: 'memcpy' forming offset [4, 7] is out of the bounds [0, 4] [-Werror=array-bounds]
121 | memcpy (((char *) dp + offsetof (struct dirent64, d_ino)),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
122 | KDP_MEMBER (kdp, d_ino), sizeof ((struct dirent64){0}.d_ino));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../sysdeps/unix/sysv/linux/mips/mips64/getdents64.c:123:7: error: 'memcpy' forming offset [4, 7] is out of the bounds [0, 4] [-Werror=array-bounds]
123 | memcpy (((char *) dp + offsetof (struct dirent64, d_off)),
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
124 | KDP_MEMBER (kdp, d_off), sizeof ((struct dirent64){0}.d_off));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The issue is due both d_ino and d_off fields for mips64-n32
kernel_dirent are 32-bits, while this is using memcpy to copy 64 bits
from it into the glibc dirent64.
The fix is to use a temporary buffer to read the correct type
from kernel_dirent.
Checked with a build-many-glibcs.py for mips64el-linux-gnu and I
also checked the tst-getdents64 on mips64el 4.1.4 kernel with
and without fallback enabled (by manually setting the
getdents64_supported).
It is not available with the baseline ISA.
Fixes commit 68ab82f566
("powerpc: Runtime selection between sc and scv for syscalls").
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@linux.ibm.com>
Check ifunc resolver with CPU_FEATURE_USABLE and tunables in dynamic and
static executables to verify that CPUID features are initialized early in
static PIE.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
This reverts commit 20b39d5946 for static
library. This avoids the need to rebuild the world for the case where
libstdc++ (and potentially other libraries) are linked to a old glibc.
To avoid requering to provide xstat symbols for newer ABIs (such as
riscv32) a new LIB_COMPAT macro is added. It is similar to SHLIB_COMPAT
but also works for static case (thus evaluating similar to SHLIB_COMPAT
for both shared and static case).
Checked with a check-abi on all affected ABIs. I also check if the
static library does contains the xstat symbols.
In commit 863d775c48, kunpeng920 is added to default memcpy version,
however, there is performance degradation when the copy size is some large bytes, eg: 100k.
This is the result, tested in glibc-2.28:
before backport after backport Performance improvement
memcpy_1k 0.005 0.005 0.00%
memcpy_10k 0.032 0.029 10.34%
memcpy_100k 0.356 0.429 -17.02%
memcpy_1m 7.470 11.153 -33.02%
This is the demo
#include "stdio.h"
#include "string.h"
#include "stdlib.h"
char a[1024*1024] = {12};
char b[1024*1024] = {13};
int main(int argc, char *argv[])
{
int i = atoi(argv[1]);
int j;
int size = atoi(argv[2]);
for (j = 0; j < i; j++)
memcpy(b, a, size*1024);
return 0;
}
# gcc -g -O0 memcpy.c -o memcpy
# time taskset -c 10 ./memcpy 100000 1024
Co-authored-by: liqingqing <liqingqing3@huawei.com>
Extern symbol access in position independent code usually involves GOT
indirection which needs RELATIVE reloc in a static linked PIE. (On
some targets this is avoided e.g. because the linker can relax a GOT
access to a pc-relative access, but this is not generally true.) Code
that runs before static PIE self relocation must avoid relying on
dynamic relocations which can be ensured by using hidden visibility.
However we cannot just make all symbols hidden:
On i386, all calls to IFUNC functions must go through PLT and calls to
hidden functions CANNOT go through PLT in PIE since EBX used in PIE PLT
may not be set up for local calls to hidden IFUNC functions.
This patch aims to make symbol references hidden in code that is used
before and by _dl_relocate_static_pie when building a static PIE libc.
Note: for an object that is used in the startup code, its references
and definition may not have consistent visibility: it is only forced
hidden in the startup code.
This is needed for fixing bug 27072.
Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add SUPPORT_STATIC_PIE that targets can define if they support
static PIE. This requires PI_STATIC_AND_HIDDEN support and various
linker features as described in
commit 9d7a3741c9
Add --enable-static-pie configure option to build static PIE [BZ #19574]
Currently defined on x86_64, i386 and aarch64 where static PIE is
known to work.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
In <sys/platform/x86.h>, define CPU features as enum instead of using
the C preprocessor magic to make it easier to wrap this functionality
in other languages. Move the C preprocessor magic to internal header
for better GCC codegen when more than one features are checked in a
single expression as in x86-64 dl-hwcaps-subdirs.c.
1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX.
2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h.
3. Remove struct cpu_features and __x86_get_cpu_features from
<sys/platform/x86.h>.
4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it
in libc.
5. Make __get_cpu_features() private to glibc.
6. Replace __x86_get_cpu_features(N) with __get_cpu_features().
7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE.
8. Use a single enum index for each CPU feature detection.
9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf.
10. Return zero struct cpuid_feature for the older glibc binary with a
smaller CPUID_INDEX_MAX [BZ #27104].
11. Inside glibc, use the C preprocessor magic so that cpu_features data
can be loaded just once leading to more compact code for glibc.
256 bits are used for each CPUID leaf. Some leaves only contain a few
features. We can add exceptions to such leaves. But it will increase
code sizes and it is harder to provide backward/forward compatibilities
when new features are added to such leaves in the future.
When new leaves are added, _rtld_global_ro offsets will change which
leads to race condition during in-place updates. We may avoid in-place
updates by
1. Rename the old glibc.
2. Install the new glibc.
3. Remove the old glibc.
NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy
relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
Since __libc_init_secure is called before ARCH_SETUP_TLS, it must use
"int $0x80" for system calls in i386 static PIE. Add startup_getuid,
startup_geteuid, startup_getgid and startup_getegid to <startup.h>.
Update __libc_init_secure to use them.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add extra-test-objs to test-extras so that they are compiled with
-DMODULE_NAME=testsuite instead of -DMODULE_NAME=libc.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
1. Move x86 processor cache info to _dl_x86_cpu_features in ld.so.
2. Update tunable bounds with TUNABLE_SET_WITH_BOUNDS.
3. Move x86 cache info initialization to dl-cacheinfo.h and initialize
x86 cache info in init_cpu_features ().
4. Put x86 cache info for libc in cacheinfo.h, which is included in
libc-start.c in libc.a and is included in cacheinfo.c in libc.so.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Store ISA level in the portion of the unused upper 32 bits of the hwcaps
field in cache and the unused pad field in aux cache. ISA level is stored
and checked only for shared objects in glibc-hwcaps subdirectories. The
shared objects in the default directories aren't checked since there are
no fallbacks for these shared objects.
Tested on x86-64-v2, x86-64-v3 and x86-64-v4 machines with
--disable-hardcoded-path-in-tests and --enable-hardcoded-path-in-tests.
The first getrandom is used only for __GT_NOCREATE, which is inherently
insecure and can use the entropy as a small improvement. On the
second and later attempts it might help against DoS attacks.
It sync with gnulib commit 854fbb81d91f7a0f2b463e7ace2499dee2f380f2.
Checked on x86_64-linux-gnu.
It syncs with gnulib commit b1268f22f443e8e4b9e. The try_tempname_len
now uses getrandom on each iteration to get entropy and only uses the
clock plus ASLR as source of entropy if getrandom fails.
Checked on x86_64-linux-gnu and i686-linux-gnu.
POSIX states that system returned code for failure to execute the shell
shall be as if the shell had terminated using _exit(127). This
behaviour was removed with 5fb7fc9635.
Checked on x86_64-linux-gnu.
The $gp register may be used to access the global variable in
the PDE program, so the $gp register should be initialized before
executing the IFUNC resolver of PDE program to avoid unexpected
error occurs.
AArch64 always uses pc relative access to static and hidden object
symbols, but the config setting was previously missing.
This affects ld.so start up code.
GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
levels:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250
and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with
GNU_PROPERTY_X86_ISA_1_V[234] marker:
https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13
Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by
commit b0ab06937385e0ae25cebf1991787d64f439bf12
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Oct 30 06:49:57 2020 -0700
x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker
and
commit 32930e4edbc06bc6f10c435dbcc63131715df678
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Oct 9 05:05:57 2020 -0700
x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker
GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the
micro-architecture ISA level required to execute the binary. The marker
must be added by programmers explicitly in one of 3 ways:
1. Pass -mneeded to GCC.
2. Add the marker in the linker inputs as this patch does.
3. Pass -z x86-64-v[234] to the linker.
Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker support to ld.so if binutils 2.32 or newer is used to build glibc:
1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
markers to elf.h.
2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker to abi-note.o based on the ISA level used to compile abi-note.o,
assuming that the same ISA level is used to compile the whole glibc.
3. Add isa_1 to cpu_features to record the supported x86 ISA level.
4. Rename _dl_process_cet_property_note to _dl_process_property_note and
add GNU_PROPERTY_X86_ISA_1_V[234] marker detection.
5. Update _rtld_main_check and _dl_open_check to check loaded objects
with the incompatible ISA level.
6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails
on lesser platforms.
7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c.
Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and
x86-64-v4 machines.
Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine
and got:
[hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1
./elf/tst-isa-level-1: CPU ISA level is lower than required
[hjl@gnu-cfl-2 build-x86_64-linux]$
Remove the wordsize-64 implementations by merging them into the main dbl-64
directory. The second patch just moves all wordsize-64 files and removes a
few wordsize-64 uses in comments and Implies files.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Remove the wordsize-64 implementations by merging them into the main dbl-64
directory. The first patch adds special cases needed for 32-bit targets
(FIX_INT_FP_CONVERT_ZERO and FIX_DBL_LONG_CONVERT_OVERFLOW) to the
wordsize-64 versions. This has no effect on 64-bit targets since they don't
define these macros.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
It sync with gnulib version ae9fb3d66. The testcase for BZ#23741
(stdlib/test-bz22786.c) is adjusted to check also for ENOMEM.
The patch fixes multiple realpath issues:
- Portability fixes for errno clobbering on free (BZ#10635). The
function does not call free directly anymore, although it might be
done through scratch_buffer_free. The free errno clobbering is
being tracked by BZ#17924.
- Pointer arithmetic overflows in realpath (BZ#26592).
- Realpath cyclically call __alloca(path_max) to consume too much
stack space (BZ#26341).
- Realpath mishandles EOVERFLOW; stat not needed anyway (BZ#24970).
The check is done through faccessat now.
Checked on x86_64-linux-gnu and i686-linux-gnu.
This ia regression from 09153638cf, versioned_symbol acts as
weak_alias for !SHARED but it is undefined to avoid non versioned
alias from the generic implementation.
Checked with a build for alpha-linux-gnu.
Calling an IFUNC function defined in unrelocated executable also leads to
segfault. Issue a fatal error message when calling IFUNC function defined
in the unrelocated executable from a shared library.
In the !MAP_FIXED case, when a bogus address is given mmap should pick up a
valide address rather than returning EINVAL: Posix only talks about
EINVAL for the MAP_FIXED case.
This fixes long-running ghc processes.
When copying with "rep movsb", if the distance between source and
destination is N*4GB + [1..63] with N >= 0, performance may be very
slow. This patch updates memmove-vec-unaligned-erms.S for AVX and
AVX512 versions with the distance in RCX:
cmpl $63, %ecx
// Don't use "rep movsb" if ECX <= 63
jbe L(Don't use rep movsb")
Use "rep movsb"
Benchtests data with bench-memcpy, bench-memcpy-large, bench-memcpy-random
and bench-memcpy-walk on Skylake, Ice Lake and Tiger Lake show that its
performance impact is within noise range as "rep movsb" is only used for
data size >= 4KB.
After sp is updated, the CFA offset should be set before next instruction.
Tested in glibc-2.28:
Thread 2 "xxxxxxx" hit Breakpoint 1, _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:149
149 stp x1, x2, [sp, #-32]!
Missing separate debuginfos, use: dnf debuginfo-install libgcc-7.3.0-20190804.h24.aarch64
(gdb) bt
#0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:149
#1 0x0000ffffbe4fbb44 in OurFunction (threadId=3194870184)
at /home/test/test_function.c:30
#2 0x0000000000400c08 in initaaa () at thread.c:58
#3 0x0000000000400c50 in thread_proc (param=0x0) at thread.c:71
#4 0x0000ffffbf6918bc in start_thread (arg=0xfffffffff29f) at pthread_create.c:486
#5 0x0000ffffbf5669ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
(gdb) ni
_dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:150
150 stp x3, x4, [sp, #16]
(gdb) bt
#0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:150
#1 0x0000ffffbe4fbb44 in OurFunction (threadId=3194870184)
at /home/test/test_function.c:30
#2 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) ni
_dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:157
157 mrs x4, tpidr_el0
(gdb) bt
#0 _dl_tlsdesc_dynamic () at ../sysdeps/aarch64/dl-tlsdesc.S:157
#1 0x0000ffffbe4fbb44 in OurFunction (threadId=3194870184)
at /home/test/test_function.c:30
#2 0x0000000000400c08 in initaaa () at thread.c:58
#3 0x0000000000400c50 in thread_proc (param=0x0) at thread.c:71
#4 0x0000ffffbf6918bc in start_thread (arg=0xfffffffff29f) at pthread_create.c:486
#5 0x0000ffffbf5669ec in thread_start () at ../sysdeps/unix/sysv/linux/aarch64/clone.S:78
Signed-off-by: liqingqing <liqingqing3@huawei.com>
Signed-off-by: Shuo Wang <wangshuo47@huawei.com>
Make the tests use TEST_COND_intel96 to decide on whether to build the
unnormal tests instead of the macro in nan-pseudo-number.h and then
drop the header inclusion. This unbreaks test runs on all
architectures that do not have ldbl-96.
Also drop the HANDLE_PSEUDO_NUMBERS macro since it is not used
anywhere.
I've updated copyright dates in glibc for 2021. This is the patch for
the changes not generated by scripts/update-copyrights and subsequent
build / regeneration of generated files. As well as the usual annual
updates, mainly dates in --version output (minus csu/version.c which
previously had to be handled manually but is now successfully updated
by update-copyrights), there is a small change to the copyright notice
in NEWS which should let NEWS get updated automatically next year.
Please remember to include 2021 in the dates for any new files added
in future (which means updating any existing uncommitted patches you
have that add new files to use the new copyright dates in them).