The x86 defines optimized THREAD_ATOMIC_* macros where reference always
the current thread instead of the one indicated by input 'descr' argument.
It work as long the input is the self thread pointer, however it generates
wrong code if the semantic is to set a bit atomicialy from another thread.
This is not an issue for current GLIBC usage, however the new cancellation
code expects that some synchronization code to atomically set bits from
different threads.
The generic code generates an additional load to reference to TLS segment,
for instance the code:
THREAD_ATOMIC_BIT_SET (THREAD_SELF, cancelhandling, CANCELED_BIT);
Compiles to:
lock;orl $4, %fs:776
Where with patch changes it now compiles to:
mov %fs:16,%rax
lock;orl $4, 776(%rax)
If some usage indeed proves to be a hotspot we can add an extra macro
with a more descriptive name (THREAD_ATOMIC_BIT_SET_SELF for instance)
where x86_64 might optimize it.
Checked on x86_64-linux-gnu.
* sysdeps/x86_64/nptl/tls.h (THREAD_ATOMIC_CMPXCHG_VAL,
THREAD_ATOMIC_AND, THREAD_ATOMIC_BIT_SET): Remove macros.
This will be used to record the current shadow stack base for shadow
stack switching by getcontext, makecontext, setcontext and swapcontext.
If the target shadow stack base is the same as the current shadow stack
base, we unwind the shadow stack. Otherwise it is a stack switch and
we look for a restore token to restore the target shadow stack.
* sysdeps/i386/nptl/tcb-offsets.sym (SSP_BASE_OFFSET): New.
* sysdeps/i386/nptl/tls.h (tcbhead_t): Replace __glibc_reserved2
with ssp_base.
* sysdeps/x86_64/nptl/tcb-offsets.sym (SSP_BASE_OFFSET): New.
* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Replace __glibc_reserved2
with ssp_base.
feature_1 has X86_FEATURE_1_IBT and X86_FEATURE_1_SHSTK bits for CET
run-time control.
CET_ENABLED, IBT_ENABLED and SHSTK_ENABLED are defined to 1 or 0 to
indicate that if CET, IBT and SHSTK are enabled.
<tls-setup.h> is added to set up thread-local data.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #22563]
* nptl/pthread_create.c: Include <tls-setup.h>.
(__pthread_create_2_1): Call tls_setup_tcbhead.
* sysdeps/generic/tls-setup.h: New file.
* sysdeps/x86/nptl/tls-setup.h: Likewise.
* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET):
Likewise.
* sysdeps/i386/nptl/tls.h (tcbhead_t): Rename __glibc_reserved1
to feature_1.
* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Likewise.
* sysdeps/x86/sysdep.h (X86_FEATURE_1_IBT): New.
(X86_FEATURE_1_SHSTK): Likewise.
(CET_ENABLED): Likewise.
(IBT_ENABLED): Likewise.
(SHSTK_ENABLED): Likewise.
sysdeps/i386/nptl/tls.h has
typedef struct
{
void *tcb; /* Pointer to the TCB. Not necessarily the
thread descriptor used by libpthread. */
dtv_t *dtv;
void *self; /* Pointer to the thread descriptor. */
int multiple_threads;
uintptr_t sysinfo;
uintptr_t stack_guard;
uintptr_t pointer_guard;
int gscope_flag;
int __glibc_reserved1;
/* Reservation of some values for the TM ABI. */
void *__private_tm[4];
/* GCC split stack support. */
void *__private_ss;
} tcbhead_t;
The offset of __private_ss is 0x34. But GCC defines
/* We steal the last transactional memory word. */
#define TARGET_THREAD_SPLIT_STACK_OFFSET 0x30
and libgcc/config/i386/morestack.S has
cmpl %gs:0x30,%eax # See if we have enough space.
movl %eax,%gs:0x30 # Save the new stack boundary.
movl %eax,%gs:0x30 # Save the new stack boundary.
movl %ecx,%gs:0x30 # Save new stack boundary.
movl %eax,%gs:0x30
movl %gs:0x30,%eax
movl %eax,%gs:0x30
Since update TARGET_THREAD_SPLIT_STACK_OFFSET changes split stack ABI,
this patch updates tcbhead_t to match GCC.
[BZ #23250]
[BZ #10686]
* sysdeps/i386/nptl/tls.h (tcbhead_t): Change __private_tm[4]
to _private_tm[3] and add __glibc_reserved2.
Add _Static_assert of offset of __private_ss == 0x30.
* sysdeps/x86_64/nptl/tls.h: Add _Static_assert of offset of
__private_ss == 0x40 for ILP32 and == 0x70 for LP64.
In commit cba595c350 and commit
f81ddabffd, ABI compatibility with
applications was broken by increasing the size of the on-stack
allocated __pthread_unwind_buf_t beyond the oringal size.
Applications only have the origianl space available for
__pthread_unwind_register, and __pthread_unwind_next to use,
any increase in the size of __pthread_unwind_buf_t causes these
functions to write beyond the original structure into other
on-stack variables leading to segmentation faults in common
applications like vlc. The only workaround is to version those
functions which operate on the old sized objects, but this must
happen in glibc 2.28.
Thank you to Andrew Senkevich, H.J. Lu, and Aurelien Jarno, for
submitting reports and tracking the issue down.
The commit reverts the above mentioned commits and testing on
x86_64 shows that the ABI compatibility is restored. A tst-cleanup1
regression test linked with an older glibc now passes when run
with the newly built glibc. Previously a tst-cleanup1 linked with
an older glibc would segfault when run with an affected glibc build.
Tested on x86_64 with no regressions.
Signed-off-by: Carlos O'Donell <carlos@redhat.com>
On x86, padding in struct __jmp_buf_tag is used for shadow stack pointer
to support Shadow Stack in Intel Control-flow Enforcemen Technology.
cancel_jmp_buf has been updated to include saved_mask so that it is as
large as struct __jmp_buf_tag. We must suport the old cancel_jmp_buf
in existing binaries. Since symbol versioning doesn't work on
cancel_jmp_buf, feature_1 is added to tcbhead_t so that setjmp and
longjmp can check if shadow stack is enabled. NB: Shadow stack is
enabled only if all modules are shadow stack enabled.
[BZ #22563]
* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
* sysdeps/i386/nptl/tls.h (tcbhead_t): Add feature_1.
* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Rename __glibc_unused1
to feature_1.
This patch removes CALL_THREAD_FCT macro usage and its defition for
x86. For 32 bits it usage is only for force 16 stack alignment,
however stack is already explicit aligned in clone syscall. For
64 bits and x32 it just a function call and there is no need to
code it with inline assembly.
Checked on i686-linux-gnu, x86_64-linux-gnu, and x86_64-linux-gnu-x32.
* nptl/pthread_create.c (START_THREAD_DEFN): Remove
CALL_THREAD_FCT macro usage.
* sysdeps/i386/nptl/tls.h (CALL_THREAD_FCT): Remove definition.
* sysdeps/x86_64/nptl/tls.h (CALL_THREAD_FCT): Likewise.
* sysdeps/x86_64/32/nptl/tls.h: Remove file.
posix/wordexp-test.c used libc-internal.h for PTR_ALIGN_DOWN; similar
to what was done with libc-diag.h, I have split the definitions of
cast_to_integer, ALIGN_UP, ALIGN_DOWN, PTR_ALIGN_UP, and PTR_ALIGN_DOWN
to a new header, libc-pointer-arith.h.
It then occurred to me that the remaining declarations in libc-internal.h
are mostly to do with early initialization, and probably most of the
files including it, even in the core code, don't need it anymore. Indeed,
only 19 files actually need what remains of libc-internal.h. 23 others
need libc-diag.h instead, and 12 need libc-pointer-arith.h instead.
No file needs more than one of them, and 16 don't need any of them!
So, with this patch, libc-internal.h stops including libc-diag.h as
well as losing the pointer arithmetic macros, and all including files
are adjusted.
* include/libc-pointer-arith.h: New file. Define
cast_to_integer, ALIGN_UP, ALIGN_DOWN, PTR_ALIGN_UP, and
PTR_ALIGN_DOWN here.
* include/libc-internal.h: Definitions of above macros
moved from here. Don't include libc-diag.h anymore either.
* posix/wordexp-test.c: Include stdint.h and libc-pointer-arith.h.
Don't include libc-internal.h.
* debug/pcprofile.c, elf/dl-tunables.c, elf/soinit.c, io/openat.c
* io/openat64.c, misc/ptrace.c, nptl/pthread_clock_gettime.c
* nptl/pthread_clock_settime.c, nptl/pthread_cond_common.c
* string/strcoll_l.c, sysdeps/nacl/brk.c
* sysdeps/unix/clock_settime.c
* sysdeps/unix/sysv/linux/i386/get_clockfreq.c
* sysdeps/unix/sysv/linux/ia64/get_clockfreq.c
* sysdeps/unix/sysv/linux/powerpc/get_clockfreq.c
* sysdeps/unix/sysv/linux/sparc/sparc64/get_clockfreq.c:
Don't include libc-internal.h.
* elf/get-dynamic-info.h, iconv/loop.c
* iconvdata/iso-2022-cn-ext.c, locale/weight.h, locale/weightwc.h
* misc/reboot.c, nis/nis_table.c, nptl_db/thread_dbP.h
* nscd/connections.c, resolv/res_send.c, soft-fp/fmadf4.c
* soft-fp/fmasf4.c, soft-fp/fmatf4.c, stdio-common/vfscanf.c
* sysdeps/ieee754/dbl-64/e_lgamma_r.c
* sysdeps/ieee754/dbl-64/k_rem_pio2.c
* sysdeps/ieee754/flt-32/e_lgammaf_r.c
* sysdeps/ieee754/flt-32/k_rem_pio2f.c
* sysdeps/ieee754/ldbl-128/k_tanl.c
* sysdeps/ieee754/ldbl-128ibm/k_tanl.c
* sysdeps/ieee754/ldbl-96/e_lgammal_r.c
* sysdeps/ieee754/ldbl-96/k_tanl.c, sysdeps/nptl/futex-internal.h:
Include libc-diag.h instead of libc-internal.h.
* elf/dl-load.c, elf/dl-reloc.c, locale/programs/locarchive.c
* nptl/nptl-init.c, string/strcspn.c, string/strspn.c
* malloc/malloc.c, sysdeps/i386/nptl/tls.h
* sysdeps/nacl/dl-map-segments.h, sysdeps/x86_64/atomic-machine.h
* sysdeps/unix/sysv/linux/spawni.c
* sysdeps/x86_64/nptl/tls.h:
Include libc-pointer-arith.h instead of libc-internal.h.
* elf/get-dynamic-info.h, sysdeps/nacl/dl-map-segments.h
* sysdeps/x86_64/atomic-machine.h:
Add multiple include guard.
This patch adds SSE, AVX and AVX512 versions of _dl_runtime_resolve
and _dl_runtime_profile, which save and restore the first 8 vector
registers used for parameter passing. elf_machine_runtime_setup
selects the proper _dl_runtime_resolve or _dl_runtime_profile based
on _dl_x86_cpu_features. It avoids race condition caused by
FOREIGN_CALL macros, which are only used for x86-64.
Performance impact of saving and restoring 8 vector registers are
negligible on Nehalem, Sandy Bridge, Ivy Bridge and Haswell when
ld.so is optimized with SSE2.
[BZ #15128]
* sysdeps/x86_64/Makefile [$(subdir) == elf] (tests): Add
ifuncmain8.
(modules-names): Add ifuncmod8.
($(objpfx)ifuncmain8): New rule.
* sysdeps/x86_64/dl-machine.h: Include <dl-procinfo.h> and
<cpuid.h>.
(elf_machine_runtime_setup): Use _dl_runtime_resolve_sse,
_dl_runtime_resolve_avx, or _dl_runtime_resolve_avx512,
_dl_runtime_profile_sse, _dl_runtime_profile_avx, or
_dl_runtime_profile_avx512, based on HAS_ARCH_FEATURE.
* sysdeps/x86_64/dl-trampoline.S: Rewrite.
* sysdeps/x86_64/dl-trampoline.h: Likewise.
* sysdeps/x86_64/ifuncmain8.c: New file.
* sysdeps/x86_64/ifuncmod8.c: Likewise.
* sysdeps/x86_64/nptl/tcb-offsets.sym (RTLD_SAVESPACE_SSE):
Removed.
* sysdeps/x86_64/nptl/tls.h (__128bits): Removed.
(tcbhead_t): Change rtld_must_xmm_save to __glibc_unused1.
Change rtld_savespace_sse to __glibc_unused2.
(RTLD_CHECK_FOREIGN_CALL): Removed.
(RTLD_ENABLE_FOREIGN_CALL): Likewise.
(RTLD_PREPARE_FOREIGN_CALL): Likewise.
(RTLD_FINALIZE_FOREIGN_CALL): Likewise.