On some architectures, the parts of math_private.h relating to the
floating-point environment are in a separate file fenv_private.h
included from math_private.h. As this is purely an
architecture-specific convention used by several architectures,
however, all such architectures still need their own math_private.h,
even if it has nothing to do beyond #include <fenv_private.h> and
peculiarity of including the i386 file directly instead of having a
shared file in sysdeps/x86.
This patch makes the fenv_private.h name an architecture-independent
convention in glibc. The include of fenv_private.h from
math_private.h becomes architecture-independent (until callers are
updated to include fenv_private.h directly so the include from
math_private.h is no longer needed). Some architecture math_private.h
headers are removed if no longer needed, or renamed to fenv_private.h
if all they define belongs in that header; architecture fenv_private.h
headers now do require #include_next <fenv_private.h>. The i386
fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is
actually shared with x86_64. The generic math_private.h gets a new
include of <stdbool.h>, as needed for bool in some prototypes in that
header (previously that was indirectly included via include/fenv.h,
which now only gets included too late in math_private.h, after those
prototypes).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/aarch64/fpu/fenv_private.h: New file. Based on ....
* sysdeps/aarch64/fpu/math_private.h: ... this file. All contents
moved to fenv_private.h except for ...
(TOINT_INTRINSICS): Kept in math_private.h.
(roundtoint): Likewise.
(converttoint): Likewise.
* sysdeps/arm/fenv_private.h: Change multiple-include guard to
[ARM_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/arm/math_private.h: Remove.
* sysdeps/generic/fenv_private.h: New file. Contents moved from
....
* sysdeps/generic/math_private.h: ... this file. Include
<stdbool.h>. Do not include <fenv.h> or <get-rounding-mode.h>.
Include <fenv_private.h>. Remove functions and macros moved to
fenv_private.h.
* sysdeps/i386/fpu/math_private.h: Remove.
* sysdeps/mips/math_private.h: Move to ....
* sysdeps/mips/fpu/fenv_private.h: ... here. Change
multiple-include guard to [MIPS_FENV_PRIVATE_H]. Remove
[__mips_hard_float] conditional. Include next <fenv_private.h>.
* sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include
guard to [POWERPC_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/powerpc/fpu/math_private.h: Do not include
<fenv_private.h>.
* sysdeps/riscv/rvf/math_private.h: Move to ....
* sysdeps/riscv/rvf/fenv_private.h: ... here. Change
multiple-include guard to [RISCV_FENV_PRIVATE_H]. Include next
<fenv_private.h>.
* sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard
to [SPARC_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/sparc/fpu/math_private.h: Remove.
* sysdeps/i386/fpu/fenv_private.h: Move to ....
* sysdeps/x86/fpu/fenv_private.h: ... here. Change
multiple-include guard to [X86_FENV_PRIVATE_H]. Include next
<fenv_private.h>.
* sysdeps/x86_64/fpu/math_private.h: Do not include
<sysdeps/i386/fpu/fenv_private.h>.
Continuing moving macros out of math-tests.h to smaller headers
following typo-proof conventions instead of using #ifndef, this patch
moves the SNAN_TESTS_* macros for individual types out to their own
sysdeps header (while the type-generic SNAN_TESTS wrapper for those
macros remains in math-tests.h).
Tested for x86_64 and x86, and with build-many-glibcs.py.
* sysdeps/generic/math-tests-snan.h: New file.
* sysdeps/generic/math-tests.h: Include <math-tests-snan.h>.
(SNAN_TESTS_float): Do not define here.
(SNAN_TESTS_double): Likewise.
(SNAN_TESTS_long_double): Likewise.
(SNAN_TESTS_float128): Likewise.
* sysdeps/i386/fpu/math-tests-snan.h: New file.
* sysdeps/i386/fpu/math-tests.h: Remove file.
* sysdeps/ia64/math-tests-snan.h: New file.
* sysdeps/ia64/math-tests.h: Remove file.
* sysdeps/x86/math-tests.h: Likewise.
* sysdeps/x86_64/fpu/math-tests-snan.h: New file.
Move STATE_SAVE_OFFSET and STATE_SAVE_MASK to sysdep.h to make
sysdeps/x86/cpu-features.h a C header file.
* sysdeps/x86/cpu-features.h (STATE_SAVE_OFFSET): Removed.
(STATE_SAVE_MASK): Likewise.
Don't check __ASSEMBLER__ to include <cpu-features-offsets.h>.
* sysdeps/x86/sysdep.h (STATE_SAVE_OFFSET): New.
(STATE_SAVE_MASK): Likewise.
* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features-offsets.h>
instead of <cpu-features.h>.
The glibc.tune namespace is vaguely named since it is a 'tunable', so
give it a more specific name that describes what it refers to. Rename
the tunable namespace to 'cpu' to more accurately reflect what it
encompasses. Also rename glibc.tune.cpu to glibc.cpu.name since
glibc.cpu.cpu is weird.
* NEWS: Mention the change.
* elf/dl-tunables.list: Rename tune namespace to cpu.
* sysdeps/powerpc/dl-tunables.list: Likewise.
* sysdeps/x86/dl-tunables.list: Likewise.
* sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to
cpu.name.
* elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust.
* elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise.
* manual/README.tunables: Likewise.
* manual/tunables.texi: Likewise.
* sysdeps/powerpc/cpu-features.c: Likewise.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
(init_cpu_features): Likewise.
* sysdeps/x86/cpu-features.c: Likewise.
* sysdeps/x86/cpu-features.h: Likewise.
* sysdeps/x86/cpu-tunables.c: Likewise.
* sysdeps/x86_64/Makefile: Likewise.
* sysdeps/x86/dl-cet.c: Likewise.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
GNU_PROPERTY_X86_FEATURE_1_AND may not be the first property item. We
need to check each property item until we reach the end of the property
or find GNU_PROPERTY_X86_FEATURE_1_AND.
This patch adds 2 tests. The first test checks if IBT is enabled and
the second test reads the output from the first test to check if IBT
is is enabled. The second second test fails if IBT isn't enabled
properly.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #23467]
* sysdeps/unix/sysv/linux/x86/Makefile (tests): Add
tst-cet-property-1 and tst-cet-property-2 if CET is enabled.
(CFLAGS-tst-cet-property-1.o): New.
(ASFLAGS-tst-cet-property-dep-2.o): Likewise.
($(objpfx)tst-cet-property-2): Likewise.
($(objpfx)tst-cet-property-2.out): Likewise.
* sysdeps/unix/sysv/linux/x86/tst-cet-property-1.c: New file.
* sysdeps/unix/sysv/linux/x86/tst-cet-property-2.c: Likewise.
* sysdeps/unix/sysv/linux/x86/tst-cet-property-dep-2.S: Likewise.
* sysdeps/x86/dl-prop.h (_dl_process_cet_property_note): Parse
each property item until GNU_PROPERTY_X86_FEATURE_1_AND is found.
All tests should be added to $(tests).
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #23458]
* sysdeps/x86/Makefile (tests): Add tst-get-cpu-features-static.
Simply check if "ptr < ptr_end" since "ptr" is always incremented by 8.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/x86/dl-prop.h (_dl_process_cet_property_note): Don't
parse beyond the note end.
cpu-features.h has
#define bit_cpu_LZCNT (1 << 5)
#define index_cpu_LZCNT COMMON_CPUID_INDEX_1
#define reg_LZCNT
But the LZCNT feature bit is in COMMON_CPUID_INDEX_80000001:
Initial EAX Value: 80000001H
ECX Extended Processor Signature and Feature Bits:
Bit 05: LZCNT available
index_cpu_LZCNT should be COMMON_CPUID_INDEX_80000001, not
COMMON_CPUID_INDEX_1. The VMX feature bit is in COMMON_CPUID_INDEX_1:
Initial EAX Value: 01H
Feature Information Returned in the ECX Register:
5 VMX
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ # 23456]
* sysdeps/x86/cpu-features.h (index_cpu_LZCNT): Set to
COMMON_CPUID_INDEX_80000001.
CET arch_prctl bits should be defined in <asm/prctl.h> from Linux kernel
header files. Add x86 <include/asm/prctl.h> for pre-CET kernel header
files.
Note: sysdeps/unix/sysv/linux/x86/include/asm/prctl.h should be removed
if <asm/prctl.h> from the required kernel header files contains CET
arch_prctl bits.
/* CET features:
IBT: GNU_PROPERTY_X86_FEATURE_1_IBT
SHSTK: GNU_PROPERTY_X86_FEATURE_1_SHSTK
*/
/* Return CET features in unsigned long long *addr:
features: addr[0].
shadow stack base address: addr[1].
shadow stack size: addr[2].
*/
# define ARCH_CET_STATUS 0x3001
/* Disable CET features in unsigned int features. */
# define ARCH_CET_DISABLE 0x3002
/* Lock all CET features. */
# define ARCH_CET_LOCK 0x3003
/* Allocate a new shadow stack with unsigned long long *addr:
IN: requested shadow stack size: *addr.
OUT: allocated shadow stack address: *addr.
*/
# define ARCH_CET_ALLOC_SHSTK 0x3004
/* Return legacy region bitmap info in unsigned long long *addr:
address: addr[0].
size: addr[1].
*/
# define ARCH_CET_LEGACY_BITMAP 0x3005
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/unix/sysv/linux/x86/include/asm/prctl.h: New file.
* sysdeps/unix/sysv/linux/x86/cpu-features.c: Include
<sys/prctl.h> and <asm/prctl.h>.
(get_cet_status): Call arch_prctl with ARCH_CET_STATUS.
* sysdeps/unix/sysv/linux/x86/dl-cet.h: Include <sys/prctl.h>
and <asm/prctl.h>.
(dl_cet_allocate_legacy_bitmap): Call arch_prctl with
ARCH_CET_LEGACY_BITMAP.
(dl_cet_disable_cet): Call arch_prctl with ARCH_CET_DISABLE.
(dl_cet_lock_cet): Call arch_prctl with ARCH_CET_LOCK.
* sysdeps/x86/libc-start.c: Include <startup.h>.
Add <bits/indirect-return.h> and include it in <ucontext.h>.
__INDIRECT_RETURN defined in <bits/indirect-return.h> indicates if
swapcontext requires special compiler treatment. The default
__INDIRECT_RETURN is empty.
On x86, when shadow stack is enabled, __INDIRECT_RETURN is defined
with indirect_return attribute, which has been added to GCC 9, to
indicate that swapcontext returns via indirect branch. Otherwise
__INDIRECT_RETURN is defined with returns_twice attribute.
When shadow stack is enabled, remove always_inline attribute from
prepare_test_buffer in string/tst-xbzero-opt.c to avoid:
tst-xbzero-opt.c: In function ‘prepare_test_buffer’:
tst-xbzero-opt.c:105:1: error: function ‘prepare_test_buffer’ can never be inlined because it uses setjmp
prepare_test_buffer (unsigned char *buf)
when indirect_return attribute isn't available.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* bits/indirect-return.h: New file.
* misc/sys/cdefs.h (__glibc_has_attribute): New.
* sysdeps/x86/bits/indirect-return.h: Likewise.
* stdlib/Makefile (headers): Add bits/indirect-return.h.
* stdlib/ucontext.h: Include <bits/indirect-return.h>.
(swapcontext): Add __INDIRECT_RETURN.
* string/tst-xbzero-opt.c (ALWAYS_INLINE): New.
(prepare_test_buffer): Use it.
Always include <dl-cet.h> and cet-tunables.h> when CET is enabled.
Otherwise, configure glibc with --enable-cet --disable-tunables will
fail to build.
* sysdeps/x86/cpu-features.c: Always include <dl-cet.h> and
cet-tunables.h> when CET is enabled.
Intel Control-flow Enforcement Technology (CET) instructions:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en
forcement-technology-preview.pdf
includes Indirect Branch Tracking (IBT) and Shadow Stack (SHSTK).
GNU_PROPERTY_X86_FEATURE_1_IBT is added to GNU program property to
indicate that all executable sections are compatible with IBT when
ENDBR instruction starts each valid target where an indirect branch
instruction can land. Linker sets GNU_PROPERTY_X86_FEATURE_1_IBT on
output only if it is set on all relocatable inputs.
On an IBT capable processor, the following steps should be taken:
1. When loading an executable without an interpreter, enable IBT and
lock IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the executable.
2. When loading an executable with an interpreter, enable IBT if
GNU_PROPERTY_X86_FEATURE_1_IBT is set on the interpreter.
a. If GNU_PROPERTY_X86_FEATURE_1_IBT isn't set on the executable,
disable IBT.
b. Lock IBT.
3. If IBT is enabled, when loading a shared object without
GNU_PROPERTY_X86_FEATURE_1_IBT:
a. If legacy interwork is allowed, then mark all pages in executable
PT_LOAD segments in legacy code page bitmap. Failure of legacy code
page bitmap allocation causes an error.
b. If legacy interwork isn't allowed, it causes an error.
GNU_PROPERTY_X86_FEATURE_1_SHSTK is added to GNU program property to
indicate that all executable sections are compatible with SHSTK where
return address popped from shadow stack always matches return address
popped from normal stack. Linker sets GNU_PROPERTY_X86_FEATURE_1_SHSTK
on output only if it is set on all relocatable inputs.
On a SHSTK capable processor, the following steps should be taken:
1. When loading an executable without an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on the executable.
2. When loading an executable with an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on interpreter.
a. If GNU_PROPERTY_X86_FEATURE_1_SHSTK isn't set on the executable
or any shared objects loaded via the DT_NEEDED tag, disable SHSTK.
b. Otherwise lock SHSTK.
3. After SHSTK is enabled, it is an error to load a shared object
without GNU_PROPERTY_X86_FEATURE_1_SHSTK.
To enable CET support in glibc, --enable-cet is required to configure
glibc. When CET is enabled, both compiler and assembler must support
CET. Otherwise, it is a configure-time error.
To support CET run-time control,
1. _dl_x86_feature_1 is added to the writable ld.so namespace to indicate
if IBT or SHSTK are enabled at run-time. It should be initialized by
init_cpu_features.
2. For dynamic executables:
a. A l_cet field is added to struct link_map to indicate if IBT or
SHSTK is enabled in an ELF module. _dl_process_pt_note or
_rtld_process_pt_note is called to process PT_NOTE segment for
GNU program property and set l_cet.
b. _dl_open_check is added to check IBT and SHSTK compatibilty when
dlopening a shared object.
3. Replace i386 _dl_runtime_resolve and _dl_runtime_profile with
_dl_runtime_resolve_shstk and _dl_runtime_profile_shstk, respectively if
SHSTK is enabled.
CET run-time control can be changed via GLIBC_TUNABLES with
$ export GLIBC_TUNABLES=glibc.tune.x86_shstk=[permissive|on|off]
$ export GLIBC_TUNABLES=glibc.tune.x86_ibt=[permissive|on|off]
1. permissive: SHSTK is disabled when dlopening a legacy ELF module.
2. on: IBT or SHSTK are always enabled, regardless if there are IBT or
SHSTK bits in GNU program property.
3. off: IBT or SHSTK are always disabled, regardless if there are IBT or
SHSTK bits in GNU program property.
<cet.h> from CET-enabled GCC is automatically included by assembly codes
to add GNU_PROPERTY_X86_FEATURE_1_IBT and GNU_PROPERTY_X86_FEATURE_1_SHSTK
to GNU program property. _CET_ENDBR is added at the entrance of all
assembly functions whose address may be taken. _CET_NOTRACK is used to
insert NOTRACK prefix with indirect jump table to support IBT. It is
defined as notrack when _CET_NOTRACK is defined in <cet.h>.
[BZ #21598]
* configure.ac: Add --enable-cet.
* configure: Regenerated.
* elf/Makefille (all-built-dso): Add a comment.
* elf/dl-load.c (filebuf): Moved before "dynamic-link.h".
Include <dl-prop.h>.
(_dl_map_object_from_fd): Call _dl_process_pt_note on PT_NOTE
segment.
* elf/dl-open.c: Include <dl-prop.h>.
(dl_open_worker): Call _dl_open_check.
* elf/rtld.c: Include <dl-prop.h>.
(dl_main): Call _rtld_process_pt_note on PT_NOTE segment. Call
_rtld_main_check.
* sysdeps/generic/dl-prop.h: New file.
* sysdeps/i386/dl-cet.c: Likewise.
* sysdeps/unix/sysv/linux/x86/cpu-features.c: Likewise.
* sysdeps/unix/sysv/linux/x86/dl-cet.h: Likewise.
* sysdeps/x86/cet-tunables.h: Likewise.
* sysdeps/x86/check-cet.awk: Likewise.
* sysdeps/x86/configure: Likewise.
* sysdeps/x86/configure.ac: Likewise.
* sysdeps/x86/dl-cet.c: Likewise.
* sysdeps/x86/dl-procruntime.c: Likewise.
* sysdeps/x86/dl-prop.h: Likewise.
* sysdeps/x86/libc-start.h: Likewise.
* sysdeps/x86/link_map.h: Likewise.
* sysdeps/i386/dl-trampoline.S (_dl_runtime_resolve): Add
_CET_ENDBR.
(_dl_runtime_profile): Likewise.
(_dl_runtime_resolve_shstk): New.
(_dl_runtime_profile_shstk): Likewise.
* sysdeps/linux/x86/Makefile (sysdep-dl-routines): Add dl-cet
if CET is enabled.
(CFLAGS-.o): Add -fcf-protection if CET is enabled.
(CFLAGS-.os): Likewise.
(CFLAGS-.op): Likewise.
(CFLAGS-.oS): Likewise.
(asm-CPPFLAGS): Add -fcf-protection -include cet.h if CET
is enabled.
(tests-special): Add $(objpfx)check-cet.out.
(cet-built-dso): New.
(+$(cet-built-dso:=.note)): Likewise.
(common-generated): Add $(cet-built-dso:$(common-objpfx)%=%.note).
($(objpfx)check-cet.out): New.
(generated): Add check-cet.out.
* sysdeps/x86/cpu-features.c: Include <dl-cet.h> and
<cet-tunables.h>.
(TUNABLE_CALLBACK (set_x86_ibt)): New prototype.
(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
(init_cpu_features): Call get_cet_status to check CET status
and update dl_x86_feature_1 with CET status. Call
TUNABLE_CALLBACK (set_x86_ibt) and TUNABLE_CALLBACK
(set_x86_shstk). Disable and lock CET in libc.a.
* sysdeps/x86/cpu-tunables.c: Include <cet-tunables.h>.
(TUNABLE_CALLBACK (set_x86_ibt)): New function.
(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
* sysdeps/x86/sysdep.h (_CET_NOTRACK): New.
(_CET_ENDBR): Define if not defined.
(ENTRY): Add _CET_ENDBR.
* sysdeps/x86/dl-tunables.list (glibc.tune): Add x86_ibt and
x86_shstk.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Add
_CET_ENDBR.
(_dl_runtime_profile): Likewise.
Save and restore shadow stack pointer in setjmp and longjmp to support
shadow stack in Intel CET. Use feature_1 in tcbhead_t to check if
shadow stack is enabled before saving and restoring shadow stack pointer.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/i386/__longjmp.S: Include <jmp_buf-ssp.h>.
(__longjmp): Restore shadow stack pointer if shadow stack is
enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
isn't defined for __longjmp_cancel.
* sysdeps/i386/bsd-_setjmp.S: Include <jmp_buf-ssp.h>.
(_setjmp): Save shadow stack pointer if shadow stack is enabled
and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/i386/bsd-setjmp.S: Include <jmp_buf-ssp.h>.
(setjmp): Save shadow stack pointer if shadow stack is enabled
and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/i386/setjmp.S: Include <jmp_buf-ssp.h>.
(__sigsetjmp): Save shadow stack pointer if shadow stack is
enabled and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/unix/sysv/linux/i386/____longjmp_chk.S: Include
<jmp_buf-ssp.h>.
(____longjmp_chk): Restore shadow stack pointer if shadow stack
is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/unix/sysv/linux/x86/Makefile (gen-as-const-headers):
Remove jmp_buf-ssp.sym.
* sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S: Include
<jmp_buf-ssp.h>.
(____longjmp_chk): Restore shadow stack pointer if shadow stack
is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/x86/Makefile (gen-as-const-headers): Add
jmp_buf-ssp.sym.
* sysdeps/x86/jmp_buf-ssp.sym: New dummy file.
* sysdeps/x86_64/__longjmp.S: Include <jmp_buf-ssp.h>.
(__longjmp): Restore shadow stack pointer if shadow stack is
enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
isn't defined for __longjmp_cancel.
* sysdeps/x86_64/setjmp.S: Include <jmp_buf-ssp.h>.
(__sigsetjmp): Save shadow stack pointer if shadow stack is
enabled and SHADOW_STACK_POINTER_OFFSET is defined.
feature_1 has X86_FEATURE_1_IBT and X86_FEATURE_1_SHSTK bits for CET
run-time control.
CET_ENABLED, IBT_ENABLED and SHSTK_ENABLED are defined to 1 or 0 to
indicate that if CET, IBT and SHSTK are enabled.
<tls-setup.h> is added to set up thread-local data.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #22563]
* nptl/pthread_create.c: Include <tls-setup.h>.
(__pthread_create_2_1): Call tls_setup_tcbhead.
* sysdeps/generic/tls-setup.h: New file.
* sysdeps/x86/nptl/tls-setup.h: Likewise.
* sysdeps/i386/nptl/tcb-offsets.sym (FEATURE_1_OFFSET): New.
* sysdeps/x86_64/nptl/tcb-offsets.sym (FEATURE_1_OFFSET):
Likewise.
* sysdeps/i386/nptl/tls.h (tcbhead_t): Rename __glibc_reserved1
to feature_1.
* sysdeps/x86_64/nptl/tls.h (tcbhead_t): Likewise.
* sysdeps/x86/sysdep.h (X86_FEATURE_1_IBT): New.
(X86_FEATURE_1_SHSTK): Likewise.
(CET_ENABLED): Likewise.
(IBT_ENABLED): Likewise.
(SHSTK_ENABLED): Likewise.
From Zen onwards this will be enabled. It was disabled for the
Excavator case and will remain disabled.
Reviewd-by: Carlos O'Donell <carlos@redhat.com>
Although the REP MOVSB implementations of memmove, memcpy and mempcpy
aren't used by the current processors, this patch adds Prefer_FSRM
check in ifunc-memmove.h so that they can be used in the future.
* sysdeps/x86/cpu-features.h (bit_arch_Prefer_FSRM): New.
(index_arch_Prefer_FSRM): Likewise.
* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
Also check Prefer_FSRM.
* sysdeps/x86_64/multiarch/ifunc-memmove.h (IFUNC_SELECTOR):
Also return OPTIMIZE (erms) for Prefer_FSRM.
The newer Intel processors support Fast Short REP MOVSB which has a
feature bit in CPUID. This patch adds the Fast Short REP MOVSB (FSRM)
bit to x86 cpu-features.
* sysdeps/x86/cpu-features.h (bit_cpu_FSRM): New.
(index_cpu_FSRM): Likewise.
(reg_FSRM): Likewise.
This patch continues cleaning up math_private.h by moving the
math_opt_barrier and math_force_eval macros to a separate header
math-barriers.h.
At present, those macros are inside a "#ifndef math_opt_barrier" in
math_private.h to allow architectures to override them and then use
a separate math-barriers.h header, no such #ifndef or #include_next is
needed; architectures just have their own alternative version of
math-barriers.h when providing their own optimized versions that avoid
going through memory unnecessarily. The generic math-barriers.h has a
comment added to document these two macros.
In this patch, math_private.h is made to #include <math-barriers.h>,
so files using these macros do not need updating yet. That is because
of uses of math_force_eval in math_check_force_underflow and
math_check_force_underflow_nonneg, which are still defined in
math_private.h. Once those are moved out to a separate header, that
separate header can be made to include <math-barriers.h>, as can the
other files directly using these barrier macros, and then the include
of <math-barriers.h> from math_private.h can be removed.
Tested for x86_64 and x86. Also tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by this patch.
* sysdeps/generic/math-barriers.h: New file.
* sysdeps/generic/math_private.h [!math_opt_barrier]
(math_opt_barrier): Move to math-barriers.h.
[!math_opt_barrier] (math_force_eval): Likewise.
* sysdeps/aarch64/fpu/math-barriers.h: New file.
* sysdeps/aarch64/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/alpha/fpu/math-barriers.h: New file.
* sysdeps/alpha/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/x86/fpu/math-barriers.h: New file.
* sysdeps/i386/fpu/fenv_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
* sysdeps/m68k/m680x0/fpu/math_private.h: Move to....
* sysdeps/m68k/m680x0/fpu/math-barriers.h: ... here. Adjust
multiple-include guard for rename.
* sysdeps/powerpc/fpu/math-barriers.h: New file.
* sysdeps/powerpc/fpu/math_private.h (math_opt_barrier): Move to
math-barriers.h.
(math_force_eval): Likewise.
The pad array in struct pthread_unwind_buf is used by setjmp to save
shadow stack register. We assert that size of struct pthread_unwind_buf
is no less than offset of shadow stack pointer + shadow stack pointer
size.
Since functions, like LIBC_START_MAIN, START_THREAD_DEFN as well as
these with thread cancellation, call setjmp, but never return after
__libc_unwind_longjmp, __libc_unwind_longjmp, which is defined as
__libc_longjmp on x86, doesn't need to restore shadow stack register.
__libc_longjmp, which is a private interface for thread cancellation
implementation in libpthread, is changed to call __longjmp_cancel,
instead of __longjmp. __longjmp_cancel is a new internal function
in libc, which is similar to __longjmp, but doesn't restore shadow
stack register.
The compatibility longjmp and siglongjmp in libpthread.so are changed
to call __libc_siglongjmp, instead of __libc_longjmp, so that they will
restore shadow stack register.
Tested with build-many-glibcs.py.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* nptl/pthread_create.c (START_THREAD_DEFN): Clear previous
handlers after setjmp.
* setjmp/longjmp.c (__libc_longjmp): Don't define alias if
defined.
* sysdeps/unix/sysv/linux/x86/setjmpP.h: Include
<libc-pointer-arith.h>.
(_JUMP_BUF_SIGSET_BITS_PER_WORD): New.
(_JUMP_BUF_SIGSET_NSIG): Changed to 96.
(_JUMP_BUF_SIGSET_NWORDS): Changed to use ALIGN_UP and
_JUMP_BUF_SIGSET_BITS_PER_WORD.
* sysdeps/x86/Makefile (sysdep_routines): Add __longjmp_cancel.
* sysdeps/x86/__longjmp_cancel.S: New file.
* sysdeps/x86/longjmp.c: Likewise.
* sysdeps/x86/nptl/pt-longjmp.c: Likewise.
Continuing the removals of inline functions from the x86
bits/mathinline.h, this patch removes an inline of __finite (which was
not actually architecture-specific at all beyond its
endianness-dependence).
This inline is not normally used with GCC 4.4 or later, because
isfinite now uses __builtin_isfinite except for -fsignaling-nans.
Allowing __builtin_isfinite etc. to work properly even for
-fsignaling-nans, by implementing versions of those built-in functions
that use integer arithmetic in GCC, is
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66462> (a patch was
committed but had to be reverted because it caused problems, and that
patch didn't address all formats for all architectures, only some, so
by itself would not have been sufficient to allow glibc to use
__builtin_isfinite unconditionally for new-enough GCC).
Tested for x86_64 and x86.
* sysdeps/x86/fpu/bits/mathinline.h [__USE_MISC] (__finite):
Remove inline function.
Remove the now unused target specific__ieee754_sqrt(f/l) inlines.
Also remove inlines of sqrt which are for really old GCC versions.
Removing these is desirable, under the general principle of leaving
such inlining to the compiler rather than trying to do it in installed
headers, especially when only very old compilers are affected.
Note that removing inlines for __ieee754_sqrt disables inlining in the
sqrt wrapper functions. Given the sqrt function will typically only be
called for negative arguments, it doesn't matter whether the inlining
happens or not.
* sysdeps/aarch64/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
* sysdeps/alpha/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
* sysdeps/generic/math-type-macros.h (M_SQRT): Use sqrt.
* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
* sysdeps/powerpc/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
* sysdeps/s390/fpu/bits/mathinline.h: Remove file.
* sysdeps/sparc/fpu/bits/mathinline.h (sqrt) Remove.
(sqrtf): Remove.
(sqrtl): Remove.
(__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
(__ieee754_sqrtl): Remove.
* sysdeps/m68k/m680x0/fpu/mathimpl.h (__ieee754_sqrt): Remove.
* sysdeps/x86/fpu/math_private.h (__ieee754_sqrt): Remove.
* sysdeps/x86_64/fpu/math_private.h (__ieee754_sqrt): Remove.
(__ieee754_sqrtf): Remove.
(__ieee754_sqrtl): Remove.
This patch removes further parts of sysdeps/x86/fpu/bits/mathinline.h
that are only of value for optimization with older compiler versions,
in accordance with general principles of preferring the let the
compiler deal with such inlining through built-in functions.
In general, GCC supports inlining all these functions as of version
4.3 or earlier. However, some inlines in GCC may have had excessively
restrictive conditions in past GCC versions (e.g. requiring
-ffast-math when the inline is valid under broader conditions). (In
particular, GCC had, before GCC 7, unnecessarily restrictive
conditions on when it could apply floor and ceil inlines corresponding
to the ones removed here. The same was true for rint, but
bits/mathinline.h *also* was excessively restrictive there.)
The removed sincos inlines are for __sincos etc. functions (not a
public interface and not currently used in this header either; not in
a part of the header ever used for building glibc itself). Likewise,
the atan2 inlines included one for __atan2l, also not a public
interface and not used for building glibc itself (calls inside glibc
generally use __ieee754_atan2l, for which there is a separate
__LIBC_INTERNAL_MATH_INLINES case in this header).
Tested for x86_64 and x86.
* sysdeps/x86/fpu/bits/mathinline.h [__FAST_MATH__]
(__sincos_code): Remove define and undefine.
[__FAST_MATH__] (__sincos): Remove inline function.
[__FAST_MATH__] (__sincosf): Remove inline function.
[__FAST_MATH__] (__sincosl): Remove inline function.
(__atan2l): Remove inline functions.
[!__GNUC_PREREQ (3, 4)] (__atan2_code): Remove macro.
[!__GNUC_PREREQ (3, 4) && __FAST_MATH__] (atan2): Remove inline
function.
(floor): Remove inline function.
(ceil): Likewise.
[__FAST_MATH__] (__ldexp_code): Remove macro.
[__FAST_MATH__] (ldexp): Remove inline function.
[__FAST_MATH__ && __USE_ISOC99] (ldexpf): Likewise.
[__FAST_MATH__ && __USE_ISOC99] (ldexpl): Likewise.
[__FAST_MATH__ && __USE_ISOC99] (rint): Likewise.
[__USE_ISOC99] (__lrint_code): Remove macro.
[__USE_ISOC99] (__llrint_code): Likewise.
[__USE_ISOC99] (lrintf): Remove inline function.
[__USE_ISOC99] (lrint): Likewise.
[__USE_ISOC99] (lrintl): Likewise.
[__USE_ISOC99] (llrint): Likewise.
[__USE_ISOC99] (llrintf): Likewise.
[__USE_ISOC99] (llrintl): Likewise.
In accordance with the general principle of preferring to let the
compiler optimize function calls based on their standard semantics
rather than putting inline definitions of such functions in installed
headers, this patch removes various such inline definitions in the x86
bits/mathinline.h that were already disabled for GCC 3.5 or later and
so were only used with very old compilers (for which good optimization
is particularly unimportant); along with those inlines, a definition
of __M_SQRT2, which was only used in such inline functions, is also
removed. This is similar to an early step in removing the string.h
inlines; I intend to follow up with further removals of
bits/mathinline.h inline definitions in appropriate logical groups
(with GCC bugs filed in cases where GCC doesn't already support
corresponding optimizations).
Tested for x86_64 and x86.
* sysdeps/x86/fpu/bits/mathinline.h [!__GNUC_PREREQ (3, 4)]
(lrintf): Remove definitions used only with old GCC.
[!__GNUC_PREREQ (3, 4)] (lrint): Likewise.
[!__GNUC_PREREQ (3, 4)] (llrintf): Likewise.
[!__GNUC_PREREQ (3, 4)] (llrint): Likewise.
[!__GNUC_PREREQ (3, 4)] (fmaxf): Likewise.
[!__GNUC_PREREQ (3, 4)] (fmax): Likewise.
[!__GNUC_PREREQ (3, 4)] (fminf): Likewise.
[!__GNUC_PREREQ (3, 4)] (fmin): Likewise.
[!__GNUC_PREREQ (3, 4)] (rint): Likewise.
[!__GNUC_PREREQ (3, 4)] (rintf): Likewise.
[!__GNUC_PREREQ (3, 4)] (nearbyint): Likewise.
[!__GNUC_PREREQ (3, 4)] (nearbyintf): Likewise.
[!__GNUC_PREREQ (3, 4)] (ceil): Likewise.
[!__GNUC_PREREQ (3, 4)] (ceilf): Likewise.
[!__GNUC_PREREQ (3, 4)] (floor): Likewise.
[!__GNUC_PREREQ (3, 4)] (floorf): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (tan): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (fmod): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 4)] (sin): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 4)] (cos): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (log10): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (asin): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (acos): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 4)] (atan): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (log1p): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (logb): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (log2): Likewise.
[__FAST_MATH__ && !__GNUC_PREREQ (3, 5)] (drem): Likewise.
[__FAST_MATH__] (__M_SQRT2): Remove macro.
We have a general principle of preferring optimizations for library
facilities to use compiler built-in functions rather than being
located in library headers, where the compiler can reasonably optimize
code without needing to know glibc implementation details.
This patch applies this principle to bits/byteswap.h, eliminating all
the architecture-specific variants and bits/byteswap-16.h. The
__bswap_16, __bswap_32 and __bswap_64 interfaces all become inline
functions, never macros, using the GCC built-in functions where
available and otherwise a single architecture-independent definition
using shifts and masking (which compilers may well be able to detect
and optimize; GCC has detection of various byte-swapping idioms).
The __bswap_constant_32 macro needs to stay around because of uses in
static initializers within glibc and its tests, and so for consistency
all __bswap_constant_* are kept rather than just being inlined into
the old-GCC-or-non-GCC parts of the __bswap_* inline function
definitions.
Various open bugs are addressed by this cleanup, with caveats about
exactly what is covered by those bugs and when the bugs applied at
all.
Bug 14508 reports -Wformat warnings building glibc because __bswap_*
sometimes returned the wrong types. Obviously we already don't have
such warnings any more or the build would be failing, given -Werror,
and I suspect that bug was originally for wrong types for x86_64, as
fixed by commit d394eb742a (glibc 2.17).
The only case I saw removed by this patch where the types would still
have been wrong was the non-__GNUC__ case of __bswap_64 in the s390
header (using unsigned long long int, but uint64_t would be unsigned
long int for 64-bit). In any case, the single header consistently
uses __uintN_t types after this patch, thereby eliminating all such
bugs. The existing string/test-endian-types.c test already suffices
to verify that the types are correct with the compiler used to build
glibc and its tests.
Bug 15512 reports an error from __bswap_constant_16 with -Werror
-Wsign-conversion. I am unable to reproduce this with any GCC version
supporting -Wsign-conversion - all seem to be able to avoid warning
for ((x) >> 8) & 0xffu, where x is uint16_t, which while it formally
does involve an implicit conversion from int to unsigned int, is also
a case where it should be easy for the compiler to see that the value
converted is never negative. But in this patch __bswap_constant_16 is
changed to use signed 0xff so that no such implicit conversion occurs
at all, and a test with -Werror -Wsign-conversion is added.
Bug 17082 objects to the use of ({}) statement expressions in these
macros preventing use at file scope (in C, that's in sizeof etc.; in
C++, more generally in static initializers). The particular case of
these interfaces is fixed by this patch as it changes them to inline
functions, eliminating all uses of ({}) in bits/byteswap.h, and a
corresponding testcase is added. The bug tries to raise a more
general policy question about use of ({}) in macros in installed
headers, referring to "many other libc functions" (unspecified which
functions are being considered).
Since such policy questions belong on libc-alpha, and since there
*are* macros in installed headers which can't really avoid using ({})
(where they are type-generic, so can't use an inline function, but
need a temporary variable, and a few where the interface involves
returning memory from alloca so can't use an inline function either),
I propose to consider that bug fixed with this change. That is
without prejudice to any other new bugs anyone wishes to file *for
precisely defined sets of macros* requesting moving away from ({})
*where it is clearly possible for those interfaces*. Where ({}) can
be avoided, typically by use of an inline function, I think that's a
good idea - that inline functions are typically to be preferred to
({}) for header interfaces where such optimizations are useful but the
interface is suited to being defined using an inline function.
Bug 20530 requests use of __builtin_bswap16 when available (GCC 4.8
and later), which this patch implements.
Tested for x86_64, and with build-many-glibcs.py. Also did an x86_64
test with the __GNUC_PREREQ conditionals changed to "#if 0" to verify
the old-GCC/non-GCC case in the headers. (There are already existing
tests for correctness of results of these interfaces.)
[BZ #14508]
[BZ #15512]
[BZ #17082]
[BZ #20530]
* bits/byteswap.h: Update file comment. Do not include
<bits/byteswap-16.h>.
(__bswap_constant_16): Cast result to __uint16_t. Use signed 0xff
constant.
(__bswap_16): Define as inline function.
(__bswap_constant_32): Reformat definition.
(__bswap_32): Always define as inline function, not macro, using
__uint32_t. Use __builtin_bswap32 if [__GNUC_PREREQ (4, 3)],
otherwise __bswap_constant_32.
(__bswap_constant_64): Reformat definition. Do not use
__extension__ here.
(__bswap_64): Always define as inline function, not macro. Use
__extension__ on function definition. Use __builtin_bswap64 if
[__GNUC_PREREQ (4, 3)], otherwise __bswap_constant_64.
* string/test-endian-file-scope.c: New file.
* string/test-endian-sign-conversion.c: Likewise.
* string/Makefile (headers): Remove bits/byteswap-16.h.
(tests): Add test-endian-file-scope and
test-endian-sign-conversion.
(CFLAGS-test-endian-sign-conversion.c): New variable.
* bits/byteswap-16.h: Remove file.
* sysdeps/ia64/bits/byteswap-16.h: Likewise.
* sysdeps/ia64/bits/byteswap.h: Likewise.
* sysdeps/m68k/bits/byteswap.h: Likewise.
* sysdeps/s390/bits/byteswap-16.h: Likewise.
* sysdeps/s390/bits/byteswap.h: Likewise.
* sysdeps/tile/bits/byteswap.h: Likewise.
* sysdeps/x86/bits/byteswap-16.h: Likewise.
* sysdeps/x86/bits/byteswap.h: Likewise.
This patch continues filling out TS 18661-3 support by adding *f64x
function aliases on platforms with _Float64x support. (It so happens
the set of such platforms is exactly the same as the set of platforms
with _Float128 support, although on x86_64, x86 and ia32 the _Float64x
format is Intel extended rather than binary128.) The API provided
corresponds exactly to that provided for _Float128, mostly coming from
TS 18661-3. As these functions always alias those for another type
(long double, _Float128 or both), __* function names are not provided,
as in other cases of alias types.
Given the preparation done in previous patches, this one just enables
the feature via Makeconfig and bits/floatn.h, adds symbol versions,
and updates documentation and ABI baselines. The symbol versions are
present unconditionally as GLIBC_2.27 in the relevant Versions files,
as it's OK for those to specify versions for functions that may not be
present in some configurations; no additional complexity is needed
unless in future some configuration gains support for this type that
didn't have such support in 2.27. The Makeconfig additions for ia64
and x86 aren't strictly needed, as those configurations also get
float64x-alias-fcts definitions from
sysdeps/ieee754/float128/Makeconfig, but still seem appropriate given
that _Float64x is not _Float128 for those configurations.
A libm-test-ulps update for x86 is included. This is because
bits/mathinline.h does not have _Float64x support added and for two
functions the use of out-of-line functions results in increased ulps
(ifloat64x shares ulps with ildouble / ifloat128 as appropriate).
Given that we'd like generally to eliminate bits/mathinline.h
optimizations, preferring to have such optimizations in GCC instead,
it seems reasonable not to add such support there for new types. GCC
support for _FloatN / _FloatNx built-in functions is limited, but has
been improved in GCC 8, and at some point I hope the full set of libm
built-in functions in GCC, and other optimizations with
per-floating-type aspects, will be enabled for all _FloatN / _FloatNx
types.
Tested for x86_64 and x86, and with build-many-glibcs.py, with both
GCC 6 and GCC 7.
* sysdeps/ia64/Makeconfig (float64x-alias-fcts): New variable.
* sysdeps/ieee754/float128/Makeconfig (float64x-alias-fcts):
Likewise.
* sysdeps/ieee754/ldbl-128/Makeconfig (float64x-alias-fcts):
Likewise.
* sysdeps/x86/Makeconfig: New file.
* bits/floatn-common.h (__HAVE_FLOAT64X): Remove macro.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* bits/floatn.h (__HAVE_FLOAT64X): New macro.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/ia64/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h (__HAVE_FLOAT64X):
Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/mips/ieee754/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/powerpc/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* sysdeps/x86/bits/floatn.h (__HAVE_FLOAT64X): Likewise.
(__HAVE_FLOAT64X_LONG_DOUBLE): Likewise.
* manual/math.texi (Mathematics): Document support for _Float64x.
* math/Versions (GLIBC_2.27): Add _Float64x functions.
* stdlib/Versions (GLIBC_2.27): Likewise.
* wcsmbs/Versions (GLIBC_2.27): Likewise.
* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Update.
* sysdeps/unix/sysv/linux/aarch64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/alpha/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/i386/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/ia64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libc-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/powerpc/powerpc64/libm-le.abilist:
Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/s390/s390-64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc32/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/sparc/sparc64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/libm.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
* sysdeps/unix/sysv/linux/x86_64/x32/libm.abilist: Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Likewise.
* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Likewise.
Further _FloatN / _FloatNx type alias support will involve making
architecture-specific .S files use the common macros for libm function
aliases. Making them use those macros will also serve to simplify
existing code for aliases / symbol versions in various cases, similar
to such simplifications for ldbl-opt code.
The libm-alias-*.h files sometimes need to include <bits/floatn.h> to
determine which aliases they should define. At present, this does not
work for inclusion from .S files because <bits/floatn.h> can define
typedefs for old compilers. This patch changes all the
<bits/floatn.h> and <bits/floatn-common.h> headers to include
__ASSEMBLER__ conditionals. Those conditionals disable everything
related to C syntax in the __ASSEMBLER__ case, not just the problem
typedefs, as that seemed cleanest. The __HAVE_* definitions remain in
the __ASSEMBLER__ case, as those provide information that is required
to define the correct set of aliases.
Tested with build-many-glibcs.py for a representative set of
configurations (x86_64-linux-gnu i686-linux-gnu ia64-linux-gnu
powerpc64le-linux-gnu mips64-linux-gnu-n64 sparc64-linux-gnu) with GCC
6. Also tested with GCC 6 for i686-linux-gnu in conjunction with
changes to use alias macros in .S files.
* bits/floatn-common.h [!__ASSEMBLER]: Disable everything related
to C syntax instead of availability and properties of types.
* bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/ia64/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/mips/ieee754/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/powerpc/bits/floatn.h [!__ASSEMBLER]: Likewise.
* sysdeps/x86/bits/floatn.h [!__ASSEMBLER]: Likewise.
This patch adds two new internal defines to set the internal
pthread_mutex_t layout required by the supported ABIS:
1. __PTHREAD_MUTEX_NUSERS_AFTER_KIND which control whether to define
__nusers fields before or after __kind. The preferred value for
is 0 for new ports and it sets __nusers before __kind.
2. __PTHREAD_MUTEX_USE_UNION which control whether internal __spins and
__list members will be place inside an union for linuxthreads
compatibility. The preferred value is 0 for ports and it sets
to not use an union to define both fields.
It fixes the wrong offsets value for __kind value on x86_64-linux-gnu-x32.
Checked with a make check run-built-tests=no on all afected ABIs.
[BZ #22298]
* nptl/allocatestack.c (allocate_stack): Check if
__PTHREAD_MUTEX_HAVE_PREV is non-zero, instead if
__PTHREAD_MUTEX_HAVE_PREV is defined.
* nptl/descr.h (pthread): Likewise.
* nptl/nptl-init.c (__pthread_initialize_minimal_internal):
Likewise.
* nptl/pthread_create.c (START_THREAD_DEFN): Likewise.
* sysdeps/nptl/fork.c (__libc_fork): Likewise.
* sysdeps/nptl/pthread.h (PTHREAD_MUTEX_INITIALIZER): Likewise.
* sysdeps/nptl/bits/thread-shared-types.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
defines.
(__pthread_internal_list): Check __PTHREAD_MUTEX_USE_UNION instead
of __WORDSIZE for internal layout.
(__pthread_mutex_s): Check __PTHREAD_MUTEX_NUSERS_AFTER_KIND instead
of __WORDSIZE for internal __nusers layout and __PTHREAD_MUTEX_USE_UNION
instead of __WORDSIZE whether to use an union for __spins and __list
fields.
(__PTHREAD_MUTEX_HAVE_PREV): Define also for __PTHREAD_MUTEX_USE_UNION
case.
* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION): New
defines.
* sysdeps/alpha/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/hppa/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/ia64/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/powerpc/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/s390/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/sparc/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/tile/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
* sysdeps/x86/nptl/bits/pthreadtypes-arch.h
(__PTHREAD_MUTEX_NUSERS_AFTER_KIND, __PTHREAD_MUTEX_USE_UNION):
Likewise.
Signed-off-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add a new header file, sysdeps/x86/sysdep.h, for common assembly code
macros between i386 and x86-64. Tested on i686 and x86-64. There are
no differences in outputs of "readelf -a" and "objdump -dw" on all glibc
shared objects before and after the patch.
* sysdeps/i386/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
of <sysdeps/generic/sysdep.h>.
(ALIGNARG): Removed.
(ASM_SIZE_DIRECTIVE): Likewise.
(ENTRY): Likewise.
(END): Likewise.
(ENTRY_CHK): Likewise.
(END_CHK): Likewise.
(syscall_error): Likewise.
(mcount): Likewise.
(PSEUDO_END): Likewise.
(L): Likewise.
(atom_text_section): Likewise.
* sysdeps/x86/sysdep.h: New file.
* sysdeps/x86_64/sysdep.h: Include <sysdeps/x86/sysdep.h> instead
of <sysdeps/generic/sysdep.h>.
(ALIGNARG): Removed.
(ASM_SIZE_DIRECTIVE): Likewise.
(ENTRY): Likewise.
(END): Likewise.
(ENTRY_CHK): Likewise.
(END_CHK): Likewise.
(syscall_error): Likewise.
(mcount): Likewise.
(PSEUDO_END): Likewise.
(L): Likewise.
(atom_text_section): Likewise.
The glibc implementation of iseqsig relies on ordered comparison
operators raising the "invalid" exception for quiet NaN operands, with
a workaround on platforms where a GCC bug means that exception is not
raised. For x86, that bug has now been fixed for GCC 8, so this patch
disables the workaround in that case. If and when the corresponding
bugs for powerpc and s390 are fixed, the headers for those platforms
should of course be updated similarly.
Tested for x86_64 and x86, including with GCC mainline. Note that
other failures appear with GCC mainline because of spurious use of
ordered comparison instructions for unordered operations
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82692>.
* sysdeps/x86/fpu/fix-fp-int-compare-invalid.h
(FIX_COMPARE_INVALID): Define to 0 if [__GNUC_PREREQ (8, 0)].
The bits/floatn.h header currently only has defines relating to
_Float128. This patch adds defines relating to other _FloatN /
_FloatNx types.
The approach taken is to add defines for all _FloatN / _FloatNx types
known to GCC, and to put them in a common bits/floatn-common.h header
included at the end of all the individual bits/floatn.h headers. If
in future some defines become different for different glibc
configurations, they will move out into the separate bits/floatn.h
headers.
Some defines are expected always to be the same across glibc ports.
Corresponding defines are nevertheless put in this header. The intent
is that where there are conditionals (in headers or in non-installed
files) that can just repeat the same or nearly the same logic for each
floating-point type, they should do so, even if in fact the cases for
some types could be unconditionally present or absent because the same
conditionals are true or false for all glibc configurations. This
should make the glibc code with such conditionals easier to read,
because the reader can just see that the same conditionals are
repeated for each type, rather than seeing different conditionals for
different types and needing to reason, at each location with such
differences, why those differences are indeed correct there. (Cases
involving per-format rather than per-type logic are more likely still
to need differences in how they handle different types.)
Having such defines and conditionals also helps in incremental
preparation for adding _Float32 / _Float64 / _Float32x / _Float64x
function aliases. I intend subsequent patches to add such
conditionals corresponding to those already present for _Float128, as
well as making more architecture-specific function implementations use
common macros to define aliases in preparation for adding such _FloatN
/ _FloatNx aliases.
Tested for x86_64.
* bits/floatn-common.h: New file.
* math/Makefile (headers): Add bits/floatn-common.h.
* bits/floatn.h: Include <bits/floatn-common.h>.
* sysdeps/ia64/bits/floatn.h: Likewise.
* sysdeps/ieee754/ldbl-128/bits/floatn.h: Likewise.
* sysdeps/mips/ieee754/bits/floatn.h: Likewise.
* sysdeps/powerpc/bits/floatn.h: Likewise.
* sysdeps/x86/bits/floatn.h: Likewise.
In _dl_runtime_resolve, use fxsave/xsave/xsavec to preserve all vector,
mask and bound registers. It simplifies _dl_runtime_resolve and supports
different calling conventions. ld.so code size is reduced by more than
1 KB. However, use fxsave/xsave/xsavec takes a little bit more cycles
than saving and restoring vector and bound registers individually.
Latency for _dl_runtime_resolve to lookup the function, foo, from one
shared library plus libc.so:
Before After Change
Westmere (SSE)/fxsave 345 866 151%
IvyBridge (AVX)/xsave 420 643 53%
Haswell (AVX)/xsave 713 1252 75%
Skylake (AVX+MPX)/xsavec 559 719 28%
Skylake (AVX512+MPX)/xsavec 145 272 87%
Ryzen (AVX)/xsavec 280 553 97%
This is the worst case where portion of time spent for saving and
restoring registers is bigger than majority of cases. With smaller
_dl_runtime_resolve code size, overall performance impact is negligible.
On IvyBridge, differences in build and test time of binutils with lazy
binding GCC and binutils are noises. On Westmere, differences in
bootstrap and "makc check" time of GCC 7 with lazy binding GCC and
binutils are also noises.
[BZ #21265]
* sysdeps/x86/cpu-features-offsets.sym (XSAVE_STATE_SIZE_OFFSET):
New.
* sysdeps/x86/cpu-features.c: Include <libc-pointer-arith.h>.
(get_common_indeces): Set xsave_state_size, xsave_state_full_size
and bit_arch_XSAVEC_Usable if needed.
(init_cpu_features): Remove bit_arch_Use_dl_runtime_resolve_slow
and bit_arch_Use_dl_runtime_resolve_opt.
* sysdeps/x86/cpu-features.h (bit_arch_Use_dl_runtime_resolve_opt):
Removed.
(bit_arch_Use_dl_runtime_resolve_slow): Likewise.
(bit_arch_Prefer_No_AVX512): Updated.
(bit_arch_MathVec_Prefer_No_AVX512): Likewise.
(bit_arch_XSAVEC_Usable): New.
(STATE_SAVE_OFFSET): Likewise.
(STATE_SAVE_MASK): Likewise.
[__ASSEMBLER__]: Include <cpu-features-offsets.h>.
(cpu_features): Add xsave_state_size and xsave_state_full_size.
(index_arch_Use_dl_runtime_resolve_opt): Removed.
(index_arch_Use_dl_runtime_resolve_slow): Likewise.
(index_arch_XSAVEC_Usable): New.
* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
Support XSAVEC_Usable. Remove Use_dl_runtime_resolve_slow.
* sysdeps/x86_64/Makefile (tst-x86_64-1-ENV): New if tunables
is enabled.
* sysdeps/x86_64/dl-machine.h (elf_machine_runtime_setup):
Replace _dl_runtime_resolve_sse, _dl_runtime_resolve_avx,
_dl_runtime_resolve_avx_slow, _dl_runtime_resolve_avx_opt,
_dl_runtime_resolve_avx512 and _dl_runtime_resolve_avx512_opt
with _dl_runtime_resolve_fxsave, _dl_runtime_resolve_xsave and
_dl_runtime_resolve_xsavec.
* sysdeps/x86_64/dl-trampoline.S (DL_RUNTIME_UNALIGNED_VEC_SIZE):
Removed.
(DL_RUNTIME_RESOLVE_REALIGN_STACK): Check STATE_SAVE_ALIGNMENT
instead of VEC_SIZE.
(REGISTER_SAVE_BND0): Removed.
(REGISTER_SAVE_BND1): Likewise.
(REGISTER_SAVE_BND3): Likewise.
(REGISTER_SAVE_RAX): Always defined to 0.
(VMOV): Removed.
(_dl_runtime_resolve_avx): Likewise.
(_dl_runtime_resolve_avx_slow): Likewise.
(_dl_runtime_resolve_avx_opt): Likewise.
(_dl_runtime_resolve_avx512): Likewise.
(_dl_runtime_resolve_avx512_opt): Likewise.
(_dl_runtime_resolve_sse): Likewise.
(_dl_runtime_resolve_sse_vex): Likewise.
(USE_FXSAVE): New.
(_dl_runtime_resolve_fxsave): Likewise.
(USE_XSAVE): Likewise.
(_dl_runtime_resolve_xsave): Likewise.
(USE_XSAVEC): Likewise.
(_dl_runtime_resolve_xsavec): Likewise.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve_avx512):
Removed.
(_dl_runtime_resolve_avx512_opt): Likewise.
(_dl_runtime_resolve_avx): Likewise.
(_dl_runtime_resolve_avx_opt): Likewise.
(_dl_runtime_resolve_sse): Likewise.
(_dl_runtime_resolve_sse_vex): Likewise.
(_dl_runtime_resolve_fxsave): New.
(_dl_runtime_resolve_xsave): Likewise.
(_dl_runtime_resolve_xsavec): Likewise.
This patch moves the generic definition from x86_64 init-arch
to a common header ifunc-init.h. No functional changes is expected.
Checked on a x86_64-linux-gnu build.
* sysdeps/generic/ifunc-init.h: New file.
* sysdeps/x86/init-arch.h: Use generic ifunc-init.h.
Simplify the C99 isgreater macros. Although some support was added
in GCC 2.97, not all targets added support until GCC 3.1. Therefore
only use the builtins in math.h from GCC 3.1 onwards, and defer to
generic macros otherwise. Improve the generic isunordered macro
to use compares rather than call fpclassify twice - this is not only
faster but also correct for signaling NaNs.
* math/math.h: Improve handling of C99 isgreater macros.
* sysdeps/alpha/fpu/bits/mathinline.h: Remove isgreater macros.
* sysdeps/m68k/m680x0/fpu/bits/mathinline.h: Likewise.
* sysdeps/powerpc/bits/mathinline.h: Likewise.
* sysdeps/sparc/fpu/bits/mathinline.h: Likewise.
* sysdeps/x86/fpu/bits/mathinline.h: Likewise.
AVX512 functions in mathvec are used on machines with AVX512. An AVX2
wrapper is also provided and it can be used when the AVX512 version
isn't profitable. MathVec_Prefer_No_AVX512 is addded to cpu-features.
If glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 is set in GLIBC_TUNABLES
environment variable, the AVX2 wrapper will be used.
Tested on x86-64 machines with and without AVX512. Also verified
glibc.tune.hwcaps=MathVec_Prefer_No_AVX512 on AVX512 machine.
[BZ #21967]
* sysdeps/x86/cpu-features.h (bit_arch_MathVec_Prefer_No_AVX512):
New.
(index_arch_MathVec_Prefer_No_AVX512): Likewise.
* sysdeps/x86/cpu-tunables.c (TUNABLE_CALLBACK (set_hwcaps)):
Handle MathVec_Prefer_No_AVX512.
* sysdeps/x86_64/fpu/multiarch/ifunc-mathvec-avx512.h
(IFUNC_SELECTOR): Return AVX2 version if MathVec_Prefer_No_AVX512
is set.
Before glibc 2.26, ld.so set dl_platform to "x86_64" and searched the
"x86_64" subdirectory when loading a shared library. ld.so in glibc
2.26 was changed to set dl_platform to "haswell" or "xeon_phi", based
on supported ISAs. This led to shared library loading failure for
shared libraries placed under the "x86_64" subdirectory.
This patch adds "x86_64" to x86-64 dl_hwcap so that ld.so will always
search the "x86_64" subdirectory when loading a shared library.
NB: We can't set x86-64 dl_platform to "x86-64" since ld.so will skip
the "haswell" and "xeon_phi" subdirectories on "haswell" and "xeon_phi"
machines.
Tested on i686 and x86-64.
[BZ #22093]
* sysdeps/x86/cpu-features.c (init_cpu_features): Initialize
GLRO(dl_hwcap) to HWCAP_X86_64 for x86-64.
* sysdeps/x86/dl-hwcap.h (HWCAP_COUNT): Updated.
(HWCAP_IMPORTANT): Likewise.
(HWCAP_X86_64): New enum.
(HWCAP_X86_AVX512_1): Updated.
* sysdeps/x86/dl-procinfo.c (_dl_x86_hwcap_flags): Add "x86_64".
* sysdeps/x86_64/Makefile (tests): Add tst-x86_64-1.
(modules-names): Add x86_64/tst-x86_64mod-1.
(LDFLAGS-tst-x86_64mod-1.so): New.
($(objpfx)tst-x86_64-1): Likewise.
($(objpfx)x86_64/tst-x86_64mod-1.os): Likewise.
(tst-x86_64-1-clean): Likewise.
* sysdeps/x86_64/tst-x86_64-1.c: New file.
* sysdeps/x86_64/tst-x86_64mod-1.c: Likewise.
There are various bits/huge_val*.h headers to define HUGE_VAL and
related macros. All of them use __builtin_huge_val etc. for GCC 3.3
and later. Then there are various fallbacks, such as using a large
hex float constant for GCC 2.96 and later, or using unions (with or
without compound literals) to construct the bytes of an infinity, with
this last being the reason for having architecture-specific files.
Supporting TS 18661-3 _FloatN / _FloatNx types that have the same
format as other supported types will mean adding more such macros;
needing to add more headers for them doesn't seem very desirable.
The fallbacks based on bytes of the representation of an infinity do
not meet the standard requirements for a constant expression. At
least one of them is also wrong: sysdeps/sh/bits/huge_val.h is
producing a mixed-endian representation which does not match what GCC
does.
This patch eliminates all those headers, defining the macros directly
in math.h. For GCC 3.3 and later, the built-in functions are used as
now. For other compilers, a large constant 1e10000 (with appropriate
suffix) is used. This is like the fallback for GCC 2.96 and later,
but without using hex floats (which have no apparent advantage here).
It is unambiguously valid standard C for all floating-point formats
with infinities, which covers all formats supported by glibc or likely
to be supported by glibc in future (C90 DR#025 said that if a
floating-point format represents infinities, all real values lie
within the range of representable values, so the constraints for
constant expressions are not violated), but may generate compiler
warnings and wouldn't handle the TS 18661-1 FENV_ROUND pragma
correctly. If someone is actually using a compiler with glibc that
does not claim to be GCC 3.3 or later, but which has a better way to
define the HUGE_VAL macros, we can always add compiler conditionals in
with alternative definitions.
I intend to make similar changes for INF and NAN. The SNAN macros
already just use __builtin_nans etc. with no fallback for compilers
not claiming to be GCC 3.3 or later.
Tested for x86_64.
* math/math.h: Do not include bits/huge_val.h, bits/huge_valf.h,
bits/huge_vall.h or bits/huge_val_flt128.h.
(HUGE_VAL): Define directly here.
[__USE_ISOC99] (HUGE_VALF): Likewise.
[__USE_ISOC99] (HUGE_VALL): Likewise.
[__HAVE_FLOAT128 && __GLIBC_USE (IEC_60559_TYPES_EXT)]
(HUGE_VAL_F128): Likewise.
* math/Makefile (headers): Remove bits/huge_val.h,
bits/huge_valf.h, bits/huge_vall.h and bits/huge_val_flt128.h.
* bits/huge_val.h: Remove.
* bits/huge_val_flt128.h: Likewise.
* bits/huge_valf.h: Likewise.
* bits/huge_vall.h: Likewise.
* sysdeps/ia64/bits/huge_vall.h: Likewise.
* sysdeps/ieee754/bits/huge_val.h: Likewise.
* sysdeps/ieee754/bits/huge_valf.h: Likewise.
* sysdeps/m68k/m680x0/bits/huge_vall.h: Likewise.
* sysdeps/sh/bits/huge_val.h: Likewise.
* sysdeps/sparc/bits/huge_vall.h: Likewise.
* sysdeps/x86/bits/huge_vall.h: Likewise.
Since assembly versions of HAS_CPU_FEATURE and HAS_ARCH_FEATURE have
been removed, assembly versions of index_cpu_* and index_arch_* can
also be removed.
Tested on i686 and x86-64 with and without --disable-multi-arch.
* sysdeps/x86/cpu-features.h [__ASSEMBLER__]
(index_cpu_*, index_arch_*): Removed.