This patch adds a new generic __pthread_rwlock_arch_t definition meant
to be used by new ports. Its layout mimics the current usage on some
64 bits ports and it allows some ports to use the generic definition.
The arch __pthread_rwlock_arch_t definition is moved from
pthreadtypes-arch.h to another arch-specific header (struct_rwlock.h).
Also the static intialization macro for pthread_rwlock_t is set to use
an arch defined on (__PTHREAD_RWLOCK_INITIALIZER) which simplifies its
implementation.
The default pthread_rwlock_t layout differs from current ports with:
1. Internal layout is the same for 32 bits and 64 bits.
2. Internal flag is an unsigned short so it should not required
additional padding to align for word boundary (if it is the case
for the ABI).
Checked with a build on affected abis.
Change-Id: I776a6a986c23199929d28a3dcd30272db21cd1d0
The current way of defining the common mutex definition for POSIX and
C11 on pthreadtypes-arch.h (added by commit 06be6368da) is
not really the best options for newer ports. It requires define some
misleading flags that should be always defined as 0
(__PTHREAD_COMPAT_PADDING_MID and __PTHREAD_COMPAT_PADDING_END), it
exposes options used solely for linuxthreads compat mode
(__PTHREAD_MUTEX_USE_UNION and __PTHREAD_MUTEX_NUSERS_AFTER_KIND), and
requires newer ports to explicit define them (adding more boilerplate
code).
This patch adds a new default __pthread_mutex_s definition meant to
be used by newer ports. Its layout mimics the current usage on both
32 and 64 bits ports and it allows most ports to use the generic
definition. Only ports that use some arch-specific definition (such
as hardware lock-elision or linuxthreads compat) requires specific
headers.
For 32 bit, the generic definitions mimic the other 32-bit ports
of using an union to define the fields uses on adaptive and robust
mutexes (thus not allowing both usage at same time) and by using a
single linked-list for robust mutexes. Both decisions seemed to
follow what recent ports have done and make the resulting
pthread_mutex_t/mtx_t object smaller.
Also the static intialization macro for pthread_mutex_t is set to use
a macro __PTHREAD_MUTEX_INITIALIZER where the architecture can redefine
in its struct_mutex.h if it requires additional fields to be
initialized.
Checked with a build on affected abis.
Change-Id: I30a22c3e3497805fd6e52994c5925897cffcfe13
The new rwlock implementation added by cc25c8b4c1 (2.25) removed
support for lock-elision. This patch removes remaining the
arch-specific unused definitions.
Checked with a build against all affected ABIs.
Change-Id: I5dec8af50e3cd56d7351c52ceff4aa3771b53cd6
With only two exceptions (sys/types.h and sys/param.h, both of which
historically might have defined BYTE_ORDER) the public headers that
include <endian.h> only want to be able to test __BYTE_ORDER against
__*_ENDIAN.
This patch creates a new bits/endian.h that can be included by any
header that wants to be able to test __BYTE_ORDER and/or
__FLOAT_WORD_ORDER against the __*_ENDIAN constants, or needs
__LONG_LONG_PAIR. It only defines macros in the implementation
namespace.
The existing bits/endian.h (which could not be included independently
of endian.h, and only defines __BYTE_ORDER and maybe __FLOAT_WORD_ORDER)
is renamed to bits/endianness.h. I also took the opportunity to
canonicalize the form of this header, which we are stuck with having
one copy of per architecture. Since they are so short, this means git
doesn’t understand that they were renamed from existing headers, sigh.
endian.h itself is a nonstandard header and its only remaining use
from a standard header is guarded by __USE_MISC, so I dropped the
__USE_MISC conditionals from around all of the public-namespace things
it defines. (This means, an application that requests strict library
conformance but includes endian.h will still see the definition of
BYTE_ORDER.)
A few changes to specific bits/endian(ness).h variants deserve
mention:
- sysdeps/unix/sysv/linux/ia64/bits/endian.h is moved to
sysdeps/ia64/bits/endianness.h. If I remember correctly, ia64 did
have selectable endianness, but we have assembly code in
sysdeps/ia64 that assumes it’s little-endian, so there is no reason
to treat the ia64 endianness.h as linux-specific.
- The C-SKY port does not fully support big-endian mode, the compile
will error out if __CSKYBE__ is defined.
- The PowerPC port had extra logic in its bits/endian.h to detect a
broken compiler, which strikes me as unnecessary, so I removed it.
- The only files that defined __FLOAT_WORD_ORDER always defined it to
the same value as __BYTE_ORDER, so I removed those definitions.
The SH bits/endian(ness).h had comments inconsistent with the
actual setting of __FLOAT_WORD_ORDER, which I also removed.
- I *removed* copyright boilerplate from the few bits/endian(ness).h
headers that had it; these files record a single fact in a fashion
dictated by an external spec, so I do not think they are copyrightable.
As long as I was changing every copy of ieee754.h in the tree, I
noticed that only the MIPS variant includes float.h, because it uses
LDBL_MANT_DIG to decide among three different versions of
ieee854_long_double. This patch makes it not include float.h when
GCC’s intrinsic __LDBL_MANT_DIG__ is available.
* string/endian.h: Unconditionally define LITTLE_ENDIAN,
BIG_ENDIAN, PDP_ENDIAN, and BYTE_ORDER. Condition byteswapping
macros only on !__ASSEMBLER__. Move the definitions of
__BIG_ENDIAN, __LITTLE_ENDIAN, __PDP_ENDIAN, __FLOAT_WORD_ORDER,
and __LONG_LONG_PAIR to...
* string/bits/endian.h: ...this new file, which includes
the renamed header bits/endianness.h for the definition of
__BYTE_ORDER and possibly __FLOAT_WORD_ORDER.
* string/Makefile: Install bits/endianness.h.
* include/bits/endian.h: New wrapper.
* bits/endian.h: Rename to bits/endianness.h.
Add multiple-include guard. Rewrite the comment explaining what
the machine-specific variants of this file should do.
* sysdeps/unix/sysv/linux/ia64/bits/endian.h:
Move to sysdeps/ia64.
* sysdeps/aarch64/bits/endian.h
* sysdeps/alpha/bits/endian.h
* sysdeps/arm/bits/endian.h
* sysdeps/csky/bits/endian.h
* sysdeps/hppa/bits/endian.h
* sysdeps/ia64/bits/endian.h
* sysdeps/m68k/bits/endian.h
* sysdeps/microblaze/bits/endian.h
* sysdeps/mips/bits/endian.h
* sysdeps/nios2/bits/endian.h
* sysdeps/powerpc/bits/endian.h
* sysdeps/riscv/bits/endian.h
* sysdeps/s390/bits/endian.h
* sysdeps/sh/bits/endian.h
* sysdeps/sparc/bits/endian.h
* sysdeps/x86/bits/endian.h:
Rename to endianness.h; canonicalize form of file; remove
redundant definitions of __FLOAT_WORD_ORDER.
* sysdeps/powerpc/bits/endianness.h: Remove logic to check for
broken compilers.
* ctype/ctype.h
* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
* sysdeps/csky/nptl/bits/pthreadtypes-arch.h
* sysdeps/ia64/ieee754.h
* sysdeps/ieee754/ieee754.h
* sysdeps/ieee754/ldbl-128/ieee754.h
* sysdeps/ieee754/ldbl-128ibm/ieee754.h
* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
* sysdeps/mips/ieee754/ieee754.h
* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
* sysdeps/nptl/pthread.h
* sysdeps/riscv/nptl/bits/pthreadtypes-arch.h
* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
* sysdeps/sparc/sparc32/ieee754.h
* sysdeps/unix/sysv/linux/generic/bits/stat.h
* sysdeps/unix/sysv/linux/generic/bits/statfs.h
* sysdeps/unix/sysv/linux/sys/acct.h
* wctype/bits/wctype-wchar.h:
Include bits/endian.h, not endian.h.
* sysdeps/unix/sysv/linux/hppa/pthread.h: Don’t include endian.h.
* sysdeps/mips/ieee754/ieee754.h: Use __LDBL_MANT_DIG__
in ifdefs, instead of LDBL_MANT_DIG. Only include float.h
when __LDBL_MANT_DIG__ is not predefined, in which case
define __LDBL_MANT_DIG__ to equal LDBL_MANT_DIG.
Since sysdeps/i386/dl-lookupcfg.h and sysdeps/x86_64/dl-lookupcfg.h are
identical, we can replace them with sysdeps/x86/dl-lookupcfg.h.
* sysdeps/i386/dl-lookupcfg.h: Moved to ...
* sysdeps/x86/dl-lookupcfg.h: Here.
* sysdeps/x86_64/dl-lookupcfg.h: Removed.
This patch refactor how hp-timing is used on loader code for statistics
report. The HP_TIMING_AVAIL and HP_SMALL_TIMING_AVAIL are removed and
HP_TIMING_INLINE is used instead to check for hp-timing avaliability.
For alpha, which only defines HP_SMALL_TIMING_AVAIL, the HP_TIMING_INLINE
is set iff for IS_IN(rtld).
Checked on aarch64-linux-gnu, x86_64-linux-gnu, and i686-linux-gnu. I also
checked the builds for all afected ABIs.
* benchtests/bench-timing.h: Replace HP_TIMING_AVAIL with
HP_TIMING_INLINE.
* nptl/descr.h: Likewise.
* elf/rtld.c (RLTD_TIMING_DECLARE, RTLD_TIMING_NOW, RTLD_TIMING_DIFF,
RTLD_TIMING_ACCUM_NT, RTLD_TIMING_SET): Define.
(dl_start_final_info, _dl_start_final, dl_main, print_statistics):
Abstract hp-timing usage with RTLD_* macros.
* sysdeps/alpha/hp-timing.h (HP_TIMING_INLINE): Define iff IS_IN(rtld).
(HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL): Remove.
* sysdeps/generic/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL,
HP_TIMING_NONAVAIL): Likewise.
* sysdeps/ia64/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL):
Likewise.
* sysdeps/powerpc/powerpc32/power4/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/powerpc/powerpc64/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/sparc/sparc32/sparcv9/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/sparc/sparc64/hp-timing.h (HP_TIMING_AVAIL,
HP_SMALL_TIMING_AVAIL): Likewise.
* sysdeps/x86/hp-timing.h (HP_TIMING_AVAIL, HP_SMALL_TIMING_AVAIL):
Likewise.
* sysdeps/generic/hp-timing-common.h: Update comment with
HP_TIMING_AVAIL removal.
The stub implementations are turned into compat symbols.
Linux actually has two reserved system call numbers (for getpmsg
and putpmsg), but these system calls have never been implemented,
and there are no plans to implement them, so this patch replaces
the wrappers with the generic stubs.
According to <https://bugzilla.redhat.com/show_bug.cgi?id=436349>,
the presence of the XSI STREAMS declarations is a minor portability
hazard because they are not actually implemented.
This commit does not change the TIRPC support code in
sunrpc/rpc_svcout.c. It uses additional XTI functionality and
therefore never worked with glibc.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This patch adds fall-through comments in some cases where -Wextra
produces implicit-fallthrough warnings.
The patch is non-exhaustive. Apart from architecture-specific code
for non-x86_64 architectures, it does not change sunrpc/xdr.c (legacy
code, probably should have such changes, but left to be dealt with
separately), or places that already had comments about the
fall-through but not matching the form expected by
-Wimplicit-fallthrough=3 (the default level with -Wextra; my
inclination is to adjust those comments to match rather than
downgrading to -Wimplicit-fallthrough=1 to allow any comment), or one
place where I thought the implicit fallthrough was not correct and so
should be handled separately as a bug fix. I think the key thing to
consider in review of this patch is whether the fall-through is indeed
intended and correct in each place where such a comment is added.
Tested for x86_64.
* elf/dl-exception.c (_dl_exception_create_format): Add
fall-through comments.
* elf/ldconfig.c (parse_conf_include): Likewise.
* elf/rtld.c (print_statistics): Likewise.
* locale/programs/charmap.c (parse_charmap): Likewise.
* misc/mntent_r.c (__getmntent_r): Likewise.
* posix/wordexp.c (parse_arith): Likewise.
(parse_backtick): Likewise.
* resolv/ns_ttl.c (ns_parse_ttl): Likewise.
* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
* sysdeps/x86_64/dl-machine.h (elf_machine_rela): Likewise.
Add <sincosf_poly.h> and include it in s_sincosf.h to allow vectorized
sincosf_poly. Add x86 sincosf_poly.h to vectorize sincosf_poly. On
Broadwell, bench-sincosf shows:
Before After Improvement
max 160.273 114.198 40%
min 6.25 5.625 11%
mean 13.0325 10.6462 22%
Vectorized sincosf_poly shows
Before After Improvement
max 138.653 114.198 21%
min 5.004 5.625 -11%
mean 11.5934 10.6462 9%
Tested on x86-64 and i686 as well as with build-many-glibcs.py.
* sysdeps/ieee754/flt-32/s_sincosf.h: Include <sincosf_poly.h>.
(sincos_t, sincosf_poly, sinf_poly): Moved to ...
* sysdeps/ieee754/flt-32/sincosf_poly.h: Here. New file.
* sysdeps/x86/fpu/s_sincosf_data.c: New file.
* sysdeps/x86/fpu/sincosf_poly.h: Likewise.
* sysdeps/x86_64/fpu/multiarch/s_sincosf-fma.c: Just include
<sysdeps/ieee754/flt-32/s_sincosf.c>.
After previous cleanups, the only code in the x86 bits/mathinline.h
that is relevant with current compilers is the inline of
__ieee754_atan2l that is conditional on __LIBC_INTERNAL_MATH_INLINES
(i.e. for when libm itself is being built).
This inline is something that does belong in glibc not GCC, since
__ieee754_atan2l is a purely internal function name. This patch moves
that inline to a new sysdeps/x86/fpu/math_private.h, removing the
bits/mathinline.h header.
Note that previously the inline was only for non-SSE 32-bit x86. That
condition does not make sense, however, for a long double function; if
it's not inlined, exactly the same x87 instruction will end up getting
used by the out-of-line function, for both 32-bit and 64-bit. So that
condition is not retained in the new version.
Tested for x86_64 and x86. As expected, installed stripped shared
libraries are unchanged for 32-bit x86, but installed stripped libm.so
is changed for x86_64 because calls to __ieee754_atan2l start being
inlined where previously they were out of line calls. (The same
change to start inlining the function would presumably also apply for
32-bit built with -mfpmath=sse, but that's not a configuration I've
tested.)
* sysdeps/x86/fpu/math_private.h: New file.
* sysdeps/x86/fpu/bits/mathinline.h: Remove.
Continuing the removal of bits/mathinline.h inlines that would better
be done by the compiler, this patch removes x86 inlines for sinh, cosh
and tanh functions (inlines only previously present for fast-math,
non-SSE 32-bit x86). I've filed
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88556> for adding such
inlines as an optimization in GCC.
I believe the only remaining part of the x86 bits/mathinline.h that
does anything useful with current compilers after this patch is the
__LIBC_INTERNAL_MATH_INLINES inline of __ieee754_atan2l; I intend to
remove the whole header and move that inline to a sysdeps
math_private.h header in a subsequent patch.
Tested for x86_64 and x86.
* sysdeps/x86/fpu/bits/mathinline.h (sinh): Remove inline
definition.
(cosh): Likewise.
(tanh): Likewise.
Merge i386 and x86_64 atomic-machine.h to x86 atomic-machine.h.
Tested on i686 and x86_64 as well as with build-many-glibcs.py.
* sysdeps/i386/atomic-machine.h: Merged with ...
* sysdeps/x86_64/atomic-machine.h: To ...
* sysdeps/x86/atomic-machine.h: This. New file.
Continuing the removal of bits/mathinline.h inlines that would better
be done by the compiler, this patch removes x86 inlines for asinh,
acosh and atanh functions (only for fast-math, non-SSE 32-bit x86).
I've filed <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88502> for
adding such inlines as an optimization in GCC.
Tested for x86_64 and x86.
* sysdeps/x86/fpu/bits/mathinline.h (asinh): Remove inline
definition.
(acosh): Likewise.
(atanh): Likewise.
This patch fix Hygon Dhyana processor CPU Vendor ID detection
problem in glibc sysdep module, current glibc codes doesn't
recognize Dhyana CPU Vendor ID("HygonGenuine") and set kind to
arch_kind_other, which result to incorrect zero value for
__cache_sysconf() syscall. As Hygon Dhyana share most
architecture feature as AMD Family 17h, this patch add Hygon CPU
Vendor ID check and setup kind to arch_kind_amd and reuse AMD
code path, which lead to correct return value in
__cache_sysconf() syscall. we run the glibc test suite for both
Hygon Dhyana and AMD EPYC and found no failure case.
Background:
Chengdu Haiguang IC Design Co., Ltd (Hygon) is a Joint Venture
between AMD and Haiguang Information Technology Co.,Ltd., aims at
providing high performance x86 processor for China server market.
Its first generation processor codename is Dhyana, which
originates from AMD technology and shares most of the
architecture with AMD's family 17h, but with different CPU Vendor
ID("HygonGenuine")/Family series number(Family 18h).
Related Hygon kernel patch can be found on
http://lkml.kernel.org/r/5ce86123a7b9dad925ac583d88d2f921040e859b.1538583282.git.puwen@hygon.cn
Signed-off-by: fanjinke <fanjinke@hygon.cn>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Continuing the removal of bits/mathinline.h inlines that would better
be done by the compiler, this patch removes an x86 inline for hypot
functions (only for fast-math, only for non-SSE 32-bit x86). I've
filed <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88474> for adding
such an inline as an optimization in GCC.
Tested for x86_64 and x86.
* sysdeps/x86/fpu/bits/mathinline.h (hypot): Remove inline
definition.
Extend CPUID support for all feature bits from CPUID. Add a new macro,
CPU_FEATURE_USABLE, which can be used to check if a feature is usable at
run-time, instead of HAS_CPU_FEATURE and HAS_ARCH_FEATURE.
Add COMMON_CPUID_INDEX_D_ECX_1, COMMON_CPUID_INDEX_80000007 and
COMMON_CPUID_INDEX_80000008 to check CPU feature bits in them.
Tested on i686 and x86-64 as well as using build-many-glibcs.py with
x86 targets.
* sysdeps/x86/cacheinfo.c (intel_check_word): Updated for
cpu_features_basic.
(__cache_sysconf): Likewise.
(init_cacheinfo): Likewise.
* sysdeps/x86/cpu-features.c (get_extended_indeces): Also
populate COMMON_CPUID_INDEX_80000007 and
COMMON_CPUID_INDEX_80000008.
(get_common_indices): Also populate COMMON_CPUID_INDEX_D_ECX_1.
Use CPU_FEATURES_CPU_P (cpu_features, XSAVEC) to check if
XSAVEC is available. Set the bit_arch_XXX_Usable bits.
(init_cpu_features): Use _Static_assert on
index_arch_Fast_Unaligned_Load.
__get_cpuid_registers and __get_arch_feature. Updated for
cpu_features_basic. Set stepping in cpu_features.
* sysdeps/x86/cpu-features.h: (FEATURE_INDEX_1): Changed to enum.
(FEATURE_INDEX_2): New.
(FEATURE_INDEX_MAX): Changed to enum.
(COMMON_CPUID_INDEX_D_ECX_1): New.
(COMMON_CPUID_INDEX_80000007): Likewise.
(COMMON_CPUID_INDEX_80000008): Likewise.
(cpuid_registers): Likewise.
(cpu_features_basic): Likewise.
(CPU_FEATURE_USABLE): Likewise.
(bit_arch_XXX_Usable): Likewise.
(cpu_features): Use cpuid_registers and cpu_features_basic.
(bit_arch_XXX): Reweritten.
(bit_cpu_XXX): Likewise.
(index_cpu_XXX): Likewise.
(reg_XXX): Likewise.
* sysdeps/x86/tst-get-cpu-features.c: Include <stdio.h> and
<support/check.h>.
(CHECK_CPU_FEATURE): New.
(CHECK_CPU_FEATURE_USABLE): Likewise.
(cpu_kinds): Likewise.
(do_test): Print vendor, family, model and stepping. Check
HAS_CPU_FEATURE and CPU_FEATURE_USABLE.
(TEST_FUNCTION): Removed.
Include <support/test-driver.c> instead of
"../../test-skeleton.c".
* sysdeps/x86_64/multiarch/sched_cpucount.c (__sched_cpucount):
Check POPCNT instead of POPCOUNT.
* sysdeps/x86_64/multiarch/test-multiarch.c (do_test): Likewise.
Add a re-exec test with legacy bitmap to verify that legacy bitmap is
properly hanlded by kernel.
* sysdeps/x86/Makefile (tests): Add tst-cet-legacy-1a.
(tst-cet-legacy-1a-ARGS): New.
($(objpfx)tst-cet-legacy-1a): New target.
* sysdeps/x86/tst-cet-legacy-1a.c: New file.
Linkers group input note sections with the same name into one output
note section with the same name. One output note section is placed in
one PT_NOTE segment. Since new linkers merge input .note.gnu.property
sections into one output .note.gnu.property section, there is only
one NT_GNU_PROPERTY_TYPE_0 note in one PT_NOTE segment with new linkers.
Since older linkers treat input .note.gnu.property section as a generic
note section and just concatenate all input .note.gnu.property sections
into one output .note.gnu.property section without merging them, we may
see multiple NT_GNU_PROPERTY_TYPE_0 notes in one PT_NOTE segment with
older linkers.
When an older linker is used to created the program on CET-enabled OS,
the linker output has a single .note.gnu.property section with multiple
NT_GNU_PROPERTY_TYPE_0 notes, some of which have IBT and SHSTK enable
bits set even if the program isn't CET enabled. Such programs will
crash on CET-enabled machines. This patch updates the note parser:
1. Skip note parsing if a NT_GNU_PROPERTY_TYPE_0 note has been processed.
2. Check multiple NT_GNU_PROPERTY_TYPE_0 notes.
[BZ #23509]
* sysdeps/x86/dl-prop.h (_dl_process_cet_property_note): Skip
note parsing if a NT_GNU_PROPERTY_TYPE_0 note has been processed.
Update the l_cet field when processing NT_GNU_PROPERTY_TYPE_0 note.
Check multiple NT_GNU_PROPERTY_TYPE_0 notes.
* sysdeps/x86/link_map.h (l_cet): Expand to 3 bits, Add
lc_unknown.
RDTSCP waits until all previous instructions have executed and all
previous loads are globally visible before reading the counter. RDTSC
doesn't wait until all previous instructions have been executed before
reading the counter. All x86 processors since 2010 support RDTSCP
instruction. This patch adds RDTSCP support to benchtests.
* benchtests/Makefile (CPPFLAGS-nonlib): Add -DUSE_RDTSCP if
USE_RDTSCP is defined.
* sysdeps/x86/hp-timing.h (HP_TIMING_NOW): Use RDTSCP if
USE_RDTSCP is defined.
Th commit 'Disable TSX on some Haswell processors.' (2702856bf4) changed the
default flags for Haswell models. Previously, new models were handled by the
default switch path, which assumed a Core i3/i5/i7 if AVX is available. After
the patch, Haswell models (0x3f, 0x3c, 0x45, 0x46) do not set the flags
Fast_Rep_String, Fast_Unaligned_Load, Fast_Unaligned_Copy, and
Prefer_PMINUB_for_stringop (only the TSX one).
This patch fixes it by disentangle the TSX flag handling from the memory
optimization ones. The strstr case cited on patch now selects the
__strstr_sse2_unaligned as expected for the Haswell cpu.
Checked on x86_64-linux-gnu.
[BZ #23709]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set TSX bits
independently of other flags.
Use __builtin_ia32_rdtsc directly since including <x86intrin.h> makes
building glibc very slow. On Intel Core i5-6260U, this patch reduces
x86-64 build time from 8 minutes 33 seconds to 3 minutes 48 seconds
with "make -j4" and GCC 8.2.1.
* sysdeps/x86/hp-timing.h: Don't include <x86intrin.h>.
(HP_TIMING_NOW): Replace _rdtsc with __builtin_ia32_rdtsc.
Since _rdtsc intrinsic is supported in GCC 4.9, we can use it for
HP_TIMING_NOW. This patch
1. Create x86 hp-timing.h to replace i686 and x86_64 hp-timing.h.
2. Move MINIMUM_ISA from init-arch.h to isa.h so that x86 hp-timing.h
can check minimum x86 ISA to decide if _rdtsc can be used.
NB: Checking if __i686__ isn't sufficient since __i686__ may not be
defined when building for i686 class processors.
* sysdeps/i386/init-arch.h: Removed.
* sysdeps/i386/i586/init-arch.h: Likewise.
* sysdeps/i386/i686/init-arch.h: Likewise.
* sysdeps/i386/i686/hp-timing.h: Likewise.
* sysdeps/x86_64/hp-timing.h: Likewise.
* sysdeps/i386/isa.h: New file.
* sysdeps/i386/i586/isa.h: Likewise.
* sysdeps/i386/i686/isa.h: Likewise.
* sysdeps/x86_64/isa.h: Likewise.
* sysdeps/x86/hp-timing.h: New file.
* sysdeps/x86/init-arch.h: Include <isa.h>.
Continuing the move to use, within libm, public names for libm
functions that can be inlined as built-in functions on many
architectures, this patch moves calls to __round functions to call the
corresponding round names instead, with asm redirection to __round
when the calls are not inlined.
An additional complication arises in
sysdeps/ieee754/ldbl-128ibm/e_expl.c, where a call to roundl, with the
result converted to int, gets converted by the compiler to call
lroundl in the case of 32-bit long, so resulting in localplt test
failures. It's logically correct to let the compiler make such an
optimization; an appropriate asm redirection of lroundl to __lroundl
is thus added to that file (it's not needed anywhere else).
Tested for x86_64, and with build-many-glibcs.py.
* include/math.h [!_ISOMAC && !(__FINITE_MATH_ONLY__ &&
__FINITE_MATH_ONLY__ > 0) && !NO_MATH_REDIRECT] (round): Redirect
using MATH_REDIRECT.
* sysdeps/aarch64/fpu/s_round.c: Define NO_MATH_REDIRECT before
header inclusion.
* sysdeps/aarch64/fpu/s_roundf.c: Likewise.
* sysdeps/ieee754/dbl-64/s_round.c: Likewise.
* sysdeps/ieee754/dbl-64/wordsize-64/s_round.c: Likewise.
* sysdeps/ieee754/float128/s_roundf128.c: Likewise.
* sysdeps/ieee754/flt-32/s_roundf.c: Likewise.
* sysdeps/ieee754/ldbl-128/s_roundl.c: Likewise.
* sysdeps/ieee754/ldbl-96/s_roundl.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_round.c: Likewise.
* sysdeps/powerpc/powerpc32/power4/fpu/multiarch/s_roundf.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_round.c: Likewise.
* sysdeps/powerpc/powerpc64/fpu/multiarch/s_roundf.c: Likewise.
* sysdeps/riscv/rv64/rvd/s_round.c: Likewise.
* sysdeps/riscv/rvf/s_roundf.c: Likewise.
* sysdeps/ieee754/ldbl-128ibm/s_roundl.c: Likewise.
(round): Redirect to __round.
(__roundl): Call round instead of __round.
* sysdeps/powerpc/fpu/math_private.h [_ARCH_PWR5X] (__round):
Remove macro.
[_ARCH_PWR5X] (__roundf): Likewise.
* sysdeps/ieee754/dbl-64/e_gamma_r.c (gamma_positive): Use round
functions instead of __round variants.
* sysdeps/ieee754/flt-32/e_gammaf_r.c (gammaf_positive): Likewise.
* sysdeps/ieee754/ldbl-128/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/ieee754/ldbl-96/e_gammal_r.c (gammal_positive):
Likewise.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_expl.c (lroundl): Redirect to
__lroundl.
(__ieee754_expl): Call roundl instead of __roundl.
I noticed that sysdeps/x86/cpu-features.h had conditionals on whether
to define HAS_CPUID, HAS_I586 and HAS_I686 with a long list of
preprocessor macros for i686-and-later processors which however was
out of date. This patch avoids the problem of the list getting out of
date by instead having conditionals on all the (few, old) pre-i686
processors for which GCC has preprocessor macros, rather than the
(many, expanding list) i686-and-later processors. It seems HAS_I586
and HAS_I686 are unused so the only effect of these macros being
missing is that 32-bit glibc built for one of these processors would
end up doing runtime detection of CPUID availability.
i386 builds are prevented by a configure test so there is no need to
allow for them here. __geode__ (no long nops?) and __k6__ (no CMOV,
at least according to GCC) are conservatively handled as i586, not
i686, here (as noted above, this is a theoretical distinction at
present in that only HAS_CPUID appears to be used).
Tested for x86.
* sysdeps/x86/cpu-features.h [__geode__ || __k6__]: Handle like
[__i586__ || __pentium__].
[__i486__]: Handle explicitly.
(HAS_CPUID): Define to 1 if above macros are undefined.
(HAS_I586): Likewise.
(HAS_I686): Likewise.
On some architectures, the parts of math_private.h relating to the
floating-point environment are in a separate file fenv_private.h
included from math_private.h. As this is purely an
architecture-specific convention used by several architectures,
however, all such architectures still need their own math_private.h,
even if it has nothing to do beyond #include <fenv_private.h> and
peculiarity of including the i386 file directly instead of having a
shared file in sysdeps/x86.
This patch makes the fenv_private.h name an architecture-independent
convention in glibc. The include of fenv_private.h from
math_private.h becomes architecture-independent (until callers are
updated to include fenv_private.h directly so the include from
math_private.h is no longer needed). Some architecture math_private.h
headers are removed if no longer needed, or renamed to fenv_private.h
if all they define belongs in that header; architecture fenv_private.h
headers now do require #include_next <fenv_private.h>. The i386
fenv_private.h file moves to sysdeps/x86/fpu/ to reflect how it is
actually shared with x86_64. The generic math_private.h gets a new
include of <stdbool.h>, as needed for bool in some prototypes in that
header (previously that was indirectly included via include/fenv.h,
which now only gets included too late in math_private.h, after those
prototypes).
Tested for x86_64 and x86, and tested with build-many-glibcs.py that
installed stripped shared libraries are unchanged by the patch.
* sysdeps/aarch64/fpu/fenv_private.h: New file. Based on ....
* sysdeps/aarch64/fpu/math_private.h: ... this file. All contents
moved to fenv_private.h except for ...
(TOINT_INTRINSICS): Kept in math_private.h.
(roundtoint): Likewise.
(converttoint): Likewise.
* sysdeps/arm/fenv_private.h: Change multiple-include guard to
[ARM_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/arm/math_private.h: Remove.
* sysdeps/generic/fenv_private.h: New file. Contents moved from
....
* sysdeps/generic/math_private.h: ... this file. Include
<stdbool.h>. Do not include <fenv.h> or <get-rounding-mode.h>.
Include <fenv_private.h>. Remove functions and macros moved to
fenv_private.h.
* sysdeps/i386/fpu/math_private.h: Remove.
* sysdeps/mips/math_private.h: Move to ....
* sysdeps/mips/fpu/fenv_private.h: ... here. Change
multiple-include guard to [MIPS_FENV_PRIVATE_H]. Remove
[__mips_hard_float] conditional. Include next <fenv_private.h>.
* sysdeps/powerpc/fpu/fenv_private.h: Change multiple-include
guard to [POWERPC_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/powerpc/fpu/math_private.h: Do not include
<fenv_private.h>.
* sysdeps/riscv/rvf/math_private.h: Move to ....
* sysdeps/riscv/rvf/fenv_private.h: ... here. Change
multiple-include guard to [RISCV_FENV_PRIVATE_H]. Include next
<fenv_private.h>.
* sysdeps/sparc/fpu/fenv_private.h: Change multiple-include guard
to [SPARC_FENV_PRIVATE_H]. Include next <fenv_private.h>.
* sysdeps/sparc/fpu/math_private.h: Remove.
* sysdeps/i386/fpu/fenv_private.h: Move to ....
* sysdeps/x86/fpu/fenv_private.h: ... here. Change
multiple-include guard to [X86_FENV_PRIVATE_H]. Include next
<fenv_private.h>.
* sysdeps/x86_64/fpu/math_private.h: Do not include
<sysdeps/i386/fpu/fenv_private.h>.
Continuing moving macros out of math-tests.h to smaller headers
following typo-proof conventions instead of using #ifndef, this patch
moves the SNAN_TESTS_* macros for individual types out to their own
sysdeps header (while the type-generic SNAN_TESTS wrapper for those
macros remains in math-tests.h).
Tested for x86_64 and x86, and with build-many-glibcs.py.
* sysdeps/generic/math-tests-snan.h: New file.
* sysdeps/generic/math-tests.h: Include <math-tests-snan.h>.
(SNAN_TESTS_float): Do not define here.
(SNAN_TESTS_double): Likewise.
(SNAN_TESTS_long_double): Likewise.
(SNAN_TESTS_float128): Likewise.
* sysdeps/i386/fpu/math-tests-snan.h: New file.
* sysdeps/i386/fpu/math-tests.h: Remove file.
* sysdeps/ia64/math-tests-snan.h: New file.
* sysdeps/ia64/math-tests.h: Remove file.
* sysdeps/x86/math-tests.h: Likewise.
* sysdeps/x86_64/fpu/math-tests-snan.h: New file.
Move STATE_SAVE_OFFSET and STATE_SAVE_MASK to sysdep.h to make
sysdeps/x86/cpu-features.h a C header file.
* sysdeps/x86/cpu-features.h (STATE_SAVE_OFFSET): Removed.
(STATE_SAVE_MASK): Likewise.
Don't check __ASSEMBLER__ to include <cpu-features-offsets.h>.
* sysdeps/x86/sysdep.h (STATE_SAVE_OFFSET): New.
(STATE_SAVE_MASK): Likewise.
* sysdeps/x86_64/dl-trampoline.S: Include <cpu-features-offsets.h>
instead of <cpu-features.h>.
The glibc.tune namespace is vaguely named since it is a 'tunable', so
give it a more specific name that describes what it refers to. Rename
the tunable namespace to 'cpu' to more accurately reflect what it
encompasses. Also rename glibc.tune.cpu to glibc.cpu.name since
glibc.cpu.cpu is weird.
* NEWS: Mention the change.
* elf/dl-tunables.list: Rename tune namespace to cpu.
* sysdeps/powerpc/dl-tunables.list: Likewise.
* sysdeps/x86/dl-tunables.list: Likewise.
* sysdeps/aarch64/dl-tunables.list: Rename tune.cpu to
cpu.name.
* elf/dl-hwcaps.c (_dl_important_hwcaps): Adjust.
* elf/dl-hwcaps.h (GET_HWCAP_MASK): Likewise.
* manual/README.tunables: Likewise.
* manual/tunables.texi: Likewise.
* sysdeps/powerpc/cpu-features.c: Likewise.
* sysdeps/unix/sysv/linux/aarch64/cpu-features.c
(init_cpu_features): Likewise.
* sysdeps/x86/cpu-features.c: Likewise.
* sysdeps/x86/cpu-features.h: Likewise.
* sysdeps/x86/cpu-tunables.c: Likewise.
* sysdeps/x86_64/Makefile: Likewise.
* sysdeps/x86/dl-cet.c: Likewise.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
GNU_PROPERTY_X86_FEATURE_1_AND may not be the first property item. We
need to check each property item until we reach the end of the property
or find GNU_PROPERTY_X86_FEATURE_1_AND.
This patch adds 2 tests. The first test checks if IBT is enabled and
the second test reads the output from the first test to check if IBT
is is enabled. The second second test fails if IBT isn't enabled
properly.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #23467]
* sysdeps/unix/sysv/linux/x86/Makefile (tests): Add
tst-cet-property-1 and tst-cet-property-2 if CET is enabled.
(CFLAGS-tst-cet-property-1.o): New.
(ASFLAGS-tst-cet-property-dep-2.o): Likewise.
($(objpfx)tst-cet-property-2): Likewise.
($(objpfx)tst-cet-property-2.out): Likewise.
* sysdeps/unix/sysv/linux/x86/tst-cet-property-1.c: New file.
* sysdeps/unix/sysv/linux/x86/tst-cet-property-2.c: Likewise.
* sysdeps/unix/sysv/linux/x86/tst-cet-property-dep-2.S: Likewise.
* sysdeps/x86/dl-prop.h (_dl_process_cet_property_note): Parse
each property item until GNU_PROPERTY_X86_FEATURE_1_AND is found.
All tests should be added to $(tests).
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ #23458]
* sysdeps/x86/Makefile (tests): Add tst-get-cpu-features-static.
Simply check if "ptr < ptr_end" since "ptr" is always incremented by 8.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/x86/dl-prop.h (_dl_process_cet_property_note): Don't
parse beyond the note end.
cpu-features.h has
#define bit_cpu_LZCNT (1 << 5)
#define index_cpu_LZCNT COMMON_CPUID_INDEX_1
#define reg_LZCNT
But the LZCNT feature bit is in COMMON_CPUID_INDEX_80000001:
Initial EAX Value: 80000001H
ECX Extended Processor Signature and Feature Bits:
Bit 05: LZCNT available
index_cpu_LZCNT should be COMMON_CPUID_INDEX_80000001, not
COMMON_CPUID_INDEX_1. The VMX feature bit is in COMMON_CPUID_INDEX_1:
Initial EAX Value: 01H
Feature Information Returned in the ECX Register:
5 VMX
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
[BZ # 23456]
* sysdeps/x86/cpu-features.h (index_cpu_LZCNT): Set to
COMMON_CPUID_INDEX_80000001.
CET arch_prctl bits should be defined in <asm/prctl.h> from Linux kernel
header files. Add x86 <include/asm/prctl.h> for pre-CET kernel header
files.
Note: sysdeps/unix/sysv/linux/x86/include/asm/prctl.h should be removed
if <asm/prctl.h> from the required kernel header files contains CET
arch_prctl bits.
/* CET features:
IBT: GNU_PROPERTY_X86_FEATURE_1_IBT
SHSTK: GNU_PROPERTY_X86_FEATURE_1_SHSTK
*/
/* Return CET features in unsigned long long *addr:
features: addr[0].
shadow stack base address: addr[1].
shadow stack size: addr[2].
*/
# define ARCH_CET_STATUS 0x3001
/* Disable CET features in unsigned int features. */
# define ARCH_CET_DISABLE 0x3002
/* Lock all CET features. */
# define ARCH_CET_LOCK 0x3003
/* Allocate a new shadow stack with unsigned long long *addr:
IN: requested shadow stack size: *addr.
OUT: allocated shadow stack address: *addr.
*/
# define ARCH_CET_ALLOC_SHSTK 0x3004
/* Return legacy region bitmap info in unsigned long long *addr:
address: addr[0].
size: addr[1].
*/
# define ARCH_CET_LEGACY_BITMAP 0x3005
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/unix/sysv/linux/x86/include/asm/prctl.h: New file.
* sysdeps/unix/sysv/linux/x86/cpu-features.c: Include
<sys/prctl.h> and <asm/prctl.h>.
(get_cet_status): Call arch_prctl with ARCH_CET_STATUS.
* sysdeps/unix/sysv/linux/x86/dl-cet.h: Include <sys/prctl.h>
and <asm/prctl.h>.
(dl_cet_allocate_legacy_bitmap): Call arch_prctl with
ARCH_CET_LEGACY_BITMAP.
(dl_cet_disable_cet): Call arch_prctl with ARCH_CET_DISABLE.
(dl_cet_lock_cet): Call arch_prctl with ARCH_CET_LOCK.
* sysdeps/x86/libc-start.c: Include <startup.h>.
Add <bits/indirect-return.h> and include it in <ucontext.h>.
__INDIRECT_RETURN defined in <bits/indirect-return.h> indicates if
swapcontext requires special compiler treatment. The default
__INDIRECT_RETURN is empty.
On x86, when shadow stack is enabled, __INDIRECT_RETURN is defined
with indirect_return attribute, which has been added to GCC 9, to
indicate that swapcontext returns via indirect branch. Otherwise
__INDIRECT_RETURN is defined with returns_twice attribute.
When shadow stack is enabled, remove always_inline attribute from
prepare_test_buffer in string/tst-xbzero-opt.c to avoid:
tst-xbzero-opt.c: In function ‘prepare_test_buffer’:
tst-xbzero-opt.c:105:1: error: function ‘prepare_test_buffer’ can never be inlined because it uses setjmp
prepare_test_buffer (unsigned char *buf)
when indirect_return attribute isn't available.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* bits/indirect-return.h: New file.
* misc/sys/cdefs.h (__glibc_has_attribute): New.
* sysdeps/x86/bits/indirect-return.h: Likewise.
* stdlib/Makefile (headers): Add bits/indirect-return.h.
* stdlib/ucontext.h: Include <bits/indirect-return.h>.
(swapcontext): Add __INDIRECT_RETURN.
* string/tst-xbzero-opt.c (ALWAYS_INLINE): New.
(prepare_test_buffer): Use it.
Always include <dl-cet.h> and cet-tunables.h> when CET is enabled.
Otherwise, configure glibc with --enable-cet --disable-tunables will
fail to build.
* sysdeps/x86/cpu-features.c: Always include <dl-cet.h> and
cet-tunables.h> when CET is enabled.
Intel Control-flow Enforcement Technology (CET) instructions:
https://software.intel.com/sites/default/files/managed/4d/2a/control-flow-en
forcement-technology-preview.pdf
includes Indirect Branch Tracking (IBT) and Shadow Stack (SHSTK).
GNU_PROPERTY_X86_FEATURE_1_IBT is added to GNU program property to
indicate that all executable sections are compatible with IBT when
ENDBR instruction starts each valid target where an indirect branch
instruction can land. Linker sets GNU_PROPERTY_X86_FEATURE_1_IBT on
output only if it is set on all relocatable inputs.
On an IBT capable processor, the following steps should be taken:
1. When loading an executable without an interpreter, enable IBT and
lock IBT if GNU_PROPERTY_X86_FEATURE_1_IBT is set on the executable.
2. When loading an executable with an interpreter, enable IBT if
GNU_PROPERTY_X86_FEATURE_1_IBT is set on the interpreter.
a. If GNU_PROPERTY_X86_FEATURE_1_IBT isn't set on the executable,
disable IBT.
b. Lock IBT.
3. If IBT is enabled, when loading a shared object without
GNU_PROPERTY_X86_FEATURE_1_IBT:
a. If legacy interwork is allowed, then mark all pages in executable
PT_LOAD segments in legacy code page bitmap. Failure of legacy code
page bitmap allocation causes an error.
b. If legacy interwork isn't allowed, it causes an error.
GNU_PROPERTY_X86_FEATURE_1_SHSTK is added to GNU program property to
indicate that all executable sections are compatible with SHSTK where
return address popped from shadow stack always matches return address
popped from normal stack. Linker sets GNU_PROPERTY_X86_FEATURE_1_SHSTK
on output only if it is set on all relocatable inputs.
On a SHSTK capable processor, the following steps should be taken:
1. When loading an executable without an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on the executable.
2. When loading an executable with an interpreter, enable SHSTK if
GNU_PROPERTY_X86_FEATURE_1_SHSTK is set on interpreter.
a. If GNU_PROPERTY_X86_FEATURE_1_SHSTK isn't set on the executable
or any shared objects loaded via the DT_NEEDED tag, disable SHSTK.
b. Otherwise lock SHSTK.
3. After SHSTK is enabled, it is an error to load a shared object
without GNU_PROPERTY_X86_FEATURE_1_SHSTK.
To enable CET support in glibc, --enable-cet is required to configure
glibc. When CET is enabled, both compiler and assembler must support
CET. Otherwise, it is a configure-time error.
To support CET run-time control,
1. _dl_x86_feature_1 is added to the writable ld.so namespace to indicate
if IBT or SHSTK are enabled at run-time. It should be initialized by
init_cpu_features.
2. For dynamic executables:
a. A l_cet field is added to struct link_map to indicate if IBT or
SHSTK is enabled in an ELF module. _dl_process_pt_note or
_rtld_process_pt_note is called to process PT_NOTE segment for
GNU program property and set l_cet.
b. _dl_open_check is added to check IBT and SHSTK compatibilty when
dlopening a shared object.
3. Replace i386 _dl_runtime_resolve and _dl_runtime_profile with
_dl_runtime_resolve_shstk and _dl_runtime_profile_shstk, respectively if
SHSTK is enabled.
CET run-time control can be changed via GLIBC_TUNABLES with
$ export GLIBC_TUNABLES=glibc.tune.x86_shstk=[permissive|on|off]
$ export GLIBC_TUNABLES=glibc.tune.x86_ibt=[permissive|on|off]
1. permissive: SHSTK is disabled when dlopening a legacy ELF module.
2. on: IBT or SHSTK are always enabled, regardless if there are IBT or
SHSTK bits in GNU program property.
3. off: IBT or SHSTK are always disabled, regardless if there are IBT or
SHSTK bits in GNU program property.
<cet.h> from CET-enabled GCC is automatically included by assembly codes
to add GNU_PROPERTY_X86_FEATURE_1_IBT and GNU_PROPERTY_X86_FEATURE_1_SHSTK
to GNU program property. _CET_ENDBR is added at the entrance of all
assembly functions whose address may be taken. _CET_NOTRACK is used to
insert NOTRACK prefix with indirect jump table to support IBT. It is
defined as notrack when _CET_NOTRACK is defined in <cet.h>.
[BZ #21598]
* configure.ac: Add --enable-cet.
* configure: Regenerated.
* elf/Makefille (all-built-dso): Add a comment.
* elf/dl-load.c (filebuf): Moved before "dynamic-link.h".
Include <dl-prop.h>.
(_dl_map_object_from_fd): Call _dl_process_pt_note on PT_NOTE
segment.
* elf/dl-open.c: Include <dl-prop.h>.
(dl_open_worker): Call _dl_open_check.
* elf/rtld.c: Include <dl-prop.h>.
(dl_main): Call _rtld_process_pt_note on PT_NOTE segment. Call
_rtld_main_check.
* sysdeps/generic/dl-prop.h: New file.
* sysdeps/i386/dl-cet.c: Likewise.
* sysdeps/unix/sysv/linux/x86/cpu-features.c: Likewise.
* sysdeps/unix/sysv/linux/x86/dl-cet.h: Likewise.
* sysdeps/x86/cet-tunables.h: Likewise.
* sysdeps/x86/check-cet.awk: Likewise.
* sysdeps/x86/configure: Likewise.
* sysdeps/x86/configure.ac: Likewise.
* sysdeps/x86/dl-cet.c: Likewise.
* sysdeps/x86/dl-procruntime.c: Likewise.
* sysdeps/x86/dl-prop.h: Likewise.
* sysdeps/x86/libc-start.h: Likewise.
* sysdeps/x86/link_map.h: Likewise.
* sysdeps/i386/dl-trampoline.S (_dl_runtime_resolve): Add
_CET_ENDBR.
(_dl_runtime_profile): Likewise.
(_dl_runtime_resolve_shstk): New.
(_dl_runtime_profile_shstk): Likewise.
* sysdeps/linux/x86/Makefile (sysdep-dl-routines): Add dl-cet
if CET is enabled.
(CFLAGS-.o): Add -fcf-protection if CET is enabled.
(CFLAGS-.os): Likewise.
(CFLAGS-.op): Likewise.
(CFLAGS-.oS): Likewise.
(asm-CPPFLAGS): Add -fcf-protection -include cet.h if CET
is enabled.
(tests-special): Add $(objpfx)check-cet.out.
(cet-built-dso): New.
(+$(cet-built-dso:=.note)): Likewise.
(common-generated): Add $(cet-built-dso:$(common-objpfx)%=%.note).
($(objpfx)check-cet.out): New.
(generated): Add check-cet.out.
* sysdeps/x86/cpu-features.c: Include <dl-cet.h> and
<cet-tunables.h>.
(TUNABLE_CALLBACK (set_x86_ibt)): New prototype.
(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
(init_cpu_features): Call get_cet_status to check CET status
and update dl_x86_feature_1 with CET status. Call
TUNABLE_CALLBACK (set_x86_ibt) and TUNABLE_CALLBACK
(set_x86_shstk). Disable and lock CET in libc.a.
* sysdeps/x86/cpu-tunables.c: Include <cet-tunables.h>.
(TUNABLE_CALLBACK (set_x86_ibt)): New function.
(TUNABLE_CALLBACK (set_x86_shstk)): Likewise.
* sysdeps/x86/sysdep.h (_CET_NOTRACK): New.
(_CET_ENDBR): Define if not defined.
(ENTRY): Add _CET_ENDBR.
* sysdeps/x86/dl-tunables.list (glibc.tune): Add x86_ibt and
x86_shstk.
* sysdeps/x86_64/dl-trampoline.h (_dl_runtime_resolve): Add
_CET_ENDBR.
(_dl_runtime_profile): Likewise.
Save and restore shadow stack pointer in setjmp and longjmp to support
shadow stack in Intel CET. Use feature_1 in tcbhead_t to check if
shadow stack is enabled before saving and restoring shadow stack pointer.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
* sysdeps/i386/__longjmp.S: Include <jmp_buf-ssp.h>.
(__longjmp): Restore shadow stack pointer if shadow stack is
enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
isn't defined for __longjmp_cancel.
* sysdeps/i386/bsd-_setjmp.S: Include <jmp_buf-ssp.h>.
(_setjmp): Save shadow stack pointer if shadow stack is enabled
and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/i386/bsd-setjmp.S: Include <jmp_buf-ssp.h>.
(setjmp): Save shadow stack pointer if shadow stack is enabled
and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/i386/setjmp.S: Include <jmp_buf-ssp.h>.
(__sigsetjmp): Save shadow stack pointer if shadow stack is
enabled and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/unix/sysv/linux/i386/____longjmp_chk.S: Include
<jmp_buf-ssp.h>.
(____longjmp_chk): Restore shadow stack pointer if shadow stack
is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/unix/sysv/linux/x86/Makefile (gen-as-const-headers):
Remove jmp_buf-ssp.sym.
* sysdeps/unix/sysv/linux/x86_64/____longjmp_chk.S: Include
<jmp_buf-ssp.h>.
(____longjmp_chk): Restore shadow stack pointer if shadow stack
is enabled and SHADOW_STACK_POINTER_OFFSET is defined.
* sysdeps/x86/Makefile (gen-as-const-headers): Add
jmp_buf-ssp.sym.
* sysdeps/x86/jmp_buf-ssp.sym: New dummy file.
* sysdeps/x86_64/__longjmp.S: Include <jmp_buf-ssp.h>.
(__longjmp): Restore shadow stack pointer if shadow stack is
enabled, SHADOW_STACK_POINTER_OFFSET is defined and __longjmp
isn't defined for __longjmp_cancel.
* sysdeps/x86_64/setjmp.S: Include <jmp_buf-ssp.h>.
(__sigsetjmp): Save shadow stack pointer if shadow stack is
enabled and SHADOW_STACK_POINTER_OFFSET is defined.