The definitions in libc are sufficient, the forwarders are no longer
needed.
The symbols have been moved using scripts/move-symbol-to-libc.py.
s390-linux-gnu and s390x-linux-gnu need a new version placeholder
to keep the GLIBC_2.19 symbol version in libpthread.
Tested on i386-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu,
x86_64-linux-gnu. Built with build-many-glibcs.py.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
All of the isnan functions are in libc.so due to printf_fp, so move
__isnanf128 there too for consistency.
Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort. When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation. Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.
1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered
by VZEROUPPER inside a transactionally executing RTM region.
2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp. Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE.
2. Disable all features which depend on XSAVE:
a. If OSXSAVE is disabled by glibc tunables. Or
b. If both XSAVE and XSAVEC aren't usable.
__x86_get_cpuid_feature_leaf is called during early startup, before
the stack check guard is initialized and is hence not safe to build
with stack-protector.
Additionally, IFUNC resolvers for static tst-ifunc-isa tests get
called too early for stack protector to be useful, so fix them to
disable stack protector for the resolver functions.
This fixes all failures seen with --enable-stack-protector=all
configuration.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
commit 2d651eb926
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Sep 18 07:55:14 2020 -0700
x86: Move x86 processor cache info to cpu_features
missed _SC_LEVEL1_ICACHE_LINESIZE.
1. Add level1_icache_linesize to struct cpu_features.
2. Initialize level1_icache_linesize by calling handle_intel,
handle_zhaoxin and handle_amd with _SC_LEVEL1_ICACHE_LINESIZE.
3. Return level1_icache_linesize for _SC_LEVEL1_ICACHE_LINESIZE.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Since the full ISA set used in an ELF binary is unknown to compiler,
an x86-64 ISA level marker indicates the minimum, not maximum, ISA set
required to run such an ELF binary. We never guarantee a library with
an x86-64 ISA level v3 marker doesn't contain other ISAs beyond x86-64
ISA level v3, like AVX VNNI. We check the x86-64 ISA level marker for
the minimum ISA set. Since -march=sandybridge enables only some ISAs
in x86-64 ISA level v3, we should set the needed ISA marker to v2.
Otherwise, libc is compiled with -march=sandybridge will fail to run on
Sandy Bridge:
$ ./elf/ld.so ./libc.so
./libc.so: (p) CPU ISA level is lower than required: needed: 7; got: 3
Set the minimum, instead of maximum, x86-64 ISA level marker should have
no impact on the glibc-hwcaps directory assignment logic in ldconfig nor
ld.so.
Remove the extra space between "# endif" left over from
commit f380868f6d
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Thu Dec 24 15:43:34 2020 -0800
Remove _ISOMAC check from <cpu-features.h>
The TUNABLE_SET interface took a primitive C type argument, which
resulted in inconsistent type conversions internally due to incorrect
dereferencing of types, especialy on 32-bit architectures. This
change simplifies the TUNABLE setting logic along with the interfaces.
Now all numeric tunable values are stored as signed numbers in
tunable_num_t, which is intmax_t. All calls to set tunables cast the
input value to its primitive type and then to tunable_num_t for
storage. This relies on gcc-specific (although I suspect other
compilers woul also do the same) unsigned to signed integer conversion
semantics, i.e. the bit pattern is conserved. The reverse conversion
is guaranteed by the standard.
1. Add CPUID_INDEX_14_ECX_0 for CPUID leaf 0x14 to detect PTWRITE feature
in EBX of CPUID leaf 0x14 with ECX == 0.
2. Add PTWRITE detection to CPU feature tests.
3. Add 2 static CPU feature tests.
In the process of optimizing memcpy for AMD machines, we have found the
vector move operations are outperforming enhanced REP MOVSB for data
transfers above the L2 cache size on Zen3 architectures.
To handle this use case, we are adding an upper bound parameter on
enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'.
As per large-bench results, we are configuring this parameter to the
L2 cache size for AMD machines and applicable from Zen3 architecture
supporting the ERMS feature.
For architectures other than AMD, it is the computed value of
non-temporal threshold parameter.
Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
Add _SC_MINSIGSTKSZ for the minimum signal stack size derived from
AT_MINSIGSTKSZ, which is the minimum number of bytes of free stack
space required in order to gurantee successful, non-nested handling
of a single signal whose handler is an empty function, and _SC_SIGSTKSZ
which is the suggested minimum number of bytes of stack space required
for a signal stack.
If AT_MINSIGSTKSZ isn't available, sysconf (_SC_MINSIGSTKSZ) returns
MINSIGSTKSZ. On Linux/x86 with XSAVE, the signal frame used by kernel
is composed of the following areas and laid out as:
------------------------------
| alignment padding |
------------------------------
| xsave buffer |
------------------------------
| fsave header (32-bit only) |
------------------------------
| siginfo + ucontext |
------------------------------
Compute AT_MINSIGSTKSZ value as size of xsave buffer + size of fsave
header (32-bit only) + size of siginfo and ucontext + alignment padding.
If _SC_SIGSTKSZ_SOURCE or _GNU_SOURCE are defined, MINSIGSTKSZ and SIGSTKSZ
are redefined as
/* Default stack size for a signal handler: sysconf (SC_SIGSTKSZ). */
# undef SIGSTKSZ
# define SIGSTKSZ sysconf (_SC_SIGSTKSZ)
/* Minimum stack size for a signal handler: SIGSTKSZ. */
# undef MINSIGSTKSZ
# define MINSIGSTKSZ SIGSTKSZ
Compilation will fail if the source assumes constant MINSIGSTKSZ or
SIGSTKSZ.
The reason for not simply increasing the kernel's MINSIGSTKSZ #define
(apart from the fact that it is rarely used, due to glibc's shadowing
definitions) was that userspace binaries will have baked in the old
value of the constant and may be making assumptions about it.
For example, the type (char [MINSIGSTKSZ]) changes if this #define
changes. This could be a problem if an newly built library tries to
memcpy() or dump such an object defined by and old binary.
Bounds-checking and the stack sizes passed to things like sigaltstack()
and makecontext() could similarly go wrong.
commit 94cd37ebb2
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Sep 16 05:27:32 2020 -0700
x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625]
broke
GLIBC_TUNABLES=glibc.cpu.hwcaps=-IBT,-SHSTK
since it can no longer disable IBT nor SHSTK. Handle IBT and SHSTK with:
1. Revert commit 94cd37ebb2.
2. Clears the usable CET feature bits if kernel doesn't support CET.
3. Add GLIBC_TUNABLES tests without dlopen.
4. Add tests to verify that CPU_FEATURE_USABLE on IBT and SHSTK matches
_get_ssp.
5. Update GLIBC_TUNABLES tests with dlopen to verify that CET is disabled
with GLIBC_TUNABLES.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Check ifunc resolver with CPU_FEATURE_USABLE and tunables in dynamic and
static executables to verify that CPUID features are initialized early in
static PIE.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Extern symbol access in position independent code usually involves GOT
indirection which needs RELATIVE reloc in a static linked PIE. (On
some targets this is avoided e.g. because the linker can relax a GOT
access to a pc-relative access, but this is not generally true.) Code
that runs before static PIE self relocation must avoid relying on
dynamic relocations which can be ensured by using hidden visibility.
However we cannot just make all symbols hidden:
On i386, all calls to IFUNC functions must go through PLT and calls to
hidden functions CANNOT go through PLT in PIE since EBX used in PIE PLT
may not be set up for local calls to hidden IFUNC functions.
This patch aims to make symbol references hidden in code that is used
before and by _dl_relocate_static_pie when building a static PIE libc.
Note: for an object that is used in the startup code, its references
and definition may not have consistent visibility: it is only forced
hidden in the startup code.
This is needed for fixing bug 27072.
Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
In <sys/platform/x86.h>, define CPU features as enum instead of using
the C preprocessor magic to make it easier to wrap this functionality
in other languages. Move the C preprocessor magic to internal header
for better GCC codegen when more than one features are checked in a
single expression as in x86-64 dl-hwcaps-subdirs.c.
1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX.
2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h.
3. Remove struct cpu_features and __x86_get_cpu_features from
<sys/platform/x86.h>.
4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it
in libc.
5. Make __get_cpu_features() private to glibc.
6. Replace __x86_get_cpu_features(N) with __get_cpu_features().
7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE.
8. Use a single enum index for each CPU feature detection.
9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf.
10. Return zero struct cpuid_feature for the older glibc binary with a
smaller CPUID_INDEX_MAX [BZ #27104].
11. Inside glibc, use the C preprocessor magic so that cpu_features data
can be loaded just once leading to more compact code for glibc.
256 bits are used for each CPUID leaf. Some leaves only contain a few
features. We can add exceptions to such leaves. But it will increase
code sizes and it is harder to provide backward/forward compatibilities
when new features are added to such leaves in the future.
When new leaves are added, _rtld_global_ro offsets will change which
leads to race condition during in-place updates. We may avoid in-place
updates by
1. Rename the old glibc.
2. Install the new glibc.
3. Remove the old glibc.
NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy
relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
1. Move x86 processor cache info to _dl_x86_cpu_features in ld.so.
2. Update tunable bounds with TUNABLE_SET_WITH_BOUNDS.
3. Move x86 cache info initialization to dl-cacheinfo.h and initialize
x86 cache info in init_cpu_features ().
4. Put x86 cache info for libc in cacheinfo.h, which is included in
libc-start.c in libc.a and is included in cacheinfo.c in libc.so.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Store ISA level in the portion of the unused upper 32 bits of the hwcaps
field in cache and the unused pad field in aux cache. ISA level is stored
and checked only for shared objects in glibc-hwcaps subdirectories. The
shared objects in the default directories aren't checked since there are
no fallbacks for these shared objects.
Tested on x86-64-v2, x86-64-v3 and x86-64-v4 machines with
--disable-hardcoded-path-in-tests and --enable-hardcoded-path-in-tests.
GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
levels:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250
and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with
GNU_PROPERTY_X86_ISA_1_V[234] marker:
https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13
Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by
commit b0ab06937385e0ae25cebf1991787d64f439bf12
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Oct 30 06:49:57 2020 -0700
x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker
and
commit 32930e4edbc06bc6f10c435dbcc63131715df678
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Oct 9 05:05:57 2020 -0700
x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker
GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the
micro-architecture ISA level required to execute the binary. The marker
must be added by programmers explicitly in one of 3 ways:
1. Pass -mneeded to GCC.
2. Add the marker in the linker inputs as this patch does.
3. Pass -z x86-64-v[234] to the linker.
Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker support to ld.so if binutils 2.32 or newer is used to build glibc:
1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
markers to elf.h.
2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker to abi-note.o based on the ISA level used to compile abi-note.o,
assuming that the same ISA level is used to compile the whole glibc.
3. Add isa_1 to cpu_features to record the supported x86 ISA level.
4. Rename _dl_process_cet_property_note to _dl_process_property_note and
add GNU_PROPERTY_X86_ISA_1_V[234] marker detection.
5. Update _rtld_main_check and _dl_open_check to check loaded objects
with the incompatible ISA level.
6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails
on lesser platforms.
7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c.
Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and
x86-64-v4 machines.
Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine
and got:
[hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1
./elf/tst-isa-level-1: CPU ISA level is lower than required
[hjl@gnu-cfl-2 build-x86_64-linux]$
Make the tests use TEST_COND_intel96 to decide on whether to build the
unnormal tests instead of the macro in nan-pseudo-number.h and then
drop the header inclusion. This unbreaks test runs on all
architectures that do not have ldbl-96.
Also drop the HANDLE_PSEUDO_NUMBERS macro since it is not used
anywhere.
I used these shell commands:
../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")
and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
Add support to treat pseudo-numbers specially and implement x86
version to consider all of them as signaling.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Do not attempt to fix the significand top bit in long double input
received in printf. The code should never reach here because isnan
should now detect unnormals as NaN. This is already a NOP for glibc
since it uses the gcc __builtin_isnan, which detects unnormals as NaN.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This syncs up isnanl behaviour with gcc. Also move the isnanl
implementation to sysdeps/x86 and remove the sysdeps/x86_64 version.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Also move sysdeps/i386/fpu/s_fpclassifyl.c to
sysdeps/x86/fpu/s_fpclassifyl.c and remove
sysdeps/x86_64/fpu/s_fpclassifyl.c
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add Intel Linear Address Masking (LAM) support to <sys/platform/x86.h>.
HAS_CPU_FEATURE (LAM) can be used to detect if LAM is enabled in CPU.
LAM modifies the checking that is applied to 64-bit linear addresses,
allowing software to use of the untranslated address bits for metadata.
Since we can't tell if the tunable value is set by user or not:
https://sourceware.org/bugzilla/show_bug.cgi?id=27069
remove the default REP MOVSB threshold tunable value so that the correct
default value will be set correctly by init_cacheinfo ().
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
To handle GNU property notes on aarch64 some segments need to
be mmaped again, so the fd of the loaded ELF module is needed.
When the fd is not available (kernel loaded modules), then -1
is passed.
The fd is passed to both _dl_process_pt_gnu_property and
_dl_process_pt_note for consistency. Target specific note
processing functions are updated accordingly.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Check HAS_CPU_FEATURE instead of CPU_FEATURE_USABLE for FSGSBASE, IBT,
LM, SHSTK and XSAVES since FSGSBASE requires kernel support, IBT/SHSTK/LM
require OS support and XSAVES is supervisor-mode only.
The UP macro is never defined. Also define LOCK_PREFIX
unconditionally, to the same string.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.
Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start. But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called. Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late. Since some fields in _rtld_global_ro
in ld.so are initialized by dynamic relocation, we can also initialize
x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so
by initializing dummy function pointers in ld.so and libc.so via IFUNC
relocation.
Key points:
1. IFUNC is always supported, independent of --enable-multi-arch or
--disable-multi-arch. Linker generates IFUNC relocations from input
IFUNC objects and ld.so performs IFUNC relocations.
2. There are no IFUNC dependencies in ld.so before dynamic relocation
have been performed,
3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT
in dynamic executable and by IFUNC relocation in dlopen in static
executable.
4. The x86 cache info in libc.o is initialized by IFUNC relocation.
5. In libc.a, both x86 CPU features and cache info are initialized from
ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init
is called.
Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
The __x86_shared_non_temporal_threshold determines when memcpy on x86
uses non_temporal stores to avoid pushing other data out of the last
level cache.
This patch proposes to revert the calculation change made by H.J. Lu's
patch of June 2, 2017.
H.J. Lu's patch selected a threshold suitable for a single thread
getting maximum performance. It was tuned using the single threaded
large memcpy micro benchmark on an 8 core processor. The last change
changes the threshold from using 3/4 of one thread's share of the
cache to using 3/4 of the entire cache of a multi-threaded system
before switching to non-temporal stores. Multi-threaded systems with
more than a few threads are server-class and typically have many
active threads. If one thread consumes 3/4 of the available cache for
all threads, it will cause other active threads to have data removed
from the cache. Two examples show the range of the effect. John
McCalpin's widely parallel Stream benchmark, which runs in parallel
and fetches data sequentially, saw a 20% slowdown with this patch on
an internal system test of 128 threads. This regression was discovered
when comparing OL8 performance to OL7. An example that compares
normal stores to non-temporal stores may be found at
https://vgatherps.github.io/2018-09-02-nontemporal/. A simple test
shows performance loss of 400 to 500% due to a failure to use
nontemporal stores. These performance losses are most likely to occur
when the system load is heaviest and good performance is critical.
The tunable x86_non_temporal_threshold can be used to override the
default for the knowledgable user who really wants maximum cache
allocation to a single thread in a multi-threaded system.
The manual entry for the tunable has been expanded to provide
more information about its purpose.
modified: sysdeps/x86/cacheinfo.c
modified: manual/tunables.texi
The behavior of isnan/__builtin_isnan on bit patterns that do not
correspond to something that the CPU would produce from valid inputs
is currently under-defined in the toolchain. (The GCC built-in and
glibc disagree.)
The isnan check in PRINTF_FP_FETCH in stdio-common/printf_fp.c
assumes the GCC behavior that returns true for non-normal numbers
which are not specified as NaN. (The glibc implementation returns
false for such numbers.)
At present, passing non-normal numbers to __mpn_extract_long_double
causes this function to produce irregularly shaped multi-precision
integers, triggering undefined behavior in __printf_fp_l.
With GCC 10 and glibc 2.32, this behavior is not visible because
__builtin_isnan is used, which avoids calling
__mpn_extract_long_double in this case. This commit updates the
implementation of __mpn_extract_long_double so that regularly shaped
multi-precision integers are produced in this case, avoiding
undefined behavior in __printf_fp_l.
commit 04bba1e5d8
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Aug 5 13:51:56 2020 -0700
x86: Set CPU usable feature bits conservatively [BZ #26552]
Set CPU usable feature bits only for CPU features which are usable in
user space and whose usability can be detected from user space, excluding
features like FSGSBASE whose enable bit can only be checked in the kernel.
no longer turns on the usable bits of IBT and SHSTK since we don't know
if IBT and SHSTK are usable until much later. Use HAS_CPU_FEATURE to
check if the processor supports IBT and SHSTK.
Add Intel Key Locker:
https://software.intel.com/content/www/us/en/develop/download/intel-key-locker-specification.html
support to <sys/platform/x86.h>. Intel Key Locker has
1. KL: AES Key Locker instructions.
2. WIDE_KL: AES wide Key Locker instructions.
3. AESKLE: AES Key Locker instructions are enabled by OS.
Applications should use
if (CPU_FEATURE_USABLE (KL))
and
if (CPU_FEATURE_USABLE (WIDE_KL))
to check if AES Key Locker instructions and AES wide Key Locker
instructions are usable.
Install <sys/platform/x86.h> so that programmers can do
#if __has_include(<sys/platform/x86.h>)
#include <sys/platform/x86.h>
#endif
...
if (CPU_FEATURE_USABLE (SSE2))
...
if (CPU_FEATURE_USABLE (AVX2))
...
<sys/platform/x86.h> exports only:
enum
{
COMMON_CPUID_INDEX_1 = 0,
COMMON_CPUID_INDEX_7,
COMMON_CPUID_INDEX_80000001,
COMMON_CPUID_INDEX_D_ECX_1,
COMMON_CPUID_INDEX_80000007,
COMMON_CPUID_INDEX_80000008,
COMMON_CPUID_INDEX_7_ECX_1,
/* Keep the following line at the end. */
COMMON_CPUID_INDEX_MAX
};
struct cpuid_features
{
struct cpuid_registers cpuid;
struct cpuid_registers usable;
};
struct cpu_features
{
struct cpu_features_basic basic;
struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
};
/* Get a pointer to the CPU features structure. */
extern const struct cpu_features *__x86_get_cpu_features
(unsigned int max) __attribute__ ((const));
Since all feature checks are done through macros, programs compiled with
a newer <sys/platform/x86.h> are compatible with the older glibc binaries
as long as the layout of struct cpu_features is identical. The features
array can be expanded with backward binary compatibility for both .o and
.so files. When COMMON_CPUID_INDEX_MAX is increased to support new
processor features, __x86_get_cpu_features in the older glibc binaries
returns NULL and HAS_CPU_FEATURE/CPU_FEATURE_USABLE return false on the
new processor feature. No new symbol version is neeeded.
Both CPU_FEATURE_USABLE and HAS_CPU_FEATURE are provided. HAS_CPU_FEATURE
can be used to identify processor features.
Note: Although GCC has __builtin_cpu_supports, it only supports a subset
of <sys/platform/x86.h> and it is equivalent to CPU_FEATURE_USABLE. It
doesn't support HAS_CPU_FEATURE.
Set CPU usable feature bits only for CPU features which are usable in
user space and whose usability can be detected from user space, excluding
features like FSGSBASE whose enable bit can only be checked in the kernel.
Intel64 and IA-32 Architectures Software Developer’s Manual has changed
the following CPU feature names:
1. The CPU feature of Enhanced Intel SpeedStep Technology is renamed
from EST to EIST.
2. The CPU feature which supports Platform Quality of Service Monitoring
(PQM) capability is changed to Intel Resource Director Technology
(Intel RDT) Monitoring capability, i.e. PQM is renamed to RDT_M.
3. The CPU feature which supports Platform Quality of Service
Enforcement (PQE) capability is changed to Intel Resource Director
Technology (Intel RDT) Allocation capability, i.e. PQE is renamed to
RDT_A.
Support usable check for all CPU features with the following changes:
1. Change struct cpu_features to
struct cpuid_features
{
struct cpuid_registers cpuid;
struct cpuid_registers usable;
};
struct cpu_features
{
struct cpu_features_basic basic;
struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};
so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits. EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.
The results are
1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
Since
commit 430388d5dc
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Aug 3 08:04:49 2018 -0700
x86: Don't include <init-arch.h> in assembly codes
removed all usages of <init-arch.h> from assembly codes, we can remove
__ASSEMBLER__ check in init-arch.h.
Since
commit c867597bff
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Jun 8 13:57:50 2016 -0700
X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove
removed the only usage of __x86_prefetchw, we can remove the unused
__x86_prefetchw.
Add generic code to handle PT_GNU_PROPERTY notes. Invalid
content is ignored, _dl_process_pt_gnu_property is always called
after PT_LOAD segments are mapped and it has no failure modes.
Currently only one NT_GNU_PROPERTY_TYPE_0 note is handled, which
contains target specific properties: the _dl_process_gnu_property
hook is called for each property.
The old _dl_process_pt_note and _rtld_process_pt_note differ in how
the program header is read. The old _dl_process_pt_note is called
before PT_LOAD segments are mapped and _rtld_process_pt_note is called
after PT_LOAD segments are mapped. The old _rtld_process_pt_note is
removed and _dl_process_pt_note is always called after PT_LOAD
segments are mapped and now it has no failure modes.
The program headers are scanned backwards so that PT_NOTE can be
skipped if PT_GNU_PROPERTY exists.
Co-Authored-By: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Add x86_rep_movsb_threshold and x86_rep_stosb_threshold to tunables
to update thresholds for "rep movsb" and "rep stosb" at run-time.
Note that the user specified threshold for "rep movsb" smaller than
the minimum threshold will be ignored.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
An extension called extended feature disable (XFD) is an extension added
for Intel AMX to the XSAVE feature set that allows an operating system
to enable a feature while preventing specific user threads from using
the feature.
Intel Advanced Matrix Extensions (Intel AMX) is a new programming
paradigm consisting of two components: a set of 2-dimensional registers
(tiles) representing sub-arrays from a larger 2-dimensional memory image,
and accelerators able to operate on tiles. Intel AMX is an extensible
architecture. New accelerators can be added and the existing accelerator
may be enhanced to provide higher performance. The initial features are
AMX-BF16, AMX-TILE and AMX-INT8, which are usable only if the operating
system supports both XTILECFG state and XTILEDATA state.
Add AMX-BF16, AMX-TILE and AMX-INT8 support to HAS_CPU_FEATURE and
CPU_FEATURE_USABLE.
1. Divide architecture features into the usable features and the preferred
features. The usable features are for correctness and can be exported in
a stable ABI. The preferred features are for performance and only for
glibc internal use.
2. Change struct cpu_features to
struct cpu_features
{
struct cpu_features_basic basic;
unsigned int *usable_p;
struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];
unsigned int usable[USABLE_FEATURE_INDEX_MAX];
unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};
and initialize usable_p to pointer to the usable arary so that
struct cpu_features
{
struct cpu_features_basic basic;
unsigned int *usable_p;
struct cpuid_registers cpuid[COMMON_CPUID_INDEX_MAX];
};
can be exported via a stable ABI. The cpuid and usable arrays can be
expanded with backward binary compatibility for both .o and .so files.
3. Add COMMON_CPUID_INDEX_7_ECX_1 for AVX512_BF16.
4. Detect ENQCMD, PKS, AVX512_VP2INTERSECT, MD_CLEAR, SERIALIZE, HYBRID,
TSXLDTRK, L1D_FLUSH, CORE_CAPABILITIES and AVX512_BF16.
5. Rename CAPABILITIES to ARCH_CAPABILITIES.
6. Check if AVX512_VP2INTERSECT, AVX512_BF16 and PKU are usable.
7. Update CPU feature detection test.
When CET is enabled, it is an error to dlopen a non CET enabled shared
library in CET enabled application. It may be desirable to make CET
permissive, that is disable CET when dlopening a non CET enabled shared
library. With the new --enable-cet=permissive configure option, CET is
disabled when dlopening a non CET enabled shared library.
Add DEFAULT_DL_X86_CET_CONTROL to config.h.in:
/* The default value of x86 CET control. */
#define DEFAULT_DL_X86_CET_CONTROL cet_elf_property
which enables CET features based on ELF property note.
--enable-cet=permissive it to
/* The default value of x86 CET control. */
#define DEFAULT_DL_X86_CET_CONTROL cet_permissive
which enables CET features permissively.
Update tst-cet-legacy-5a, tst-cet-legacy-5b, tst-cet-legacy-6a and
tst-cet-legacy-6b to check --enable-cet and --enable-cet=permissive.
1. Include <dl-procruntime.c> to get architecture specific initializer in
rtld_global.
2. Change _dl_x86_feature_1[2] to _dl_x86_feature_1.
3. Add _dl_x86_feature_control after _dl_x86_feature_1, which is a
struct of 2 bitfields for IBT and SHSTK control
This fixes [BZ #25887].
This consolidates the copy-pasted arch specific semaphore header into
single version (based on s390) which suffices 32-bit and and 64-bit
arch/ABI based on the canonical WORDSIZE.
For now I've left out arches which use alternate defines to choose for
32 vs 64-bit builds (aarch64, mips) which in theory can also use the same
header.
Passes build-many for
aarch64-linux-gnu arm-linux-gnueabi arm-linux-gnueabihf
riscv64-linux-gnu-rv64imac-lp64 riscv64-linux-gnu-rv64imafdc-lp64
x86_64-linux-gnu microblaze-linux-gnu nios2-linux-gnu
Suggested-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Commit a98dc92dd1 ("x86: Add cache
information support for Zhaoxin processors") introduced an unused
variable warning in the default i686-linux-gnu build:
In file included from ../sysdeps/i386/cacheinfo.c:3:
../sysdeps/x86/cacheinfo.c: In function 'init_cacheinfo':
../sysdeps/x86/cacheinfo.c:762:16: error: unused variable 'eax' [-Werror=unused-variable]
762 | unsigned int eax;
| ^~~
To obtain Zhaoxin CPU cache information, add a new function
handle_zhaoxin().
Add a new function get_common_cache_info() that extracts the code
in init_cacheinfo() to get the value of the variable shared, threads.
Add Zhaoxin branch in init_cacheinfo() for initializing variables,
such as __x86_shared_cache_size.
The __sfp_handle_exceptions is not fully correct regarding raising
exceptions, since there is no direct way to raise only FP_EX_OVERFLOW
nor FP_EX_UNDERFLOW for SSE mode. Both libgcc and feraiseexcept rely
on x87 mode to accomplish it.
This reverts commit 460ee50de0.
Checked on x86_64.
The exported x86_64 fenv.h functions operate on both i387 and SSE (since
they should work on both float, double, and long double) while the
internal libc_fe* set either SSE (float, double, and float128) or
i387 (long double).
The libgcc __sfp_handle_exceptions (used on float128 implementation),
however, will set either SEE or i387 exception depending of the
exception to raise. This broke the internal assumption of float128
where only SSE operations will be used.
This patch reimplements the libgcc __sfp_handle_exceptions to use only
SSE operations and sets libgcc to use it instead of its own
implementation.
And I think we should fix libgcc in a similar manner, since checking on
config/i386/64/sfp-machine.h it already only supports SSE rounding mode
and x86_64 ABI also expectes float128 to use SSE registers [1]
(although it is not clear on how future implementation might implement
it).
Checked on x86_64-linux-gnu.
[1] https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI
Similar to fenvinline.h removal, this kind of optimization is better
implemented by the compiler. Also newer code avoid setting exceptions
directly (for instance the code to make new logf, log2f and powf
implementatation to now support SVID compat).
The BZ#94194 [1] the corresponding GCC bug for adding replacements
for these on x86.
Checked on x86_64-linux-gnu and i686-linux-gnu.
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94194
Since legacy bitmap doesn't cover jitted code generated by legacy JIT
engine, it isn't very useful. This patch removes ARCH_CET_LEGACY_BITMAP
and treats indirect branch tracking similar to shadow stack by removing
legacy bitmap support.
Tested on CET Linux/x86-64 and non-CET Linux/x86-64.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
This supersedes the init_array sysdeps directory. It allows us to
check for ELF_INITFINI in both C and assembler code, and skip DT_INIT
and DT_FINI processing completely on newer architectures.
A new header file is needed because <dl-machine.h> is incompatible
with assembler code. <sysdep.h> is compatible with assembler code,
but it cannot be included in all assembler files because on some
architectures, it redefines register names, and some assembler files
conflict with that.
<elf-initfini.h> is replicated for legacy architectures which need
DT_INIT/DT_FINI support. New architectures follow the generic default
and disable it.
Particularly on CPUs without ERMS, the string instructions are slow,
so it is unclear whether this architecture-specific implementation is
in fact an optimization.
We shouldn't make 2 calls to dlerror () in a row since the first call
will clear the error. We should just use the return value from the
first call.
Tested on Linux/x86-64.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This patch adds a new macro, libm_alias_finite, to define all _finite
symbol. It sets all _finite symbol as compat symbol based on its first
version (obtained from the definition at built generated first-versions.h).
The <fn>f128_finite symbols were introduced in GLIBC 2.26 and so need
special treatment in code that is shared between long double and float128.
It is done by adding a list, similar to internal symbol redifinition,
on sysdeps/ieee754/float128/float128_private.h.
Alpha also needs some tricky changes to ensure we still emit 2 compat
symbols for sqrt(f).
Passes buildmanyglibc.
Co-authored-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
This patch adds a new generic __pthread_rwlock_arch_t definition meant
to be used by new ports. Its layout mimics the current usage on some
64 bits ports and it allows some ports to use the generic definition.
The arch __pthread_rwlock_arch_t definition is moved from
pthreadtypes-arch.h to another arch-specific header (struct_rwlock.h).
Also the static intialization macro for pthread_rwlock_t is set to use
an arch defined on (__PTHREAD_RWLOCK_INITIALIZER) which simplifies its
implementation.
The default pthread_rwlock_t layout differs from current ports with:
1. Internal layout is the same for 32 bits and 64 bits.
2. Internal flag is an unsigned short so it should not required
additional padding to align for word boundary (if it is the case
for the ABI).
Checked with a build on affected abis.
Change-Id: I776a6a986c23199929d28a3dcd30272db21cd1d0
The current way of defining the common mutex definition for POSIX and
C11 on pthreadtypes-arch.h (added by commit 06be6368da) is
not really the best options for newer ports. It requires define some
misleading flags that should be always defined as 0
(__PTHREAD_COMPAT_PADDING_MID and __PTHREAD_COMPAT_PADDING_END), it
exposes options used solely for linuxthreads compat mode
(__PTHREAD_MUTEX_USE_UNION and __PTHREAD_MUTEX_NUSERS_AFTER_KIND), and
requires newer ports to explicit define them (adding more boilerplate
code).
This patch adds a new default __pthread_mutex_s definition meant to
be used by newer ports. Its layout mimics the current usage on both
32 and 64 bits ports and it allows most ports to use the generic
definition. Only ports that use some arch-specific definition (such
as hardware lock-elision or linuxthreads compat) requires specific
headers.
For 32 bit, the generic definitions mimic the other 32-bit ports
of using an union to define the fields uses on adaptive and robust
mutexes (thus not allowing both usage at same time) and by using a
single linked-list for robust mutexes. Both decisions seemed to
follow what recent ports have done and make the resulting
pthread_mutex_t/mtx_t object smaller.
Also the static intialization macro for pthread_mutex_t is set to use
a macro __PTHREAD_MUTEX_INITIALIZER where the architecture can redefine
in its struct_mutex.h if it requires additional fields to be
initialized.
Checked with a build on affected abis.
Change-Id: I30a22c3e3497805fd6e52994c5925897cffcfe13
The new rwlock implementation added by cc25c8b4c1 (2.25) removed
support for lock-elision. This patch removes remaining the
arch-specific unused definitions.
Checked with a build against all affected ABIs.
Change-Id: I5dec8af50e3cd56d7351c52ceff4aa3771b53cd6
With only two exceptions (sys/types.h and sys/param.h, both of which
historically might have defined BYTE_ORDER) the public headers that
include <endian.h> only want to be able to test __BYTE_ORDER against
__*_ENDIAN.
This patch creates a new bits/endian.h that can be included by any
header that wants to be able to test __BYTE_ORDER and/or
__FLOAT_WORD_ORDER against the __*_ENDIAN constants, or needs
__LONG_LONG_PAIR. It only defines macros in the implementation
namespace.
The existing bits/endian.h (which could not be included independently
of endian.h, and only defines __BYTE_ORDER and maybe __FLOAT_WORD_ORDER)
is renamed to bits/endianness.h. I also took the opportunity to
canonicalize the form of this header, which we are stuck with having
one copy of per architecture. Since they are so short, this means git
doesn’t understand that they were renamed from existing headers, sigh.
endian.h itself is a nonstandard header and its only remaining use
from a standard header is guarded by __USE_MISC, so I dropped the
__USE_MISC conditionals from around all of the public-namespace things
it defines. (This means, an application that requests strict library
conformance but includes endian.h will still see the definition of
BYTE_ORDER.)
A few changes to specific bits/endian(ness).h variants deserve
mention:
- sysdeps/unix/sysv/linux/ia64/bits/endian.h is moved to
sysdeps/ia64/bits/endianness.h. If I remember correctly, ia64 did
have selectable endianness, but we have assembly code in
sysdeps/ia64 that assumes it’s little-endian, so there is no reason
to treat the ia64 endianness.h as linux-specific.
- The C-SKY port does not fully support big-endian mode, the compile
will error out if __CSKYBE__ is defined.
- The PowerPC port had extra logic in its bits/endian.h to detect a
broken compiler, which strikes me as unnecessary, so I removed it.
- The only files that defined __FLOAT_WORD_ORDER always defined it to
the same value as __BYTE_ORDER, so I removed those definitions.
The SH bits/endian(ness).h had comments inconsistent with the
actual setting of __FLOAT_WORD_ORDER, which I also removed.
- I *removed* copyright boilerplate from the few bits/endian(ness).h
headers that had it; these files record a single fact in a fashion
dictated by an external spec, so I do not think they are copyrightable.
As long as I was changing every copy of ieee754.h in the tree, I
noticed that only the MIPS variant includes float.h, because it uses
LDBL_MANT_DIG to decide among three different versions of
ieee854_long_double. This patch makes it not include float.h when
GCC’s intrinsic __LDBL_MANT_DIG__ is available.
* string/endian.h: Unconditionally define LITTLE_ENDIAN,
BIG_ENDIAN, PDP_ENDIAN, and BYTE_ORDER. Condition byteswapping
macros only on !__ASSEMBLER__. Move the definitions of
__BIG_ENDIAN, __LITTLE_ENDIAN, __PDP_ENDIAN, __FLOAT_WORD_ORDER,
and __LONG_LONG_PAIR to...
* string/bits/endian.h: ...this new file, which includes
the renamed header bits/endianness.h for the definition of
__BYTE_ORDER and possibly __FLOAT_WORD_ORDER.
* string/Makefile: Install bits/endianness.h.
* include/bits/endian.h: New wrapper.
* bits/endian.h: Rename to bits/endianness.h.
Add multiple-include guard. Rewrite the comment explaining what
the machine-specific variants of this file should do.
* sysdeps/unix/sysv/linux/ia64/bits/endian.h:
Move to sysdeps/ia64.
* sysdeps/aarch64/bits/endian.h
* sysdeps/alpha/bits/endian.h
* sysdeps/arm/bits/endian.h
* sysdeps/csky/bits/endian.h
* sysdeps/hppa/bits/endian.h
* sysdeps/ia64/bits/endian.h
* sysdeps/m68k/bits/endian.h
* sysdeps/microblaze/bits/endian.h
* sysdeps/mips/bits/endian.h
* sysdeps/nios2/bits/endian.h
* sysdeps/powerpc/bits/endian.h
* sysdeps/riscv/bits/endian.h
* sysdeps/s390/bits/endian.h
* sysdeps/sh/bits/endian.h
* sysdeps/sparc/bits/endian.h
* sysdeps/x86/bits/endian.h:
Rename to endianness.h; canonicalize form of file; remove
redundant definitions of __FLOAT_WORD_ORDER.
* sysdeps/powerpc/bits/endianness.h: Remove logic to check for
broken compilers.
* ctype/ctype.h
* sysdeps/aarch64/nptl/bits/pthreadtypes-arch.h
* sysdeps/arm/nptl/bits/pthreadtypes-arch.h
* sysdeps/csky/nptl/bits/pthreadtypes-arch.h
* sysdeps/ia64/ieee754.h
* sysdeps/ieee754/ieee754.h
* sysdeps/ieee754/ldbl-128/ieee754.h
* sysdeps/ieee754/ldbl-128ibm/ieee754.h
* sysdeps/m68k/nptl/bits/pthreadtypes-arch.h
* sysdeps/microblaze/nptl/bits/pthreadtypes-arch.h
* sysdeps/mips/ieee754/ieee754.h
* sysdeps/mips/nptl/bits/pthreadtypes-arch.h
* sysdeps/nios2/nptl/bits/pthreadtypes-arch.h
* sysdeps/nptl/pthread.h
* sysdeps/riscv/nptl/bits/pthreadtypes-arch.h
* sysdeps/sh/nptl/bits/pthreadtypes-arch.h
* sysdeps/sparc/sparc32/ieee754.h
* sysdeps/unix/sysv/linux/generic/bits/stat.h
* sysdeps/unix/sysv/linux/generic/bits/statfs.h
* sysdeps/unix/sysv/linux/sys/acct.h
* wctype/bits/wctype-wchar.h:
Include bits/endian.h, not endian.h.
* sysdeps/unix/sysv/linux/hppa/pthread.h: Don’t include endian.h.
* sysdeps/mips/ieee754/ieee754.h: Use __LDBL_MANT_DIG__
in ifdefs, instead of LDBL_MANT_DIG. Only include float.h
when __LDBL_MANT_DIG__ is not predefined, in which case
define __LDBL_MANT_DIG__ to equal LDBL_MANT_DIG.
Since sysdeps/i386/dl-lookupcfg.h and sysdeps/x86_64/dl-lookupcfg.h are
identical, we can replace them with sysdeps/x86/dl-lookupcfg.h.
* sysdeps/i386/dl-lookupcfg.h: Moved to ...
* sysdeps/x86/dl-lookupcfg.h: Here.
* sysdeps/x86_64/dl-lookupcfg.h: Removed.