Commit Graph

419 Commits

Author SHA1 Message Date
Sunil K Pandey
76ddc74e86 x86-64: Add vector expm1/expm1f implementation to libmvec
Implement vectorized expm1/expm1f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector expm1/expm1f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:49 -08:00
Sunil K Pandey
ef7ea9c132 x86-64: Add vector cosh/coshf implementation to libmvec
Implement vectorized cosh/coshf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector cosh/coshf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:42 -08:00
Sunil K Pandey
8b726453d5 x86-64: Add vector exp10/exp10f implementation to libmvec
Implement vectorized exp10/exp10f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector exp10/exp10f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:35 -08:00
Sunil K Pandey
3fc9ccc20b x86-64: Add vector exp2/exp2f implementation to libmvec
Implement vectorized exp2/exp2f containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector exp2/exp2f with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:29 -08:00
Sunil K Pandey
37475ba883 x86-64: Add vector hypot/hypotf implementation to libmvec
Implement vectorized hypot/hypotf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector hypot/hypotf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:21 -08:00
Sunil K Pandey
11c01de14c x86-64: Add vector asin/asinf implementation to libmvec
Implement vectorized asin/asinf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector asin/asinf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:37:03 -08:00
Sunil K Pandey
146310177a x86-64: Add vector atan/atanf implementation to libmvec
Implement vectorized atan/atanf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector atan/atanf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-29 11:36:46 -08:00
Florian Weimer
5d28a8962d elf: Add _dl_find_object function
It can be used to speed up the libgcc unwinder, and the internal
_dl_find_dso_for_object function (which is used for caller
identification in dlopen and related functions, and in dladdr).

_dl_find_object is in the internal namespace due to bug 28503.
If libgcc switches to _dl_find_object, this namespace issue will
be fixed.  It is located in libc for two reasons: it is necessary
to forward the call to the static libc after static dlopen, and
there is a link ordering issue with -static-libgcc and libgcc_eh.a
because libc.so is not a linker script that includes ld.so in the
glibc build tree (so that GCC's internal -lc after libgcc_eh.a does
not pick up ld.so).

It is necessary to do the i386 customization in the
sysdeps/x86/bits/dl_find_object.h header shared with x86-64 because
otherwise, multilib installations are broken.

The implementation uses software transactional memory, as suggested
by Torvald Riegel.  Two copies of the supporting data structures are
used, also achieving full async-signal-safety.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-12-28 22:52:56 +01:00
Adhemerval Zanella
92ff345137 Remove atomic-machine.h atomic typedefs
Now that memusage.c uses generic types we can remove them.
2021-12-28 14:57:57 -03:00
Sunil K Pandey
f20f980c71 x86-64: Add vector acos/acosf implementation to libmvec
Implement vectorized acos/acosf containing SSE, AVX, AVX2 and
AVX512 versions for libmvec as per vector ABI.  It also contains
accuracy and ABI tests for vector acos/acosf with regenerated ulps.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-12-22 13:03:14 -08:00
Aurelien Jarno
94058f6cde elf: Fix tst-cpu-features-cpuinfo for KVM guests on some AMD systems [BZ #28704]
On KVM guests running on some AMD systems, the IBRS feature is reported
as a synthetic feature using the Intel feature, while the cpuinfo entry
keeps the same. Handle that by first checking the presence of the Intel
feature on AMD systems.

Fixes bug 28704.
2021-12-17 20:20:15 +01:00
H.J. Lu
ea5814467a x86-64: Remove LD_PREFER_MAP_32BIT_EXEC support [BZ #28656]
Remove the LD_PREFER_MAP_32BIT_EXEC environment variable support since
the first PT_LOAD segment is no longer executable due to defaulting to
-z separate-code.

This fixes [BZ #28656].

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-12-10 14:01:34 -08:00
Florian Weimer
8dbeb0561e nptl: Add <thread_pointer.h> for defining __thread_pointer
<tls.h> already contains a definition that is quite similar,
but it is not consistent across architectures.

Only architectures for which rseq support is added are covered.

Reviewed-by: Szabolcs Nagy <szabolcs.nagy@arm.com>
2021-12-09 09:49:32 +01:00
H.J. Lu
ceeffe968c x86: Don't set Prefer_No_AVX512 for processors with AVX512 and AVX-VNNI
Don't set Prefer_No_AVX512 on processors with AVX512 and AVX-VNNI since
they won't lower CPU frequency when ZMM load and store instructions are
used.
2021-12-06 07:14:12 -08:00
Noah Goldstein
475b63702e x86: Double size of ERMS rep_movsb_threshold in dl-cacheinfo.h
No bug.

This patch doubles the rep_movsb_threshold when using ERMS. Based on
benchmarks the vector copy loop, especially now that it handles 4k
aliasing, is better for these medium ranged.

On Skylake with ERMS:

Size,   Align1, Align2, dst>src,(rep movsb) / (vec copy)
4096,   0,      0,      0,      0.975
4096,   0,      0,      1,      0.953
4096,   12,     0,      0,      0.969
4096,   12,     0,      1,      0.872
4096,   44,     0,      0,      0.979
4096,   44,     0,      1,      0.83
4096,   0,      12,     0,      1.006
4096,   0,      12,     1,      0.989
4096,   0,      44,     0,      0.739
4096,   0,      44,     1,      0.942
4096,   12,     12,     0,      1.009
4096,   12,     12,     1,      0.973
4096,   44,     44,     0,      0.791
4096,   44,     44,     1,      0.961
4096,   2048,   0,      0,      0.978
4096,   2048,   0,      1,      0.951
4096,   2060,   0,      0,      0.986
4096,   2060,   0,      1,      0.963
4096,   2048,   12,     0,      0.971
4096,   2048,   12,     1,      0.941
4096,   2060,   12,     0,      0.977
4096,   2060,   12,     1,      0.949
8192,   0,      0,      0,      0.85
8192,   0,      0,      1,      0.845
8192,   13,     0,      0,      0.937
8192,   13,     0,      1,      0.939
8192,   45,     0,      0,      0.932
8192,   45,     0,      1,      0.927
8192,   0,      13,     0,      0.621
8192,   0,      13,     1,      0.62
8192,   0,      45,     0,      0.53
8192,   0,      45,     1,      0.516
8192,   13,     13,     0,      0.664
8192,   13,     13,     1,      0.659
8192,   45,     45,     0,      0.593
8192,   45,     45,     1,      0.575
8192,   2048,   0,      0,      0.854
8192,   2048,   0,      1,      0.834
8192,   2061,   0,      0,      0.863
8192,   2061,   0,      1,      0.857
8192,   2048,   13,     0,      0.63
8192,   2048,   13,     1,      0.629
8192,   2061,   13,     0,      0.627
8192,   2061,   13,     1,      0.62

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-11-06 16:18:08 -05:00
Florian Weimer
cca75bd8b5 i386: Explain why __HAVE_64B_ATOMICS has to be 0 2021-11-02 10:26:23 +01:00
H.J. Lu
14dbbf46a0 x86-64: Remove Prefer_AVX2_STRCMP
Remove Prefer_AVX2_STRCMP to enable EVEX strcmp.  When comparing 2 32-byte
strings, EVEX strcmp has been improved to require 1 load, 1 VPTESTM, 1
VPCMP, 1 KMOVD and 1 INCL instead of 2 loads, 3 VPCMPs, 2 KORDs, 1 KMOVD
and 1 TESTL while AVX2 strcmp requires 1 load, 2 VPCMPEQs, 1 VPMINU, 1
VPMOVMSKB and 1 TESTL.  EVEX strcmp is now faster than AVX2 strcmp by up
to 40% on Tiger Lake and Ice Lake.
2021-11-01 07:53:04 -07:00
Fangrui Song
bf433b849a elf: Remove Intel MPX support (lazy PLT, ld.so profile, and LD_AUDIT)
Intel MPX failed to gain wide adoption and has been deprecated for a
while. GCC 9.1 removed Intel MPX support. Linux kernel removed MPX in
2019.

This patch removes the support code from the dynamic loader.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-10-11 11:14:02 -07:00
Noah Goldstein
fc5bd179ef x86: Modify ENTRY in sysdep.h so that p2align can be specified
No bug.

This change adds a new macro ENTRY_P2ALIGN which takes a second
argument, log2 of the desired function alignment.

The old ENTRY(name) macro is just ENTRY_P2ALIGN(name, 4) so this
doesn't affect any existing functionality.

Signed-off-by: Noah Goldstein <goldstein.w.n@gmail.com>
2021-10-08 11:30:52 -05:00
H.J. Lu
1bd888d0b7 Initial support for GNU_PROPERTY_1_NEEDED
1. Add GNU_PROPERTY_1_NEEDED:

 #define GNU_PROPERTY_1_NEEDED      GNU_PROPERTY_UINT32_OR_LO

to indicate the needed properties by the object file.
2. Add GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS:

 #define GNU_PROPERTY_1_NEEDED_INDIRECT_EXTERN_ACCESS (1U << 0)

to indicate that the object file requires canonical function pointers and
cannot be used with copy relocation.
3. Scan GNU_PROPERTY_1_NEEDED property and store it in l_1_needed.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-10-07 10:26:08 -07:00
Joseph Myers
b26901b26e Fix sysdeps/x86/fpu/s_ffma.c for 32-bit FMA processor case
It turns out the __SSE2_MATH__ conditional in sysdeps/x86/fpu/s_ffma.c
does not cover all cases where the x86 fenv_private.h macros might
manipulate one of the SSE and 387 floating-point state, while the
actual fma implementation uses the other.  Specifically, in the 32-bit
case, with a compiler not defaulting to -mfpmath=sse, but testing on a
processor with hardware FMA support, the multiarch fma function
implementations will end up using SSE, while the fenv_private.h macros
will use the 387 state for double.  Change the conditional to use the
default macros rather than the optimized ones in all cases except when
the compiler inlines an fma instruction (in which case, since all
those instructions are SSE instructions and -mfpmath=sse must be in
effect for them to be inlined, the optimized macros will only use the
SSE state and it's OK for them to only use the SSE state).

Tested for x86_64 and x86.  H.J. reports in
<https://sourceware.org/pipermail/libc-alpha/2021-September/131367.html>
that it fixes the problems he observed.
2021-09-24 17:59:22 +00:00
Joseph Myers
4ed7a383f9 Fix ffma use of round-to-odd on x86
On 32-bit x86 with -mfpmath=sse, and on x86_64 with
--disable-multi-arch, the tests of ffma and its aliases (fma narrowing
from binary64 to binary32) fail.  This is probably the issue reported
by H.J. in
<https://sourceware.org/pipermail/libc-alpha/2021-September/131277.html>.

The problem is the use of fenv_private.h macros in the round-to-odd
implementation.  Those macros are set up to manipulate only one of the
SSE and 387 floating-point state, whichever is relevant for the type
indicated by the suffix on the macro name.  But x86 configurations
sometimes use the ldbl-96 implementation of binary64 fma (that's where
--disable-multi-arch is relevant for x86_64: it causes the ldbl-96
implementation to be used, instead of an IFUNC implementation that
falls back to the dbl-64 version), contrary to the expectations of
those macros for functions operating on double when __SSE2_MATH__ is
defined.

This can be addressed by using the default versions of those macros
(giving x86 its own version of s_ffma.c), as is done for the *f128
macro variants where it depends on the details of how GCC was
configured when building libgcc which floating-point state is affected
by _Float128 arithmetic.  The issue only applies when __SSE2_MATH__ is
defined, and doesn't apply when __FP_FAST_FMA is defined (because in
that case, fma will be inlined by the compiler, meaning it's
definitely an SSE operation; for the same reason, this is not an issue
for narrowing sqrt, as hardware sqrt is always inlined in that
implementation for x86), but in other cases it's safest to use the
default versions of the fenv_private.h macros to ensure things work
whichever fma implementation is used.

Tested for x86_64 (with and without --disable-multi-arch) and x86
(with and without -mfpmath=sse).
2021-09-23 21:18:31 +00:00
Siddhesh Poyarekar
30891f35fa Remove "Contributed by" lines
We stopped adding "Contributed by" or similar lines in sources in 2012
in favour of git logs and keeping the Contributors section of the
glibc manual up to date.  Removing these lines makes the license
header a bit more consistent across files and also removes the
possibility of error in attribution when license blocks or files are
copied across since the contributed-by lines don't actually reflect
reality in those cases.

Move all "Contributed by" and similar lines (Written by, Test by,
etc.) into a new file CONTRIBUTED-BY to retain record of these
contributions.  These contributors are also mentioned in
manual/contrib.texi, so we just maintain this additional record as a
courtesy to the earlier developers.

The following scripts were used to filter a list of files to edit in
place and to clean up the CONTRIBUTED-BY file respectively.  These
were not added to the glibc sources because they're not expected to be
of any use in future given that this is a one time task:

https://gist.github.com/siddhesh/b5ecac94eabfd72ed2916d6d8157e7dc
https://gist.github.com/siddhesh/15ea1f5e435ace9774f485030695ee02

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-09-03 22:06:44 +05:30
Fangrui Song
224edada60 configure: Allow LD to be LLD 13.0.0 or above [BZ #26558]
When using LLD (LLVM linker) as the linker, configure prints a confusing
message.

    *** These critical programs are missing or too old: GNU ld

LLD>=13.0.0 can build glibc --enable-static-pie. (8.0.0 needs one
workaround for -Wl,-defsym=_begin=0. 9.0.0 works with
--disable-static-pie).

XFAIL two tests sysdeps/x86/tst-ifunc-isa-* which have the BZ #28154
issue (LLD follows the PowerPC port of GNU ld for ifunc by placing
IRELATIVE relocations in .rela.dyn, triggering a glibc ifunc fragility).

The set of dynamic symbols is the same with GNU ld and LLD,
modulo unused SHN_ABS version node symbols.

For comparison, gold does not support --enable-static-pie
yet (--no-dynamic-linker is unsupported BZ #22221), yet
has 6 failures more than LLD. gold linked libc.so has
larger .dynsym differences with GNU ld and LLD
(non-default version symbols are changed to default versions
by a version script BZ #28196).
2021-08-31 20:23:34 -07:00
Matt Whitlock
0835c0f0ba x86: fix Autoconf caching of instruction support checks [BZ #27991]
The Autoconf documentation for the AC_CACHE_CHECK macro states:

  The commands-to-set-it must have no side effects except for setting
  the variable cache-id, see below.

However, the tests for support of -msahf and -mmovbe were embedded in
the commands-to-set-it for lib_cv_include_x86_isa_level. This had the
consequence that libc_cv_have_x86_lahf_sahf and libc_cv_have_x86_movbe
were not defined whenever lib_cv_include_x86_isa_level was read from
cache. These variables' being undefined meant that their unquoted use
in later test expressions led to the 'test' built-in's misparsing its
arguments and emitting errors like "test: =: unexpected operator" or
"test: =: unary operator expected", depending on the particular shell.

This commit refactors the tests for LAHF/SAHF and MOVBE instruction
support into their own AC_CACHE_CHECK macro invocations to obey the
rule that the commands-to-set-it must have no side effects other than
setting the variable named by cache-id.

Signed-off-by: Matt Whitlock <sourceware@mattwhitlock.name>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-08-19 09:11:35 -03:00
H.J. Lu
91cc803d27 x86-64: Add Avoid_Short_Distance_REP_MOVSB
commit 3ec5d83d2a
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Sat Jan 25 14:19:40 2020 -0800

    x86-64: Avoid rep movsb with short distance [BZ #27130]

introduced some regressions on Intel processors without Fast Short REP
MOV (FSRM).  Add Avoid_Short_Distance_REP_MOVSB to avoid rep movsb with
short distance only on Intel processors with FSRM.  bench-memmove-large
on Skylake server shows that cycles of __memmove_evex_unaligned_erms
improves for the following data size:

                                  before    after    Improvement
length=4127, align1=3, align2=0:  479.38    349.25      27%
length=4223, align1=9, align2=5:  405.62    333.25      18%
length=8223, align1=3, align2=0:  786.12    496.38      37%
length=8319, align1=9, align2=5:  727.50    501.38      31%
length=16415, align1=3, align2=0: 1436.88   840.00      41%
length=16511, align1=9, align2=5: 1375.50   836.38      39%
length=32799, align1=3, align2=0: 2890.00   1860.12     36%
length=32895, align1=9, align2=5: 2891.38   1931.88     33%
2021-07-28 13:23:57 -07:00
H.J. Lu
7c124e3714 x86: Install <bits/platform/x86.h> [BZ #27958]
1. Install <bits/platform/x86.h> for <sys/platform/x86.h> which includes
<bits/platform/x86.h>.
2. Rename HAS_CPU_FEATURE to CPU_FEATURE_PRESENT which checks if the
processor has the feature.
3. Rename CPU_FEATURE_USABLE to CPU_FEATURE_ACTIVE which checks if the
feature is active.  There may be other preconditions, like sufficient
stack space or further setup for AMX, which must be satisfied before the
feature can be used.

This fixes BZ #27958.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-23 05:12:51 -07:00
Siddhesh Poyarekar
5b8d271571 Fix build and tests with --disable-tunables
Remove unused code and declare __libc_mallopt when !IS_IN (libc) to
allow the debug hook to build with --disable-tunables.

Also, run tst-ifunc-isa-2* tests only when tunables are enabled since
the result depends on it.

Tested on x86_64.

Reported-by: Matheus Castanho <msc@linux.ibm.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-07-23 13:57:56 +05:30
Adhemerval Zanella
469761eac8 elf: Fix tst-cpu-features-cpuinfo on some AMD systems (BZ #28090)
The SSBD feature is implemented in 2 different ways on AMD processors:
newer systems (Zen3) provides AMD_SSBD (function 8000_0008, EBX[24]),
while older system provides AMD_VIRT_SSBD (function 8000_0008, EBX[25]).
However for AMD_VIRT_SSBD, kernel shows both 'ssdb' and 'virt_ssdb' on
/proc/cpuinfo; while for AMD_SSBD only 'ssdb' is provided.

This now check is AMD_SSBD is set to check for 'ssbd', otherwise check
if AMD_VIRT_SSDB is set to check for 'virt_ssbd'.

Checked on x86_64-linux-gnu on a Ryzen 9 5900x.

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-07-19 14:12:29 -03:00
H.J. Lu
ea8e465a6b x86: Check RTM_ALWAYS_ABORT for RTM [BZ #28033]
From

https://www.intel.com/content/www/us/en/support/articles/000059422/processors.html

* Intel TSX will be disabled by default.
* The processor will force abort all Restricted Transactional Memory (RTM)
  transactions by default.
* A new CPUID bit CPUID.07H.0H.EDX[11](RTM_ALWAYS_ABORT) will be enumerated,
  which is set to indicate to updated software that the loaded microcode is
  forcing RTM abort.
* On processors that enumerate support for RTM, the CPUID enumeration bits
  for Intel TSX (CPUID.07H.0H.EBX[11] and CPUID.07H.0H.EBX[4]) continue to
  be set by default after microcode update.
* Workloads that were benefited from Intel TSX might experience a change
  in performance.
* System software may use a new bit in Model-Specific Register (MSR) 0x10F
  TSX_FORCE_ABORT[TSX_CPUID_CLEAR] functionality to clear the Hardware Lock
  Elision (HLE) and RTM bits to indicate to software that Intel TSX is
  disabled.

1. Add RTM_ALWAYS_ABORT to CPUID features.
2. Set RTM usable only if RTM_ALWAYS_ABORT isn't set.  This skips the
string/tst-memchr-rtm etc. testcases on the affected processors, which
always fail after a microcde update.
3. Check RTM feature, instead of usability, against /proc/cpuinfo.

This fixes BZ #28033.
2021-07-01 10:47:35 -07:00
Adhemerval Zanella
e3e3eb0a2e x86: Fix tst-cpu-features-cpuinfo on Ryzen 9 (BZ #27873)
AMD define different flags for IRPB, IBRS, and STIPBP [1], so new
x86_64_cpu are added and IBRS_IBPB is only tested for Intel.

The SSDB is also defined and implemented different on AMD [2],
and also a new AMD_SSDB flag is added.  It should map to the
cpuinfo 'ssdb' on recent AMD cpus.

It fixes tst-cpu-features-cpuinfo and tst-cpu-features-cpuinfo-static
on recent AMD cpus.

Checked on x86_64-linux-gnu on AMD Ryzen 9 5900X.

[1] https://developer.amd.com/wp-content/resources/Architecture_Guidelines_Update_Indirect_Branch_Control.pdf
[2] https://bugzilla.kernel.org/show_bug.cgi?id=199889

Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
2021-06-24 09:57:46 -03:00
H.J. Lu
ea26ff0322 x86: Copy IBT and SHSTK usable only if CET is enabled
IBT and SHSTK usable bits are copied from CPUID feature bits and later
cleared if kernel doesn't support CET.  Copy IBT and SHSTK usable only
if CET is enabled so that they aren't set on CET capable processors
with non-CET enabled glibc.
2021-06-23 17:35:47 -07:00
Florian Weimer
6f1c701026 dlfcn: Cleanups after -ldl is no longer required
This commit removes the ELF constructor and internal variables from
dlfcn/dlfcn.c.  The file now serves the same purpose as
nptl/libpthread-compat.c, so it is renamed to dlfcn/libdl-compat.c.
The use of libdl-shared-only-routines ensures that libdl.a is empty.

This commit adjusts the test suite not to use $(libdl).  The libdl.so
symbolic link is no longer installed.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-06-03 09:11:45 +02:00
H.J. Lu
79aec84102 Properly check stack alignment [BZ #27901]
1. Replace

if ((((uintptr_t) &_d) & (__alignof (double) - 1)) != 0)

which may be optimized out by compiler, with

int
__attribute__ ((weak, noclone, noinline))
is_aligned (void *p, int align)
{
  return (((uintptr_t) p) & (align - 1)) != 0;
}

2. Add TEST_STACK_ALIGN_INIT to TEST_STACK_ALIGN.
3. Add a common TEST_STACK_ALIGN_INIT to check 16-byte stack alignment
for both i386 and x86-64.
4. Update powerpc to use TEST_STACK_ALIGN_INIT.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-05-24 07:42:12 -07:00
H.J. Lu
cf2c57526b x86: Set rep_movsb_threshold to 2112 on processors with FSRM
The glibc memcpy benchmark on Intel Core i7-1065G7 (Ice Lake) showed
that REP MOVSB became faster after 2112 bytes:

                                      Vector Move       REP MOVSB
length=2112, align1=0, align2=0:        24.20             24.40
length=2112, align1=1, align2=0:        26.07             23.13
length=2112, align1=0, align2=1:        27.18             28.13
length=2112, align1=1, align2=1:        26.23             25.16
length=2176, align1=0, align2=0:        23.18             22.52
length=2176, align1=2, align2=0:        25.45             22.52
length=2176, align1=0, align2=2:        27.14             27.82
length=2176, align1=2, align2=2:        22.73             25.56
length=2240, align1=0, align2=0:        24.62             24.25
length=2240, align1=3, align2=0:        29.77             27.15
length=2240, align1=0, align2=3:        35.55             29.93
length=2240, align1=3, align2=3:        34.49             25.15
length=2304, align1=0, align2=0:        34.75             26.64
length=2304, align1=4, align2=0:        32.09             22.63
length=2304, align1=0, align2=4:        28.43             31.24

Use REP MOVSB for data size > 2112 bytes in memcpy on processors with
fast short REP MOVSB (FSRM).

	* sysdeps/x86/dl-cacheinfo.h (dl_init_cacheinfo): Set
	rep_movsb_threshold to 2112 on processors with fast short REP
	MOVSB (FSRM).
2021-05-03 05:08:22 -07:00
H.J. Lu
7fc9152e83 x86: tst-cpu-features-supports.c: Update AMX check
Pass "amx-bf16", "amx-int8" and "amx-tile", instead of "amx_bf16",
"amx_int8" and "amx_tile", to __builtin_cpu_supports for GCC 11.
2021-04-22 10:09:49 -07:00
Florian Weimer
81dfc6694c nptl: Remove longjmp, siglongjmp from libpthread
The definitions in libc are sufficient, the forwarders are no longer
needed.

The symbols have been moved using scripts/move-symbol-to-libc.py.
s390-linux-gnu and s390x-linux-gnu need a new version placeholder
to keep the GLIBC_2.19 symbol version in libpthread.

Tested on i386-linux-gnu, powerpc64le-linux-gnu, s390x-linux-gnu,
x86_64-linux-gnu.  Built with build-many-glibcs.py.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-04-21 19:49:50 +02:00
Siddhesh Poyarekar
abadbef5c8 Move __isnanf128 to libc.so
All of the isnan functions are in libc.so due to printf_fp, so move
__isnanf128 there too for consistency.

Reviewed-by: Tulio Magno Quites Machado Filho <tuliom@ascii.art.br>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
2021-03-30 14:58:19 +05:30
H.J. Lu
4bd660be40 x86: Add string/memory function tests in RTM region
At function exit, AVX optimized string/memory functions have VZEROUPPER
which triggers RTM abort.   When such functions are called inside a
transactionally executing RTM region, RTM abort causes severe performance
degradation.  Add tests to verify that string/memory functions won't
cause RTM abort in RTM region.
2021-03-29 07:40:17 -07:00
H.J. Lu
1da50d4bda x86: Set Prefer_No_VZEROUPPER and add Prefer_AVX2_STRCMP
1. Set Prefer_No_VZEROUPPER if RTM is usable to avoid RTM abort triggered
by VZEROUPPER inside a transactionally executing RTM region.
2. Since to compare 2 32-byte strings, 256-bit EVEX strcmp requires 2
loads, 3 VPCMPs and 2 KORDs while AVX2 strcmp requires 1 load, 2 VPCMPEQs,
1 VPMINU and 1 VPMOVMSKB, AVX2 strcmp is faster than EVEX strcmp.  Add
Prefer_AVX2_STRCMP to prefer AVX2 strcmp family functions.
2021-03-29 07:40:17 -07:00
H.J. Lu
27f7463675 x86: Properly disable XSAVE related features [BZ #27605]
1. Support GLIBC_TUNABLES=glibc.cpu.hwcaps=-XSAVE.
2. Disable all features which depend on XSAVE:
   a. If OSXSAVE is disabled by glibc tunables.  Or
   b. If both XSAVE and XSAVEC aren't usable.
2021-03-29 06:04:17 -07:00
Samuel Thibault
16b597807d elf: Fix not compiling ifunc tests that need gcc ifunc support 2021-03-24 01:52:46 +01:00
Siddhesh Poyarekar
941ea10f80 Build get-cpuid-feature-leaf.c without stack-protector [BZ #27555]
__x86_get_cpuid_feature_leaf is called during early startup, before
the stack check guard is initialized and is hence not safe to build
with stack-protector.

Additionally, IFUNC resolvers for static tst-ifunc-isa tests get
called too early for stack protector to be useful, so fix them to
disable stack protector for the resolver functions.

This fixes all failures seen with --enable-stack-protector=all
configuration.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-03-15 20:24:45 +05:30
H.J. Lu
f53ffc9b90 x86: Handle _SC_LEVEL1_ICACHE_LINESIZE [BZ #27444]
commit 2d651eb926
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Sep 18 07:55:14 2020 -0700

    x86: Move x86 processor cache info to cpu_features

missed _SC_LEVEL1_ICACHE_LINESIZE.

1. Add level1_icache_linesize to struct cpu_features.
2. Initialize level1_icache_linesize by calling handle_intel,
handle_zhaoxin and handle_amd with _SC_LEVEL1_ICACHE_LINESIZE.
3. Return level1_icache_linesize for _SC_LEVEL1_ICACHE_LINESIZE.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-03-15 05:43:26 -07:00
H.J. Lu
339bf918ea x86: Set minimum x86-64 level marker [BZ #27318]
Since the full ISA set used in an ELF binary is unknown to compiler,
an x86-64 ISA level marker indicates the minimum, not maximum, ISA set
required to run such an ELF binary.  We never guarantee a library with
an x86-64 ISA level v3 marker doesn't contain other ISAs beyond x86-64
ISA level v3, like AVX VNNI.  We check the x86-64 ISA level marker for
the minimum ISA set.  Since -march=sandybridge enables only some ISAs
in x86-64 ISA level v3, we should set the needed ISA marker to v2.
Otherwise, libc is compiled with -march=sandybridge will fail to run on
Sandy Bridge:

$ ./elf/ld.so ./libc.so
./libc.so: (p) CPU ISA level is lower than required: needed: 7; got: 3

Set the minimum, instead of maximum, x86-64 ISA level marker should have
no impact on the glibc-hwcaps directory assignment logic in ldconfig nor
ld.so.
2021-03-06 07:49:30 -08:00
Florian Weimer
01a5746b6c x86: Add CPU-specific diagnostics to ld.so --list-diagnostics 2021-03-02 15:01:10 +01:00
Florian Weimer
e4933c8a92 x86: Automate generation of PREFERRED_FEATURE_INDEX_1 bitfield
Use a .def file to define the bitfield layout, so that it is possible
to iterate over field members using the preprocessor.
2021-03-02 15:01:06 +01:00
H.J. Lu
89de9d3958 x86: Use x86/nptl/pthreaddef.h
1. Move sysdeps/i386/nptl/pthreaddef.h to sysdeps/x86/nptl/pthreaddef.h.
2. Remove sysdeps/x86_64/nptl/pthreaddef.h.

Reviewed-by: DJ Delorie <dj@redhat.com>
2021-02-22 15:52:56 -08:00
Florian Weimer
feb741bb81 x86: Remove unused variables for raw cache sizes from cacheinfo.h 2021-02-22 17:36:03 +01:00
H.J. Lu
ba230b6387 <bits/platform/x86.h>: Correct x86_cpu_TBM
x86_cpu_TBM should be x86_cpu_index_80000001_ecx + 21.
2021-02-22 04:31:51 -08:00
H.J. Lu
ce4a94b12e x86: Remove the extra space between "# endif"
Remove the extra space between "# endif" left over from

commit f380868f6d
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Thu Dec 24 15:43:34 2020 -0800

    Remove _ISOMAC check from <cpu-features.h>
2021-02-12 07:50:29 -08:00
Siddhesh Poyarekar
a1b8b06a55 x86: Use SIZE_MAX instead of (long int)-1 for tunable range value
The tunable types are SIZE_T, so set the ranges to the correct maximum
value, i.e. SIZE_MAX.
2021-02-10 19:08:33 +05:30
Siddhesh Poyarekar
61117bfa1b tunables: Simplify TUNABLE_SET interface
The TUNABLE_SET interface took a primitive C type argument, which
resulted in inconsistent type conversions internally due to incorrect
dereferencing of types, especialy on 32-bit architectures.  This
change simplifies the TUNABLE setting logic along with the interfaces.

Now all numeric tunable values are stored as signed numbers in
tunable_num_t, which is intmax_t.  All calls to set tunables cast the
input value to its primitive type and then to tunable_num_t for
storage.  This relies on gcc-specific (although I suspect other
compilers woul also do the same) unsigned to signed integer conversion
semantics, i.e. the bit pattern is conserved.  The reverse conversion
is guaranteed by the standard.
2021-02-10 19:08:33 +05:30
H.J. Lu
5ab25c8875 x86: Add PTWRITE feature detection [BZ #27346]
1. Add CPUID_INDEX_14_ECX_0 for CPUID leaf 0x14 to detect PTWRITE feature
in EBX of CPUID leaf 0x14 with ECX == 0.
2. Add PTWRITE detection to CPU feature tests.
3. Add 2 static CPU feature tests.
2021-02-07 08:01:14 -08:00
Sajan Karumanchi
6e02b3e932 x86: Adding an upper bound for Enhanced REP MOVSB.
In the process of optimizing memcpy for AMD machines, we have found the
vector move operations are outperforming enhanced REP MOVSB for data
transfers above the L2 cache size on Zen3 architectures.
To handle this use case, we are adding an upper bound parameter on
enhanced REP MOVSB:'__x86_rep_movsb_stop_threshold'.
As per large-bench results, we are configuring this parameter to the
L2 cache size for AMD machines and applicable from Zen3 architecture
supporting the ERMS feature.
For architectures other than AMD, it is the computed value of
non-temporal threshold parameter.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
2021-02-02 12:42:15 +01:00
H.J. Lu
6c57d32048 sysconf: Add _SC_MINSIGSTKSZ/_SC_SIGSTKSZ [BZ #20305]
Add _SC_MINSIGSTKSZ for the minimum signal stack size derived from
AT_MINSIGSTKSZ, which is the minimum number of bytes of free stack
space required in order to gurantee successful, non-nested handling
of a single signal whose handler is an empty function, and _SC_SIGSTKSZ
which is the suggested minimum number of bytes of stack space required
for a signal stack.

If AT_MINSIGSTKSZ isn't available, sysconf (_SC_MINSIGSTKSZ) returns
MINSIGSTKSZ.  On Linux/x86 with XSAVE, the signal frame used by kernel
is composed of the following areas and laid out as:

 ------------------------------
 | alignment padding          |
 ------------------------------
 | xsave buffer               |
 ------------------------------
 | fsave header (32-bit only) |
 ------------------------------
 | siginfo + ucontext         |
 ------------------------------

Compute AT_MINSIGSTKSZ value as size of xsave buffer + size of fsave
header (32-bit only) + size of siginfo and ucontext + alignment padding.

If _SC_SIGSTKSZ_SOURCE or _GNU_SOURCE are defined, MINSIGSTKSZ and SIGSTKSZ
are redefined as

/* Default stack size for a signal handler: sysconf (SC_SIGSTKSZ).  */
 # undef SIGSTKSZ
 # define SIGSTKSZ sysconf (_SC_SIGSTKSZ)

/* Minimum stack size for a signal handler: SIGSTKSZ.  */
 # undef MINSIGSTKSZ
 # define MINSIGSTKSZ SIGSTKSZ

Compilation will fail if the source assumes constant MINSIGSTKSZ or
SIGSTKSZ.

The reason for not simply increasing the kernel's MINSIGSTKSZ #define
(apart from the fact that it is rarely used, due to glibc's shadowing
definitions) was that userspace binaries will have baked in the old
value of the constant and may be making assumptions about it.

For example, the type (char [MINSIGSTKSZ]) changes if this #define
changes.  This could be a problem if an newly built library tries to
memcpy() or dump such an object defined by and old binary.
Bounds-checking and the stack sizes passed to things like sigaltstack()
and makecontext() could similarly go wrong.
2021-02-01 11:00:52 -08:00
H.J. Lu
04dff6fc0d x86: Properly set usable CET feature bits [BZ #26625]
commit 94cd37ebb2
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Sep 16 05:27:32 2020 -0700

    x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625]

broke

GLIBC_TUNABLES=glibc.cpu.hwcaps=-IBT,-SHSTK

since it can no longer disable IBT nor SHSTK.  Handle IBT and SHSTK with:

1. Revert commit 94cd37ebb2.
2. Clears the usable CET feature bits if kernel doesn't support CET.
3. Add GLIBC_TUNABLES tests without dlopen.
4. Add tests to verify that CPU_FEATURE_USABLE on IBT and SHSTK matches
_get_ssp.
5. Update GLIBC_TUNABLES tests with dlopen to verify that CET is disabled
with GLIBC_TUNABLES.

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2021-01-29 03:58:11 -08:00
Andreas Schwab
31f6488722 Fix misplaced const
Constify __x86_cacheinfo_p and __x86_cpu_features_p, not their pointer
target types.
2021-01-25 15:09:02 +01:00
H.J. Lu
5f478eb0fb x86: Properly match CPU features in /proc/cpuinfo [BZ #27222]
Search " YYY " and " YYY\n", instead of "YYY", to avoid matching
"XXXYYYZZZ" with "YYY".

Update /proc/cpuinfo CPU feature names:

/proc/cpuinfo                     glibc
------------------------------------------------
avx512vbmi                        AVX512_VBMI
dts                               DS
pni                               SSE3
tsc_deadline_timer                TSC_DEADLINE
2021-01-22 10:15:46 -08:00
H.J. Lu
7a5ab88e21 x86: Check ifunc resolver with CPU_FEATURE_USABLE [BZ #27072]
Check ifunc resolver with CPU_FEATURE_USABLE and tunables in dynamic and
static executables to verify that CPUID features are initialized early in
static PIE.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2021-01-21 10:22:26 -08:00
Szabolcs Nagy
47618209d0 Use hidden visibility for early static PIE code
Extern symbol access in position independent code usually involves GOT
indirection which needs RELATIVE reloc in a static linked PIE. (On
some targets this is avoided e.g. because the linker can relax a GOT
access to a pc-relative access, but this is not generally true.) Code
that runs before static PIE self relocation must avoid relying on
dynamic relocations which can be ensured by using hidden visibility.
However we cannot just make all symbols hidden:

On i386, all calls to IFUNC functions must go through PLT and calls to
hidden functions CANNOT go through PLT in PIE since EBX used in PIE PLT
may not be set up for local calls to hidden IFUNC functions.

This patch aims to make symbol references hidden in code that is used
before and by _dl_relocate_static_pie when building a static PIE libc.
Note: for an object that is used in the startup code, its references
and definition may not have consistent visibility: it is only forced
hidden in the startup code.

This is needed for fixing bug 27072.

Co-authored-by: H.J. Lu <hjl.tools@gmail.com>
Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-21 15:55:01 +00:00
H.J. Lu
ff6d62e9ed <sys/platform/x86.h>: Remove the C preprocessor magic
In <sys/platform/x86.h>, define CPU features as enum instead of using
the C preprocessor magic to make it easier to wrap this functionality
in other languages.  Move the C preprocessor magic to internal header
for better GCC codegen when more than one features are checked in a
single expression as in x86-64 dl-hwcaps-subdirs.c.

1. Rename COMMON_CPUID_INDEX_XXX to CPUID_INDEX_XXX.
2. Move CPUID_INDEX_MAX to sysdeps/x86/include/cpu-features.h.
3. Remove struct cpu_features and __x86_get_cpu_features from
<sys/platform/x86.h>.
4. Add __x86_get_cpuid_feature_leaf to <sys/platform/x86.h> and put it
in libc.
5. Make __get_cpu_features() private to glibc.
6. Replace __x86_get_cpu_features(N) with __get_cpu_features().
7. Add _dl_x86_get_cpu_features to GLIBC_PRIVATE.
8. Use a single enum index for each CPU feature detection.
9. Pass the CPUID feature leaf to __x86_get_cpuid_feature_leaf.
10. Return zero struct cpuid_feature for the older glibc binary with a
smaller CPUID_INDEX_MAX [BZ #27104].
11. Inside glibc, use the C preprocessor magic so that cpu_features data
can be loaded just once leading to more compact code for glibc.

256 bits are used for each CPUID leaf.  Some leaves only contain a few
features.  We can add exceptions to such leaves.  But it will increase
code sizes and it is harder to provide backward/forward compatibilities
when new features are added to such leaves in the future.

When new leaves are added, _rtld_global_ro offsets will change which
leads to race condition during in-place updates. We may avoid in-place
updates by

1. Rename the old glibc.
2. Install the new glibc.
3. Remove the old glibc.

NB: A function, __x86_get_cpuid_feature_leaf , is used to avoid the copy
relocation issue with IFUNC resolver as shown in IFUNC resolver tests.
2021-01-21 05:58:17 -08:00
H.J. Lu
2d651eb926 x86: Move x86 processor cache info to cpu_features
1. Move x86 processor cache info to _dl_x86_cpu_features in ld.so.
2. Update tunable bounds with TUNABLE_SET_WITH_BOUNDS.
3. Move x86 cache info initialization to dl-cacheinfo.h and initialize
x86 cache info in init_cpu_features ().
4. Put x86 cache info for libc in cacheinfo.h, which is included in
libc-start.c in libc.a and is included in cacheinfo.c in libc.so.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2021-01-14 11:38:45 -08:00
Adhemerval Zanella
d18f59bf92 Fix x86 build with --enable-tunable=no
Checked on x86_64-linux-gnu.
2021-01-14 16:04:05 -03:00
H.J. Lu
efbbd9c33a ldconfig/x86: Store ISA level in cache and aux cache
Store ISA level in the portion of the unused upper 32 bits of the hwcaps
field in cache and the unused pad field in aux cache.  ISA level is stored
and checked only for shared objects in glibc-hwcaps subdirectories.  The
shared objects in the default directories aren't checked since there are
no fallbacks for these shared objects.

Tested on x86-64-v2, x86-64-v3 and x86-64-v4 machines with
--disable-hardcoded-path-in-tests and --enable-hardcoded-path-in-tests.
2021-01-13 05:51:17 -08:00
H.J. Lu
2ef23b5205 x86: Set header.feature_1 in TCB for always-on CET [BZ #27177]
Update dl_cet_check() to set header.feature_1 in TCB when both IBT and
SHSTK are always on.
2021-01-13 05:03:34 -08:00
H.J. Lu
ecce11aa07 x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker [BZ #26717]
GCC 11 supports -march=x86-64-v[234] to enable x86 micro-architecture ISA
levels:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97250

and -mneeded to emit GNU_PROPERTY_X86_ISA_1_NEEDED property with
GNU_PROPERTY_X86_ISA_1_V[234] marker:

https://gitlab.com/x86-psABIs/x86-64-ABI/-/merge_requests/13

Binutils support for GNU_PROPERTY_X86_ISA_1_V[234] marker were added by

commit b0ab06937385e0ae25cebf1991787d64f439bf12
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Oct 30 06:49:57 2020 -0700

    x86: Support GNU_PROPERTY_X86_ISA_1_BASELINE marker

and

commit 32930e4edbc06bc6f10c435dbcc63131715df678
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Oct 9 05:05:57 2020 -0700

    x86: Support GNU_PROPERTY_X86_ISA_1_V[234] marker

GNU_PROPERTY_X86_ISA_1_NEEDED property in x86 ELF binaries indicate the
micro-architecture ISA level required to execute the binary.  The marker
must be added by programmers explicitly in one of 3 ways:

1. Pass -mneeded to GCC.
2. Add the marker in the linker inputs as this patch does.
3. Pass -z x86-64-v[234] to the linker.

Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker support to ld.so if binutils 2.32 or newer is used to build glibc:

1. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
markers to elf.h.
2. Add GNU_PROPERTY_X86_ISA_1_BASELINE and GNU_PROPERTY_X86_ISA_1_V[234]
marker to abi-note.o based on the ISA level used to compile abi-note.o,
assuming that the same ISA level is used to compile the whole glibc.
3. Add isa_1 to cpu_features to record the supported x86 ISA level.
4. Rename _dl_process_cet_property_note to _dl_process_property_note and
add GNU_PROPERTY_X86_ISA_1_V[234] marker detection.
5. Update _rtld_main_check and _dl_open_check to check loaded objects
with the incompatible ISA level.
6. Add a testcase to verify that dlopen an x86-64-v4 shared object fails
on lesser platforms.
7. Use <get-isa-level.h> in dl-hwcaps-subdirs.c and tst-glibc-hwcaps.c.

Tested under i686, x32 and x86-64 modes on x86-64-v2, x86-64-v3 and
x86-64-v4 machines.

Marked elf/tst-isa-level-1 with x86-64-v4, ran it on x86-64-v3 machine
and got:

[hjl@gnu-cfl-2 build-x86_64-linux]$ ./elf/tst-isa-level-1
./elf/tst-isa-level-1: CPU ISA level is lower than required
[hjl@gnu-cfl-2 build-x86_64-linux]$
2021-01-07 13:10:13 -08:00
Siddhesh Poyarekar
8cc1e39a36 Drop nan-pseudo-number.h usage from tests
Make the tests use TEST_COND_intel96 to decide on whether to build the
unnormal tests instead of the macro in nan-pseudo-number.h and then
drop the header inclusion.  This unbreaks test runs on all
architectures that do not have ldbl-96.

Also drop the HANDLE_PSEUDO_NUMBERS macro since it is not used
anywhere.
2021-01-04 20:49:56 +05:30
Paul Eggert
2b778ceb40 Update copyright dates with scripts/update-copyrights
I used these shell commands:

../glibc/scripts/update-copyrights $PWD/../gnulib/build-aux/update-copyright
(cd ../glibc && git commit -am"[this commit message]")

and then ignored the output, which consisted lines saying "FOO: warning:
copyright statement not found" for each of 6694 files FOO.
I then removed trailing white space from benchtests/bench-pthread-locks.c
and iconvdata/tst-iconv-big5-hkscs-to-2ucs4.c, to work around this
diagnostic from Savannah:
remote: *** pre-commit check failed ...
remote: *** error: lines with trailing whitespace found
remote: error: hook declined to update refs/heads/master
2021-01-02 12:17:34 -08:00
Siddhesh Poyarekar
7525c1c71d x86 long double: Consider pseudo numbers as signaling
Add support to treat pseudo-numbers specially and implement x86
version to consider all of them as signaling.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-12-30 10:52:45 +05:30
H.J. Lu
f380868f6d Remove _ISOMAC check from <cpu-features.h>
Remove _ISOMAC check from <cpu-features.h> since it isn't an installer
header file.
2020-12-24 15:43:34 -08:00
H.J. Lu
45dcd1af09 x86: Remove the duplicated CPU_FEATURE_CPU_P
CPU_FEATURE_CPU_P is defined in sysdeps/x86/sys/platform/x86.h.  Remove
the duplicated CPU_FEATURE_CPU_P in sysdeps/x86/include/cpu-features.h.
2020-12-24 04:39:08 -08:00
Siddhesh Poyarekar
41290b6e84 Partially revert 681900d296
Do not attempt to fix the significand top bit in long double input
received in printf.  The code should never reach here because isnan
should now detect unnormals as NaN.  This is already a NOP for glibc
since it uses the gcc __builtin_isnan, which detects unnormals as NaN.

Reviewed-by: Florian Weimer <fweimer@redhat.com>
2020-12-24 06:05:46 +05:30
Siddhesh Poyarekar
94547d9209 x86 long double: Support pseudo numbers in isnanl
This syncs up isnanl behaviour with gcc.  Also move the isnanl
implementation to sysdeps/x86 and remove the sysdeps/x86_64 version.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-12-24 06:05:40 +05:30
Siddhesh Poyarekar
b7f8815617 x86 long double: Support pseudo numbers in fpclassifyl
Also move sysdeps/i386/fpu/s_fpclassifyl.c to
sysdeps/x86/fpu/s_fpclassifyl.c and remove
sysdeps/x86_64/fpu/s_fpclassifyl.c

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-12-24 06:05:26 +05:30
H.J. Lu
a2e5da2cf4 <sys/platform/x86.h>: Add Intel LAM support
Add Intel Linear Address Masking (LAM) support to <sys/platform/x86.h>.
HAS_CPU_FEATURE (LAM) can be used to detect if LAM is enabled in CPU.

LAM modifies the checking that is applied to 64-bit linear addresses,
allowing software to use of the untranslated address bits for metadata.
2020-12-22 03:45:47 -08:00
H.J. Lu
2ee7711bdd x86: Remove the default REP MOVSB threshold tunable value [BZ #27061]
Since we can't tell if the tunable value is set by user or not:

https://sourceware.org/bugzilla/show_bug.cgi?id=27069

remove the default REP MOVSB threshold tunable value so that the correct
default value will be set correctly by init_cacheinfo ().

Reviewed-by: Carlos O'Donell <carlos@redhat.com>
2020-12-14 07:31:00 -08:00
Szabolcs Nagy
c00452d775 elf: Pass the fd to note processing
To handle GNU property notes on aarch64 some segments need to
be mmaped again, so the fd of the loaded ELF module is needed.

When the fd is not available (kernel loaded modules), then -1
is passed.

The fd is passed to both _dl_process_pt_gnu_property and
_dl_process_pt_note for consistency. Target specific note
processing functions are updated accordingly.

Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
2020-12-11 15:45:37 +00:00
H.J. Lu
93fda28693 x86: Adjust tst-cpu-features-supports.c for GCC 11
Check HAS_CPU_FEATURE instead of CPU_FEATURE_USABLE for FSGSBASE, IBT,
LM, SHSTK and XSAVES since FSGSBASE requires kernel support, IBT/SHSTK/LM
require OS support and XSAVES is supervisor-mode only.
2020-12-04 05:12:55 -08:00
H.J. Lu
2976082a38 x86: Set RDRAND usable if CPU supports RDRAND
Set RDRAND usable if CPU supports RDRAND.
2020-12-04 05:03:42 -08:00
Florian Weimer
0f34d426ac x86: Remove UP macro. Define LOCK_PREFIX unconditionally.
The UP macro is never defined.  Also define LOCK_PREFIX
unconditionally, to the same string.

Reviewed-by: Adhemerval Zanella  <adhemerval.zanella@linaro.org>
2020-11-13 15:20:03 +01:00
Florian Weimer
cb3a749a22 x86: Restore processing of cache size tunables in init_cacheinfo
Fixes and partially reverts commit 59803e81f9
("x86: Optimizing memcpy for AMD Zen architecture.").
2020-10-28 15:53:26 +01:00
Sajan Karumanchi
59803e81f9 x86: Optimizing memcpy for AMD Zen architecture.
Modifying the shareable cache '__x86_shared_cache_size', which is a
factor in computing the non-temporal threshold parameter
'__x86_shared_non_temporal_threshold' to optimize memcpy for AMD Zen
architectures.
In the existing implementation, the shareable cache is computed as 'L3
per thread, L2 per core'. Recomputing this shareable cache as 'L3 per
CCX(Core-Complex)' has brought in performance gains.
As per the large bench variant results, this patch also addresses the
regression problem on AMD Zen architectures.

Reviewed-by: Premachandra Mallappa <premachandra.mallappa@amd.com>
2020-10-28 09:57:14 +01:00
H.J. Lu
0f09154c64 x86: Initialize CPU info via IFUNC relocation [BZ 26203]
X86 CPU features in ld.so are initialized by init_cpu_features, which is
invoked by DL_PLATFORM_INIT from _dl_sysdep_start.  But when ld.so is
loaded by static executable, DL_PLATFORM_INIT is never called.  Also
x86 cache info in libc.o and libc.a is initialized by a constructor
which may be called too late.  Since some fields in _rtld_global_ro
in ld.so are initialized by dynamic relocation, we can also initialize
x86 CPU features in _rtld_global_ro in ld.so and cache info in libc.so
by initializing dummy function pointers in ld.so and libc.so via IFUNC
relocation.

Key points:

1. IFUNC is always supported, independent of --enable-multi-arch or
--disable-multi-arch.  Linker generates IFUNC relocations from input
IFUNC objects and ld.so performs IFUNC relocations.
2. There are no IFUNC dependencies in ld.so before dynamic relocation
have been performed,
3. The x86 CPU features in ld.so is initialized by DL_PLATFORM_INIT
in dynamic executable and by IFUNC relocation in dlopen in static
executable.
4. The x86 cache info in libc.o is initialized by IFUNC relocation.
5. In libc.a, both x86 CPU features and cache info are initialized from
ARCH_INIT_CPU_FEATURES, not by IFUNC relocation, before __libc_early_init
is called.

Note: _dl_x86_init_cpu_features can be called more than once from
DL_PLATFORM_INIT and during relocation in ld.so.
2020-10-16 16:17:53 -07:00
H.J. Lu
428985c436 <sys/platform/x86.h>: Add FSRCS/FSRS/FZLRM support
Add Fast Short REP CMP and SCA (FSRCS), Fast Short REP STO (FSRS) and
Fast Zero-Length REP MOV (FZLRM) support to <sys/platform/x86.h>.
2020-10-09 11:52:30 -07:00
H.J. Lu
c712401bc6 <sys/platform/x86.h>: Add Intel HRESET support
Add Intel HRESET support to <sys/platform/x86.h>.
2020-10-09 11:52:30 -07:00
H.J. Lu
875a50ff63 <sys/platform/x86.h>: Add AVX-VNNI support
Add AVX-VNNI support to <sys/platform/x86.h>.
2020-10-09 11:52:30 -07:00
H.J. Lu
ebe454bcca <sys/platform/x86.h>: Add AVX512_FP16 support
Add AVX512_FP16 support to <sys/platform/x86.h>.
2020-10-09 11:52:30 -07:00
H.J. Lu
7674695cf7 <sys/platform/x86.h>: Add Intel UINTR support
Add Intel UINTR support to <sys/platform/x86.h>.
2020-10-09 11:52:30 -07:00
Patrick McGehearty
d3c5702747 Reversing calculation of __x86_shared_non_temporal_threshold
The __x86_shared_non_temporal_threshold determines when memcpy on x86
uses non_temporal stores to avoid pushing other data out of the last
level cache.

This patch proposes to revert the calculation change made by H.J. Lu's
patch of June 2, 2017.

H.J. Lu's patch selected a threshold suitable for a single thread
getting maximum performance. It was tuned using the single threaded
large memcpy micro benchmark on an 8 core processor. The last change
changes the threshold from using 3/4 of one thread's share of the
cache to using 3/4 of the entire cache of a multi-threaded system
before switching to non-temporal stores. Multi-threaded systems with
more than a few threads are server-class and typically have many
active threads. If one thread consumes 3/4 of the available cache for
all threads, it will cause other active threads to have data removed
from the cache. Two examples show the range of the effect. John
McCalpin's widely parallel Stream benchmark, which runs in parallel
and fetches data sequentially, saw a 20% slowdown with this patch on
an internal system test of 128 threads. This regression was discovered
when comparing OL8 performance to OL7.  An example that compares
normal stores to non-temporal stores may be found at
https://vgatherps.github.io/2018-09-02-nontemporal/.  A simple test
shows performance loss of 400 to 500% due to a failure to use
nontemporal stores. These performance losses are most likely to occur
when the system load is heaviest and good performance is critical.

The tunable x86_non_temporal_threshold can be used to override the
default for the knowledgable user who really wants maximum cache
allocation to a single thread in a multi-threaded system.
The manual entry for the tunable has been expanded to provide
more information about its purpose.

	modified: sysdeps/x86/cacheinfo.c
	modified: manual/tunables.texi
2020-09-28 22:10:39 +00:00
Florian Weimer
681900d296 x86: Harden printf against non-normal long double values (bug 26649)
The behavior of isnan/__builtin_isnan on bit patterns that do not
correspond to something that the CPU would produce from valid inputs
is currently under-defined in the toolchain. (The GCC built-in and
glibc disagree.)

The isnan check in PRINTF_FP_FETCH in stdio-common/printf_fp.c
assumes the GCC behavior that returns true for non-normal numbers
which are not specified as NaN. (The glibc implementation returns
false for such numbers.)

At present, passing non-normal numbers to __mpn_extract_long_double
causes this function to produce irregularly shaped multi-precision
integers, triggering undefined behavior in __printf_fp_l.

With GCC 10 and glibc 2.32, this behavior is not visible because
__builtin_isnan is used, which avoids calling
__mpn_extract_long_double in this case.  This commit updates the
implementation of __mpn_extract_long_double so that regularly shaped
multi-precision integers are produced in this case, avoiding
undefined behavior in __printf_fp_l.
2020-09-22 19:07:49 +02:00
Florian Weimer
90ccfdf176 x86: Use one ldbl2mpn.c file for both i386 and x86_64 2020-09-22 17:58:39 +02:00
H.J. Lu
94cd37ebb2 x86: Use HAS_CPU_FEATURE with IBT and SHSTK [BZ #26625]
commit 04bba1e5d8
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Aug 5 13:51:56 2020 -0700

    x86: Set CPU usable feature bits conservatively [BZ #26552]

    Set CPU usable feature bits only for CPU features which are usable in
    user space and whose usability can be detected from user space, excluding
    features like FSGSBASE whose enable bit can only be checked in the kernel.

no longer turns on the usable bits of IBT and SHSTK since we don't know
if IBT and SHSTK are usable until much later.  Use HAS_CPU_FEATURE to
check if the processor supports IBT and SHSTK.
2020-09-17 05:18:36 -07:00
H.J. Lu
f2c679d4b2 <sys/platform/x86.h>: Add Intel Key Locker support
Add Intel Key Locker:

https://software.intel.com/content/www/us/en/develop/download/intel-key-locker-specification.html

support to <sys/platform/x86.h>.  Intel Key Locker has

1. KL: AES Key Locker instructions.
2. WIDE_KL: AES wide Key Locker instructions.
3. AESKLE: AES Key Locker instructions are enabled by OS.

Applications should use

if (CPU_FEATURE_USABLE (KL))

and

if (CPU_FEATURE_USABLE (WIDE_KL))

to check if AES Key Locker instructions and AES wide Key Locker
instructions are usable.
2020-09-16 05:56:10 -07:00
H.J. Lu
9620398097 x86: Install <sys/platform/x86.h> [BZ #26124]
Install <sys/platform/x86.h> so that programmers can do

 #if __has_include(<sys/platform/x86.h>)
 #include <sys/platform/x86.h>
 #endif
 ...

   if (CPU_FEATURE_USABLE (SSE2))
 ...
   if (CPU_FEATURE_USABLE (AVX2))
 ...

<sys/platform/x86.h> exports only:

enum
{
  COMMON_CPUID_INDEX_1 = 0,
  COMMON_CPUID_INDEX_7,
  COMMON_CPUID_INDEX_80000001,
  COMMON_CPUID_INDEX_D_ECX_1,
  COMMON_CPUID_INDEX_80000007,
  COMMON_CPUID_INDEX_80000008,
  COMMON_CPUID_INDEX_7_ECX_1,
  /* Keep the following line at the end.  */
  COMMON_CPUID_INDEX_MAX
};

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
};

/* Get a pointer to the CPU features structure.  */
extern const struct cpu_features *__x86_get_cpu_features
  (unsigned int max) __attribute__ ((const));

Since all feature checks are done through macros, programs compiled with
a newer <sys/platform/x86.h> are compatible with the older glibc binaries
as long as the layout of struct cpu_features is identical.  The features
array can be expanded with backward binary compatibility for both .o and
.so files.  When COMMON_CPUID_INDEX_MAX is increased to support new
processor features, __x86_get_cpu_features in the older glibc binaries
returns NULL and HAS_CPU_FEATURE/CPU_FEATURE_USABLE return false on the
new processor feature.  No new symbol version is neeeded.

Both CPU_FEATURE_USABLE and HAS_CPU_FEATURE are provided.  HAS_CPU_FEATURE
can be used to identify processor features.

Note: Although GCC has __builtin_cpu_supports, it only supports a subset
of <sys/platform/x86.h> and it is equivalent to CPU_FEATURE_USABLE.  It
doesn't support HAS_CPU_FEATURE.
2020-09-11 17:20:52 -07:00
H.J. Lu
04bba1e5d8 x86: Set CPU usable feature bits conservatively [BZ #26552]
Set CPU usable feature bits only for CPU features which are usable in
user space and whose usability can be detected from user space, excluding
features like FSGSBASE whose enable bit can only be checked in the kernel.
2020-09-03 04:36:20 -07:00
H.J. Lu
ac3bda9a25 x86: Rename Intel CPU feature names
Intel64 and IA-32 Architectures Software Developer’s Manual has changed
the following CPU feature names:

1. The CPU feature of Enhanced Intel SpeedStep Technology is renamed
from EST to EIST.
2. The CPU feature which supports Platform Quality of Service Monitoring
(PQM) capability is changed to Intel Resource Director Technology
(Intel RDT) Monitoring capability, i.e. PQM is renamed to RDT_M.
3. The CPU feature which supports Platform Quality of Service
Enforcement (PQE) capability is changed to Intel Resource Director
Technology (Intel RDT) Allocation capability, i.e. PQE is renamed to
RDT_A.
2020-08-05 11:48:46 -07:00
H.J. Lu
107e6a3c22 x86: Support usable check for all CPU features
Support usable check for all CPU features with the following changes:

1. Change struct cpu_features to

struct cpuid_features
{
  struct cpuid_registers cpuid;
  struct cpuid_registers usable;
};

struct cpu_features
{
  struct cpu_features_basic basic;
  struct cpuid_features features[COMMON_CPUID_INDEX_MAX];
  unsigned int preferred[PREFERRED_FEATURE_INDEX_MAX];
...
};

so that there is a usable bit for each cpuid bit.
2. After the cpuid bits have been initialized, copy the known bits to the
usable bits.  EAX/EBX from INDEX_1 and EAX from INDEX_7 aren't used for
CPU feature detection.
3. Clear the usable bits which require OS support.
4. If the feature is supported by OS, copy its cpuid bit to its usable
bit.
5. Replace HAS_CPU_FEATURE and CPU_FEATURES_CPU_P with CPU_FEATURE_USABLE
and CPU_FEATURE_USABLE_P to check if a feature is usable.
6. Add DEPR_FPU_CS_DS for INDEX_7_EBX_13.
7. Unset MPX feature since it has been deprecated.

The results are

1. If the feature is known and doesn't requre OS support, its usable bit
is copied from the cpuid bit.
2. Otherwise, its usable bit is copied from the cpuid bit only if the
feature is known to supported by OS.
3. CPU_FEATURE_USABLE/CPU_FEATURE_USABLE_P are used to check if the
feature can be used.
4. HAS_CPU_FEATURE/CPU_FEATURE_CPU_P are used to check if CPU supports
the feature.
2020-07-13 06:05:16 -07:00
H.J. Lu
43530ba1dc x86: Remove __ASSEMBLER__ check in init-arch.h
Since

commit 430388d5dc
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Fri Aug 3 08:04:49 2018 -0700

    x86: Don't include <init-arch.h> in assembly codes

removed all usages of <init-arch.h> from assembly codes, we can remove
__ASSEMBLER__ check in init-arch.h.
2020-07-11 10:03:05 -07:00
H.J. Lu
9016b6f389 x86: Remove the unused __x86_prefetchw
Since

commit c867597bff
Author: H.J. Lu <hjl.tools@gmail.com>
Date:   Wed Jun 8 13:57:50 2016 -0700

    X86-64: Remove previous default/SSE2/AVX2 memcpy/memmove

removed the only usage of __x86_prefetchw, we can remove the unused
__x86_prefetchw.
2020-07-11 09:34:03 -07:00