ARM _dl_tlsdesc_dynamic slow path has two issues:
* The ip/r12 is defined by AAPCS as a scratch register, and gcc is
used to save the stack pointer before on some function calls. So it
should also be saved/restored as well. It fixes the tst-gnu2-tls2.
* None of the possible VFP registers are saved/restored. ARM has the
additional complexity to have different VFP bank sizes (depending of
VFP support by the chip).
The tst-gnu2-tls2 test is extended to check for VFP registers, although
only for hardfp builds. Different than setcontext, _dl_tlsdesc_dynamic
does not have HWCAP_ARM_IWMMXT (I don't have a way to properly test
it and it is almost a decade since newer hardware was released).
With this patch there is no need to mark tst-gnu2-tls2 as XFAIL.
Checked on arm-linux-gnueabihf.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
_dl_tlsdesc_dynamic preserves RDI, RSI and RBX before realigning stack.
After realigning stack, it saves RCX, RDX, R8, R9, R10 and R11. Define
TLSDESC_CALL_REGISTER_SAVE_AREA to allocate space for RDI, RSI and RBX
to avoid clobbering saved RDI, RSI and RBX values on stack by xsave to
STATE_SAVE_OFFSET(%rsp).
+==================+<- stack frame start aligned at 8 or 16 bytes
| |<- RDI saved in the red zone
| |<- RSI saved in the red zone
| |<- RBX saved in the red zone
| |<- paddings for stack realignment of 64 bytes
|------------------|<- xsave buffer end aligned at 64 bytes
| |<-
| |<-
| |<-
|------------------|<- xsave buffer start at STATE_SAVE_OFFSET(%rsp)
| |<- 8-byte padding for 64-byte alignment
| |<- 8-byte padding for 64-byte alignment
| |<- R11
| |<- R10
| |<- R9
| |<- R8
| |<- RDX
| |<- RCX
+==================+<- RSP aligned at 64 bytes
Define TLSDESC_CALL_REGISTER_SAVE_AREA, the total register save area size
for all integer registers by adding 24 to STATE_SAVE_OFFSET since RDI, RSI
and RBX are saved onto stack without adjusting stack pointer first, using
the red-zone. This fixes BZ #31501.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
Originally, nptl/descr.h included <sys/rseq.h>, but we removed that
in commit 2c6b4b272e ("nptl:
Unconditionally use a 32-byte rseq area"). After that, it was
not ensured that the RSEQ_SIG macro was defined during sched_getcpu.c
compilation that provided a definition. This commit always checks
the rseq area for CPU number information before using the other
approaches.
This adds an unnecessary (but well-predictable) branch on
architectures which do not define RSEQ_SIG, but its cost is small
compared to the system call. Most architectures that have vDSO
acceleration for getcpu also have rseq support.
Fixes: 2c6b4b272e
Fixes: 1d350aa060
Reviewed-by: Arjun Shankar <arjun@redhat.com>
Due to GCC bug 110901 -mcpu can override -march setting when compiling
asm code and thus a compiler targetting a specific cpu can fail the
configure check even when binutils gas supports SVE.
The workaround is that explicit .arch directive overrides both -mcpu
and -march, and since that's what the actual SVE memcpy uses the
configure check should use that too even if the GCC issue is fixed
independently.
Reviewed-by: Florian Weimer <fweimer@redhat.com>
This patch updates the kernel version in the tests tst-mman-consts.py,
tst-mount-consts.py and tst-pidfd-consts.py to 6.8. (There are no new
constants covered by these tests in 6.8 that need any other header
changes.)
Tested with build-many-glibcs.py.
Linux 6.8 adds five new syscalls. Update syscall-names.list and
regenerate the arch-syscall.h headers with build-many-glibcs.py
update-syscalls.
Tested with build-many-glibcs.py.
Similar to strstr (1e9a550ba4), power8 strcasestr does not show much
improvement compared to the generic implementation. The geomean
on bench-strcasestr shows:
__strcasestr_power8 __strcasestr_ppc
power10 1159 1120
power9 1640 1469
power8 1787 1904
The strcasestr uses the same 'trick' as power7 strstr to detect
potential quadradic behavior, which only adds overheads for input
that trigger quadradic behavior and it is really a hack.
Checked on powerpc64le-linux-gnu.
Reviewed-by: DJ Delorie <dj@redhat.com>
The memcpy optimization (commit 587a1290a1) has a series
of mistakes:
- The implementation is wrong: the chunk size calculation is wrong
leading to invalid memory access.
- It adds ifunc supports as default, so --disable-multi-arch does
not work as expected for riscv.
- It mixes Linux files (memcpy ifunc selection which requires the
vDSO/syscall mechanism) with generic support (the memcpy
optimization itself).
- There is no __libc_ifunc_impl_list, which makes testing only
check the selected implementation instead of all supported
by the system.
This patch also simplifies the required bits to enable ifunc: there
is no need to memcopy.h; nor to add Linux-specific files.
The __memcpy_noalignment tail handling now uses a branchless strategy
similar to aarch64 (overlap 32-bits copies for sizes 4..7 and byte
copies for size 1..3).
Checked on riscv64 and riscv32 by explicitly enabling the function
on __libc_ifunc_impl_list on qemu-system.
Changes from v1:
* Implement the memcpy in assembly to correctly handle RISCV
strict-alignment.
Reviewed-by: Evan Green <evan@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Each mask in the sigset array is an unsigned long, so fix __sigisemptyset
to use that instead of int. The __sigword function returns a simple array
index, so it can return int instead of unsigned long.
Protect the global locale from being modified while we compute the size of
the locale category names. That allows the use of the global locale in a
single thread, while all other threads use the thread safe locale
functions.
Replace minimum ISA check ifdef conditional with if. Since
MINIMUM_X86_ISA_LEVEL and AVX_X86_ISA_LEVEL are compile time constants,
compiler will perform constant folding optimization, getting same
results.
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
For CPU implementations that can perform unaligned accesses with little
or no performance penalty, create a memcpy implementation that does not
bother aligning buffers. It will use a block of integer registers, a
single integer register, and fall back to bytewise copy for the
remainder.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add a little helper method so it's easier to fetch a single value from
the hwprobe function when used within an ifunc selector.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
RISC-V is apparently the first architecture to pass more than one
argument to ifunc resolvers. The helper macros in libc-symbols.h,
__ifunc_resolver(), __ifunc(), and __ifunc_hidden(), are incompatible
with this. These macros have an "arg" (non-final) parameter that
represents the parameter signature of the ifunc resolver. The result is
an inability to pass the required comma through in a single preprocessor
argument.
Rearrange the __ifunc_resolver() macro to be variadic, and pass the
types as those variable parameters. Move the guts of __ifunc() and
__ifunc_hidden() into new macros, __ifunc_args(), and
__ifunc_args_hidden(), that pass the variable arguments down through to
__ifunc_resolver(). Then redefine __ifunc() and __ifunc_hidden(), which
are used in a bunch of places, to simply shuffle the arguments down into
__ifunc_args[_hidden]. Finally, define a riscv-ifunc.h header, which
provides convenience macros to those looking to write ifunc selectors
that use both arguments.
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The new __riscv_hwprobe() function is designed to be used by ifunc
selector functions. This presents a challenge for applications and
libraries, as ifunc selectors are invoked before all relocations have
been performed, so an external call to __riscv_hwprobe() from an ifunc
selector won't work. To address this, pass a pointer to the
__riscv_hwprobe() function into ifunc selectors as the second
argument (alongside dl_hwcap, which was already being passed).
Include a typedef as well for convenience, so that ifunc users don't
have to go through contortions to call this routine. Users will need to
remember to check the second argument for NULL, to account for older
glibcs that don't pass the function.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
The new riscv_hwprobe syscall also comes with a vDSO for faster answers
to your most common questions. Call in today to speak with a kernel
representative near you!
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add an INTERNAL_VSYSCALL() macro that makes a vDSO call, falling back to
a regular syscall, but without setting errno. Instead, the return value
is plumbed straight out of the macro.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add awareness and a thin wrapper function around a new Linux system call
that allows callers to get architecture and microarchitecture
information about the CPUs from the kernel. This can be used to
do things like dynamically choose a memcpy implementation.
Signed-off-by: Evan Green <evan@rivosinc.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Add a tunable for setting __libc_enable_secure to 1. Do not set
__libc_enable_secure to 0 if the tunable is set to 0. Ignore all
tunables if glib.rtld.enable_secure is set. One use-case for this
addition is to enable testing code paths that depend on
__libc_enable_secure being set without the need to use setxid binaries.
Reviewed-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
_dl_tlsdesc_dynamic should also preserve AMX registers which are
caller-saved. Add X86_XSTATE_TILECFG_ID and X86_XSTATE_TILEDATA_ID
to x86-64 TLSDESC_CALL_STATE_SAVE_MASK. Compute the AMX state size
and save it in xsave_state_full_size which is only used by
_dl_tlsdesc_dynamic_xsave and _dl_tlsdesc_dynamic_xsavec. This fixes
the AMX part of BZ #31372. Tested on AMX processor.
AMX test is enabled only for compilers with the fix for
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114098
GCC 14 and GCC 11/12/13 branches have the bug fix.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
When strcmp-avx2.S is used as the default, elf/tst-valgrind-smoke fails
with
==1272761== Conditional jump or move depends on uninitialised value(s)
==1272761== at 0x4022C98: strcmp (strcmp-avx2.S:462)
==1272761== by 0x400B05B: _dl_name_match_p (dl-misc.c:75)
==1272761== by 0x40085F3: _dl_map_object (dl-load.c:1966)
==1272761== by 0x401AEA4: map_doit (rtld.c:644)
==1272761== by 0x4001488: _dl_catch_exception (dl-catch.c:237)
==1272761== by 0x40015AE: _dl_catch_error (dl-catch.c:256)
==1272761== by 0x401B38F: do_preload (rtld.c:816)
==1272761== by 0x401C116: handle_preload_list (rtld.c:892)
==1272761== by 0x401EDF5: dl_main (rtld.c:1842)
==1272761== by 0x401A79E: _dl_sysdep_start (dl-sysdep.c:140)
==1272761== by 0x401BEEE: _dl_start_final (rtld.c:494)
==1272761== by 0x401BEEE: _dl_start (rtld.c:581)
==1272761== by 0x401AD87: ??? (in */elf/ld.so)
The assembly codes are:
0x0000000004022c80 <+144>: vmovdqu 0x20(%rdi),%ymm0
0x0000000004022c85 <+149>: vpcmpeqb 0x20(%rsi),%ymm0,%ymm1
0x0000000004022c8a <+154>: vpcmpeqb %ymm0,%ymm15,%ymm2
0x0000000004022c8e <+158>: vpandn %ymm1,%ymm2,%ymm1
0x0000000004022c92 <+162>: vpmovmskb %ymm1,%ecx
0x0000000004022c96 <+166>: inc %ecx
=> 0x0000000004022c98 <+168>: jne 0x4022c32 <strcmp+66>
strcmp-avx2.S has 32-byte vector loads of strings which are shorter than
32 bytes:
(gdb) p (char *) ($rdi + 0x20)
$6 = 0x1ffeffea20 "memcheck-amd64-linux.so"
(gdb) p (char *) ($rsi + 0x20)
$7 = 0x4832640 "core-amd64-linux.so"
(gdb) call (int) strlen ((char *) ($rsi + 0x20))
$8 = 19
(gdb) call (int) strlen ((char *) ($rdi + 0x20))
$9 = 23
(gdb)
It triggers the valgrind error. The above code is safe since the loads
don't cross the page boundary. Update tst-valgrind-smoke.sh to accept
an optional suppression file and pass a suppression file to valgrind when
strcmp-avx2.S is the default implementation of strcmp.
Reviewed-by: Sunil K Pandey <skpgkp2@gmail.com>
When glibc is built with ISA level 3 or above enabled, SSE resolvers
aren't available and glibc fails to build:
ld: .../elf/librtld.os: in function `init_cpu_features':
.../elf/../sysdeps/x86/cpu-features.c:1200:(.text+0x1445f): undefined reference to `_dl_runtime_resolve_fxsave'
ld: .../elf/librtld.os: relocation R_X86_64_PC32 against undefined hidden symbol `_dl_runtime_resolve_fxsave' can not be used when making a shared object
/usr/local/bin/ld: final link failed: bad value
For ISA level 3 or above, don't use _dl_runtime_resolve_fxsave nor
_dl_tlsdesc_dynamic_fxsave.
This fixes BZ #31429.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
Compiler generates the following instruction sequence for GNU2 dynamic
TLS access:
leaq tls_var@TLSDESC(%rip), %rax
call *tls_var@TLSCALL(%rax)
or
leal tls_var@TLSDESC(%ebx), %eax
call *tls_var@TLSCALL(%eax)
CALL instruction is transparent to compiler which assumes all registers,
except for EFLAGS and RAX/EAX, are unchanged after CALL. When
_dl_tlsdesc_dynamic is called, it calls __tls_get_addr on the slow
path. __tls_get_addr is a normal function which doesn't preserve any
caller-saved registers. _dl_tlsdesc_dynamic saved and restored integer
caller-saved registers, but didn't preserve any other caller-saved
registers. Add _dl_tlsdesc_dynamic IFUNC functions for FNSAVE, FXSAVE,
XSAVE and XSAVEC to save and restore all caller-saved registers. This
fixes BZ #31372.
Add GLRO(dl_x86_64_runtime_resolve) with GLRO(dl_x86_tlsdesc_dynamic)
to optimize elf_machine_runtime_setup.
Reviewed-by: Noah Goldstein <goldstein.w.n@gmail.com>
When passed a pointer to a zero-sized struct, the access attribute
without the third argument misleads -Wstringop-overflow diagnostics to
think that a function is writing 1 byte into the zero-sized structs.
The attribute doesn't add that much value in this context, so drop it
completely for _FORTIFY_SOURCE=3.
Resolves: BZ #31383
Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Instead of tying based on the linker name and version, check for the
required support:
* whether it does not generate dynamic TLS relocations in PIE
(binutils PR ld/22263);
* if it accepts --no-dynamic-linker (by using -static-pie);
* and if it adds a DT_JMPREL pointing to .rela.iplt with static pie.
The patch also trims the comments, for binutils one of the tests should
already cover it. The kernel ones are not clear which version should
have the backport, nor it is something that glibc can do much about
it. Finally, the glibc is somewhat confusing, since it refers
to commits not related to s390x.
Checked with a build for s390x-linux-gnu.
Reviewed-by: Stefan Liebler <stli@linux.ibm.com>
It improve mq_open. The compile and runtime checks have similar
coverage as with GCC.
Checked on aarch64, armhf, x86_64, and i686.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
It improves open, open64, openat, and openat64. The compile and runtime
checks have similar coverage as with GCC.
Checked on aarch64, armhf, x86_64, and i686.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
It improve fortify checks for wmemcpy, wmemmove, wmemset, wcscpy,
wcpcpy, wcsncpy, wcpncpy, wcscat, wcsncat, wcslcpy, wcslcat, swprintf,
fgetws, fgetws_unlocked, wcrtomb, mbsrtowcs, wcsrtombs, mbsnrtowcs, and
wcsnrtombs. The compile and runtime checks have similar coverage as
with GCC.
Checked on aarch64, armhf, x86_64, and i686.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
It improve fortify checks for syslog and vsyslog. The compile
and runtime hecks have similar coverage as with GCC.
The syslog fortify wrapper calls the va_arg version, since clang does
not support __va_arg_pack.
Checked on aarch64, armhf, x86_64, and i686.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
It improve fortify checks recv, recvfrom, poll, and ppoll. The compile
and runtime hecks have similar coverage as with GCC.
Checked on aarch64, armhf, x86_64, and i686.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
It improve fortify checks for read, pread, pread64, readlink,
readlinkat, getcwd, getwd, confstr, getgroups, ttyname_r, getlogin_r,
gethostname, and getdomainname. The compile and runtime checks have
similar coverage as with GCC.
Checked on aarch64, armhf, x86_64, and i686.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
It improve fortify checks for realpath, ptsname_r, wctomb, mbstowcs,
and wcstombs. The runtime and compile checks have similar coverage as
with GCC.
Checked on aarch64, armhf, x86_64, and i686.
Tested-by: Carlos O'Donell <carlos@redhat.com>
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
It improve fortify checks for strcpy, stpcpy, strncpy, stpncpy, strcat,
strncat, strlcpy, and strlcat. The runtime and compile checks have
similar coverage as with GCC.
Checked on aarch64, armhf, x86_64, and i686.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
It improve fortify checks for sprintf, vsprintf, vsnsprintf, fprintf,
dprintf, asprintf, __asprintf, obstack_printf, gets, fgets,
fgets_unlocked, fread, and fread_unlocked. The runtime checks have
similar support coverage as with GCC.
For function with variadic argument (sprintf, snprintf, fprintf, printf,
dprintf, asprintf, __asprintf, obstack_printf) the fortify wrapper calls
the va_arg version since clang does not support __va_arg_pack.
Checked on aarch64, armhf, x86_64, and i686.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
For instance, the read wrapper is currently expanded as:
extern __inline
__attribute__((__always_inline__))
__attribute__((__artificial__))
__attribute__((__warn_unused_result__))
ssize_t read (int __fd, void *__buf, size_t __nbytes)
{
return __glibc_safe_or_unknown_len (__nbytes,
sizeof (char),
__glibc_objsize0 (__buf))
? __read_alias (__fd, __buf, __nbytes)
: __glibc_unsafe_len (__nbytes,
sizeof (char),
__glibc_objsize0 (__buf))
? __read_chk_warn (__fd,
__buf,
__nbytes,
__builtin_object_size (__buf, 0))
: __read_chk (__fd,
__buf,
__nbytes,
__builtin_object_size (__buf, 0));
}
The wrapper relies on __builtin_object_size call lowers to a constant at
compile-time and many other operations in the wrapper depends on
having a single, known value for parameters. Because this is
impossible to have for function parameters, the wrapper depends heavily
on inlining to work and While this is an entirely viable approach on
GCC, it is not fully reliable on clang. This is because by the time llvm
gets to inlining and optimizing, there is a minimal reliable source and
type-level information available (more information on a more deep
explanation on how to fortify wrapper works on clang [1]).
To allow the wrapper to work reliably and with the same functionality as
with GCC, clang requires a different approach:
* __attribute__((diagnose_if(c, “str”, “warning”))) which is a function
level attribute; if the compiler can determine that 'c' is true at
compile-time, it will emit a warning with the text 'str1'. If it would
be better to emit an error, the wrapper can use "error" instead of
"warning".
* __attribute__((overloadable)) which is also a function-level attribute;
and it allows C++-style overloading to occur on C functions.
* __attribute__((pass_object_size(n))) which is a parameter-level
attribute; and it makes the compiler evaluate
__builtin_object_size(param, n) at each call site of the function
that has the parameter, and passes it in as a hidden parameter.
This attribute has two side-effects that are key to how FORTIFY works:
1. It can overload solely on pass_object_size (e.g. there are two
overloads of foo in
void foo(char * __attribute__((pass_object_size(0))) c);
void foo(char *);
(The one with pass_object_size attribute has precende over the
default one).
2. A function with at least one pass_object_size parameter can never
have its address taken (and overload resolution respects this).
Thus the read wrapper can be implemented as follows, without
hindering any fortify coverage compile and runtime:
extern __inline
__attribute__((__always_inline__))
__attribute__((__artificial__))
__attribute__((__overloadable__))
__attribute__((__warn_unused_result__))
ssize_t read (int __fd,
void *const __attribute__((pass_object_size (0))) __buf,
size_t __nbytes)
__attribute__((__diagnose_if__ ((((__builtin_object_size (__buf, 0)) != -1ULL
&& (__nbytes) > (__builtin_object_size (__buf, 0)) / (1))),
"read called with bigger length than size of the destination buffer",
"warning")))
{
return (__builtin_object_size (__buf, 0) == (size_t) -1)
? __read_alias (__fd,
__buf,
__nbytes)
: __read_chk (__fd,
__buf,
__nbytes,
__builtin_object_size (__buf, 0));
}
To avoid changing the current semantic for GCC, a set of macros is
defined to enable the clang required attributes, along with some changes
on internal macros to avoid the need to issue the symbol_chk symbols
(which are done through the __diagnose_if__ attribute for clang).
The read wrapper is simplified as:
__fortify_function __attribute_overloadable__ __wur
ssize_t read (int __fd,
__fortify_clang_overload_arg0 (void *, ,__buf),
size_t __nbytes)
__fortify_clang_warning_only_if_bos0_lt (__nbytes, __buf,
"read called with bigger length than "
"size of the destination buffer")
{
return __glibc_fortify (read, __nbytes, sizeof (char),
__glibc_objsize0 (__buf),
__fd, __buf, __nbytes);
}
There is no expected semantic or code change when using GCC.
Also, clang does not support __va_arg_pack, so variadic functions are
expanded to call va_arg implementations. The error function must not
have bodies (address takes are expanded to nonfortified calls), and
with the __fortify_function compiler might still create a body with the
C++ mangling name (due to the overload attribute). In this case, the
function is defined with __fortify_function_error_function macro
instead.
[1] https://docs.google.com/document/d/1DFfZDICTbL7RqS74wJVIJ-YnjQOj1SaoqfhbgddFYSM/edit
Checked on aarch64, armhf, x86_64, and i686.
Reviewed-by: Carlos O'Donell <carlos@redhat.com>
Tested-by: Carlos O'Donell <carlos@redhat.com>
This includes a fix for big-endian in AdvSIMD log, some cosmetic
changes, and numerous small optimisations mainly around inlining and
using indexed variants of MLA intrinsics.
Reviewed-by: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Starting with commit e57d8fc97b
"S390: Always use svc 0"
clone clobbers the call-saved register r7 in error case:
function or stack is NULL.
This patch restores the saved registers also in the error case.
Furthermore the existing test misc/tst-clone is extended to check
all error cases and that clone does not clobber registers in this
error case.
When glibc is built with ISA level 3 or higher by default, the resulting
glibc binaries won't run on SSE or FMA4 processors. Exclude SSE, AVX and
FMA4 variants in libm multiarch when ISA level 3 or higher is enabled by
default.
When glibc is built with ISA level 2 enabled by default, only keep SSE4.1
variant.
Fixes BZ 31335.
NB: elf/tst-valgrind-smoke test fails with ISA level 4, because valgrind
doesn't support AVX512 instructions:
https://bugs.kde.org/show_bug.cgi?id=383010
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>