index_* and bit_* macros are used to access cpuid and feature arrays o
struct cpu_features. It is very easy to use bits and indices of cpuid
array on feature array, especially in assembly codes. For example,
sysdeps/i386/i686/multiarch/bcopy.S has
HAS_CPU_FEATURE (Fast_Rep_String)
which should be
HAS_ARCH_FEATURE (Fast_Rep_String)
We change index_* and bit_* to index_cpu_*/index_arch_* and
bit_cpu_*/bit_arch_* so that we can catch such error at build time.
[BZ #19762]
* sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h
(EXTRA_LD_ENVVARS): Add _arch_ to index_*/bit_*.
* sysdeps/x86/cpu-features.c (init_cpu_features): Likewise.
* sysdeps/x86/cpu-features.h (bit_*): Renamed to ...
(bit_arch_*): This for feature array.
(bit_*): Renamed to ...
(bit_cpu_*): This for cpu array.
(index_*): Renamed to ...
(index_arch_*): This for feature array.
(index_*): Renamed to ...
(index_cpu_*): This for cpu array.
[__ASSEMBLER__] (HAS_FEATURE): Add and use field.
[__ASSEMBLER__] (HAS_CPU_FEATURE)): Pass cpu to HAS_FEATURE.
[__ASSEMBLER__] (HAS_ARCH_FEATURE)): Pass arch to HAS_FEATURE.
[!__ASSEMBLER__] (HAS_CPU_FEATURE): Replace index_##name and
bit_##name with index_cpu_##name and bit_cpu_##name.
[!__ASSEMBLER__] (HAS_ARCH_FEATURE): Replace index_##name and
bit_##name with index_arch_##name and bit_arch_##name.
Since x86 has an optimized mempcpy and GCC can inline mempcpy on x86,
define _HAVE_STRING_ARCH_mempcpy to 1 for x86.
[BZ #19759]
* sysdeps/x86/bits/string.h (_HAVE_STRING_ARCH_mempcpy): New.
As discussed in
https://sourceware.org/ml/libc-alpha/2015-10/msg00403.html
the setting of _STRING_ARCH_unaligned currently controls the external
GLIBC ABI as well as selecting the use of unaligned accesses withing
GLIBC.
Since _STRING_ARCH_unaligned was recently changed for AArch64, this
would potentially break the ABI in GLIBC 2.23, so split the uses and add
_STRING_INLINE_unaligned to select the string ABI. This setting must be
fixed for each target, while _STRING_ARCH_unaligned may be changed from
release to release. _STRING_ARCH_unaligned is used unconditionally in
glibc. But <bits/string.h>, which defines _STRING_ARCH_unaligned, isn't
included with -Os. Since _STRING_ARCH_unaligned is internal to glibc and
may change between glibc releases, it should be made private to glibc.
_STRING_ARCH_unaligned should defined in the new string_private.h heade
file which is included unconditionally from internal <string.h> for glibc
build.
[BZ #19462]
* bits/string.h (_STRING_ARCH_unaligned): Renamed to ...
(_STRING_INLINE_unaligned): This.
* include/string.h: Include <string_private.h>.
* string/bits/string2.h: Replace _STRING_ARCH_unaligned with
_STRING_INLINE_unaligned.
* sysdeps/aarch64/bits/string.h (_STRING_ARCH_unaligned): Removed.
(_STRING_INLINE_unaligned): New.
* sysdeps/aarch64/string_private.h: New file.
* sysdeps/generic/string_private.h: Likewise.
* sysdeps/m68k/m680x0/m68020/string_private.h: Likewise.
* sysdeps/s390/string_private.h: Likewise.
* sysdeps/x86/string_private.h: Likewise.
* sysdeps/m68k/m680x0/m68020/bits/string.h
(_STRING_ARCH_unaligned): Renamed to ...
(_STRING_INLINE_unaligned): This.
* sysdeps/s390/bits/string.h (_STRING_ARCH_unaligned): Renamed
to ...
(_STRING_INLINE_unaligned): This.
* sysdeps/sparc/bits/string.h (_STRING_ARCH_unaligned): Renamed
to ...
(_STRING_INLINE_unaligned): This.
* sysdeps/x86/bits/string.h (_STRING_ARCH_unaligned): Renamed
to ...
(_STRING_INLINE_unaligned): This.
GLIBC benchtest testcases shows SSE2_Unaligned based implementations
are performing faster compare to SSE2 based implementations for
routines: strcmp, strcat, strncat, stpcpy, stpncpy, strcpy, strncpy
and strstr. Flag index_Fast_Unaligned_Load is set for Excavator family
0x15h CPU's. This makes SSE2_Unaligned based implementations as
default for these routines.
[BZ #19467]
* sysdeps/x86/cpu-features.c (init_cpu_features): Set
index_Fast_Unaligned_Load flag for Excavator family CPUs.
It shows improvement up to 28% over AVX2 memset (performance results
attached at <https://sourceware.org/ml/libc-alpha/2015-12/msg00052.html>).
* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: New file.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new file.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
* sysdeps/x86_64/multiarch/memset.S: Added new IFUNC branch.
* sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
* sysdeps/x86/cpu-features.h (bit_Prefer_No_VZEROUPPER,
index_Prefer_No_VZEROUPPER): New.
* sysdeps/x86/cpu-features.c (init_cpu_features): Set the
Prefer_No_VZEROUPPER for Knights Landing.
According to Silvermont software optimization guide, for 64-bit
applications, branch prediction performance can be negatively impacted
when the target of a branch is more than 4GB away from the branch. Add
the Prefer_MAP_32BIT_EXEC bit so that mmap will try to map executable
pages with MAP_32BIT first. NB: MAP_32BIT will map to lower 2GB, not
lower 4GB, address. Prefer_MAP_32BIT_EXEC reduces bits available for
address space layout randomization (ASLR), which is always disabled for
SUID programs and can only be enabled by setting environment variable,
LD_PREFER_MAP_32BIT_EXEC.
On Fedora 23, this patch speeds up GCC 5 testsuite by 3% on Silvermont.
[BZ #19367]
* sysdeps/unix/sysv/linux/wordsize-64/mmap.c: New file.
* sysdeps/unix/sysv/linux/x86_64/64/dl-librecon.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/64/mmap.c: Likewise.
* sysdeps/x86/cpu-features.h (bit_Prefer_MAP_32BIT_EXEC): New.
(index_Prefer_MAP_32BIT_EXEC): Likewise.
Knights Landing processor is based on Silvermont. This patch enables
Silvermont optimizations for Knights Landing.
* sysdeps/x86/cpu-features.c (init_cpu_features): Enable
Silvermont optimizations for Knights Landing.
This patch allows to use x86_64 vector math functions with GCC 6.*
without OpenMP SIMD constructs. For additional details please visit
<https://sourceware.org/glibc/wiki/libmvec#Example_2>.
* sysdeps/x86/fpu/bits/math-vector.h: W/o -fopenmp declare vector math
functions with GCC 6.* __attribute__ ((__simd__)).
AMD CPUs uses the similar encoding scheme for extended family and model
as Intel CPUs as shown in:
http://support.amd.com/TechDocs/25481.pdf
This patch updates get_common_indeces to get family and model for both
Intel and AMD CPUs when family == 0x0f.
[BZ #19214]
* sysdeps/x86/cpu-features.c (get_common_indeces): Add an
argument to return extended model. Update family and model
with extended family and model when family == 0x0f.
(init_cpu_features): Updated.
Old workaround based on assembly aliases can lead to link fail (bug 19058).
This patch makes workaround in another way to avoid it.
[BZ #19058]
* math/Makefile ($(inst_libdir)/libm.so): Added libmvec_nonshared.a
to AS_NEEDED.
* sysdeps/x86/fpu/bits/math-vector.h: Removed code with old workaround.
* sysdeps/x86_64/fpu/Makefile (libmvec-support,
libmvec-static-only-routines): Added new file.
* sysdeps/x86_64/fpu/svml_finite_alias.S: New file.
This test applies to i386 and x86_64 which set R_386_GLOB_DAT and
R_X86_64_GLOB_DAT to ELF_RTYPE_CLASS_EXTERN_PROTECTED_DATA.
[BZ #19178]
* sysdeps/x86/Makefile (tests): Add tst-prelink.
(tst-prelink-ENV): New.
($(objpfx)tst-prelink-conflict.out): Likewise.
($(objpfx)tst-prelink-cmp.out): Likewise.
(tests-special): Add $(objpfx)tst-prelink-cmp.out.
* sysdeps/x86/tst-prelink.c: New file.
* sysdeps/x86/tst-prelink.exp: Likewise.
fenv_t should include architecture-specific floating-point modes and
status flags. i386 and x86_64 fesetenv limit which bits they use from
the x87 status and control words, when using saved state, and limit
which parts of the state they set to fixed values, when using
FE_DFL_ENV / FE_NOMASK_ENV. The following should be included but are
excluded in at least some cases: status and masking for the "denormal
operand" exception (which isn't part of FE_ALL_EXCEPT); precision
control (explicitly mentioned in Annex F as something that counts as
part of the floating-point environment); MXCSR FZ and DAZ bits (for
FE_DFL_ENV and FE_NOMASK_ENV). This patch arranges for this extra
state to be handled by fesetenv (and thereby by feupdateenv, which
calls fesetenv).
(Note that glibc functions using floating point are not generally
expected to work correctly with non-default values of this state,
especially precision control, but it is still logically part of the
floating-point environment and should be handled as such by fesetenv.
Changes to the state relating to subnormals ought generally to work
with libm functions when the arguments aren't subnormal and neither
are the expected results; that's a consequence of functions avoiding
spurious internal underflows.)
A question arising from this is whether FE_NOMASK_ENV should or should
not mask the "denormal operand" exception. I decided it should mask
that exception. This is the status quo - previously that exception
could only be unmasked by direct manipulation of control registers
(possibly via <fpu_control.h>). In addition, it means that use of
FE_NOMASK_ENV leaves a floating-point environment the same as could be
obtained by fesetenv (FE_DFL_ENV); feenableexcept (FE_ALL_EXCEPT);,
rather than an environment in which an exception is unmasked that
could only be masked again by using fesetenv with FE_DFL_ENV (or a
previously saved environment) - this exception not being usable with
other <fenv.h> functions because it's outside FE_ALL_EXCEPT.
Tested for x86_64 and x86.
[BZ #16068]
* sysdeps/i386/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86_64/fpu/fesetenv.c: Include <fpu_control.h>.
(FE_ALL_EXCEPT_X86): New macro.
(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
FE_ALL_EXCEPT. Ensure precision control is included in
floating-point state. Ensure that FE_DFL_ENV and FE_NOMASK_ENV
handle "denormal operand exception" and clear FZ and DAZ bits.
* sysdeps/x86/fpu/test-fenv-sse-2.c: New file.
* sysdeps/x86/fpu/test-fenv-x87.c: Likewise.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-x87 and test-fenv-sse-2.
[$(subdir) = math] (CFLAGS-test-fenv-sse-2.c): New variable.
The i386 and x86_64 versions of fesetenv, when called with FE_DFL_ENV
or FE_NOMASK_ENV as argument, do not clear SSE exceptions raised in
MXCSR. These arguments should, like other fenv_t values, represent
the whole of the floating-point state, so such exceptions should be
cleared; this patch adds the required clearing. (Discovered while
working on bug 16068.)
Tested for x86_64 and x86.
[BZ #19181]
* sysdeps/i386/fpu/fesetenv.c (__fesetenv): Clear already-raised
SSE exceptions when argument is FE_DFL_ENV or FE_NOMASK_ENV.
* sysdeps/x86_64/fpu/fesetenv.c (__fesetenv): Likewise.
* math/test-fenv-clear-main.c: New file.
* math/test-fenv-clear.c: Likewise.
* math/Makefile (tests): Add test-fenv-clear.
* sysdeps/x86/fpu/test-fenv-clear-sse.c: New file.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-clear-sse.
[$(subdir) = math] (CFLAGS-test-fenv-clear-sse.c): New variable.
Similar to various other bugs in this area, pow functions can fail to
raise the underflow exception when the result is tiny and inexact but
one or more low bits of the intermediate result that is scaled down
(or, in the i386 case, converted from a wider evaluation format) are
zero. This patch forces the exception in a similar way to previous
fixes, thereby concluding the fixes for known bugs with missing
underflow exceptions currently filed in Bugzilla.
Tested for x86_64, x86, mips64 and powerpc.
[BZ #18825]
* sysdeps/i386/fpu/i386-math-asm.h (FLT_NARROW_EVAL_UFLOW_NONNAN):
New macro.
(DBL_NARROW_EVAL_UFLOW_NONNAN): Likewise.
(LDBL_CHECK_FORCE_UFLOW_NONNAN): Likewise.
* sysdeps/i386/fpu/e_pow.S: Use DEFINE_DBL_MIN.
(__ieee754_pow): Use DBL_NARROW_EVAL_UFLOW_NONNAN instead of
DBL_NARROW_EVAL, reloading the PIC register as needed.
* sysdeps/i386/fpu/e_powf.S: Use DEFINE_FLT_MIN.
(__ieee754_powf): Use FLT_NARROW_EVAL_UFLOW_NONNAN instead of
FLT_NARROW_EVAL. Use separate return path for case when first
argument is NaN.
* sysdeps/i386/fpu/e_powl.S: Include <i386-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN, reloading the
PIC register.
* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Use
math_check_force_underflow_nonneg.
* sysdeps/ieee754/flt-32/e_powf.c (__ieee754_powf): Force
underflow for subnormal result.
* sysdeps/ieee754/ldbl-128/e_powl.c (__ieee754_powl): Likewise.
* sysdeps/ieee754/ldbl-128ibm/e_powl.c (__ieee754_powl): Use
math_check_force_underflow_nonneg.
* sysdeps/x86/fpu/powl_helper.c (__powl_helper): Use
math_check_force_underflow.
* sysdeps/x86_64/fpu/x86_64-math-asm.h
(LDBL_CHECK_FORCE_UFLOW_NONNAN): New macro.
* sysdeps/x86_64/fpu/e_powl.S: Include <x86_64-math-asm.h>. Use
DEFINE_LDBL_MIN.
(__ieee754_powl): Use LDBL_CHECK_FORCE_UFLOW_NONNAN.
* math/auto-libm-test-in: Add more tests of pow.
* math/auto-libm-test-out: Regenerated.
It was noted in
<https://sourceware.org/ml/libc-alpha/2012-09/msg00305.html> that the
bits/*.h naming scheme should only be used for installed headers.
This patch renames bits/linkmap.h to plain linkmap.h to follow that
convention.
Tested for x86_64 (testsuite, and that installed stripped shared
libraries are unchanged by the patch).
[BZ #14912]
* bits/linkmap.h: Move to ...
* sysdeps/generic/linkmap.h: ...here.
* sysdeps/aarch64/bits/linkmap.h: Move to ...
* sysdeps/aarch64/linkmap.h: ...here.
* sysdeps/arm/bits/linkmap.h: Move to ...
* sysdeps/arm/linkmap.h: ...here.
* sysdeps/hppa/bits/linkmap.h: Move to ...
* sysdeps/hppa/linkmap.h: ...here.
* sysdeps/ia64/bits/linkmap.h: Move to ...
* sysdeps/ia64/linkmap.h: ...here.
* sysdeps/mips/bits/linkmap.h: Move to ...
* sysdeps/mips/linkmap.h: ...here.
* sysdeps/s390/bits/linkmap.h: Move to ...
* sysdeps/s390/linkmap.h: ...here.
* sysdeps/sh/bits/linkmap.h: Move to ...
* sysdeps/sh/linkmap.h: ...here.
* sysdeps/x86/bits/linkmap.h: Move to ...
* sysdeps/x86/linkmap.h: ...here.
* include/link.h: Include <linkmap.h> instead of <bits/linkmap.h>.
We detect i586 and i686 features at run-time by checking CX8 and CMOV
CPUID features bits. We can use these information to select the best
implementation in ix86 multiarch. HAS_I586/HAS_I686 is true if i586/i686
instructions are available on the processor.
Due to the reordering and the other nifty extensions in i686, it is not
really good to use heavily i586 optimized code on an i686. It's better
to use i486 code if it isn't an i586. USE_I586/USE_I686 is true if
i586/i686 implementation should be used for the processor. USE_I586
is true only if i686 instructions aren't available. If i686 instructions
are available, we always choose i686 or i486 implementation, in that order,
and we never choose i586 implementation for i686-class processors.
* sysdeps/i386/init-arch.h: New file.
* sysdeps/i386/i586/init-arch.h: Likewise.
* sysdeps/i386/i686/init-arch.h: Likewise.
* sysdeps/x86/cpu-features.c (init_cpu_features): Set bit_I586
bit if CX8 is available. Set bit_I686 bit if CMOV is available.
* sysdeps/x86/cpu-features.h (bit_I586): New.
(bit_I686): Likewise.
(bit_CX8): Likewise.
(bit_CMOV): Likewise.
(index_CX8): Likewise.
(index_CMOV): Likewise.
(index_I586): Likewise.
(index_I686): Likewise.
(reg_CX8): Likewise.
(reg_CMOV): Likewise.
(HAS_I586): Defined as HAS_ARCH_FEATURE (I586) if i586 isn't
available at compile-time.
(HAS_I686): Defined as HAS_ARCH_FEATURE (I686) if i686 isn't
available at compile-time.
* sysdeps/x86/init-arch.h (USE_I586): New macro.
(USE_I686): Likewise.
Since x86-64 ld.so preserves vector registers now, we can use SSE in
x86-64 ld.so. We should run tst-ld-sse-use.sh only on i386.
* sysdeps/x86/Makefile [$(subdir) == elf] (CFLAGS-.os,
tests-special, $(objpfx)tst-ld-sse-use.out): Moved to ...
* sysdeps/i386/Makefile [$(subdir) == elf] (CFLAGS-.os,
tests-special, $(objpfx)tst-ld-sse-use.out): Here. Update
comments.
* sysdeps/x86_64/Makefile [$(subdir) == elf] (CFLAGS-.os): Add
-mno-mmx for $(all-rtld-routines).
* sysdeps/x86/tst-ld-sse-use.sh: Moved to ...
* sysdeps/i386/tst-ld-sse-use.sh: Here. Replace x86-64 with
i386.
Move sysdeps/x86_64/multiarch/init-arch.h to sysdeps/x86/init-arch.h
which can be used for both i386 and x86_64.
* sysdeps/i386/i686/multiarch/init-arch.h: Removed.
* sysdeps/unix/sysv/linux/x86/init-arch.h: Likewise.
* sysdeps/x86_64/cacheinfo.c: Include <init-arch.h> instead
of "multiarch/init-arch.h".
* sysdeps/x86_64/multiarch/init-arch.h: Renamed to ...
* sysdeps/x86/init-arch.h: This.
cpuid, i586 and i686 instructions are available if the processor
specified by -march= supports them. We can use this information
to determine whether those instructions can be used safely.
* sysdeps/x86/cpu-features.c (init_cpu_features): Check
whether cpuid is available only if HAS_CPUID is 0.
* sysdeps/x86/cpu-features.h (HAS_CPUID): New.
(HAS_I586): Likewise.
(HAS_I686): Likewise.
Since not all i486 processors support cpuid, we call __get_cpuid_max to
check if cpuid is available before using it if not compiling for i586,
i686 nor x86-64.
* sysdeps/x86/cpu-features.c (init_cpu_features): Call
__get_cpuid_max if not compiling for i586, i686 nor x86-64.
We need to save/restore bound registers and add a BND prefix before
branches in _dl_runtime_profile so that bound registers for pointer
pass and return are preserved when LD_AUDIT is used.
[BZ #18134]
* sysdeps/i386/configure.ac: Set HAVE_MPX_SUPPORT.
* sysdeps/i386/configure: Regenerated.
* sysdeps/i386/dl-trampoline.S (PRESERVE_BND_REGS_PREFIX): New.
(_dl_runtime_profile): Save and restore Intel MPX return bound
registers when calling _dl_call_pltexit. Add
PRESERVE_BND_REGS_PREFIX before return.
* sysdeps/i386/link-defines.sym (LRV_BND0_OFFSET): New.
(LRV_BND1_OFFSET): Likewise.
* sysdeps/x86/bits/link.h (La_i86_retval): Add lrv_bnd0 and
lrv_bnd1.
* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Fix
typo in bndmov encoding.
* sysdeps/x86_64/dl-trampoline.h: Properly save and restore
Intel MPX bound registers. Add PRESERVE_BND_REGS_PREFIX before
branch instructions to preserve bounds.
Some of the x86 string functions create pointers based on input strings
that may be outside of the input strings. When this happens in C code,
the compiler can potentially detect this, leading to warnings in
application code when those string functions are inlined. Perform those
operations in the assembly code instead of the C code to fix this.
Here is implementation of vectorized sin containing SSE, AVX,
AVX2 and AVX512 versions according to Vector ABI
<https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
* bits/libm-simd-decl-stubs.h: Added stubs for sin.
* math/bits/mathcalls.h: Added sin declaration with __MATHCALL_VEC.
* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added.
* sysdeps/x86/fpu/bits/math-vector.h: SIMD declaration for sin.
* sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
* sysdeps/x86_64/fpu/Versions: New versions added.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
build of SSE, AVX2 and AVX512 IFUNC versions.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core_sse4.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core_avx2.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin2_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin4_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin4_core_avx.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin8_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin_data.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin_data.h: New file.
* sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Added vector sin test.
* sysdeps/x86_64/fpu/test-double-vlen2.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen4-avx2.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen8.c: Likewise.
* NEWS: Mention addition of x86_64 vector sin.
Here is implementation of vectorized cosf containing SSE, AVX,
AVX2 and AVX512 versions according to Vector ABI
<https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
* sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
* sysdeps/x86_64/fpu/Versions: New versions added.
* sysdeps/x86_64/fpu/svml_s_cosf4_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core_sse4.S: New file.
* sysdeps/x86_64/fpu/svml_s_cosf8_core_avx.S: New file.
* sysdeps/x86_64/fpu/svml_s_cosf8_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core_avx2.S: New file.
* sysdeps/x86_64/fpu/svml_s_cosf16_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S: New file.
* sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: New file.
* sysdeps/x86_64/fpu/svml_s_cosf_data.S: New file.
* sysdeps/x86_64/fpu/svml_s_cosf_data.h: New file.
* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
build of SSE, AVX2 and AVX512 IFUNC versions.
* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added.
* sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration for cosf.
* NEWS: Mention addition of x86_64 vector cosf.
Here is implementation of cos containing SSE, AVX, AVX2 and AVX512
versions according to Vector ABI which had been discussed in
<https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
Vector math library build and ABI testing enabled by default for x86_64.
* sysdeps/x86_64/fpu/Makefile: New file.
* sysdeps/x86_64/fpu/Versions: New file.
* sysdeps/x86_64/fpu/svml_d_cos_data.S: New file.
* sysdeps/x86_64/fpu/svml_d_cos_data.h: New file.
* sysdeps/x86_64/fpu/svml_d_cos2_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_cos4_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_cos4_core_avx.S: New file.
* sysdeps/x86_64/fpu/svml_d_cos8_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core_sse4.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core_avx2.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: New file.
* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
build of SSE, AVX2 and AVX512 IFUNC versions.
* sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration for cos.
* math/bits/mathcalls.h: Added cos declaration with __MATHCALL_VEC.
* sysdeps/x86_64/configure.ac: Options for libmvec build.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/sysdep.h (cfi_offset_rel_rsp): New macro.
* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New file.
* manual/install.texi (Configuring and compiling): Document
--disable-mathvec.
* INSTALL: Regenerated.
* NEWS: Mention addition of libmvec and x86_64 vector cos.
This patch fixes bug 15319, missing underflows from atan / atan2 when
the result of atan is very close to its small argument (or that of
atan2 is very close to the ratio of its arguments, which may be an
exact division).
The usual approach of doing an underflowing computation if the
computed result is subnormal is followed. For 32-bit x86, there are
extra complications: the inline __ieee754_atan2 in bits/mathinline.h
needs to be disabled for float and double because other libm functions
using it generally rely on getting proper underflow exceptions from
it, while the out-of-line functions have to remove excess range and
precision from the underflowing result so as to return an exact 0 in
the case where errno should be set for underflow to 0. (The failures
I saw without that are similar to those Carlos reported for other
functions, where I haven't seen a response to
<https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>
confirming if my diagnosis is correct. Arguably all libm functions
with float and double returns should remove excess range and
precision, but that's a separate matter.)
The x86_64 long double case reported in a comment in bug 15319 is not
a bug (it's an argument of LDBL_MIN, and x86_64 is an after-rounding
architecture so the correct IEEE result is not to raise underflow in
the given rounding mode, in addition to treating the result as an
exact LDBL_MIN being within the newly clarified documentation of
accuracy goals). I'm presuming that the fpatan instruction can be
trusted to raise appropriate exceptions when the (long double) result
underflows (after rounding) and so no changes are needed for x86 /
x86_64 long double functions here; empirically this is the case for
the cases covered in the testsuite, on my system.
Tested for x86_64, x86, powerpc and mips64. Only 32-bit x86 needs
ulps updates (for the changes to inlines meaning some functions no
longer get excess precision from their __ieee754_atan2* calls).
[BZ #15319]
* sysdeps/i386/fpu/e_atan2.S (dbl_min): New object.
(MO): New macro.
(__ieee754_atan2): For results with small absolute value, force
underflow exception and remove excess range and precision from
return value.
* sysdeps/i386/fpu/e_atan2f.S (flt_min): New object.
(MO): New macro.
(__ieee754_atan2f): For results with small absolute value, force
underflow exception and remove excess range and precision from
return value.
* sysdeps/i386/fpu/s_atan.S (dbl_min): New object.
(MO): New macro.
(__atan): For results with small absolute value, force underflow
exception and remove excess range and precision from return value.
* sysdeps/i386/fpu/s_atanf.S (flt_min): New object.
(MO): New macro.
(__atanf): For results with small absolute value, force underflow
exception and remove excess range and precision from return value.
* sysdeps/ieee754/dbl-64/e_atan2.c: Include <float.h> and
<math.h>.
(__ieee754_atan2): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/dbl-64/s_atan.c: Include <float.h> and
<math_private.h>.
(atan): Force underflow exception for results with small absolute
value.
* sysdeps/ieee754/flt-32/s_atanf.c: Include <float.h>.
(__atanf): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_atanl.c: Include <float.h> and
<math.h>.
(__atanl): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128ibm/s_atanl.c: Include <float.h>.
(__atanl): Force underflow exception for results with small
absolute value.
* sysdeps/x86/fpu/bits/mathinline.h
[!__SSE2_MATH__ && !__x86_64__ && __LIBC_INTERNAL_MATH_INLINES]
(__ieee754_atan2): Only define inline for long double.
* sysdeps/x86_64/fpu/multiarch/e_atan2.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Include <math.h>.
* math/auto-libm-test-in: Do not mark underflow exceptions as
possibly missing for bug 15319. Add more tests of atan2.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (casin_test_data): Do not mark underflow
exceptions as possibly missing for bug 15319.
(casinh_test_data): Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Update.
Various C90 and UNIX98 libm functions call feraiseexcept, which is not
in those standards. This causes linknamespace test failures - except
on x86 / x86_64, where feraiseexcept is inline (for the relevant
constant arguments) in bits/fenv.h.
This patch fixes this by making those functions call __feraiseexcept
instead. All changes are applied to all architectures rather than
considering the possibility that some might not be needed in some
cases (e.g. x86) as it seems most maintainable to keep architectures
consistent.
Where __feraiseexcept does not exist, it is added, with feraiseexcept
made a weak alias; where it is a strong alias, it is made weak.
libm_hidden_def / libm_hidden_proto are used with __feraiseexcept
(this might in some cases improve code generation for existing calls
to __feraiseexcept in some code on some architectures). Where there
are dummy feraiseexcept macros (on architectures without
floating-point exceptions support, to avoid compile errors from
references to undefined FE_* macros), corresponding dummy
__feraiseexcept macros are added. And on x86, to ensure
__feraiseexcept calls still get inlined, the inline function in
bits/fenv.h is refactored so that most of it can be reused in an
inline __feraiseexcept in a separate include/bits/fenv.h.
Calls are changed in C90/UNIX98 functions, but generally not in
functions missing from those standards. They are also changed in
libc_fe* functions (on the basis that those might be used in any libm
function), and in feupdateenv (on the same basis - may be used, via
default libc_*, in any libm function - of course feupdateenv will need
changing to __feupdateenv in a subsequent patch to make that fully
namespace-clean).
No __feraiseexcept is added corresponding to the feraiseexcept in
powerpc bits/fenvinline.h, because that macro definition is
conditional on !defined __NO_MATH_INLINES, and glibc libm is built
with -D__NO_MATH_INLINES, so changing internal calls to use
__feraiseexcept should make no difference.
Tested for x86_64 (testsuite; the only change in disassembly of
installed shared libraries is a slight code reordering in clog10, of
no apparent significance). Also tested for MIPS, where (in the
configuration tested) it eliminates math.h linknamespace failures for
n32 and n64 (some for o32 remain because of other issues).
[BZ #17723]
* include/fenv.h (__feraiseexcept): Use libm_hidden_proto.
* math/fraiseexcpt.c (__feraiseexcept): Use libm_hidden_def.
* sysdeps/aarch64/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/arm/fraiseexcpt.c (feraiseexcept): Likewise.
* sysdeps/hppa/fpu/fraiseexcpt.c (feraiseexcept): Likewise.
* sysdeps/i386/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
* sysdeps/ia64/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/m68k/coldfire/fpu/fraiseexcpt.c (feraiseexcept):
Likewise.
* sysdeps/microblaze/math_private.h (__feraiseexcept): New macro.
* sysdeps/mips/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/powerpc/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
* sysdeps/powerpc/nofpu/fraiseexcpt.c (__feraiseexcept): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/fraiseexcpt.c
(__feraiseexcept): Likewise.
* sysdeps/s390/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/sh/sh4/fpu/fraiseexcpt.c (feraiseexcept): Likewise.
* sysdeps/sparc/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
* sysdeps/tile/math_private.h (__feraiseexcept): New macro.
* sysdeps/unix/sysv/linux/alpha/fraiseexcpt.S (__feraiseexcept):
Use libm_hidden_def.
* sysdeps/x86_64/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
(feraiseexcept): Define as weak not strong alias. Use
libm_hidden_weak.
* sysdeps/x86/fpu/bits/fenv.h (__feraiseexcept_invalid_divbyzero):
New inline function. Factored out of ...
(feraiseexcept): ... here. Use __feraiseexcept_invalid_divbyzero.
* sysdeps/x86/fpu/include/bits/fenv.h: New file.
* math/e_scalb.c (invalid_fn): Call __feraiseexcept instead of
feraiseexcept.
* math/w_acos.c (__acos): Likewise.
* math/w_asin.c (__asin): Likewise.
* math/w_ilogb.c (__ilogb): Likewise.
* math/w_j0.c (y0): Likewise.
* math/w_j1.c (y1): Likewise.
* math/w_jn.c (yn): Likewise.
* math/w_log.c (__log): Likewise.
* math/w_log10.c (__log10): Likewise.
* sysdeps/aarch64/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/aarch64/fpu/math_private.h
(libc_feupdateenv_test_aarch64): Likewise.
* sysdeps/alpha/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/arm/fenv_private.h (libc_feupdateenv_test_vfp): Likewise.
* sysdeps/arm/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/ia64/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/m68k/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/mips/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrt): Likewise.
* sysdeps/s390/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sh/sh4/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sparc/fpu/feupdateenv.c (__feupdateenv): Likewise.
tst-ld-sse-use.sh is a bash script, not a POSIX shell script, and so
needs to be run with $(BASH) not $(SHELL) to avoid errors of the form:
../sysdeps/x86/tst-ld-sse-use.sh: 41: ../sysdeps/x86/tst-ld-sse-use.sh: declare: not found
(when /bin/sh is dash). This patch makes that change.
Tested for x86_64.
* sysdeps/x86/Makefile ($(objpfx)tst-ld-sse-use.out): Run script
with $(BASH) not $(SHELL).
2d63a517e4 added support to save and
restore zmm register in the dynamic linker, but did not enhance
test-xmmymm.sh to detect accidental usage of these registers. The
patch below adds that check.
The script has also been renamed to tst-ld-sse-use.sh. To see the
minimal changes, run `git show -M`.
[BZ #16194]
* sysdeps/x86/tst-xmmymm.sh: Rename file to...
* sysdeps/x86/tst-ld-sse-use.sh: ... this. Check for zmm
register usage.
* sysdeps/x86/Makefile: Adjust.
Since:
commit 409e00bd69
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Jan 29 07:51:41 2014 -0800
Disable x87 inline functions for SSE2 math
When i386 and x86-64 mathinline.h was merged into a single mathinline.h,
"gcc -m32" enables x87 inline functions on x86-64 even when -mfpmath=sse
and SSE2 is enabled. It is a regression on x86-64. We should check
__SSE2_MATH__ instead of __x86_64__ when disabling x87 inline functions.
gcc-3.2 is unable to correctly compile x86_64 routines for llrint
since it gets redefined. This is because gcc 3.2 does not set
__SSE2_MATH__ for x86_64, thus exposing the duplicate definition.
The correct fix ought to be to check for both __SSE2_MATH__ and
__x86_64__ and enable those bits only when neither are defined.
Tested fix with the reproducer for
409e00bd69 as well as with gcc-3.2.
The first argument of elision_adapt and that of ELISION_*LOCK have
different signs since __elision_rwcount is signed char * and the
argument of elision_adapt is uint8_t *. Modified elision_adapt to
accept signed char * instead of uint8_t *.