Here is implementation of vectorized sin containing SSE, AVX,
AVX2 and AVX512 versions according to Vector ABI
<https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
* bits/libm-simd-decl-stubs.h: Added stubs for sin.
* math/bits/mathcalls.h: Added sin declaration with __MATHCALL_VEC.
* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added.
* sysdeps/x86/fpu/bits/math-vector.h: SIMD declaration for sin.
* sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
* sysdeps/x86_64/fpu/Versions: New versions added.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
build of SSE, AVX2 and AVX512 IFUNC versions.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core_sse4.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core_avx2.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin2_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin4_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin4_core_avx.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin8_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin_data.S: New file.
* sysdeps/x86_64/fpu/svml_d_sin_data.h: New file.
* sysdeps/x86_64/fpu/test-double-vlen2-wrappers.c: Added vector sin test.
* sysdeps/x86_64/fpu/test-double-vlen2.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen4-avx2-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen4-avx2.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen4-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen4.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen8-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-double-vlen8.c: Likewise.
* NEWS: Mention addition of x86_64 vector sin.
Here is implementation of vectorized cosf containing SSE, AVX,
AVX2 and AVX512 versions according to Vector ABI
<https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
* sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
* sysdeps/x86_64/fpu/Versions: New versions added.
* sysdeps/x86_64/fpu/svml_s_cosf4_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core_sse4.S: New file.
* sysdeps/x86_64/fpu/svml_s_cosf8_core_avx.S: New file.
* sysdeps/x86_64/fpu/svml_s_cosf8_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core_avx2.S: New file.
* sysdeps/x86_64/fpu/svml_s_cosf16_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S: New file.
* sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: New file.
* sysdeps/x86_64/fpu/svml_s_cosf_data.S: New file.
* sysdeps/x86_64/fpu/svml_s_cosf_data.h: New file.
* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
build of SSE, AVX2 and AVX512 IFUNC versions.
* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New versions added.
* sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration for cosf.
* NEWS: Mention addition of x86_64 vector cosf.
Here is implementation of cos containing SSE, AVX, AVX2 and AVX512
versions according to Vector ABI which had been discussed in
<https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
Vector math library build and ABI testing enabled by default for x86_64.
* sysdeps/x86_64/fpu/Makefile: New file.
* sysdeps/x86_64/fpu/Versions: New file.
* sysdeps/x86_64/fpu/svml_d_cos_data.S: New file.
* sysdeps/x86_64/fpu/svml_d_cos_data.h: New file.
* sysdeps/x86_64/fpu/svml_d_cos2_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_cos4_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_cos4_core_avx.S: New file.
* sysdeps/x86_64/fpu/svml_d_cos8_core.S: New file.
* sysdeps/x86_64/fpu/svml_d_wrapper_impl.h: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core_sse4.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core_avx2.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core.S: New file.
* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S: New file.
* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines): Added
build of SSE, AVX2 and AVX512 IFUNC versions.
* sysdeps/x86/fpu/bits/math-vector.h: Added SIMD declaration for cos.
* math/bits/mathcalls.h: Added cos declaration with __MATHCALL_VEC.
* sysdeps/x86_64/configure.ac: Options for libmvec build.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/sysdep.h (cfi_offset_rel_rsp): New macro.
* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New file.
* manual/install.texi (Configuring and compiling): Document
--disable-mathvec.
* INSTALL: Regenerated.
* NEWS: Mention addition of libmvec and x86_64 vector cos.
This patch fixes bug 15319, missing underflows from atan / atan2 when
the result of atan is very close to its small argument (or that of
atan2 is very close to the ratio of its arguments, which may be an
exact division).
The usual approach of doing an underflowing computation if the
computed result is subnormal is followed. For 32-bit x86, there are
extra complications: the inline __ieee754_atan2 in bits/mathinline.h
needs to be disabled for float and double because other libm functions
using it generally rely on getting proper underflow exceptions from
it, while the out-of-line functions have to remove excess range and
precision from the underflowing result so as to return an exact 0 in
the case where errno should be set for underflow to 0. (The failures
I saw without that are similar to those Carlos reported for other
functions, where I haven't seen a response to
<https://sourceware.org/ml/libc-alpha/2015-01/msg00485.html>
confirming if my diagnosis is correct. Arguably all libm functions
with float and double returns should remove excess range and
precision, but that's a separate matter.)
The x86_64 long double case reported in a comment in bug 15319 is not
a bug (it's an argument of LDBL_MIN, and x86_64 is an after-rounding
architecture so the correct IEEE result is not to raise underflow in
the given rounding mode, in addition to treating the result as an
exact LDBL_MIN being within the newly clarified documentation of
accuracy goals). I'm presuming that the fpatan instruction can be
trusted to raise appropriate exceptions when the (long double) result
underflows (after rounding) and so no changes are needed for x86 /
x86_64 long double functions here; empirically this is the case for
the cases covered in the testsuite, on my system.
Tested for x86_64, x86, powerpc and mips64. Only 32-bit x86 needs
ulps updates (for the changes to inlines meaning some functions no
longer get excess precision from their __ieee754_atan2* calls).
[BZ #15319]
* sysdeps/i386/fpu/e_atan2.S (dbl_min): New object.
(MO): New macro.
(__ieee754_atan2): For results with small absolute value, force
underflow exception and remove excess range and precision from
return value.
* sysdeps/i386/fpu/e_atan2f.S (flt_min): New object.
(MO): New macro.
(__ieee754_atan2f): For results with small absolute value, force
underflow exception and remove excess range and precision from
return value.
* sysdeps/i386/fpu/s_atan.S (dbl_min): New object.
(MO): New macro.
(__atan): For results with small absolute value, force underflow
exception and remove excess range and precision from return value.
* sysdeps/i386/fpu/s_atanf.S (flt_min): New object.
(MO): New macro.
(__atanf): For results with small absolute value, force underflow
exception and remove excess range and precision from return value.
* sysdeps/ieee754/dbl-64/e_atan2.c: Include <float.h> and
<math.h>.
(__ieee754_atan2): Force underflow exception for results with
small absolute value.
* sysdeps/ieee754/dbl-64/s_atan.c: Include <float.h> and
<math_private.h>.
(atan): Force underflow exception for results with small absolute
value.
* sysdeps/ieee754/flt-32/s_atanf.c: Include <float.h>.
(__atanf): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128/s_atanl.c: Include <float.h> and
<math.h>.
(__atanl): Force underflow exception for results with small
absolute value.
* sysdeps/ieee754/ldbl-128ibm/s_atanl.c: Include <float.h>.
(__atanl): Force underflow exception for results with small
absolute value.
* sysdeps/x86/fpu/bits/mathinline.h
[!__SSE2_MATH__ && !__x86_64__ && __LIBC_INTERNAL_MATH_INLINES]
(__ieee754_atan2): Only define inline for long double.
* sysdeps/x86_64/fpu/multiarch/e_atan2.c
[HAVE_FMA4_SUPPORT || HAVE_AVX_SUPPORT]: Include <math.h>.
* math/auto-libm-test-in: Do not mark underflow exceptions as
possibly missing for bug 15319. Add more tests of atan2.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (casin_test_data): Do not mark underflow
exceptions as possibly missing for bug 15319.
(casinh_test_data): Likewise.
* sysdeps/i386/fpu/libm-test-ulps: Update.
Various C90 and UNIX98 libm functions call feraiseexcept, which is not
in those standards. This causes linknamespace test failures - except
on x86 / x86_64, where feraiseexcept is inline (for the relevant
constant arguments) in bits/fenv.h.
This patch fixes this by making those functions call __feraiseexcept
instead. All changes are applied to all architectures rather than
considering the possibility that some might not be needed in some
cases (e.g. x86) as it seems most maintainable to keep architectures
consistent.
Where __feraiseexcept does not exist, it is added, with feraiseexcept
made a weak alias; where it is a strong alias, it is made weak.
libm_hidden_def / libm_hidden_proto are used with __feraiseexcept
(this might in some cases improve code generation for existing calls
to __feraiseexcept in some code on some architectures). Where there
are dummy feraiseexcept macros (on architectures without
floating-point exceptions support, to avoid compile errors from
references to undefined FE_* macros), corresponding dummy
__feraiseexcept macros are added. And on x86, to ensure
__feraiseexcept calls still get inlined, the inline function in
bits/fenv.h is refactored so that most of it can be reused in an
inline __feraiseexcept in a separate include/bits/fenv.h.
Calls are changed in C90/UNIX98 functions, but generally not in
functions missing from those standards. They are also changed in
libc_fe* functions (on the basis that those might be used in any libm
function), and in feupdateenv (on the same basis - may be used, via
default libc_*, in any libm function - of course feupdateenv will need
changing to __feupdateenv in a subsequent patch to make that fully
namespace-clean).
No __feraiseexcept is added corresponding to the feraiseexcept in
powerpc bits/fenvinline.h, because that macro definition is
conditional on !defined __NO_MATH_INLINES, and glibc libm is built
with -D__NO_MATH_INLINES, so changing internal calls to use
__feraiseexcept should make no difference.
Tested for x86_64 (testsuite; the only change in disassembly of
installed shared libraries is a slight code reordering in clog10, of
no apparent significance). Also tested for MIPS, where (in the
configuration tested) it eliminates math.h linknamespace failures for
n32 and n64 (some for o32 remain because of other issues).
[BZ #17723]
* include/fenv.h (__feraiseexcept): Use libm_hidden_proto.
* math/fraiseexcpt.c (__feraiseexcept): Use libm_hidden_def.
* sysdeps/aarch64/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/arm/fraiseexcpt.c (feraiseexcept): Likewise.
* sysdeps/hppa/fpu/fraiseexcpt.c (feraiseexcept): Likewise.
* sysdeps/i386/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
* sysdeps/ia64/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/m68k/coldfire/fpu/fraiseexcpt.c (feraiseexcept):
Likewise.
* sysdeps/microblaze/math_private.h (__feraiseexcept): New macro.
* sysdeps/mips/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/powerpc/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
* sysdeps/powerpc/nofpu/fraiseexcpt.c (__feraiseexcept): Likewise.
* sysdeps/powerpc/powerpc32/e500/nofpu/fraiseexcpt.c
(__feraiseexcept): Likewise.
* sysdeps/s390/fpu/fraiseexcpt.c (feraiseexcept): Rename to
__feraiseexcept and define as weak alias of __feraiseexcept. Use
libm_hidden_weak.
* sysdeps/sh/sh4/fpu/fraiseexcpt.c (feraiseexcept): Likewise.
* sysdeps/sparc/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
* sysdeps/tile/math_private.h (__feraiseexcept): New macro.
* sysdeps/unix/sysv/linux/alpha/fraiseexcpt.S (__feraiseexcept):
Use libm_hidden_def.
* sysdeps/x86_64/fpu/fraiseexcpt.c (__feraiseexcept): Use
libm_hidden_def.
(feraiseexcept): Define as weak not strong alias. Use
libm_hidden_weak.
* sysdeps/x86/fpu/bits/fenv.h (__feraiseexcept_invalid_divbyzero):
New inline function. Factored out of ...
(feraiseexcept): ... here. Use __feraiseexcept_invalid_divbyzero.
* sysdeps/x86/fpu/include/bits/fenv.h: New file.
* math/e_scalb.c (invalid_fn): Call __feraiseexcept instead of
feraiseexcept.
* math/w_acos.c (__acos): Likewise.
* math/w_asin.c (__asin): Likewise.
* math/w_ilogb.c (__ilogb): Likewise.
* math/w_j0.c (y0): Likewise.
* math/w_j1.c (y1): Likewise.
* math/w_jn.c (yn): Likewise.
* math/w_log.c (__log): Likewise.
* math/w_log10.c (__log10): Likewise.
* sysdeps/aarch64/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/aarch64/fpu/math_private.h
(libc_feupdateenv_test_aarch64): Likewise.
* sysdeps/alpha/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/arm/fenv_private.h (libc_feupdateenv_test_vfp): Likewise.
* sysdeps/arm/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/ia64/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/m68k/fpu/feupdateenv.c (__feupdateenv): Likewise.
* sysdeps/mips/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/powerpc/fpu/e_sqrt.c (__slow_ieee754_sqrt): Likewise.
* sysdeps/s390/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sh/sh4/fpu/feupdateenv.c (feupdateenv): Likewise.
* sysdeps/sparc/fpu/feupdateenv.c (__feupdateenv): Likewise.
tst-ld-sse-use.sh is a bash script, not a POSIX shell script, and so
needs to be run with $(BASH) not $(SHELL) to avoid errors of the form:
../sysdeps/x86/tst-ld-sse-use.sh: 41: ../sysdeps/x86/tst-ld-sse-use.sh: declare: not found
(when /bin/sh is dash). This patch makes that change.
Tested for x86_64.
* sysdeps/x86/Makefile ($(objpfx)tst-ld-sse-use.out): Run script
with $(BASH) not $(SHELL).
2d63a517e4 added support to save and
restore zmm register in the dynamic linker, but did not enhance
test-xmmymm.sh to detect accidental usage of these registers. The
patch below adds that check.
The script has also been renamed to tst-ld-sse-use.sh. To see the
minimal changes, run `git show -M`.
[BZ #16194]
* sysdeps/x86/tst-xmmymm.sh: Rename file to...
* sysdeps/x86/tst-ld-sse-use.sh: ... this. Check for zmm
register usage.
* sysdeps/x86/Makefile: Adjust.
Since:
commit 409e00bd69
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Wed Jan 29 07:51:41 2014 -0800
Disable x87 inline functions for SSE2 math
When i386 and x86-64 mathinline.h was merged into a single mathinline.h,
"gcc -m32" enables x87 inline functions on x86-64 even when -mfpmath=sse
and SSE2 is enabled. It is a regression on x86-64. We should check
__SSE2_MATH__ instead of __x86_64__ when disabling x87 inline functions.
gcc-3.2 is unable to correctly compile x86_64 routines for llrint
since it gets redefined. This is because gcc 3.2 does not set
__SSE2_MATH__ for x86_64, thus exposing the duplicate definition.
The correct fix ought to be to check for both __SSE2_MATH__ and
__x86_64__ and enable those bits only when neither are defined.
Tested fix with the reproducer for
409e00bd69 as well as with gcc-3.2.
The first argument of elision_adapt and that of ELISION_*LOCK have
different signs since __elision_rwcount is signed char * and the
argument of elision_adapt is uint8_t *. Modified elision_adapt to
accept signed char * instead of uint8_t *.
This patch fixes bug 16315, bad pow handling of overflow/underflow in
non-default rounding modes. Tests of pow are duly converted to
ALL_RM_TEST to run all tests in all rounding modes.
There are two main issues here. First, various implementations
compute a negative result by negating a positive result, but this
yields inappropriate overflow / underflow values for directed
rounding, so either overflow / underflow results need recomputing in
the correct sign, or the relevant overflowing / underflowing operation
needs to be made to have a result of the correct sign. Second, the
dbl-64 implementation sets FE_TONEAREST internally; in the overflow /
underflow case, the result needs recomputing in the original rounding
mode.
Tested x86_64 and x86 and ulps updated accordingly.
[BZ #16315]
* sysdeps/i386/fpu/e_pow.S (__ieee754_pow): Ensure possibly
overflowing or underflowing operations take place with sign of
result.
* sysdeps/i386/fpu/e_powf.S (__ieee754_powf): Likewise.
* sysdeps/i386/fpu/e_powl.S (__ieee754_powl): Likewise.
* sysdeps/ieee754/dbl-64/e_pow.c: Include <math.h>.
(__ieee754_pow): Recompute overflowing and underflowing results in
original rounding mode.
* sysdeps/x86/fpu/powl_helper.c: Include <stdbool.h>.
(__powl_helper): Allow negative argument X and scale negated value
as needed. Avoid passing value outside [-1, 1] to f2xm1.
* sysdeps/x86_64/fpu/e_powl.S (__ieee754_powl): Ensure possibly
overflowing or underflowing operations take place with sign of
result.
* sysdeps/x86_64/fpu/multiarch/e_pow.c [HAVE_FMA4_SUPPORT]:
Include <math.h>.
* math/auto-libm-test-in: Add more tests of pow.
* math/auto-libm-test-out: Regenerated.
* math/libm-test.inc (pow_test): Use ALL_RM_TEST.
(pow_tonearest_test_data): Remove.
(pow_test_tonearest): Likewise.
(pow_towardzero_test_data): Likewise.
(pow_test_towardzero): Likewise.
(pow_downward_test_data): Likewise.
(pow_test_downward): Likewise.
(pow_upward_test_data): Likewise.
(pow_test_upward): Likewise.
(main): Don't call removed functions.
* sysdeps/i386/fpu/libm-test-ulps: Update.
* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
Since long is 4 bytes for x32, we should use 3 bytes for __pad1 when
a long __pad1 is replaced by a byte __rwelision and __pad1.
* sysdeps/x86/nptl/bits/pthreadtypes.h (pthread_rwlock_t): Use
3 bytes for __pad1 for x32.
(__PTHREAD_RWLOCK_ELISION_EXTRA): Likewise.
This patch relies on the C version of the rwlocks posted earlier.
With C rwlocks it is very straight forward to do adaptive elision
using TSX. It is based on the infrastructure added earlier
for mutexes, but uses its own elision macros. The macros
are fairly general purpose and could be used for other
elision purposes too.
This version is much cleaner than the earlier assembler based
version, and in particular implements adaptation which makes
it safer.
I changed the behavior slightly to not require any changes
in the test suite and fully conform to all expected
behaviors (generally at the cost of not eliding in
various situations). In particular this means the timedlock
variants are not elided. Nested trylock aborts.
This patch fixes bug 16064, i386 fenv_t not including SSE state, using
the technique suggested there of storing the state in the existing
__eip field of fenv_t to avoid needing to increase the size of fenv_t
and add new symbol versions. The included testcase, which previously
failed for i386 (but passed for x86_64), illustrates how the previous
state was buggy.
This patch causes the SSE state to be included *to the extent it is on
x86_64*. Where some state should logically be included but isn't for
x86_64 (see bug 16068), this patch does not cause it to be included
for i386 either. The idea is that any patch fixing that bug should
fix it for both x86_64 and i386 at once.
Tested i386 and x86_64. (I haven't tested the case of a CPU without
SSE2 disabling the test.)
[BZ #16064]
* sysdeps/i386/fpu/fegetenv.c: Include <unistd.h>, <ldsodefs.h>
and <dl-procinfo.h>.
(__fegetenv): Save SSE state in envp->__eip if supported.
* sysdeps/i386/fpu/feholdexcpt.c (feholdexcept): Save SSE state in
envp->__eip if supported.
* sysdeps/i386/fpu/fesetenv.c: Include <unistd.h>, <ldsodefs.h>
and <dl-procinfo.h>.
(__fesetenv): Always set __eip, __cs_selector, __opcode,
__data_offset and __data_selector in environment to 0. Set SSE
state if supported.
* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
test-fenv-sse.
[$(subdir) = math] (CFLAGS-test-fenv-sse.c): Add -msse2
-mfpmath=sse.
* sysdeps/x86/fpu/test-fenv-sse.c: New file.
__int128 was added in GCC 4.6 and __int128_t was added before x86-64
was supported. This patch replaces __int128 with __int128_t so that
the installed bits/link.h can be used with older GCC.
* sysdeps/x86/bits/link.h (La_x86_64_regs): Replace __int128
with __int128_t.
(La_x86_64_retval): Likewise.
AVX-512 ISA adds 512-bit zmm registers. This patch updates
_dl_runtime_profile to pass zmm registers to run-time audit. It also
changes _dl_x86_64_save_sse and _dl_x86_64_restore_sse to upport zmm
registers, which are called when only when RTLD_PREPARE_FOREIGN_CALL
is used. Its performance impact is minimum.
* config.h.in (HAVE_AVX512_SUPPORT): New #undef.
(HAVE_AVX512_ASM_SUPPORT): Likewise.
* sysdeps/x86_64/bits/link.h (La_x86_64_zmm): New.
(La_x86_64_vector): Add zmm.
* sysdeps/x86_64/Makefile (tests): Add tst-audit10.
(modules-names): Add tst-auditmod10a and tst-auditmod10b.
($(objpfx)tst-audit10): New target.
($(objpfx)tst-audit10.out): Likewise.
(tst-audit10-ENV): New.
(AVX512-CFLAGS): Likewise.
(CFLAGS-tst-audit10.c): Likewise.
(CFLAGS-tst-auditmod10a.c): Likewise.
(CFLAGS-tst-auditmod10b.c): Likewise.
* sysdeps/x86_64/configure.ac: Set config-cflags-avx512,
HAVE_AVX512_SUPPORT and HAVE_AVX512_ASM_SUPPORT.
* sysdeps/x86_64/configure: Regenerated.
* sysdeps/x86_64/dl-trampoline.S (_dl_runtime_profile): Add
AVX-512 zmm register support.
(_dl_x86_64_save_sse): Likewise.
(_dl_x86_64_restore_sse): Likewise.
* sysdeps/x86_64/dl-trampoline.h: Updated to support different
size vector registers.
* sysdeps/x86_64/link-defines.sym (YMM_SIZE): New.
(ZMM_SIZE): Likewise.
* sysdeps/x86_64/tst-audit10.c: New file.
* sysdeps/x86_64/tst-auditmod10a.c: Likewise.
* sysdeps/x86_64/tst-auditmod10b.c: Likewise.
This patch is a revised and updated version of
<https://sourceware.org/ml/libc-alpha/2014-01/msg00196.html>.
In order to generate overall summaries of the results of all tests in
the glibc testsuite, we need to identify and concatenate the files
with the results of individual tests.
Tomas Dohnalek's patch used $(common-objpfx)*/*.test-result for this.
However, the normal glibc approach is explicit enumeration of the
expected set of files with a given property, rather than all files
matching some pattern like that. Furthermore, we would like to be
able to mark tests as UNRESOLVED if the file with their results is for
some reason missing, and in future we would like to be able to mark
tests as UNSUPPORTED if they are disabled for a particular
configuration (rather than simply having them missing from the list of
tests as at present). Such handling of tests that were not run or did
not record results requires an explicit enumeration of tests.
For the tests following the default makefile rules, $(tests) (and
$(xtests)) provides such an enumeration. Others, however, are added
directly as dependencies of the "tests" and "xtests" makefile
targets. This patch changes the makefiles to put them in variables
tests-special and xtests-special, with appropriate dependencies on the
tests listed there then being added centrally.
Those variables are used in Rules and so need to be set before Rules
is included in a subdirectory makefile, which is often earlier in the
makefile than the dependencies were present before. We previously
discussed the question of where to include Rules; see the question at
<https://sourceware.org/ml/libc-alpha/2012-11/msg00798.html>, and a
discussion in
<https://sourceware.org/ml/libc-alpha/2013-01/msg00337.html> of why
Rules is included early rather than late in subdirectory makefiles.
It was necessary to avoid an indirection through the check-abi target
and get the check-abi-* targets for individual libraries into the
tests-special variable. The intl/ test $(objpfx)tst-gettext.out,
previously built only because of dependencies from other tests, was
also added to tests-special for the same reason.
The entries in tests-special are the full makefile targets, complete
with $(objpfx) and .out. If a future change causes tests to be named
consistently with a .out suffix, this can be changed to include just
the path relative to $(objpfx), without .out.
Tested x86_64, including that the same set of files is generated in
the build directory by a build and testsuite run both before and after
the patch (except for changes to the
elf/tst-null-argv.debug.out.<number> file name), and a build with
run-built-tests=no to verify there aren't any more obvious instances
of the issue Marcus Shawcroft reported with a previous version in
<https://sourceware.org/ml/libc-alpha/2014-01/msg00462.html>.
* Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
(tests): Depend on $(tests-special).
* Makerules (check-abi-list): New variable.
(check-abi): Depend on $(check-abi-list).
[$(subdir) = elf] (tests-special): Add
$(objpfx)check-abi-libc.out.
[$(build-shared) = yes && subdir] (tests-special): Add
$(check-abi-list).
[$(build-shared) = yes && subdir] (tests): Do not depend on
check-abi.
* Rules (tests): Depend on $(tests-special).
(xtests): Depend on $(xtests-special).
* catgets/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* conform/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* elf/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* grp/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* iconv/Makefile (xtests): Change dependencies to ....
(xtests-special): ... additions to this variable.
* iconvdata/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* intl/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable. Also add
$(objpfx)tst-gettext.out.
* io/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* libio/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* malloc/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* misc/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* nptl/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* nptl_db/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* posix/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
(xtests): Change dependencies to ....
(xtests-special): ... additions to this variable.
* resolv/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
(xtests): Change dependencies to ....
(xtests-special): ... additions to this variable.
* stdio-common/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
(do-tst-unbputc): Remove target.
(do-tst-printf): Likewise.
* stdlib/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* string/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
* sysdeps/x86/Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
localedata:
* Makefile (tests): Change dependencies to ....
(tests-special): ... additions to this variable.
When i386 and x86-64 mathinline.h was merged into a single mathinline.h,
"gcc -m32" enables x87 inline functions on x86-64 even when -mfpmath=sse
and SSE2 is enabled. It is a regression on x86-64. We should check
__SSE2_MATH__ instead of __x86_64__ when disabling x87 inline functions.
The __builtin_bswap* functions were introduced in gcc-4.3, not gcc-4.2.
Fix the __GNUC_PREREQ tests to reflect this.
Otherwise trying to compile code with gcc-4.2 falls down:
In file included from /usr/include/endian.h:60,
from /usr/include/ctype.h:40,
/usr/include/bits/byteswap.h: In function 'unsigned int __bswap_32(unsigned int)':
/usr/include/bits/byteswap.h:46: error: '__builtin_bswap32' was not declared in this scope
/usr/include/bits/byteswap.h: In function 'long long unsigned int __bswap_64(long long unsigned int)':
/usr/include/bits/byteswap.h:110: error: '__builtin_bswap64' was not declared in this scope
Signed-off-by: Mike Frysinger <vapier@gentoo.org>