Commit Graph

1003 Commits

Author SHA1 Message Date
Joseph Myers
5ff81530dd Do not raise "inexact" from x86_64 SSE4.1 ceil, floor (bug 15479).
Continuing fixes for ceil and floor functions not to raise the
"inexact" exception, this patch fixes the x86_64 SSE4.1 versions.  The
roundss / roundsd instructions take an immediate operand that
determines the rounding mode and whether to raise "inexact"; this just
needs bit 3 set to disable "inexact", which this patch does.

Remark: we don't have an SSE4.1 version of trunc / truncf (using this
instruction with operand 11); I'd expect one to make sense, but of
course it should be benchmarked against the existing C code.  I'll
file a bug in Bugzilla for the lack of such a version.

Tested for x86_64.

	[BZ #15479]
	* sysdeps/x86_64/fpu/multiarch/s_ceil.S (__ceil_sse41): Set bit 3
	of immediate operand to rounding instruction.
	* sysdeps/x86_64/fpu/multiarch/s_ceilf.S (__ceilf_sse41):
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floor.S (__floor_sse41):
	Likewise.
	* sysdeps/x86_64/fpu/multiarch/s_floorf.S (__floorf_sse41):
	Likewise.
2016-05-24 21:11:18 +00:00
H.J. Lu
6901def689 Avoid an extra branch to PLT for -z now
When --enable-bind-now is used to configure glibc build, we can avoid
an extra branch to the PLT entry by using indirect branch via the GOT
slot instead, which is similar to the first instructuon in the PLT
entry.  Changes in the shared library sizes in text sections:

Shared library    Before (bytes)   After (bytes)
libm.so             1060813          1060797
libmvec.so           160881           160805
libpthread.so         94992            94984
librt.so              25064            25048

	* config.h.in (BIND_NOW): New.
	* configure.ac (BIND_NOW): New.  Defined for --enable-bind-now.
	* configure: Regenerated.
	* sysdeps/x86_64/sysdep.h (JUMPTARGET)[BIND_NOW]: Defined to
	indirect branch via the GOT slot.
2016-05-24 08:44:23 -07:00
H.J. Lu
eb2c88c7c8 Remove alignments on jump targets in memset
X86-64 memset-vec-unaligned-erms.S aligns many jump targets, which
increases code sizes, but not necessarily improve performance.  As
memset benchtest data of align vs no align on various Intel and AMD
processors

https://sourceware.org/bugzilla/attachment.cgi?id=9277

shows that aligning jump targets isn't necessary.

	[BZ #20115]
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S (__memset):
	Remove alignments on jump targets.
2016-05-19 08:49:55 -07:00
H.J. Lu
4facca0b0e Call init_cpu_features only if SHARED is defined
In static executable, since init_cpu_features is called early from
__libc_start_main, there is no need to call it again in dl_platform_init.

	[BZ #20072]
	* sysdeps/i386/dl-machine.h (dl_platform_init): Call
	init_cpu_features only if SHARED is defined.
	* sysdeps/x86_64/dl-machine.h (dl_platform_init): Likewise.
2016-05-13 08:29:33 -07:00
H.J. Lu
2a1f15b1a9 Remove x86 ifunc-defines.sym and rtld-global-offsets.sym
Merge x86 ifunc-defines.sym with x86 cpu-features-offsets.sym.  Remove
x86 ifunc-defines.sym and rtld-global-offsets.sym.  No code changes on
i686 and x86-64.

	* sysdeps/i386/i686/multiarch/Makefile (gen-as-const-headers):
	Remove ifunc-defines.sym.
	* sysdeps/x86_64/multiarch/Makefile (gen-as-const-headers):
	Likewise.
	* sysdeps/i386/i686/multiarch/ifunc-defines.sym: Removed.
	* sysdeps/x86/rtld-global-offsets.sym: Likewise.
	* sysdeps/x86_64/multiarch/ifunc-defines.sym: Likewise.
	* sysdeps/x86/Makefile (gen-as-const-headers): Remove
	rtld-global-offsets.sym.
	* sysdeps/x86_64/multiarch/ifunc-defines.sym: Merged with ...
	* sysdeps/x86/cpu-features-offsets.sym: This.
	* sysdeps/x86/cpu-features.h: Include <cpu-features-offsets.h>
	instead of <ifunc-defines.h> and <rtld-global-offsets.h>.
2016-05-11 05:51:39 -07:00
H.J. Lu
a9558b49b3 Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86
Move sysdeps/x86_64/cacheinfo.c to sysdeps/x86.  No code changes on x86
and x86_64.

	* sysdeps/i386/cacheinfo.c: Include <sysdeps/x86/cacheinfo.c>
	instead of <sysdeps/x86_64/cacheinfo.c>.
	* sysdeps/x86_64/cacheinfo.c: Moved to ...
	* sysdeps/x86/cacheinfo.c: Here.
2016-05-08 08:49:18 -07:00
Andreas Schwab
b4bcb3aec6 Register extra test objects
This makes sure that the extra test objects are compiled with the correct
MODULE_NAME and dependencies are tracked.
2016-04-13 17:07:13 +02:00
H.J. Lu
a057f5f8cd X86-64: Use non-temporal store in memcpy on large data
The large memcpy micro benchmark in glibc shows that there is a
regression with large data on Haswell machine.  non-temporal store in
memcpy on large data can improve performance significantly.  This
patch adds a threshold to use non temporal store which is 6 times of
shared cache size.  When size is above the threshold, non temporal
store will be used, but avoid non-temporal store if there is overlap
between destination and source since destination may be in cache when
source is loaded.

For size below 8 vector register width, we load all data into registers
and store them together.  Only forward and backward loops, which move 4
vector registers at a time, are used to support overlapping addresses.
For forward loop, we load the last 4 vector register width of data and
the first vector register width of data into vector registers before the
loop and store them after the loop.  For backward loop, we load the first
4 vector register width of data and the last vector register width of
data into vector registers before the loop and store them after the loop.

	[BZ #19928]
	* sysdeps/x86_64/cacheinfo.c (__x86_shared_non_temporal_threshold):
	New.
	(init_cacheinfo): Set __x86_shared_non_temporal_threshold to 6
	times of shared cache size.
	* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S
	(VMOVNT): New.
	* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S
	(VMOVNT): Likewise.
	* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S
	(VMOVNT): Likewise.
	(VMOVU): Changed to movups for smaller code sizes.
	(VMOVA): Changed to movaps for smaller code sizes.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S: Update
	comments.
	(PREFETCH): New.
	(PREFETCH_SIZE): Likewise.
	(PREFETCHED_LOAD_SIZE): Likewise.
	(PREFETCH_ONE_SET): Likewise.
	Rewrite to use forward and backward loops, which move 4 vector
	registers at a time, to support overlapping addresses and use
	non temporal store if size is above the threshold and there is
	no overlap between destination and source.
2016-04-12 08:10:47 -07:00
Mike Frysinger
b2d4456b33 configure: fix test == usage
POSIX defines the = operator, but not ==.  Fix the few places where we
incorrectly used ==.
2016-04-09 20:05:13 -04:00
H.J. Lu
a7d1c51482 X86-64: Prepare memmove-vec-unaligned-erms.S
Prepare memmove-vec-unaligned-erms.S to make the SSE2 version as the
default memcpy, mempcpy and memmove.

	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S
	(MEMCPY_SYMBOL): New.
	(MEMPCPY_SYMBOL): Likewise.
	(MEMMOVE_CHK_SYMBOL): Likewise.
	Replace MEMMOVE_SYMBOL with MEMMOVE_CHK_SYMBOL on __mempcpy_chk
	symbols.  Replace MEMMOVE_SYMBOL with MEMPCPY_SYMBOL on
	__mempcpy symbols.  Provide alias for __memcpy_chk in libc.a.
	Provide alias for memcpy in libc.a and ld.so.
2016-04-06 10:19:16 -07:00
H.J. Lu
4af1bb06c5 X86-64: Prepare memset-vec-unaligned-erms.S
Prepare memset-vec-unaligned-erms.S to make the SSE2 version as the
default memset.

	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S
	(MEMSET_CHK_SYMBOL): New.  Define if not defined.
	(__bzero): Check VEC_SIZE == 16 instead of USE_MULTIARCH.
	Disabled fro now.
	Replace MEMSET_SYMBOL with MEMSET_CHK_SYMBOL on __memset_chk
	symbols.  Properly check USE_MULTIARCH on __memset symbols.
2016-04-06 09:10:35 -07:00
H.J. Lu
ec0cac9a1f Force 32-bit displacement in memset-vec-unaligned-erms.S
* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S: Force
	32-bit displacement to avoid long nop between instructions.
2016-04-05 05:21:19 -07:00
H.J. Lu
696ac77484 Add a comment in memset-sse2-unaligned-erms.S
* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S: Add
	a comment on VMOVU and VMOVA.
2016-04-05 05:19:18 -07:00
H.J. Lu
5cd7af016d Don't put SSE2/AVX/AVX512 memmove/memset in ld.so
Since memmove and memset in ld.so don't use IFUNC, don't put SSE2, AVX
and AVX512 memmove and memset in ld.so.

	* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: Skip
	if not in libc.
	* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
	Likewise.
2016-04-03 14:35:38 -07:00
H.J. Lu
ea2785e96f Fix memmove-vec-unaligned-erms.S
__mempcpy_erms and __memmove_erms can't be placed between __memmove_chk
and __memmove it breaks __memmove_chk.

Don't check source == destination first since it is less common.

	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
	(__mempcpy_erms, __memmove_erms): Moved before __mempcpy_chk
	with unaligned_erms.
	(__memmove_erms): Skip if source == destination.
	(__memmove_unaligned_erms): Don't check source == destination
	first.
2016-04-03 12:38:25 -07:00
H.J. Lu
830566307f Add x86-64 memset with unaligned store and rep stosb
Implement x86-64 memset with unaligned store and rep movsb.  Support
16-byte, 32-byte and 64-byte vector register sizes.  A single file
provides 2 implementations of memset, one with rep stosb and the other
without rep stosb.  They share the same codes when size is between 2
times of vector register size and REP_STOSB_THRESHOLD which defaults
to 2KB.

Key features:

1. Use overlapping store to avoid branch.
2. For size <= 4 times of vector register size, fully unroll the loop.
3. For size > 4 times of vector register size, store 4 times of vector
register size at a time.

	[BZ #19881]
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memset-sse2-unaligned-erms, memset-avx2-unaligned-erms and
	memset-avx512-unaligned-erms.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test __memset_chk_sse2_unaligned,
	__memset_chk_sse2_unaligned_erms, __memset_chk_avx2_unaligned,
	__memset_chk_avx2_unaligned_erms, __memset_chk_avx512_unaligned,
	__memset_chk_avx512_unaligned_erms, __memset_sse2_unaligned,
	__memset_sse2_unaligned_erms, __memset_erms,
	__memset_avx2_unaligned, __memset_avx2_unaligned_erms,
	__memset_avx512_unaligned_erms and __memset_avx512_unaligned.
	* sysdeps/x86_64/multiarch/memset-avx2-unaligned-erms.S: New
	file.
	* sysdeps/x86_64/multiarch/memset-avx512-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-sse2-unaligned-erms.S:
	Likewise.
	* sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:
	Likewise.
2016-03-31 10:06:07 -07:00
H.J. Lu
88b57b8ed4 Add x86-64 memmove with unaligned load/store and rep movsb
Implement x86-64 memmove with unaligned load/store and rep movsb.
Support 16-byte, 32-byte and 64-byte vector register sizes.  When
size <= 8 times of vector register size, there is no check for
address overlap bewteen source and destination.  Since overhead for
overlap check is small when size > 8 times of vector register size,
memcpy is an alias of memmove.

A single file provides 2 implementations of memmove, one with rep movsb
and the other without rep movsb.  They share the same codes when size is
between 2 times of vector register size and REP_MOVSB_THRESHOLD which
is 2KB for 16-byte vector register size and scaled up by large vector
register size.

Key features:

1. Use overlapping load and store to avoid branch.
2. For size <= 8 times of vector register size, load  all sources into
registers and store them together.
3. If there is no address overlap bewteen source and destination, copy
from both ends with 4 times of vector register size at a time.
4. If address of destination > address of source, backward copy 8 times
of vector register size at a time.
5. Otherwise, forward copy 8 times of vector register size at a time.
6. Use rep movsb only for forward copy.  Avoid slow backward rep movsb
by fallbacking to backward copy 8 times of vector register size at a
time.
7. Skip when address of destination == address of source.

	[BZ #19776]
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memmove-sse2-unaligned-erms, memmove-avx-unaligned-erms and
	memmove-avx512-unaligned-erms.
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list): Test
	__memmove_chk_avx512_unaligned_2,
	__memmove_chk_avx512_unaligned_erms,
	__memmove_chk_avx_unaligned_2, __memmove_chk_avx_unaligned_erms,
	__memmove_chk_sse2_unaligned_2,
	__memmove_chk_sse2_unaligned_erms, __memmove_avx_unaligned_2,
	__memmove_avx_unaligned_erms, __memmove_avx512_unaligned_2,
	__memmove_avx512_unaligned_erms, __memmove_erms,
	__memmove_sse2_unaligned_2, __memmove_sse2_unaligned_erms,
	__memcpy_chk_avx512_unaligned_2,
	__memcpy_chk_avx512_unaligned_erms,
	__memcpy_chk_avx_unaligned_2, __memcpy_chk_avx_unaligned_erms,
	__memcpy_chk_sse2_unaligned_2, __memcpy_chk_sse2_unaligned_erms,
	__memcpy_avx_unaligned_2, __memcpy_avx_unaligned_erms,
	__memcpy_avx512_unaligned_2, __memcpy_avx512_unaligned_erms,
	__memcpy_sse2_unaligned_2, __memcpy_sse2_unaligned_erms,
	__memcpy_erms, __mempcpy_chk_avx512_unaligned_2,
	__mempcpy_chk_avx512_unaligned_erms,
	__mempcpy_chk_avx_unaligned_2, __mempcpy_chk_avx_unaligned_erms,
	__mempcpy_chk_sse2_unaligned_2, __mempcpy_chk_sse2_unaligned_erms,
	__mempcpy_avx512_unaligned_2, __mempcpy_avx512_unaligned_erms,
	__mempcpy_avx_unaligned_2, __mempcpy_avx_unaligned_erms,
	__mempcpy_sse2_unaligned_2, __mempcpy_sse2_unaligned_erms and
	__mempcpy_erms.
	* sysdeps/x86_64/multiarch/memmove-avx-unaligned-erms.S: New
	file.
	* sysdeps/x86_64/multiarch/memmove-avx512-unaligned-erms.S:
	Likwise.
	* sysdeps/x86_64/multiarch/memmove-sse2-unaligned-erms.S:
	Likwise.
	* sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:
	Likwise.
2016-03-31 10:04:40 -07:00
H.J. Lu
064f01b10b Make __memcpy_avx512_no_vzeroupper an alias
Since x86-64 memcpy-avx512-no-vzeroupper.S implements memmove, make
__memcpy_avx512_no_vzeroupper an alias of __memmove_avx512_no_vzeroupper
to reduce code size of libc.so.

	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
	memcpy-avx512-no-vzeroupper.
	* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: Renamed
	to ...
	* sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: This.
	(MEMCPY): Don't define.
	(MEMCPY_CHK): Likewise.
	(MEMPCPY): Likewise.
	(MEMPCPY_CHK): Likewise.
	(MEMPCPY_CHK): Renamed to ...
	(__mempcpy_chk_avx512_no_vzeroupper): This.
	(MEMPCPY_CHK): Renamed to ...
	(__mempcpy_chk_avx512_no_vzeroupper): This.
	(MEMCPY_CHK): Renamed to ...
	(__memmove_chk_avx512_no_vzeroupper): This.
	(MEMCPY): Renamed to ...
	(__memmove_avx512_no_vzeroupper): This.
	(__memcpy_avx512_no_vzeroupper): New alias.
	(__memcpy_chk_avx512_no_vzeroupper): Likewise.
2016-03-28 13:16:22 -07:00
H.J. Lu
c365e615f7 Implement x86-64 multiarch mempcpy in memcpy
Implement x86-64 multiarch mempcpy in memcpy to share most of code.  It
reduces code size of libc.so.

	[BZ #18858]
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Remove
	mempcpy-ssse3, mempcpy-ssse3-back, mempcpy-avx-unaligned
	and mempcpy-avx512-no-vzeroupper.
	* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMPCPY_CHK):
	New.
	(MEMPCPY): Likewise.
	* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S
	(MEMPCPY_CHK): New.
	(MEMPCPY): Likewise.
	* sysdeps/x86_64/multiarch/memcpy-ssse3-back.S (MEMPCPY_CHK): New.
	(MEMPCPY): Likewise.
	* sysdeps/x86_64/multiarch/memcpy-ssse3.S (MEMPCPY_CHK): New.
	(MEMPCPY): Likewise.
	* sysdeps/x86_64/multiarch/mempcpy-avx-unaligned.S: Removed.
	* sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S:
	Likewise.
	* sysdeps/x86_64/multiarch/mempcpy-ssse3-back.S: Likewise.
	* sysdeps/x86_64/multiarch/mempcpy-ssse3.S: Likewise.
2016-03-28 13:13:51 -07:00
H.J. Lu
e41b395523 [x86] Add a feature bit: Fast_Unaligned_Copy
On AMD processors, memcpy optimized with unaligned SSE load is
slower than emcpy optimized with aligned SSSE3 while other string
functions are faster with unaligned SSE load.  A feature bit,
Fast_Unaligned_Copy, is added to select memcpy optimized with
unaligned SSE load.

	[BZ #19583]
	* sysdeps/x86/cpu-features.c (init_cpu_features): Set
	Fast_Unaligned_Copy with Fast_Unaligned_Load for Intel
	processors.  Set Fast_Copy_Backward for AMD Excavator
	processors.
	* sysdeps/x86/cpu-features.h (bit_arch_Fast_Unaligned_Copy):
	New.
	(index_arch_Fast_Unaligned_Copy): Likewise.
	* sysdeps/x86_64/multiarch/memcpy.S (__new_memcpy): Check
	Fast_Unaligned_Copy instead of Fast_Unaligned_Load.
2016-03-28 04:40:03 -07:00
Florian Weimer
f327f5b47b tst-audit10: Fix compilation on compilers without bit_AVX512F [BZ #19860]
[BZ# 19860]
	* sysdeps/x86_64/tst-audit10.c (avx512_enabled): Always return
	zero if the compiler does not provide the AVX512F bit.
2016-03-25 11:11:42 +01:00
Joseph Myers
c898991d8b Fix x86_64 / x86 powl inaccuracy for integer exponents (bug 19848).
Bug 19848 reports cases where powl on x86 / x86_64 has error
accumulation, for small integer exponents, larger than permitted by
glibc's accuracy goals, at least in some rounding modes.  This patch
further restricts the exponent range for which the
small-integer-exponent logic is used to limit the possible error
accumulation.

Tested for x86_64 and x86 and ulps updated accordingly.

	[BZ #19848]
	* sysdeps/i386/fpu/e_powl.S (p3): Rename to p2 and change value
	from 8 to 4.
	(__ieee754_powl): Compare integer exponent against 4 not 8.
	* sysdeps/x86_64/fpu/e_powl.S (p3): Rename to p2 and change value
	from 8 to 4.
	(__ieee754_powl): Compare integer exponent against 4 not 8.
	* math/auto-libm-test-in: Add more tests of pow.
	* math/auto-libm-test-out: Regenerated.
	* sysdeps/i386/i686/fpu/multiarch/libm-test-ulps: Update.
	* sysdeps/x86_64/fpu/libm-test-ulps: Likewise.
2016-03-24 01:32:52 +00:00
H.J. Lu
3c9a4cd16c Don't set %rcx twice before "rep movsb"
* sysdeps/x86_64/multiarch/memcpy-avx-unaligned.S (MEMCPY):
	Don't set %rcx twice before "rep movsb".
2016-03-22 08:36:16 -07:00
H.J. Lu
86ed888255 Use JUMPTARGET in x86-64 mathvec
When PLT may be used, JUMPTARGET should be used instead calling the
function directly.

	* sysdeps/x86_64/fpu/multiarch/svml_d_cos2_core_sse4.S
	(_ZGVbN2v_cos_sse4): Use JUMPTARGET to call cos.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos4_core_avx2.S
	(_ZGVdN4v_cos_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_cos8_core_avx512.S
	(_ZGVdN4v_cos): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp2_core_sse4.S
	(_ZGVbN2v_exp_sse4): Use JUMPTARGET to call exp.
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp4_core_avx2.S
	(_ZGVdN4v_exp_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_exp8_core_avx512.S
	(_ZGVdN4v_exp): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_log2_core_sse4.S
	(_ZGVbN2v_log_sse4): Use JUMPTARGET to call log.
	* sysdeps/x86_64/fpu/multiarch/svml_d_log4_core_avx2.S
	(_ZGVdN4v_log_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_log8_core_avx512.S
	(_ZGVdN4v_log): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow2_core_sse4.S
	(_ZGVbN2vv_pow_sse4): Use JUMPTARGET to call pow.
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow4_core_avx2.S
	(_ZGVdN4vv_pow_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_pow8_core_avx512.S
	(_ZGVdN4vv_pow): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin2_core_sse4.S
	(_ZGVbN2v_sin_sse4): Use JUMPTARGET to call sin.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin4_core_avx2.S
	(_ZGVdN4v_sin_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sin8_core_avx512.S
	(_ZGVdN4v_sin): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos2_core_sse4.S
	(_ZGVbN2vvv_sincos_sse4): Use JUMPTARGET to call sin and cos.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos4_core_avx2.S
	(_ZGVdN4vvv_sincos_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_d_sincos8_core_avx512.S
	(_ZGVdN4vvv_sincos): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_cosf16_core_avx512.S
	(_ZGVdN8v_cosf): Use JUMPTARGET to call cosf.
	* sysdeps/x86_64/fpu/multiarch/svml_s_cosf4_core_sse4.S
	(_ZGVbN4v_cosf_sse4): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_cosf8_core_avx2.S
	(_ZGVdN8v_cosf_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_expf16_core_avx512.S
	(_ZGVdN8v_expf): Use JUMPTARGET to call expf.
	* sysdeps/x86_64/fpu/multiarch/svml_s_expf4_core_sse4.S
	(_ZGVbN4v_expf_sse4): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_expf8_core_avx2.S
	(_ZGVdN8v_expf_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_logf16_core_avx512.S
	(_ZGVdN8v_logf): Use JUMPTARGET to call logf.
	* sysdeps/x86_64/fpu/multiarch/svml_s_logf4_core_sse4.S
	(_ZGVbN4v_logf_sse4): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_logf8_core_avx2.S
	(_ZGVdN8v_logf_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_powf16_core_avx512.S
	(_ZGVdN8vv_powf): Use JUMPTARGET to call powf.
	* sysdeps/x86_64/fpu/multiarch/svml_s_powf4_core_sse4.S
	(_ZGVbN4vv_powf_sse4): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_powf8_core_avx2.S
	(_ZGVdN8vv_powf_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
	(_ZGVdN8vv_powf): Use JUMPTARGET to call sinf and cosf.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
	(_ZGVbN4vvv_sincosf_sse4): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
	(_ZGVdN8vvv_sincosf_avx2): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sinf16_core_avx512.S
	(_ZGVdN8v_sinf): Use JUMPTARGET to call sinf.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sinf4_core_sse4.S
	(_ZGVbN4v_sinf_sse4): Likewise.
	* sysdeps/x86_64/fpu/multiarch/svml_s_sinf8_core_avx2.S
	(_ZGVdN8v_sinf_avx2): Likewise.
	* sysdeps/x86_64/fpu/svml_d_wrapper_impl.h (WRAPPER_IMPL_SSE2):
	Use JUMPTARGET to call callee.
	(WRAPPER_IMPL_SSE2_ff): Likewise.
	(WRAPPER_IMPL_SSE2_fFF): Likewise.
	(WRAPPER_IMPL_AVX): Likewise.
	(WRAPPER_IMPL_AVX_ff): Likewise.
	(WRAPPER_IMPL_AVX_fFF): Likewise.
	(WRAPPER_IMPL_AVX512): Likewise.
	(WRAPPER_IMPL_AVX512_ff): Likewise.
	* sysdeps/x86_64/fpu/svml_s_wrapper_impl.h (WRAPPER_IMPL_SSE2):
	Likewise.
	(WRAPPER_IMPL_SSE2_ff): Likewise.
	(WRAPPER_IMPL_SSE2_fFF): Likewise.
	(WRAPPER_IMPL_AVX): Likewise.
	(WRAPPER_IMPL_AVX_ff): Likewise.
	(WRAPPER_IMPL_AVX_fFF): Likewise.
	(WRAPPER_IMPL_AVX512): Likewise.
	(WRAPPER_IMPL_AVX512_ff): Likewise.
	(WRAPPER_IMPL_AVX512_fFF): Likewise.
2016-03-16 14:24:19 -07:00
Roland McGrath
3bd80c0de2 Fix tst-audit10 build when -mavx512f is not supported. 2016-03-08 12:32:59 -08:00
Florian Weimer
3c0f7407ee tst-audit4, tst-audit10: Compile AVX/AVX-512 code separately [BZ #19269]
This ensures that GCC will not use unsupported instructions before
the run-time check to ensure support.
2016-03-07 16:00:25 +01:00
H.J. Lu
fee9eb6200 Group AVX512 functions in .text.avx512 section
* sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S:
	Replace .text with .text.avx512.
	* sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S:
	Likewise.
2016-03-06 16:48:11 -08:00
H.J. Lu
16b23e0363 Replace PREINIT_FUNCTION@PLT with *%rax in call
Since we have loaded address of PREINIT_FUNCTION into %rax, we can
avoid extra branch to PLT slot.

	[BZ #19745]
	* sysdeps/x86_64/crti.S (_init): Replace PREINIT_FUNCTION@PLT
	with *%rax in call.
2016-03-04 16:15:41 -08:00
H.J. Lu
21683b5a7d Replace @PLT with @GOTPCREL(%rip) in call
Since __libc_start_main is called very early, lazy binding isn't relevant
here.  Use indirect branch via GOT to avoid extra branch to PLT slot.

	[BZ #19745]
	* sysdeps/x86_64/start.S (_start): __libc_start_main@PLT
	with *__libc_start_main@GOTPCREL(%rip) in call.
2016-03-04 16:15:41 -08:00
H.J. Lu
97f7112728 Add a comment in sysdeps/x86_64/Makefile
Mention recursive calls when ENTRY is used in _mcount.S.

	* sysdeps/x86_64/Makefile (sysdep_noprof): Add a comment.
2016-03-04 08:44:58 -08:00
H.J. Lu
14a1d7cc4c x86-64: Fix memcpy IFUNC selection
Chek Fast_Unaligned_Load, instead of Slow_BSF, and also check for
Fast_Copy_Backward to enable __memcpy_ssse3_back.  Existing selection
order is updated with following selection order:

1. __memcpy_avx_unaligned if AVX_Fast_Unaligned_Load bit is set.
2. __memcpy_sse2_unaligned if Fast_Unaligned_Load bit is set.
3. __memcpy_sse2 if SSSE3 isn't available.
4. __memcpy_ssse3_back if Fast_Copy_Backward bit it set.
5. __memcpy_ssse3

	[BZ #18880]
	* sysdeps/x86_64/multiarch/memcpy.S: Check Fast_Unaligned_Load,
	instead of Slow_BSF, and also check for Fast_Copy_Backward to
	enable __memcpy_ssse3_back.
2016-03-04 08:39:07 -08:00
Paul Pluzhnikov
5cdc3d9db0 2016-03-03 Paul Pluzhnikov <ppluzhnikov@google.com>
[BZ #19490]
	* sysdeps/x86_64/_mcount.S (_mcount): Add unwind descriptor.
	(__fentry__): Likewise
2016-03-03 09:53:49 -08:00
H.J. Lu
87a07a4376 Copy x86_64 _mcount.op from _mcount.o
No need to compile x86_64 _mcount.S with -pg.  We can just copy the
normal static object.

	* gmon/Makefile (noprof): Add $(sysdep_noprof).
	* sysdeps/x86_64/Makefile (sysdep_noprof): Add _mcount.
2016-03-03 06:56:22 -08:00
H.J. Lu
ec215346b9 Call x86-64 __mcount_internal/__sigjmp_save directly
Since __mcount_internal and __sigjmp_save are internal to x86-64 libc.so:

3532: 0000000000104530   289 FUNC    LOCAL  DEFAULT   13 __mcount_internal
3391: 0000000000034170    38 FUNC    LOCAL  DEFAULT   13 __sigjmp_save

they can be called directly without PLT.

	* sysdeps/x86_64/_mcount.S (C_LABEL(_mcount)): Call
	__mcount_internal directly.
	(C_LABEL(__fentry__)): Likewise.
	* sysdeps/x86_64/setjmp.S __sigsetjmp): Call __sigjmp_save
	directly.
2016-03-01 16:58:07 -08:00
H.J. Lu
8d9c92017d [x86_64] Set DL_RUNTIME_UNALIGNED_VEC_SIZE to 8
Due to GCC bug:

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58066

__tls_get_addr may be called with 8-byte stack alignment.  Although
this bug has been fixed in GCC 4.9.4, 5.3 and 6, we can't assume
that stack will be always aligned at 16 bytes.  Since SSE optimized
memory/string functions with aligned SSE register load and store are
used in the dynamic linker, we must set DL_RUNTIME_UNALIGNED_VEC_SIZE
to 8 so that _dl_runtime_resolve_sse will align the stack before
calling _dl_fixup:

Dump of assembler code for function _dl_runtime_resolve_sse:
   0x00007ffff7deea90 <+0>:	push   %rbx
   0x00007ffff7deea91 <+1>:	mov    %rsp,%rbx
   0x00007ffff7deea94 <+4>:	and    $0xfffffffffffffff0,%rsp
                                ^^^^^^^^^^^ Align stack to 16 bytes
   0x00007ffff7deea98 <+8>:	sub    $0x100,%rsp
   0x00007ffff7deea9f <+15>:	mov    %rax,0xc0(%rsp)
   0x00007ffff7deeaa7 <+23>:	mov    %rcx,0xc8(%rsp)
   0x00007ffff7deeaaf <+31>:	mov    %rdx,0xd0(%rsp)
   0x00007ffff7deeab7 <+39>:	mov    %rsi,0xd8(%rsp)
   0x00007ffff7deeabf <+47>:	mov    %rdi,0xe0(%rsp)
   0x00007ffff7deeac7 <+55>:	mov    %r8,0xe8(%rsp)
   0x00007ffff7deeacf <+63>:	mov    %r9,0xf0(%rsp)
   0x00007ffff7deead7 <+71>:	movaps %xmm0,(%rsp)
   0x00007ffff7deeadb <+75>:	movaps %xmm1,0x10(%rsp)
   0x00007ffff7deeae0 <+80>:	movaps %xmm2,0x20(%rsp)
   0x00007ffff7deeae5 <+85>:	movaps %xmm3,0x30(%rsp)
   0x00007ffff7deeaea <+90>:	movaps %xmm4,0x40(%rsp)
   0x00007ffff7deeaef <+95>:	movaps %xmm5,0x50(%rsp)
   0x00007ffff7deeaf4 <+100>:	movaps %xmm6,0x60(%rsp)
   0x00007ffff7deeaf9 <+105>:	movaps %xmm7,0x70(%rsp)

	[BZ #19679]
	* sysdeps/x86_64/dl-trampoline.S (DL_RUNIME_UNALIGNED_VEC_SIZE):
	Renamed to ...
	(DL_RUNTIME_UNALIGNED_VEC_SIZE): This.  Set to 8.
	(DL_RUNIME_RESOLVE_REALIGN_STACK): Renamed to ...
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): This.  Updated.
	(DL_RUNIME_RESOLVE_REALIGN_STACK): Renamed to ...
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): This.
	* sysdeps/x86_64/dl-trampoline.h
	(DL_RUNIME_RESOLVE_REALIGN_STACK): Renamed to ...
	(DL_RUNTIME_RESOLVE_REALIGN_STACK): This.
2016-02-19 15:45:09 -08:00
Andrew Senkevich
a5df3210a6 Use PIC relocation in ALIAS_IMPL
Since libmvec_nonshared.a may be linked into shared objects, ALIAS_IMPL
should use PIC relocation.

	[BZ #19590]
	* sysdeps/x86_64/fpu/svml_finite_alias.S (ALIAS_IMPL): Use PIC
	relocation.
2016-02-17 14:23:32 -08:00
Paul Pluzhnikov
b274130206 2016-01-20 Paul Pluzhnikov <ppluzhnikov@google.com>
[BZ #19490]
* sysdeps/unix/sysv/linux/x86_64/pthread_cond_broadcast.S (pthread_cond_broadcast): Use ENTRY/END
* sysdeps/unix/sysv/linux/x86_64/pthread_cond_signal.S (pthread_cond_signal): Likewise
* sysdeps/x86_64/nptl/pthread_spin_lock.S (pthread_spin_lock): Likewise
* sysdeps/x86_64/nptl/pthread_spin_trylock.S (pthread_spin_trylock): Likewise
* sysdeps/x86_64/nptl/pthread_spin_unlock.S (pthread_spin_unlock): Likewise
2016-01-20 13:39:20 -08:00
Andrew Senkevich
df782dc690 Fixed build with assembler w/o AVX-512 support.
* sysdeps/x86_64/multiarch/ifunc-impl-list.c: Fixed build with
    assembler not supporting AVX-512.
2016-01-19 14:34:53 +03:00
Andrew Senkevich
214a44f394 Fixed typos in __memcpy_chk.
* sysdeps/x86_64/multiarch/memcpy_chk.S: Fixed typos.
2016-01-16 14:42:26 +03:00
Andrew Senkevich
72276d6e88 Added memcpy/memmove family optimized with AVX512 for KNL hardware.
Added AVX512 implementations of memcpy, mempcpy, memmove, memcpy_chk,
mempcpy_chk, memmove_chk.
It shows average improvement more than 30% over AVX versions on KNL
hardware (performance results in the thread
<https://sourceware.org/ml/libc-alpha/2016-01/msg00258.html>).

    * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new files.
    * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
    * sysdeps/x86_64/multiarch/memcpy-avx512-no-vzeroupper.S: New file.
    * sysdeps/x86_64/multiarch/mempcpy-avx512-no-vzeroupper.S: Likewise.
    * sysdeps/x86_64/multiarch/memmove-avx512-no-vzeroupper.S: Likewise.
    * sysdeps/x86_64/multiarch/memcpy.S: Added new IFUNC branch.
    * sysdeps/x86_64/multiarch/memcpy_chk.S: Likewise.
    * sysdeps/x86_64/multiarch/memmove.c: Likewise.
    * sysdeps/x86_64/multiarch/memmove_chk.c: Likewise.
    * sysdeps/x86_64/multiarch/mempcpy.S: Likewise.
    * sysdeps/x86_64/multiarch/mempcpy_chk.S: Likewise.
2016-01-16 00:49:45 +03:00
Joseph Myers
f7a9f785e5 Update copyright dates with scripts/update-copyrights. 2016-01-04 16:05:18 +00:00
Andrew Senkevich
83d776f979 Added memset optimized with AVX512 for KNL hardware.
It shows improvement up to 28% over AVX2 memset (performance results
attached at <https://sourceware.org/ml/libc-alpha/2015-12/msg00052.html>).

    * sysdeps/x86_64/multiarch/memset-avx512-no-vzeroupper.S: New file.
    * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Added new file.
    * sysdeps/x86_64/multiarch/ifunc-impl-list.c: Added new tests.
    * sysdeps/x86_64/multiarch/memset.S: Added new IFUNC branch.
    * sysdeps/x86_64/multiarch/memset_chk.S: Likewise.
    * sysdeps/x86/cpu-features.h (bit_Prefer_No_VZEROUPPER,
    index_Prefer_No_VZEROUPPER): New.
    * sysdeps/x86/cpu-features.c (init_cpu_features): Set the
    Prefer_No_VZEROUPPER for Knights Landing.
2015-12-19 02:47:28 +03:00
Andrew Senkevich
977a30801f Better workaround for aliases of *_finite symbols in vector math library.
Old workaround based on assembly aliases can lead to link fail (bug 19058).
This patch makes workaround in another way to avoid it.

    [BZ #19058]
    * math/Makefile ($(inst_libdir)/libm.so): Added libmvec_nonshared.a
    to AS_NEEDED.
    * sysdeps/x86/fpu/bits/math-vector.h: Removed code with old workaround.
    * sysdeps/x86_64/fpu/Makefile (libmvec-support,
    libmvec-static-only-routines): Added new file.
    * sysdeps/x86_64/fpu/svml_finite_alias.S: New file.
2015-11-27 16:22:26 +03:00
Joseph Myers
01189b083b Fix i386/x86_64 log* (1) zero sign for -ffinite-math-only (bug 19213).
For the -ffinite-math-only versions of various x86_64 and x86 log*
functions, a zero result from log* (1) is returned with incorrect sign
in round-downward mode.  This patch fixes this in a similar way to the
previous fixes for the non-*_finite versions of the functions.

Tested for x86_64 and x86 (including an i586 build), together with a
patch that will be applied separately to enable the main libm-test.inc
tests for the finite-math-only functions.

	[BZ #19213]
	* sysdeps/i386/fpu/e_log.S (__log_finite): Ensure +0 is always
	returned for argument 1.
	* sysdeps/i386/fpu/e_logf.S (__logf_finite): Likewise.
	* sysdeps/i386/fpu/e_logl.S (__logl_finite): Likewise.
	* sysdeps/i386/i686/fpu/e_logl.S (__logl_finite): Likewise.
	* sysdeps/x86_64/fpu/e_log10l.S (__log10l_finite): Likewise.
	* sysdeps/x86_64/fpu/e_log2l.S (__log2l_finite): Likewise.
	* sysdeps/x86_64/fpu/e_logl.S (__logl_finite): Likewise.
2015-11-05 21:56:31 +00:00
Joseph Myers
9f9f27248b Remove miscellaneous GCC >= 4.7 version conditionals.
This patch removes miscellaneous __GNUC_PREREQ (4, 7) conditionals
that are now dead.

Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).

	* sysdeps/arm/atomic-machine.h
	[__GNUC_PREREQ (4, 7) && __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4]:
	Change conditional to [__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4].
	[__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4 && !__GNUC_PREREQ (4, 7)]:
	Remove conditional code.
	[!__GNUC_PREREQ (4, 7) || !__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4]:
	Change conditional to [!__GCC_HAVE_SYNC_COMPARE_AND_SWAP_4].
	* sysdeps/i386/sysdep.h [__ASSEMBLER__ && __GNUC_PREREQ (4, 7)]:
	Change conditional to [__ASSEMBLER__].
	[__ASSEMBLER__ && !__GNUC_PREREQ (4, 7)]: Remove conditional code.
	[!__ASSEMBLER__ && __GNUC_PREREQ (4, 7)]: Change conditional to
	[!__ASSEMBLER__].
	[!__ASSEMBLER__ && !__GNUC_PREREQ (4, 7)]: Remove conditional
	code.
	* sysdeps/unix/sysv/linux/sh/atomic-machine.h (rNOSP): Remove
	conditional macro definitions.
	(__arch_compare_and_exchange_val_8_acq): Use "u" instead of rNOSP.
	(__arch_compare_and_exchange_val_16_acq): Likewise.
	(__arch_compare_and_exchange_val_32_acq): Likewise.
	(atomic_exchange_and_add): Likewise.
	(atomic_add): Likewise.
	(atomic_add_negative): Likewise.
	(atomic_add_zero): Likewise.
	(atomic_bit_set): Likewise.
	(atomic_bit_test_set): Likewise.
	* sysdeps/x86_64/atomic-machine.h [__GNUC_PREREQ (4, 7)]: Make
	code unconditional.
	[!__GNUC_PREREQ (4, 7)]: Remove conditional code.
2015-11-04 21:34:36 +00:00
Joseph Myers
91bcb95ad4 Remove cpuid.h configure tests.
There are configure tests for the cpuid.h header for x86 / x86_64.
GCC 4.3 and later install this header, so those tests are obsolete.
This patch removes them.

Tested for x86_64 and x86 (testsuite, and that installed shared
libraries are unchanged by the patch).

	* sysdeps/i386/configure.ac (cpuid.h): Do not test for header.
	* sysdeps/i386/configure: Regenerated.
	* sysdeps/x86_64/configure.ac (cpuid.h): Do not test for header.
	* sysdeps/x86_64/configure: Regenerated.
2015-10-29 14:43:46 +00:00
Joseph Myers
2145f97cee Handle more state in i386/x86_64 fesetenv (bug 16068).
fenv_t should include architecture-specific floating-point modes and
status flags.  i386 and x86_64 fesetenv limit which bits they use from
the x87 status and control words, when using saved state, and limit
which parts of the state they set to fixed values, when using
FE_DFL_ENV / FE_NOMASK_ENV.  The following should be included but are
excluded in at least some cases: status and masking for the "denormal
operand" exception (which isn't part of FE_ALL_EXCEPT); precision
control (explicitly mentioned in Annex F as something that counts as
part of the floating-point environment); MXCSR FZ and DAZ bits (for
FE_DFL_ENV and FE_NOMASK_ENV).  This patch arranges for this extra
state to be handled by fesetenv (and thereby by feupdateenv, which
calls fesetenv).

(Note that glibc functions using floating point are not generally
expected to work correctly with non-default values of this state,
especially precision control, but it is still logically part of the
floating-point environment and should be handled as such by fesetenv.
Changes to the state relating to subnormals ought generally to work
with libm functions when the arguments aren't subnormal and neither
are the expected results; that's a consequence of functions avoiding
spurious internal underflows.)

A question arising from this is whether FE_NOMASK_ENV should or should
not mask the "denormal operand" exception.  I decided it should mask
that exception.  This is the status quo - previously that exception
could only be unmasked by direct manipulation of control registers
(possibly via <fpu_control.h>).  In addition, it means that use of
FE_NOMASK_ENV leaves a floating-point environment the same as could be
obtained by fesetenv (FE_DFL_ENV); feenableexcept (FE_ALL_EXCEPT);,
rather than an environment in which an exception is unmasked that
could only be masked again by using fesetenv with FE_DFL_ENV (or a
previously saved environment) - this exception not being usable with
other <fenv.h> functions because it's outside FE_ALL_EXCEPT.

Tested for x86_64 and x86.

	[BZ #16068]
	* sysdeps/i386/fpu/fesetenv.c: Include <fpu_control.h>.
	(FE_ALL_EXCEPT_X86): New macro.
	(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
	FE_ALL_EXCEPT.  Ensure precision control is included in
	floating-point state.  Ensure that FE_DFL_ENV and FE_NOMASK_ENV
	handle "denormal operand exception" and clear FZ and DAZ bits.
	* sysdeps/x86_64/fpu/fesetenv.c: Include <fpu_control.h>.
	(FE_ALL_EXCEPT_X86): New macro.
	(__fesetenv): Use FE_ALL_EXCEPT_X86 in most places instead of
	FE_ALL_EXCEPT.  Ensure precision control is included in
	floating-point state.  Ensure that FE_DFL_ENV and FE_NOMASK_ENV
	handle "denormal operand exception" and clear FZ and DAZ bits.
	* sysdeps/x86/fpu/test-fenv-sse-2.c: New file.
	* sysdeps/x86/fpu/test-fenv-x87.c: Likewise.
	* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
	test-fenv-x87 and test-fenv-sse-2.
	[$(subdir) = math] (CFLAGS-test-fenv-sse-2.c): New variable.
2015-10-28 22:58:29 +00:00
Joseph Myers
0b9af583a5 Fix i386/x86_64 fesetenv SSE exception clearing (bug 19181).
The i386 and x86_64 versions of fesetenv, when called with FE_DFL_ENV
or FE_NOMASK_ENV as argument, do not clear SSE exceptions raised in
MXCSR.  These arguments should, like other fenv_t values, represent
the whole of the floating-point state, so such exceptions should be
cleared; this patch adds the required clearing.  (Discovered while
working on bug 16068.)

Tested for x86_64 and x86.

	[BZ #19181]
	* sysdeps/i386/fpu/fesetenv.c (__fesetenv): Clear already-raised
	SSE exceptions when argument is FE_DFL_ENV or FE_NOMASK_ENV.
	* sysdeps/x86_64/fpu/fesetenv.c (__fesetenv): Likewise.
	* math/test-fenv-clear-main.c: New file.
	* math/test-fenv-clear.c: Likewise.
	* math/Makefile (tests): Add test-fenv-clear.
	* sysdeps/x86/fpu/test-fenv-clear-sse.c: New file.
	* sysdeps/x86/fpu/Makefile [$(subdir) = math] (tests): Add
	test-fenv-clear-sse.
	[$(subdir) = math] (CFLAGS-test-fenv-clear-sse.c): New variable.
2015-10-28 18:50:20 +00:00
Joseph Myers
c871b9b096 Remove -mavx2 configure tests.
There are configure tests for the -mavx2 compiler option.  AVX2
support was added in GCC 4.7, so these tests are now obsolete; this
patch removes them.

Tested for x86_64 and x86 (testsuite, and that installed stripped
shared libraries are unchanged by the patch).

	* sysdeps/i386/configure.ac (libc_cv_cc_avx2): Remove configure
	test.
	* sysdeps/i386/configure: Regenerated.
	* sysdeps/x86_64/configure.ac (libc_cv_cc_avx2): Remove configure
	test.
	* sysdeps/x86_64/configure: Regenerated.
	* config.h.in (HAVE_AVX2_SUPPORT): Remove #undef.
	* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
	memset-avx2 unconditionally instead of conditionally on
	[$(config-cflags-avx2) = yes].
	* sysdeps/x86_64/multiarch/ifunc-impl-list.c
	(__libc_ifunc_impl_list) [HAVE_AVX2_SUPPORT]: Make code
	unconditional.
	* sysdeps/x86_64/multiarch/memset.S [HAVE_AVX2_SUPPORT]: Likewise.
	* sysdeps/x86_64/multiarch/memset_chk.S
	[IS_IN (libc) && SHARED && HAVE_AVX2_SUPPORT]: Change conditional
	to [IS_IN (libc) && SHARED].
2015-10-28 13:29:03 +00:00
Florian Weimer
e39edb0552 x86_64: Regenerate ulps [BZ #19168]
This comes from running “make regen-ulps” on AMD Opteron 6272 CPUs.
2015-10-26 13:15:48 +01:00