glibc

mirror of https://sourceware.org/git/glibc.git synced 2024-11-23 13:30:06 +00:00

Author	SHA1	Message	Date
Ling Ma	05f3633da4	Improve 64bit memcpy performance for Haswell CPU with AVX instruction In this patch we take advantage of HSW memory bandwidth, manage to reduce miss branch prediction by avoiding using branch instructions and force destination to be aligned with avx instruction. The CPU2006 403.gcc benchmark indicates this patch improves performance from 2% to 10%.	2014-07-30 08:02:35 -07:00
H.J. Lu	f2fef657d8	Enable AVX2 optimized memset only if -mavx2 works * config.h.in (HAVE_AVX2_SUPPORT): New #undef. * sysdeps/i386/configure.ac: Set HAVE_AVX2_SUPPORT and config-cflags-avx2. * sysdeps/x86_64/configure.ac: Likewise. * sysdeps/i386/configure: Regenerated. * sysdeps/x86_64/configure: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-avx2 only if config-cflags-avx2 is yes. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Tests for memset_chk and memset only if HAVE_AVX2_SUPPORT is defined. * sysdeps/x86_64/multiarch/memset.S: Define multiple versions only if HAVE_AVX2_SUPPORT is defined. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise.	2014-07-14 07:58:27 -07:00
H.J. Lu	d92d8f8a42	Add ifunc tests for x86_64 memset_chk and memset This patch adds ifunc tests for x86_64 memset_chk and memset. It also defines HAS_AVX2 with AVX2_Usable since AVX2 may not be usable even if processor has AVX2. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add tests for memset_chk and memset. * sysdeps/x86_64/multiarch/init-arch.h (HAS_AVX2): Defined with AVX2_Usable.	2014-06-20 14:52:29 -07:00
H.J. Lu	224c6c51c5	Remove sysdeps/x86_64/multiarch/rtld-strlen.S Since there is no sysdeps/x86_64/multiarch/strlen.S, sysdeps/x86_64/rtld-strlen.S will be used. * sysdeps/x86_64/multiarch/rtld-strlen.S: Removed.	2014-06-20 08:10:07 -07:00
Ling Ma	5c74e47cd6	Add x86_64 memset optimized for AVX2 In this patch we take advantage of HSW memory bandwidth, manage to reduce miss branch prediction by avoiding using branch instructions and force destination to be aligned with avx & avx2 instruction. The CPU2006 403.gcc benchmark indicates this patch improves performance from 26% to 59%. * sysdeps/x86_64/multiarch/Makefile: Add memset-avx2. * sysdeps/x86_64/multiarch/memset-avx2.S: New file. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.	2014-06-19 15:14:08 -07:00
Carlos O'Donell	8f1df5cf9d	Fix -Wundef warning for FEATURE_INDEX_1. Define FEATURE_INDEX_1 and FEATURE_INDEX_MAX as macros for use by both assembly and C code. This fixes the -Wundef error for cases where FEATURE_INDEX_1 was not defined but used the correct value of 0 for an undefined macro.	2014-05-03 00:25:21 -04:00
Sihai Yao	f9281df995	Detect if AVX2 is usable This patch checks and sets bit_AVX2_Usable in __cpu_features.feature. * sysdeps/x86_64/multiarch/ifunc-defines.sym (COMMON_CPUID_INDEX_7): New. * sysdeps/x86_64/multiarch/init-arch.c (__init_cpu_features): Check and set bit_AVX2_Usable. * sysdeps/x86_64/multiarch/init-arch.h (bit_AVX2_Usable): New macro. (bit_AVX2): Likewise. (index_AVX2_Usable): Likewise. (CPUID_AVX2): Likewise. (HAS_AVX2): Likewise.	2014-04-17 08:00:21 -07:00
Allan McRae	d4697bc93d	Update copyright notices with scripts/update-copyrights	2014-01-01 22:00:23 +10:00
Allan McRae	6f8e37ebf8	Update file name in x86_64 ifunc list File name update missed in commit `584b18eb`.	2013-12-16 13:00:39 +10:00
Ondřej Bílka	584b18eb4d	Add strstr with unaligned loads. Fixes bug 12100. A sse42 version of strstr used pcmpistr instruction which is quite ineffective. A faster way is look for pairs of characters which is uses sse2, is faster than pcmpistr and for real strings a pairs we look for are relatively rare. For linear time complexity we use buy or rent technique which switches to two-way algorithm when superlinear behaviour is detected.	2013-12-14 20:08:13 +01:00
Ondřej Bílka	e7044ea76b	Use p2align instead ALIGN	2013-10-08 15:46:48 +02:00
Ondřej Bílka	dc1a95c730	Faster strrchr.	2013-09-26 19:23:01 +02:00
Ondřej Bílka	5905e7b3e2	Faster strchr implementation.	2013-09-11 17:07:38 +02:00
Ondřej Bílka	8f02859f17	Add unaligned strcmp.	2013-09-03 16:27:10 +02:00
Ondřej Bílka	382466e04e	Fix typos.	2013-08-30 18:08:59 +02:00
Ondřej Bílka	0186c6e97e	Fix rawmemchr regression on bulldozer.	2013-08-30 10:14:37 +02:00
Ondřej Bílka	c0c3f78afb	Fix typos.	2013-08-21 19:48:48 +02:00
Liubov Dmitrieva	6308fd9a46	Skip SSE4.2 versions on Intel Silvermont SSE2/SSSE3 versions are faster than SSE4.2 versions on Intel Silvermont.	2013-06-28 15:31:40 -07:00
Liubov Dmitrieva	11b8a0e1d7	Fix buffers overrun in x86_64 memcmp-ssse3.S	2013-06-26 12:31:51 -07:00
Liubov Dmitrieva	d086fc7ba0	Set fast unaligned load flag for new Intel microarchitecture I have small patch for new Intel Silvermont machines. http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture I checked this on my machine and see that strcpy, ... unaligned versions are faster than ssse3 versions.	2013-06-14 20:46:15 +02:00
Ondrej Bilka	2d48b41c8f	Faster memcpy on x64. We add new memcpy version that uses unaligned loads which are fast on modern processors. This allows second improvement which is avoiding computed jump which is relatively expensive operation. Tests available here: http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2	2013-05-20 08:24:41 +02:00
Ondrej Bilka	37bb363f03	Faster strlen on x64.	2013-03-18 07:39:12 +01:00
Ondrej Bilka	80f844c9d8	Remove Prefer_SSE_for_memop on x64	2013-03-11 15:39:08 +01:00
Ondrej Bilka	87bd9bc4bd	Revert " * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation" This reverts commit `b79188d717`.	2013-03-06 22:27:18 +01:00
Ondrej Bilka	b79188d717	* sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation which is faster on all x86_64 architectures. Tested on AMD, Intel Nehalem, SNB, IVB.	2013-03-06 21:54:01 +01:00
Roland McGrath	f1d70dad53	Remove lots of inline keywords.	2013-02-07 14:44:18 -08:00
H.J. Lu	afec409af9	Change __x86_64 prefix in cache size to __x86	2013-01-05 16:00:38 -08:00
H.J. Lu	5d7dd1ca84	Add HAS_RTM	2013-01-03 09:38:20 -08:00
Joseph Myers	568035b787	Update copyright notices with scripts/update-copyrights.	2013-01-02 19:05:09 +00:00
Pino Toscano	94558d30b1	test-multiarch: terminate printf output with newline	2012-11-22 11:34:03 +01:00
H.J. Lu	f62c8abcfb	Compile x86 rtld with -mno-sse -mno-mmx	2012-11-02 18:43:27 -07:00
H.J. Lu	ac49ecaf9d	Add x86-64 __libc_ifunc_impl_list	2012-10-11 16:41:12 -07:00
H.J. Lu	9a387d1f78	Use IFUNC memmove/memset in x86-64 bcopy/bzero Also add separate tests for bcopy and bzero.	2012-10-11 13:58:16 -07:00
H.J. Lu	0569936773	Define HAS_FMA with bit_FMA_Usable	2012-10-02 05:05:17 -07:00
H.J. Lu	31ed415328	Don't define x86-64 __strncmp_ssse3 in libc.a	2012-09-27 07:43:03 -07:00
Roland McGrath	7312ca90dc	Clean up x86_64/multiarch/strstr-c.c include order.	2012-08-15 11:38:57 -07:00
Roland McGrath	9a0a54864b	Clean up x86_64/multiarch/memmove.c include order.	2012-08-15 11:26:02 -07:00
H.J. Lu	f85fa27058	Avoid DWARF definition DIE on ifunc symbols	2012-08-09 16:04:37 -07:00
Carlos O'Donell	1a0994f535	BZ#14059: Fix AVX and FMA4 detection. Fix AVX and FMA4 detection by following the guidelines set out by Intel and AMD for detecting these features.	2012-05-17 06:59:28 -07:00
H.J. Lu	70bc83b910	Load pointers into RAX_LP in strcmp-sse42.S	2012-05-15 09:59:31 -07:00
H.J. Lu	9bc0b730a6	Load cache sizes into R*_LP in memcpy-ssse3.S	2012-05-15 09:58:28 -07:00
H.J. Lu	6d2850e7f5	Load cache sizes into R*_LP in memcpy-ssse3-back.S	2012-05-15 09:56:17 -07:00
H.J. Lu	8a17f34979	Load cache size into R8_LP	2012-05-15 09:35:43 -07:00
Paul Eggert	59ba27a63a	Replace FSF snail mail address with URLs.	2012-02-09 23:18:22 +00:00
Ulrich Drepper	08cf777f9e	Really fix AVX tests There is no problem with strcmp, it doesn't use the YMM registers. The math routines might since gcc perhaps generates such code. Introduce bit_YMM_USBALE and use it in the math routines.	2012-01-26 09:45:54 -05:00
Ulrich Drepper	afc5ed09cb	Reset bit_AVX in __cpu_features is OS support is missing	2012-01-26 07:45:14 -05:00
Liubov Dmitrieva	15db4de19d	Fix overrun in destination buffer	2011-12-23 12:02:15 -05:00
Ulrich Drepper	370a7d88f7	WP fixes	2011-12-17 14:41:05 -05:00
Ulrich Drepper	1d3e4b618a	Optimized wcschr and wcscpy for x86-64 and x86-32	2011-12-17 14:39:23 -05:00
Ulrich Drepper	aff2453df7	Fix more warnings	2011-12-03 21:49:35 -05:00

1 2 3 4

157 Commits