glibc

mirror of https://sourceware.org/git/glibc.git synced 2025-01-16 05:40:08 +00:00

Author	SHA1	Message	Date
Ling Ma	05f3633da4	Improve 64bit memcpy performance for Haswell CPU with AVX instruction In this patch we take advantage of HSW memory bandwidth, manage to reduce miss branch prediction by avoiding using branch instructions and force destination to be aligned with avx instruction. The CPU2006 403.gcc benchmark indicates this patch improves performance from 2% to 10%.	2014-07-30 08:02:35 -07:00
H.J. Lu	f2fef657d8	Enable AVX2 optimized memset only if -mavx2 works * config.h.in (HAVE_AVX2_SUPPORT): New #undef. * sysdeps/i386/configure.ac: Set HAVE_AVX2_SUPPORT and config-cflags-avx2. * sysdeps/x86_64/configure.ac: Likewise. * sysdeps/i386/configure: Regenerated. * sysdeps/x86_64/configure: Likewise. * sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add memset-avx2 only if config-cflags-avx2 is yes. * sysdeps/x86_64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Tests for memset_chk and memset only if HAVE_AVX2_SUPPORT is defined. * sysdeps/x86_64/multiarch/memset.S: Define multiple versions only if HAVE_AVX2_SUPPORT is defined. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise.	2014-07-14 07:58:27 -07:00
Ling Ma	5c74e47cd6	Add x86_64 memset optimized for AVX2 In this patch we take advantage of HSW memory bandwidth, manage to reduce miss branch prediction by avoiding using branch instructions and force destination to be aligned with avx & avx2 instruction. The CPU2006 403.gcc benchmark indicates this patch improves performance from 26% to 59%. * sysdeps/x86_64/multiarch/Makefile: Add memset-avx2. * sysdeps/x86_64/multiarch/memset-avx2.S: New file. * sysdeps/x86_64/multiarch/memset.S: Likewise. * sysdeps/x86_64/multiarch/memset_chk.S: Likewise. * sysdeps/x86_64/multiarch/rtld-memset.S: Likewise.	2014-06-19 15:14:08 -07:00
Ondřej Bílka	584b18eb4d	Add strstr with unaligned loads. Fixes bug 12100. A sse42 version of strstr used pcmpistr instruction which is quite ineffective. A faster way is look for pairs of characters which is uses sse2, is faster than pcmpistr and for real strings a pairs we look for are relatively rare. For linear time complexity we use buy or rent technique which switches to two-way algorithm when superlinear behaviour is detected.	2013-12-14 20:08:13 +01:00
Ondřej Bílka	dc1a95c730	Faster strrchr.	2013-09-26 19:23:01 +02:00
Ondřej Bílka	8f02859f17	Add unaligned strcmp.	2013-09-03 16:27:10 +02:00
Ondrej Bilka	2d48b41c8f	Faster memcpy on x64. We add new memcpy version that uses unaligned loads which are fast on modern processors. This allows second improvement which is avoiding computed jump which is relatively expensive operation. Tests available here: http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2	2013-05-20 08:24:41 +02:00
Ondrej Bilka	37bb363f03	Faster strlen on x64.	2013-03-18 07:39:12 +01:00
Ondrej Bilka	80f844c9d8	Remove Prefer_SSE_for_memop on x64	2013-03-11 15:39:08 +01:00
Ondrej Bilka	87bd9bc4bd	Revert " * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation" This reverts commit `b79188d717`.	2013-03-06 22:27:18 +01:00
Ondrej Bilka	b79188d717	* sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation which is faster on all x86_64 architectures. Tested on AMD, Intel Nehalem, SNB, IVB.	2013-03-06 21:54:01 +01:00
Carlos O'Donell	1a0994f535	BZ#14059: Fix AVX and FMA4 detection. Fix AVX and FMA4 detection by following the guidelines set out by Intel and AMD for detecting these features.	2012-05-17 06:59:28 -07:00
Ulrich Drepper	1d3e4b618a	Optimized wcschr and wcscpy for x86-64 and x86-32	2011-12-17 14:39:23 -05:00
Liubov Dmitrieva	ce7dd29f28	Optimized strnlen and wcscmp for x86-64	2011-10-23 14:56:04 -04:00
Liubov Dmitrieva	be13f7bff6	Optimized memcmp and wmemcmp for x86-64 and x86-32	2011-10-15 11:10:08 -04:00
Liubov Dmitrieva	a5f524e479	Add Atom-optimized strchr and strrchr for x86-64	2011-09-05 21:34:03 -04:00
Liubov Dmitrieva	99710781cc	Improve 64 bit strcat functions with SSE2/SSSE3	2011-07-19 17:11:54 -04:00
H.J. Lu	8912479f9e	Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64	2011-06-24 15:14:22 -04:00
H.J. Lu	ff02d5280b	Use IFUNC on x86-64 memset	2010-11-08 03:41:34 -05:00
H.J. Lu	623aac7f84	Unroll x86-64 strlen	2010-08-26 22:09:34 -07:00
Roland McGrath	8b2b771538	Clean up warnings in new x86_64/multiarch code.	2010-08-25 12:13:08 -07:00
Richard Henderson	73f27d5e72	Clean up SSE variable shifts	2010-08-24 11:35:01 -07:00
Ulrich Drepper	e9f82e0d1d	Add optimized strncasecmp versions for x86-64.	2010-08-14 22:04:01 -07:00
Ulrich Drepper	73507d3ae0	Add support for SSSE3 and SSE4.2 versions of strcasecmp on x86-64.	2010-07-31 21:41:09 -07:00
Ulrich Drepper	cc9f2e47a0	Speed up SSE4.2 strcasestr by avoiding indirect function call.	2010-07-16 15:37:38 -07:00
H.J. Lu	6fb8cbcb58	Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7 This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to 4X on Core 2 and up to 2X on Core i7.	2010-06-30 08:26:11 -07:00
H.J. Lu	404a6e3201	x86-64 SSE4 optimized memcmp This is 64bit SSE4 optimized memcmp. It improves memcmp by upto 3X on Intel Core i7.	2010-04-14 00:12:53 -07:00
H.J. Lu	001659f4d5	Implement SSE4.2 optimized strchr and strrchr.	2009-10-22 22:47:12 -07:00
Ulrich Drepper	0fda545d5f	Add SSSE3-optimized implementation of str{,n}cmp for x86-64.	2009-08-07 22:51:02 -07:00
H.J. Lu	7956a3d27c	Add SSE2 support to str{,n}cmp for x86-64.	2009-07-26 13:32:28 -07:00
H.J. Lu	2b7a8664fa	SSE4.2 strstr/strcasestr for x86-64. This patch implements SSE4.2 strstr/strcasestr, using Knuth-Morris-Pratt string searching algorithm.	2009-07-20 21:06:50 -07:00
H.J. Lu	06e51c8f3d	Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.	2009-07-03 02:48:56 -07:00
H.J. Lu	ab6a873fe0	SSSE3 strcpy/stpcpy for x86-64 This patch adds SSSE3 strcpy/stpcpy. I got up to 4X speed up on Core 2 and Core i7. I disabled it on Atom since SSSE3 version is slower for shorter (<64byte) data.	2009-07-02 03:39:03 -07:00
H.J. Lu	772f4e6a1b	Add SSE4.2 support for strcmp and strncmp on x86-64.	2009-06-22 20:38:41 -07:00
Ulrich Drepper	3ab2d57a4d	Optimize x86-64 strlen for SSE4.2. The SSE4.2 implementation is used in the DSO only. The patch also adds some infrastructure to be used in similar code later one.	2009-06-05 11:32:00 -07:00
Ulrich Drepper	425ce2edb9	* config.h.in (USE_MULTIARCH): Define. * configure.in: Handle --enable-multi-arch. * elf/dl-runtime.c (_dl_fixup): Handle STT_GNU_IFUNC. (_dl_fixup_profile): Likewise. * elf/do-lookup.c (dl_lookup_x): Likewise. * sysdeps/x86_64/dl-machine.h: Handle STT_GNU_IFUNC. * elf/elf.h (STT_GNU_IFUNC): Define. * include/libc-symbols.h (libc_ifunc): Define. * sysdeps/x86_64/cacheinfo.c: If USE_MULTIARCH is defined, use the framework in init-arch.h to get CPUID values. * sysdeps/x86_64/multiarch/Makefile: New file. * sysdeps/x86_64/multiarch/init-arch.c: New file. * sysdeps/x86_64/multiarch/init-arch.h: New file. * sysdeps/x86_64/multiarch/sched_cpucount.c: New file. * config.make.in (experimental-malloc): Define. * configure.in: Handle --enable-experimental-malloc. * malloc/Makefile: Handle experimental-malloc flag. * malloc/malloc.c: Implement PER_THREAD and ATOMIC_FASTBINS features. * malloc/arena.c: Likewise. * malloc/hooks.c: Likewise. * malloc/malloc.h: Define M_ARENA_TEST and M_ARENA_MAX.	2009-03-13 23:53:18 +00:00

36 Commits