Ondřej Bílka
584b18eb4d
Add strstr with unaligned loads. Fixes bug 12100.
...
A sse42 version of strstr used pcmpistr instruction which is quite
ineffective. A faster way is look for pairs of characters which is uses
sse2, is faster than pcmpistr and for real strings a pairs we look for
are relatively rare.
For linear time complexity we use buy or rent technique which switches
to two-way algorithm when superlinear behaviour is detected.
2013-12-14 20:08:13 +01:00
Ondřej Bílka
dc1a95c730
Faster strrchr.
2013-09-26 19:23:01 +02:00
Ondřej Bílka
8f02859f17
Add unaligned strcmp.
2013-09-03 16:27:10 +02:00
Ondrej Bilka
2d48b41c8f
Faster memcpy on x64.
...
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.
Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
2013-05-20 08:24:41 +02:00
Ondrej Bilka
37bb363f03
Faster strlen on x64.
2013-03-18 07:39:12 +01:00
Ondrej Bilka
80f844c9d8
Remove Prefer_SSE_for_memop on x64
2013-03-11 15:39:08 +01:00
Ondrej Bilka
87bd9bc4bd
Revert " * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation"
...
This reverts commit b79188d717
.
2013-03-06 22:27:18 +01:00
Ondrej Bilka
b79188d717
* sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation
...
which is faster on all x86_64 architectures.
Tested on AMD, Intel Nehalem, SNB, IVB.
2013-03-06 21:54:01 +01:00
Carlos O'Donell
1a0994f535
BZ#14059: Fix AVX and FMA4 detection.
...
Fix AVX and FMA4 detection by following the guidelines
set out by Intel and AMD for detecting these features.
2012-05-17 06:59:28 -07:00
Ulrich Drepper
1d3e4b618a
Optimized wcschr and wcscpy for x86-64 and x86-32
2011-12-17 14:39:23 -05:00
Liubov Dmitrieva
ce7dd29f28
Optimized strnlen and wcscmp for x86-64
2011-10-23 14:56:04 -04:00
Liubov Dmitrieva
be13f7bff6
Optimized memcmp and wmemcmp for x86-64 and x86-32
2011-10-15 11:10:08 -04:00
Liubov Dmitrieva
a5f524e479
Add Atom-optimized strchr and strrchr for x86-64
2011-09-05 21:34:03 -04:00
Liubov Dmitrieva
99710781cc
Improve 64 bit strcat functions with SSE2/SSSE3
2011-07-19 17:11:54 -04:00
H.J. Lu
8912479f9e
Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64
2011-06-24 15:14:22 -04:00
H.J. Lu
ff02d5280b
Use IFUNC on x86-64 memset
2010-11-08 03:41:34 -05:00
H.J. Lu
623aac7f84
Unroll x86-64 strlen
2010-08-26 22:09:34 -07:00
Roland McGrath
8b2b771538
Clean up warnings in new x86_64/multiarch code.
2010-08-25 12:13:08 -07:00
Richard Henderson
73f27d5e72
Clean up SSE variable shifts
2010-08-24 11:35:01 -07:00
Ulrich Drepper
e9f82e0d1d
Add optimized strncasecmp versions for x86-64.
2010-08-14 22:04:01 -07:00
Ulrich Drepper
73507d3ae0
Add support for SSSE3 and SSE4.2 versions of strcasecmp on x86-64.
2010-07-31 21:41:09 -07:00
Ulrich Drepper
cc9f2e47a0
Speed up SSE4.2 strcasestr by avoiding indirect function call.
2010-07-16 15:37:38 -07:00
H.J. Lu
6fb8cbcb58
Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7
...
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and
Core i7. It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and
up to 1X on Core i7. It also improves memmove by up to 3X on Atom, up to
4X on Core 2 and up to 2X on Core i7.
2010-06-30 08:26:11 -07:00
H.J. Lu
404a6e3201
x86-64 SSE4 optimized memcmp
...
This is 64bit SSE4 optimized memcmp. It improves memcmp by upto 3X
on Intel Core i7.
2010-04-14 00:12:53 -07:00
H.J. Lu
001659f4d5
Implement SSE4.2 optimized strchr and strrchr.
2009-10-22 22:47:12 -07:00
Ulrich Drepper
0fda545d5f
Add SSSE3-optimized implementation of str{,n}cmp for x86-64.
2009-08-07 22:51:02 -07:00
H.J. Lu
7956a3d27c
Add SSE2 support to str{,n}cmp for x86-64.
2009-07-26 13:32:28 -07:00
H.J. Lu
2b7a8664fa
SSE4.2 strstr/strcasestr for x86-64.
...
This patch implements SSE4.2 strstr/strcasestr, using Knuth-Morris-Pratt
string searching algorithm.
2009-07-20 21:06:50 -07:00
H.J. Lu
06e51c8f3d
Add SSE4.2 support for strcspn, strpbrk, and strspn on x86-64.
2009-07-03 02:48:56 -07:00
H.J. Lu
ab6a873fe0
SSSE3 strcpy/stpcpy for x86-64
...
This patch adds SSSE3 strcpy/stpcpy. I got up to 4X speed up on Core 2
and Core i7. I disabled it on Atom since SSSE3 version is slower for
shorter (<64byte) data.
2009-07-02 03:39:03 -07:00
H.J. Lu
772f4e6a1b
Add SSE4.2 support for strcmp and strncmp on x86-64.
2009-06-22 20:38:41 -07:00
Ulrich Drepper
3ab2d57a4d
Optimize x86-64 strlen for SSE4.2.
...
The SSE4.2 implementation is used in the DSO only. The patch also adds
some infrastructure to be used in similar code later one.
2009-06-05 11:32:00 -07:00
Ulrich Drepper
425ce2edb9
* config.h.in (USE_MULTIARCH): Define.
...
* configure.in: Handle --enable-multi-arch.
* elf/dl-runtime.c (_dl_fixup): Handle STT_GNU_IFUNC.
(_dl_fixup_profile): Likewise.
* elf/do-lookup.c (dl_lookup_x): Likewise.
* sysdeps/x86_64/dl-machine.h: Handle STT_GNU_IFUNC.
* elf/elf.h (STT_GNU_IFUNC): Define.
* include/libc-symbols.h (libc_ifunc): Define.
* sysdeps/x86_64/cacheinfo.c: If USE_MULTIARCH is defined, use the
framework in init-arch.h to get CPUID values.
* sysdeps/x86_64/multiarch/Makefile: New file.
* sysdeps/x86_64/multiarch/init-arch.c: New file.
* sysdeps/x86_64/multiarch/init-arch.h: New file.
* sysdeps/x86_64/multiarch/sched_cpucount.c: New file.
* config.make.in (experimental-malloc): Define.
* configure.in: Handle --enable-experimental-malloc.
* malloc/Makefile: Handle experimental-malloc flag.
* malloc/malloc.c: Implement PER_THREAD and ATOMIC_FASTBINS features.
* malloc/arena.c: Likewise.
* malloc/hooks.c: Likewise.
* malloc/malloc.h: Define M_ARENA_TEST and M_ARENA_MAX.
2009-03-13 23:53:18 +00:00