Allan McRae
d4697bc93d
Update copyright notices with scripts/update-copyrights
2014-01-01 22:00:23 +10:00
Allan McRae
6f8e37ebf8
Update file name in x86_64 ifunc list
...
File name update missed in commit 584b18eb
.
2013-12-16 13:00:39 +10:00
Ondřej Bílka
584b18eb4d
Add strstr with unaligned loads. Fixes bug 12100.
...
A sse42 version of strstr used pcmpistr instruction which is quite
ineffective. A faster way is look for pairs of characters which is uses
sse2, is faster than pcmpistr and for real strings a pairs we look for
are relatively rare.
For linear time complexity we use buy or rent technique which switches
to two-way algorithm when superlinear behaviour is detected.
2013-12-14 20:08:13 +01:00
Ondřej Bílka
e7044ea76b
Use p2align instead ALIGN
2013-10-08 15:46:48 +02:00
Ondřej Bílka
dc1a95c730
Faster strrchr.
2013-09-26 19:23:01 +02:00
Ondřej Bílka
5905e7b3e2
Faster strchr implementation.
2013-09-11 17:07:38 +02:00
Ondřej Bílka
8f02859f17
Add unaligned strcmp.
2013-09-03 16:27:10 +02:00
Ondřej Bílka
382466e04e
Fix typos.
2013-08-30 18:08:59 +02:00
Ondřej Bílka
0186c6e97e
Fix rawmemchr regression on bulldozer.
2013-08-30 10:14:37 +02:00
Ondřej Bílka
c0c3f78afb
Fix typos.
2013-08-21 19:48:48 +02:00
Liubov Dmitrieva
6308fd9a46
Skip SSE4.2 versions on Intel Silvermont
...
SSE2/SSSE3 versions are faster than SSE4.2 versions on Intel Silvermont.
2013-06-28 15:31:40 -07:00
Liubov Dmitrieva
11b8a0e1d7
Fix buffers overrun in x86_64 memcmp-ssse3.S
2013-06-26 12:31:51 -07:00
Liubov Dmitrieva
d086fc7ba0
Set fast unaligned load flag for new Intel microarchitecture
...
I have small patch for new Intel Silvermont machines.
http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture
I checked this on my machine and see that strcpy, ... unaligned
versions are faster than ssse3 versions.
2013-06-14 20:46:15 +02:00
Ondrej Bilka
2d48b41c8f
Faster memcpy on x64.
...
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.
Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
2013-05-20 08:24:41 +02:00
Ondrej Bilka
37bb363f03
Faster strlen on x64.
2013-03-18 07:39:12 +01:00
Ondrej Bilka
80f844c9d8
Remove Prefer_SSE_for_memop on x64
2013-03-11 15:39:08 +01:00
Ondrej Bilka
87bd9bc4bd
Revert " * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation"
...
This reverts commit b79188d717
.
2013-03-06 22:27:18 +01:00
Ondrej Bilka
b79188d717
* sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation
...
which is faster on all x86_64 architectures.
Tested on AMD, Intel Nehalem, SNB, IVB.
2013-03-06 21:54:01 +01:00
Roland McGrath
f1d70dad53
Remove lots of inline keywords.
2013-02-07 14:44:18 -08:00
H.J. Lu
afec409af9
Change __x86_64 prefix in cache size to __x86
2013-01-05 16:00:38 -08:00
H.J. Lu
5d7dd1ca84
Add HAS_RTM
2013-01-03 09:38:20 -08:00
Joseph Myers
568035b787
Update copyright notices with scripts/update-copyrights.
2013-01-02 19:05:09 +00:00
Pino Toscano
94558d30b1
test-multiarch: terminate printf output with newline
2012-11-22 11:34:03 +01:00
H.J. Lu
f62c8abcfb
Compile x86 rtld with -mno-sse -mno-mmx
2012-11-02 18:43:27 -07:00
H.J. Lu
ac49ecaf9d
Add x86-64 __libc_ifunc_impl_list
2012-10-11 16:41:12 -07:00
H.J. Lu
9a387d1f78
Use IFUNC memmove/memset in x86-64 bcopy/bzero
...
Also add separate tests for bcopy and bzero.
2012-10-11 13:58:16 -07:00
H.J. Lu
0569936773
Define HAS_FMA with bit_FMA_Usable
2012-10-02 05:05:17 -07:00
H.J. Lu
31ed415328
Don't define x86-64 __strncmp_ssse3 in libc.a
2012-09-27 07:43:03 -07:00
Roland McGrath
7312ca90dc
Clean up x86_64/multiarch/strstr-c.c include order.
2012-08-15 11:38:57 -07:00
Roland McGrath
9a0a54864b
Clean up x86_64/multiarch/memmove.c include order.
2012-08-15 11:26:02 -07:00
H.J. Lu
f85fa27058
Avoid DWARF definition DIE on ifunc symbols
2012-08-09 16:04:37 -07:00
Carlos O'Donell
1a0994f535
BZ#14059: Fix AVX and FMA4 detection.
...
Fix AVX and FMA4 detection by following the guidelines
set out by Intel and AMD for detecting these features.
2012-05-17 06:59:28 -07:00
H.J. Lu
70bc83b910
Load pointers into RAX_LP in strcmp-sse42.S
2012-05-15 09:59:31 -07:00
H.J. Lu
9bc0b730a6
Load cache sizes into R*_LP in memcpy-ssse3.S
2012-05-15 09:58:28 -07:00
H.J. Lu
6d2850e7f5
Load cache sizes into R*_LP in memcpy-ssse3-back.S
2012-05-15 09:56:17 -07:00
H.J. Lu
8a17f34979
Load cache size into R8_LP
2012-05-15 09:35:43 -07:00
Paul Eggert
59ba27a63a
Replace FSF snail mail address with URLs.
2012-02-09 23:18:22 +00:00
Ulrich Drepper
08cf777f9e
Really fix AVX tests
...
There is no problem with strcmp, it doesn't use the YMM registers.
The math routines might since gcc perhaps generates such code.
Introduce bit_YMM_USBALE and use it in the math routines.
2012-01-26 09:45:54 -05:00
Ulrich Drepper
afc5ed09cb
Reset bit_AVX in __cpu_features is OS support is missing
2012-01-26 07:45:14 -05:00
Liubov Dmitrieva
15db4de19d
Fix overrun in destination buffer
2011-12-23 12:02:15 -05:00
Ulrich Drepper
370a7d88f7
WP fixes
2011-12-17 14:41:05 -05:00
Ulrich Drepper
1d3e4b618a
Optimized wcschr and wcscpy for x86-64 and x86-32
2011-12-17 14:39:23 -05:00
Ulrich Drepper
aff2453df7
Fix more warnings
2011-12-03 21:49:35 -05:00
Ulrich Drepper
34372fc6d3
Fix test of non-ASCII locales in x86-64 strcasecmp et.al.
2011-11-01 16:46:23 -04:00
Ulrich Drepper
52e4b9eb62
More cleanups of x86-64 strstr
2011-10-28 19:01:48 -04:00
Ulrich Drepper
fd52bc6dc4
Clean up x86-64 strcasestr
...
Actually describe in the C code what is going on.
2011-10-28 18:18:04 -04:00
Ulrich Drepper
e0016b11d6
Add AVX optimized versions for some x86-64 math functions
2011-10-25 21:34:55 -04:00
Ulrich Drepper
618280a192
Optimize x86-64 SSE4.2+ strcmp a bit more
2011-10-25 14:50:31 -04:00
Ulrich Drepper
09229f3e1b
Fix WS
2011-10-23 14:57:28 -04:00
Liubov Dmitrieva
ce7dd29f28
Optimized strnlen and wcscmp for x86-64
2011-10-23 14:56:04 -04:00