Commit Graph

196 Commits

Author SHA1 Message Date
Ondřej Bílka
dc1a95c730 Faster strrchr. 2013-09-26 19:23:01 +02:00
Ondřej Bílka
5905e7b3e2 Faster strchr implementation. 2013-09-11 17:07:38 +02:00
Ondřej Bílka
8f02859f17 Add unaligned strcmp. 2013-09-03 16:27:10 +02:00
Ondřej Bílka
382466e04e Fix typos. 2013-08-30 18:08:59 +02:00
Ondřej Bílka
0186c6e97e Fix rawmemchr regression on bulldozer. 2013-08-30 10:14:37 +02:00
Ondřej Bílka
c0c3f78afb Fix typos. 2013-08-21 19:48:48 +02:00
Liubov Dmitrieva
6308fd9a46 Skip SSE4.2 versions on Intel Silvermont
SSE2/SSSE3 versions are faster than SSE4.2 versions on Intel Silvermont.
2013-06-28 15:31:40 -07:00
Liubov Dmitrieva
11b8a0e1d7 Fix buffers overrun in x86_64 memcmp-ssse3.S 2013-06-26 12:31:51 -07:00
Liubov Dmitrieva
d086fc7ba0 Set fast unaligned load flag for new Intel microarchitecture
I have small patch for new Intel Silvermont machines.

http://newsroom.intel.com/community/intel_newsroom/blog/2013/05/06/intel-launches-low-power-high-performance-silvermont-microarchitecture

I checked this on my machine and see that strcpy, ... unaligned
versions are faster than ssse3 versions.
2013-06-14 20:46:15 +02:00
Ondrej Bilka
2d48b41c8f Faster memcpy on x64.
We add new memcpy version that uses unaligned loads which are fast
on modern processors. This allows second improvement which is avoiding
computed jump which is relatively expensive operation.

Tests available here:
http://kam.mff.cuni.cz/~ondra/memcpy_profile_result27_04_13.tar.bz2
2013-05-20 08:24:41 +02:00
Ondrej Bilka
37bb363f03 Faster strlen on x64. 2013-03-18 07:39:12 +01:00
Ondrej Bilka
80f844c9d8 Remove Prefer_SSE_for_memop on x64 2013-03-11 15:39:08 +01:00
Ondrej Bilka
87bd9bc4bd Revert " * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation"
This reverts commit b79188d717.
2013-03-06 22:27:18 +01:00
Ondrej Bilka
b79188d717 * sysdeps/x86_64/strlen.S: Replace with new SSE2 based implementation
which is faster on all x86_64 architectures.
	Tested on AMD, Intel Nehalem, SNB, IVB.
2013-03-06 21:54:01 +01:00
Roland McGrath
f1d70dad53 Remove lots of inline keywords. 2013-02-07 14:44:18 -08:00
H.J. Lu
afec409af9 Change __x86_64 prefix in cache size to __x86 2013-01-05 16:00:38 -08:00
H.J. Lu
5d7dd1ca84 Add HAS_RTM 2013-01-03 09:38:20 -08:00
Joseph Myers
568035b787 Update copyright notices with scripts/update-copyrights. 2013-01-02 19:05:09 +00:00
Pino Toscano
94558d30b1 test-multiarch: terminate printf output with newline 2012-11-22 11:34:03 +01:00
H.J. Lu
f62c8abcfb Compile x86 rtld with -mno-sse -mno-mmx 2012-11-02 18:43:27 -07:00
H.J. Lu
ac49ecaf9d Add x86-64 __libc_ifunc_impl_list 2012-10-11 16:41:12 -07:00
H.J. Lu
9a387d1f78 Use IFUNC memmove/memset in x86-64 bcopy/bzero
Also add separate tests for bcopy and bzero.
2012-10-11 13:58:16 -07:00
H.J. Lu
0569936773 Define HAS_FMA with bit_FMA_Usable 2012-10-02 05:05:17 -07:00
H.J. Lu
31ed415328 Don't define x86-64 __strncmp_ssse3 in libc.a 2012-09-27 07:43:03 -07:00
Roland McGrath
7312ca90dc Clean up x86_64/multiarch/strstr-c.c include order. 2012-08-15 11:38:57 -07:00
Roland McGrath
9a0a54864b Clean up x86_64/multiarch/memmove.c include order. 2012-08-15 11:26:02 -07:00
H.J. Lu
f85fa27058 Avoid DWARF definition DIE on ifunc symbols 2012-08-09 16:04:37 -07:00
Carlos O'Donell
1a0994f535 BZ#14059: Fix AVX and FMA4 detection.
Fix AVX and FMA4 detection by following the guidelines
set out by Intel and AMD for detecting these features.
2012-05-17 06:59:28 -07:00
H.J. Lu
70bc83b910 Load pointers into RAX_LP in strcmp-sse42.S 2012-05-15 09:59:31 -07:00
H.J. Lu
9bc0b730a6 Load cache sizes into R*_LP in memcpy-ssse3.S 2012-05-15 09:58:28 -07:00
H.J. Lu
6d2850e7f5 Load cache sizes into R*_LP in memcpy-ssse3-back.S 2012-05-15 09:56:17 -07:00
H.J. Lu
8a17f34979 Load cache size into R8_LP 2012-05-15 09:35:43 -07:00
Paul Eggert
59ba27a63a Replace FSF snail mail address with URLs. 2012-02-09 23:18:22 +00:00
Ulrich Drepper
08cf777f9e Really fix AVX tests
There is no problem with strcmp, it doesn't use the YMM registers.
The math routines might since gcc perhaps generates such code.
Introduce bit_YMM_USBALE and use it in the math routines.
2012-01-26 09:45:54 -05:00
Ulrich Drepper
afc5ed09cb Reset bit_AVX in __cpu_features is OS support is missing 2012-01-26 07:45:14 -05:00
Liubov Dmitrieva
15db4de19d Fix overrun in destination buffer 2011-12-23 12:02:15 -05:00
Ulrich Drepper
370a7d88f7 WP fixes 2011-12-17 14:41:05 -05:00
Ulrich Drepper
1d3e4b618a Optimized wcschr and wcscpy for x86-64 and x86-32 2011-12-17 14:39:23 -05:00
Ulrich Drepper
aff2453df7 Fix more warnings 2011-12-03 21:49:35 -05:00
Ulrich Drepper
34372fc6d3 Fix test of non-ASCII locales in x86-64 strcasecmp et.al. 2011-11-01 16:46:23 -04:00
Ulrich Drepper
52e4b9eb62 More cleanups of x86-64 strstr 2011-10-28 19:01:48 -04:00
Ulrich Drepper
fd52bc6dc4 Clean up x86-64 strcasestr
Actually describe in the C code what is going on.
2011-10-28 18:18:04 -04:00
Ulrich Drepper
e0016b11d6 Add AVX optimized versions for some x86-64 math functions 2011-10-25 21:34:55 -04:00
Ulrich Drepper
618280a192 Optimize x86-64 SSE4.2+ strcmp a bit more 2011-10-25 14:50:31 -04:00
Ulrich Drepper
09229f3e1b Fix WS 2011-10-23 14:57:28 -04:00
Liubov Dmitrieva
ce7dd29f28 Optimized strnlen and wcscmp for x86-64 2011-10-23 14:56:04 -04:00
Ulrich Drepper
c196fed8f0 Fix compilation problems in x86-64 init-arch 2011-10-21 20:47:20 -04:00
Ulrich Drepper
ed72b6545f Check for FMA4 support and generate appropriate fma functions 2011-10-20 22:43:15 -04:00
Ulrich Drepper
8d4f46c613 Move fma routines to right place 2011-10-20 21:55:41 -04:00
Ulrich Drepper
855d156018 Optimize x86-64 rawmemchr and add test 2011-10-19 22:22:29 -04:00
Ulrich Drepper
d9a4d2ab27 Add optimized str{,n}casecmp for AVX on x86-64 2011-10-19 12:42:38 -04:00
Ulrich Drepper
2d1f3a4db6 Fix WS 2011-10-15 11:11:12 -04:00
Liubov Dmitrieva
be13f7bff6 Optimized memcmp and wmemcmp for x86-64 and x86-32 2011-10-15 11:10:08 -04:00
Liubov Dmitrieva
093ecf9299 Improve 64 bit memchr, memrchr, rawmemchr with SSE2 2011-10-07 11:49:10 -04:00
Ulrich Drepper
ceaa0c5dc3 Move Atom-optimized code out of the way and together 2011-09-06 21:53:03 -04:00
Ulrich Drepper
6d18b67f4d Fix whitespaces 2011-09-05 21:42:12 -04:00
Liubov Dmitrieva
a5f524e479 Add Atom-optimized strchr and strrchr for x86-64 2011-09-05 21:34:03 -04:00
Andreas Schwab
8c1a459f9a Fix inline strncat/strncmp on x86 2011-08-04 14:59:25 -04:00
Ulrich Drepper
21137f89c5 Fix overflow bug is optimized strncat for x86-64 2011-07-21 12:32:36 -04:00
Ulrich Drepper
8002999481 Fix whitespaces 2011-07-19 17:27:09 -04:00
Liubov Dmitrieva
99710781cc Improve 64 bit strcat functions with SSE2/SSSE3 2011-07-19 17:11:54 -04:00
H.J. Lu
8912479f9e Improved st{r,p}{,n}cpy for SSE2 and SSSE3 on x86-64 2011-06-24 15:14:22 -04:00
H.J. Lu
0b1cbaaef5 Optimized st{r,p}{,n}cpy for SSE2/SSSE3 on x86-32 2011-06-24 14:15:32 -04:00
H.J. Lu
3d29045b5e Assume Intel Core i3/i5/i7 processor if AVX is available 2011-06-03 07:01:25 -04:00
Mike Frysinger
4c559bcdf3 Fix static linking with checking x86/x86-64 memcpy. 2011-04-17 22:20:47 -04:00
H.J. Lu
0354e35501 Work around old buggy program which cannot cope with memcpy semantics. 2011-04-01 19:38:21 -04:00
H.J. Lu
c97a1282a4 Handle page boundaries in x86 SSE4.2 strncmp. 2011-03-21 05:35:38 -04:00
Harsha Jagasia
7e4ba49cd3 Enable SSE2 memset for AMD'supcoming Orochi processor.
This patch enables SSE2 memset for AMD's upcoming Orochi processor.
This patch also fixes the following bug:
For misaligned blocks larger than > 144 Bytes, memset branches into
the integer code path depending on the value of misalignment even if
the startup code chooses the SSE2 code path upfront, when multiarch
is enabled.
2011-03-04 23:30:08 -05:00
Roland McGrath
a0bf67cca2 Fix some warning nits. 2011-02-04 10:53:51 -08:00
H.J. Lu
13b695749a Support Intel processor model 6 and model 0x2. 2010-11-12 03:48:52 -05:00
H.J. Lu
8ca52c6e3b Fix one exit path in x86-64 SSE4.2 str{,n}casecmp. 2010-11-10 03:05:37 -05:00
H.J. Lu
ff02d5280b Use IFUNC on x86-64 memset 2010-11-08 03:41:34 -05:00
Richard Li
dbf3a06904 Fix x86-64 strchr propagation of search byte into all bytes of SSE register 2010-10-25 14:13:17 -04:00
Jakub Jelinek
5e908464b9 Implement accurate fma. 2010-10-13 22:27:03 -04:00
Jakub Jelinek
9ff8d36f27 Correct implementation of fmaf. 2010-10-11 09:27:05 -04:00
Ulrich Drepper
015a4c6193 Re-enable all strncasecmp versions. 2010-09-20 20:18:00 -07:00
Ulrich Drepper
8ffcee4a04 Fix limit detection in x86-64 SSE2 strncasecmp. 2010-09-20 14:02:23 -07:00
Ulrich Drepper
9ea3de11f1 Move slow Atom code to separate section. 2010-08-26 22:17:03 -07:00
H.J. Lu
623aac7f84 Unroll x86-64 strlen 2010-08-26 22:09:34 -07:00
H.J. Lu
b416a90085 Missing comma in last commit. 2010-08-26 13:18:46 -07:00
Roland McGrath
8b2b771538 Clean up warnings in new x86_64/multiarch code. 2010-08-25 12:13:08 -07:00
H.J. Lu
e73015f2d6 Unroll 32bit SSE strlen and handle slow bsf 2010-08-25 10:07:37 -07:00
Ulrich Drepper
1cdfe7242f Add missing copyright year updated and pretty printing. 2010-08-24 11:42:19 -07:00
Richard Henderson
73f27d5e72 Clean up SSE variable shifts 2010-08-24 11:35:01 -07:00
Ulrich Drepper
9da4bb316f Fix two typos in x86-64 SSE4.2 strncasecmp implementation. 2010-08-19 09:20:44 -07:00
Ulrich Drepper
1feccb6caf Fix fourth parameter of SSE4.2 strcmp for x86-64. 2010-08-15 20:46:09 -07:00
Ulrich Drepper
e9f82e0d1d Add optimized strncasecmp versions for x86-64. 2010-08-14 22:04:01 -07:00
Ulrich Drepper
ca6bb004eb Fix x86-64 build without multiarch. 2010-08-14 14:56:32 -07:00
Ulrich Drepper
73507d3ae0 Add support for SSSE3 and SSE4.2 versions of strcasecmp on x86-64. 2010-07-31 21:41:09 -07:00
Ulrich Drepper
66f6765a47 Pretty printing x86-64 SSE4.3 strcmp. 2010-07-30 12:54:37 -07:00
Ulrich Drepper
fe36dd025e Fix tolower operation in strcasestr. 2010-07-30 00:09:07 -07:00
Ulrich Drepper
880113d91e Avoid compiling unneeded file in ld.so. 2010-07-27 21:12:59 -07:00
Ulrich Drepper
8e96b93aa7 Speed up x86-64 strcasestr a bit moew.
Using the new SSE4.2 instructions is cool but not really the fastest.
Some older SSE instructions can do the trick faster.
2010-07-24 08:34:44 -07:00
Andreas Schwab
f6a31e0eb6 Add strcasestr-nonascii to i386 build 2010-07-21 07:26:18 -07:00
Ulrich Drepper
d02dc4ba08 Fix non-ASCII case of SSE4.2 strcasstr. 2010-07-16 16:00:22 -07:00
Ulrich Drepper
cc9f2e47a0 Speed up SSE4.2 strcasestr by avoiding indirect function call. 2010-07-16 15:37:38 -07:00
H.J. Lu
6fb8cbcb58 Improve 64bit memcpy/memmove for Atom, Core 2 and Core i7
This patch includes optimized 64bit memcpy/memmove for Atom, Core 2 and
Core i7.  It improves memcpy by up to 3X on Atom, up to 4X on Core 2 and
up to 1X on Core i7.  It also improves memmove by up to 3X on Atom, up to
4X on Core 2 and up to 2X on Core i7.
2010-06-30 08:26:11 -07:00
H.J. Lu
3c88fe1e3a Incorrect x86 CPU family and model check. 2010-05-27 11:14:18 -07:00
H.J. Lu
df87f54923 Check DATA_CACHE_SIZE_HALF 2010-04-14 22:18:27 -07:00
H.J. Lu
dd37cd1a12 Optimie x86-64 SSE4 memcmp for unaligned data. 2010-04-14 17:53:44 -07:00