optimize the following memcpy: sysdeps/i386/i686/multiarch/memcpy-ssse3.S

I've improved the following implementation of memcpy:
"sysdeps/i386/i686/multiarch/memcpy-ssse3.S".

The patch includes some minor style fixes, but the important part is
just using prefetch loops for the case:

DATA_CACHE_SIZE_HALF <= len <  SHARED_CACHE_SIZE_HALF and
src and dst pointers have unequal 16 byte alignments.

This gives from 6% - 50% performance boost on the atom machine, about
24,73% in geometric mean.
This commit is contained in:
Liubov Dmitrieva 2012-03-30 16:45:27 -04:00 committed by Ulrich Drepper
parent 48c41d04ee
commit 4b43400f6a
2 changed files with 1484 additions and 564 deletions

View File

@ -1,3 +1,10 @@
2012-03-22 Liubov Dmitrieva <liubov.dmitrieva@gmail.com>
* sysdeps/i386/i686/multiarch/memcpy-ssse3.S: Update.
Optimize memcpy with prefetch if
DATA_CACHE_SIZE_HALF <= len < SHARED_CACHE_SIZE_HALF and
src, dst pointers have unequal 16 byte alignments.
2012-03-30 Siddhesh Poyarekar <siddhesh@redhat.com>
[BZ #13928]

File diff suppressed because it is too large Load Diff