2012-05-14 Liubov Dmitrieva <liubov.dmitrieva@gmail.com>
* sysdeps/i386/i686/fpu/multiarch/Makefile: New file.
* sysdeps/i386/i686fpu/multiarch/e_expf.c: New file.
* sysdeps/i386/i686fpu/multiarch/e_expf-ia32.S: New file.
* sysdeps/i386/i686/fpu/multiarch/e_expf-sse2.S: New file.
The proper define to check "am I in a shared lib" is "SHARED", not "PIC".
The two new memset_chk functions incorrectly depend on "PIC".
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
I've improved the following implementation of memcpy:
"sysdeps/i386/i686/multiarch/memcpy-ssse3.S".
The patch includes some minor style fixes, but the important part is
just using prefetch loops for the case:
DATA_CACHE_SIZE_HALF <= len < SHARED_CACHE_SIZE_HALF and
src and dst pointers have unequal 16 byte alignments.
This gives from 6% - 50% performance boost on the atom machine, about
24,73% in geometric mean.
Wrong copy algorithm for last bytes, not thread safety.
In some particular cases it uses the destination
memory beyond the string end for
16-byte load, puts changes into that part that is relevant
to destination string and writes whole 16-byte chunk into memory.
I have a test case where the memory beyond the string end contains
malloc/free data, that appear corrupted in case free() updates
it in between the 16-byte read and 16-byte write.
32bit memset-sse2.S assumes cache size is multiple of 128 bytes. If
it isn't true, memset-sse2.S will fail. For example, a processor can
have 24576 KB L3 cache and 20 cores. That is 2516582 byte per core. Half
of it is 1258291, which isn't helpful for vector instructions. This
patch rounds cache sizes to multiple of 256 bytes and adds "raw" cache
sizes.