Implement strcpy family IFUNC selectors in C.
All internal calls within libc.so can use IFUNC on x86-64 since unlike
x86, x86-64 supports PC-relative addressing to access the GOT entry so
that it can call via PLT without using an extra register. For libc.a,
we can't use IFUNC for functions which are called before IFUNC has been
initialized. Use IFUNC internally reduces the icache footprint since
libc.so and other codes in the process use the same implementations.
This patch uses IFUNC for strcpy family functions within libc.
* sysdeps/x86_64/multiarch/Makefile (sysdep_routines): Add
strcpy-sse2 and stpcpy-sse2.
* sysdeps/x86_64/multiarch/ifunc-unaligned-ssse3.h: New file.
* sysdeps/x86_64/multiarch/stpcpy-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/stpcpy.c: Likewise.
* sysdeps/x86_64/multiarch/stpncpy.c: Likewise.
* sysdeps/x86_64/multiarch/strcpy-sse2.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy.c: Likewise.
* sysdeps/x86_64/multiarch/strncpy.c: Likewise.
* sysdeps/x86_64/multiarch/stpcpy.S: Removed.
* sysdeps/x86_64/multiarch/stpncpy.S: Likewise.
* sysdeps/x86_64/multiarch/strcpy.S: Likewise.
* sysdeps/x86_64/multiarch/strncpy.S: Likewise.
* sysdeps/x86_64/multiarch/stpncpy-c.c (weak_alias): New.
(libc_hidden_def): Always defined as empty.
* sysdeps/x86_64/multiarch/strncpy-c.c (libc_hidden_builtin_def):
Always Defined as empty.
This patch adds SSSE3 strcpy/stpcpy. I got up to 4X speed up on Core 2
and Core i7. I disabled it on Atom since SSSE3 version is slower for
shorter (<64byte) data.