This version uses general register based memory instruction to load
data, because vector register based is slightly slower in emag.
Character-matching is performed on 16-byte (both size and alignment)
memory block in parallel each iteration.
* sysdeps/aarch64/memchr.S (__memchr): Rename to MEMCHR.
[!MEMCHR](MEMCHR): Set to __memchr.
* sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
Add memchr_generic and memchr_nosimd.
* sysdeps/aarch64/multiarch/ifunc-impl-list.c
(__libc_ifunc_impl_list): Add memchr ifuncs.
* sysdeps/aarch64/multiarch/memchr.c: New file.
* sysdeps/aarch64/multiarch/memchr_generic.S: Likewise.
* sysdeps/aarch64/multiarch/memchr_nosimd.S: Likewise.
strchr and is significantly faster than the C version.
2016-11-04 Wilco Dijkstra <wdijkstr@arm.com>
Kevin Petit <kevin.petit@arm.com>
* sysdeps/aarch64/memchr.S (__memchr): New file.