Commit Graph

3 Commits

Author SHA1 Message Date
Steve Ellcey
f0da0bcf8b Remove extra space at end of line. 2018-10-16 11:02:03 -07:00
Anton Youdkevitch
75c1aee500 aarch64: optimized memcpy implementation for thunderx2
Since aligned loads and stores are huge performance
advantage the implementation always tries to do aligned
access. Among the cases when src and dst addresses are
aligned or unaligned evenly there are cases of not evenly
unaligned src and dst. For such cases (if the length is
big enough) ext instruction is used to merge-and-shift
two memory chunks loaded from two adjacent aligned
locations and then the adjusted chunk gets stored to
aligned address.

Performance gain against the current T2 implementation:
     memcpy-large: 65K-32M: +40% - +10%
     memcpy-walk:  128-32M: +20% - +2%
2018-10-16 11:00:27 -07:00
Steve Ellcey
e9537dddc7 IFUNC for Cavium ThunderX2
* sysdeps/aarch64/multiarch/Makefile (sysdep_routines):
	Add memcpy_thunderx2.
	* sysdeps/aarch64/multiarch/ifunc-impl-list.c (MAX_IFUNC):
	Increment to 4.
	(__libc_ifunc_impl_list): Add __memcpy_thunderx2.
	* sysdeps/aarch64/multiarch/memcpy.c (libc_ifunc): Add IS_THUNDERX2
	and IS_THUNDERX2PA checks.
	* sysdeps/aarch64/multiarch/memcpy_thunderx.S (USE_THUNDERX2):
	Use macro to set name appropriately.
	(memcpy): Use USE_THUNDERX2 macro to modify prefetches.
	* sysdeps/aarch64/multiarch/memcpy_thunderx2.S: New file.
	* sysdeps/unix/sysv/linux/aarch64/cpu-features.h (IS_THUNDERX2PA):
	New macro.
	(IS_THUNDERX2): New macro.
2018-02-22 08:38:47 -08:00