The optimization is achieved by following techniques:
> data alignment [gain from aligned memory access on read/write]
> POWER7 gains performance with loop unrolling/unwinding
[gain by reduction of branch penalty].
> zero padding done by calling optimized memset