Benchmarking shows it took up to 3.5% of Qt Creator's initialization
cost. Optimize by modifying only one variable per loop: instead of
updating n and dst128, we only update one variable at a time.
Removing the Duff's Device also improves the code, since the compiler
won't try to update dst128 four times per loop, only once.
The moving of the epilogue close to the prologue was just to make the
code a little cleaner.
Change-Id: I5b74e27d520ca821f380aef0533c244805f003b7
Reviewed-by: Gunnar Sletta <gunnar.sletta@jollamobile.com>