Instead of updating the pointer that we're reading, update the offset
(which is the length). The number of variables we're operating on is the
same (2), but this simplifies the calculation at the end.
BEFORE | AFTER
tzcntl %edx, %edx | tzcntl %edx, %eax
subq %rdi, %rax |
sarq %rax |
shrl %edx | shrq %rax
addq %rdx, %rax | leaq (%rax,%rcx), %rax
ret | ret
We remove one subtraction and one shift. I don't know why it decided to
use LEA instead of ADD... The shift changed from 32- to 64-bit because
we cleaned up the constant 2 (an int) in the file with sizeof(char16_t)
(a size_t), but that has no effect in performance.
Change-Id: I0e5f6bec596a4a78bd3bfffd16c9650a60289f4c
Reviewed-by: Lars Knoll <lars@knoll.priv.no>