For arm this makes no difference--the result is bit-for-bit identical;
for thumb this results in smaller encodings. Perhaps it ought not and
this is in fact an assembler bug, but I also think it's clearer.
Factor out the sequence needed to call kuser_get_tls, as we can't
play subtract into pc games in thumb mode. Prepare for hard-tp,
pulling the save of LR into the macro.