For arm this makes no difference--the result is bit-for-bit identical;
for thumb this results in smaller encodings. Perhaps it ought not and
this is in fact an assembler bug, but I also think it's clearer.
Some routines are written with complex LDM/STM insns that cannot be
used in thumb mode, or are highly conditional requiring excessive
IT insns.
When a future patch goes in to enable thumb2 by default, this marker
will be used to override that default.