This feature is specifically for the C++ compiler to offload calling
thread_local object destructors on thread program exit, to glibc.
This is to overcome the possible complication of destructors of
thread_local objects getting called after the DSO in which they're
defined is unloaded by the dynamic linker. The DSO is marked as
'unloadable' if it has a constructed thread_local object and marked as
'unloadable' again when all the constructed thread_local objects
defined in it are destroyed.
That convention requires the instruction immediately preceding SYSCALL
to initialize $v0 with the syscall number. Then if a restart triggers,
$v0 will have been clobbered by the syscall interrupted, and needs to be
reinititalized. The kernel will decrement the PC by 4 before switching
back to the user mode so that $v0 has been reloaded before SYSCALL is
executed again. This implies the place $v0 is loaded from must be
preserved across a syscall, e.g. an immediate, static register, stack
slot, etc.
The restriction was lifted with Linux 2.6.36 kernel release and no
special requirements are placed around the SYSCALL instruction anymore,
however we still support older kernel binaries.