Ref gcc.gnu.org/bugzilla/show_bug.cgi?id=52839#c10
Release barriers are needed to ensure that any memory written by
init_routine is seen by other threads before *once_control changes.
In the case of clear_once_control we need to flush any partially
written state.
In some cases, the compiler would optimize out the call to
allocate_and_test and thus result in a false positive for the test
case. Another problem was the fact that the compiler could in some
cases generate additional shifting of the stack pointer, resulting in
alloca moving the stack pointer beyond what is allowed by the
rlimit. Hence, accessing the stackaddr returned by pthread_getattr_np
is safer than relying on the alloca'd result.
Another problem is when RLIMIT may be very large, which may result in
violation of other resource limits. Hence we cap the max stack size to
8M for this test.
When rlimit is small enough to be used as the stacksize to be returned
in pthread_getattr_np, cases where a stack is made executable due to a
DSO load get stack size that is larger than what the kernel
allows. This is because in such a case the stack size does not account
for the pages that have auxv and program arguments.
Additionally, the stacksize for the process derived from this should
be truncated to align to page size to avoid going beyond rlimit.
When a stack is marked executable due to loading a DSO that requires
an executable stack, the logic tends to leave out a portion of stack
after the first frame, thus causing a difference in the value returned
by pthread_getattr_np before and after the stack is marked
executable. It ought to be possible to fix this by marking the rest of
the stack as executable too, but in the interest of marking as less of
the stack as executable as possible, the path this fix takes is to
make pthread_getattr_np also look at the first frame as the underflow
end of the stack and compute size and stack top accordingly.
The above happens only for the main process stack. NPTL thread stacks
are not affected by this change.