This feature is specifically for the C++ compiler to offload calling
thread_local object destructors on thread program exit, to glibc.
This is to overcome the possible complication of destructors of
thread_local objects getting called after the DSO in which they're
defined is unloaded by the dynamic linker. The DSO is marked as
'unloadable' if it has a constructed thread_local object and marked as
'unloadable' again when all the constructed thread_local objects
defined in it are destroyed.
* sysdeps/unix/sysv/linux/i386/i486/pthread_cond_timedwait.S
(__pthread_cond_timedwait): If possible use FUTEX_WAIT_BITSET to
directly use absolute timeout.
Since the FUTEX_WAIT operation takes a relative timeout, the
pthread_cond_timedwait and other timed function implementations have
to get a relative timeout from the absolute timeout parameter it gets
before it makes the futex syscall. This value is then converted back
into an absolute timeout within the kernel. This is a waste and has
hence been improved upon by a FUTEX_WAIT_BITSET operation (OR'd with
FUTEX_CLOCK_REALTIME to make the kernel use the realtime clock instead
of the default monotonic clock). This was implemented only in the x86
and sh assembly code and not in the C code. This patch implements
support for FUTEX_WAIT_BITSET whenever available (since linux-2.6.29)
for s390 and powerpc.
nptl/
* sysdeps/unix/sysv/linux/sparc/lowlevellock.h (BUSY_WAIT_NOP):
Define when we have v9 instructions available.
* sysdeps/unix/sysv/linux/sparc/sparc64/cpu_relax.S: New file.
* sysdeps/unix/sysv/linux/sparc/sparc32/sparcv9/cpu_relax.S: New
file.
* sysdeps/unix/sysv/linux/sparc/sparc32/sparcv9/Makefile: New
file.
* sysdeps/unix/sysv/linux/sparc/sparc64/Makefile: Add cpu_relax
to libpthread-routines.
The macro pthread_cleanup_push_defer_np in pthread.h has a misaligned
line continuation marker. This marker was previously aligned, but
recent changes have moved it out of alignment. This change realigns
the marker. This also reduces the diff against the hppa version of
pthread.h where the marker is aligned.
[BZ #14652]
When a thread waiting in pthread_cond_wait with a PI mutex is
cancelled after it has returned successfully from the futex syscall
but just before async cancellation is disabled, it enters its
cancellation handler with the mutex held and simply calling a
mutex_lock again will result in a deadlock. Hence, it is necessary to
see if the thread owns the lock and try to lock it only if it doesn't.
[BZ #14568]
* sysdeps/sparc/tls.h (DB_THREAD_SELF_INCLUDE): Delete.
(DB_THREAD_SELF): Use constants for the register offsets. Correct
the case of a 64-bit debugger with a 32-bit inferior.
[BZ #14417]
A futex call with FUTEX_WAIT_REQUEUE_PI returns with the mutex locked
on success. If such a successful thread is pipped to the cond_lock by
another spuriously woken waiter, it could be sent back to wait on the
futex with the mutex lock held, thus causing a deadlock. So it is
necessary that the thread relinquishes the mutex before going back to
sleep.
[BZ #14477]
Add an additional entry in the exception table to jump to
__condvar_w_cleanup2 instead of __condvar_w_cleanup for PI mutexes
when %ebx contains the address of the futex instead of the condition
variable.
Ref gcc.gnu.org/bugzilla/show_bug.cgi?id=52839#c10
Release barriers are needed to ensure that any memory written by
init_routine is seen by other threads before *once_control changes.
In the case of clear_once_control we need to flush any partially
written state.
In some cases, the compiler would optimize out the call to
allocate_and_test and thus result in a false positive for the test
case. Another problem was the fact that the compiler could in some
cases generate additional shifting of the stack pointer, resulting in
alloca moving the stack pointer beyond what is allowed by the
rlimit. Hence, accessing the stackaddr returned by pthread_getattr_np
is safer than relying on the alloca'd result.
Another problem is when RLIMIT may be very large, which may result in
violation of other resource limits. Hence we cap the max stack size to
8M for this test.
When rlimit is small enough to be used as the stacksize to be returned
in pthread_getattr_np, cases where a stack is made executable due to a
DSO load get stack size that is larger than what the kernel
allows. This is because in such a case the stack size does not account
for the pages that have auxv and program arguments.
Additionally, the stacksize for the process derived from this should
be truncated to align to page size to avoid going beyond rlimit.
When a stack is marked executable due to loading a DSO that requires
an executable stack, the logic tends to leave out a portion of stack
after the first frame, thus causing a difference in the value returned
by pthread_getattr_np before and after the stack is marked
executable. It ought to be possible to fix this by marking the rest of
the stack as executable too, but in the interest of marking as less of
the stack as executable as possible, the path this fix takes is to
make pthread_getattr_np also look at the first frame as the underflow
end of the stack and compute size and stack top accordingly.
The above happens only for the main process stack. NPTL thread stacks
are not affected by this change.