This patch relies on the C version of the rwlocks posted earlier.
With C rwlocks it is very straight forward to do adaptive elision
using TSX. It is based on the infrastructure added earlier
for mutexes, but uses its own elision macros. The macros
are fairly general purpose and could be used for other
elision purposes too.
This version is much cleaner than the earlier assembler based
version, and in particular implements adaptation which makes
it safer.
I changed the behavior slightly to not require any changes
in the test suite and fully conform to all expected
behaviors (generally at the cost of not eliding in
various situations). In particular this means the timedlock
variants are not elided. Nested trylock aborts.
Added support for TX lock elision of pthread mutexes on s390 and
s390x. This may improve lock scaling of existing programs on TX
capable systems. The lock elision code is only built with
--enable-lock-elision=yes and then requires a GCC version supporting
the TX builtins. With lock elision default mutexes are elided via
__builtin_tbegin, if the cpu supports transactions. By default lock
elision is not enabled and the elision code is not built.
This patch removes the arch specific powerpc implementation and instead
uses the linux default one. Although the current powerpc implementation
already constains the required memory barriers for correct
initialization, the default implementation shows a better performance on
newer chips.
[BZ #15215] This unifies various pthread_once architecture-specific
implementations which were using the same algorithm with slightly different
implementations. It also adds missing memory barriers that are required for
correctness.
This patch moves the __PTHREAD_SPINS definition to arch specific header
since pthread_mutex_t layout is also arch specific. This leads to no
need to defining __PTHREAD_MUTEX_HAVE_ELISION and thus removing of the
undefined compiler warning.
When profiling programs with lock problems with perf record -g dwarf,
libunwind can currently not backtrace through the futex and unlock
functions in pthread. This is because they use out of line sections,
and those are not correctly described in dwarf2 (I believe needs
dwarf3 or 4).
This patch first removes the out of line sections. They only save a
single jump, but cause a lot of pain. Then it converts the now inline
lock code to use the now standard gas .cfi_* commands.
With these changes libunwind/perf can backtrace through the futex
functions now.
Longer term it would be likely better to just use C futex() functions
on x86 like all the other architectures. This would clean the code up
even more.