Lock elision using TSX is a technique to optimize lock scaling
It allows to run locks in parallel using hardware support for
a transactional execution mode in 4th generation Intel Core CPUs.
See http://www.intel.com/software/tsx for more Information.
This patch implements a simple adaptive lock elision algorithm based
on RTM. It enables elision for the pthread mutexes and rwlocks.
The algorithm keeps track whether a mutex successfully elides or not,
and stops eliding for some time when it is not.
When the CPU supports RTM the elision path is automatically tried,
otherwise any elision is disabled.
The adaptation algorithm and its tuning is currently preliminary.
The code adds some checks to the lock fast paths. Micro-benchmarks
show little to no difference without RTM.
This patch implements the low level "lll_" code for lock elision.
Followon patches hook this into the pthread implementation
Changes with the RTM mutexes:
-----------------------------
Lock elision in pthreads is generally compatible with existing programs.
There are some obscure exceptions, which are expected to be uncommon.
See the manual for more details.
- A broken program that unlocks a free lock will crash.
There are ways around this with some tradeoffs (more code in hot paths)
I'm still undecided on what approach to take here; have to wait for testing reports.
- pthread_mutex_destroy of a lock mutex will not return EBUSY but 0.
- There's also a similar situation with trylock outside the mutex,
"knowing" that the mutex must be held due to some other condition.
In this case an assert failure cannot be recovered. This situation is
usually an existing bug in the program.
- Same applies to the rwlocks. Some of the return values changes
(for example there is no EDEADLK for an elided lock, unless it aborts.
However when elided it will also never deadlock of course)
- Timing changes, so broken programs that make assumptions about specific timing
may expose already existing latent problems. Note that these broken programs will
break in other situations too (loaded system, new faster hardware, compiler
optimizations etc.)
- Programs with non recursive mutexes that take them recursively in a thread and
which would always deadlock without elision may not always see a deadlock.
The deadlock will only happen on an early or delayed abort (which typically
happens at some point)
This only happens for mutexes not explicitely set to PTHREAD_MUTEX_NORMAL
or PTHREAD_MUTEX_ADAPTIVE_NP. PTHREAD_MUTEX_NORMAL mutexes do not elide.
The elision default can be set at configure time.
This patch implements the basic infrastructure for elision.
It is very very possible that the futex syscall returns an
error and that the caller of lll_futex_wake may want to
look at that error and propagate the failure.
This patch allows a caller to see the syscall error.
There are no users of the syscall error at present, but
future cleanups are now be able to check for the error.
--
nplt/
2013-06-10 Carlos O'Donell <carlos@redhat.com>
* sysdeps/unix/sysv/linux/i386/lowlevellock.h
(lll_futex_wake): Return syscall error.
* sysdeps/unix/sysv/linux/x86_64/lowlevellock.h
(lll_futex_wake): Return syscall error.
Fixes BZ #15337.
Static builds fail with the following warning:
/home/tools/glibc/glibc/nptl/../nptl/sysdeps/unix/sysv/linux/x86_64/cancellation.S:80:
undefined reference to `__GI___pthread_unwind'
When the source is configured with --disable-hidden-plt. This is
because the preprocessor conditional in cancellation.S only checks if
the build is for SHARED, whereas hidden_def is defined appropriately
only for a SHARED build that will have symbol versioning *and* hidden
defs are enabled. The last case is false here.
[BZ #14652]
When a thread waiting in pthread_cond_wait with a PI mutex is
cancelled after it has returned successfully from the futex syscall
but just before async cancellation is disabled, it enters its
cancellation handler with the mutex held and simply calling a
mutex_lock again will result in a deadlock. Hence, it is necessary to
see if the thread owns the lock and try to lock it only if it doesn't.
[BZ #14417]
A futex call with FUTEX_WAIT_REQUEUE_PI returns with the mutex locked
on success. If such a successful thread is pipped to the cond_lock by
another spuriously woken waiter, it could be sent back to wait on the
futex with the mutex lock held, thus causing a deadlock. So it is
necessary that the thread relinquishes the mutex before going back to
sleep.