QMutex & QReadWriteLock: do a memory read before CAS

The testAndSet operation is expensive if the lock is contended:
attempting to CAS that lock will cause the cacheline containing the lock
to be brought to the current CPU's most local cache in exclusive mode,
which in turn causes the CPU that has the lock to stall when it attempts
to release it. That's not desirable if we were just trying an untimed
tryLock*.

In the case of timed, contended tryLocks or unconditional locks, we
still need to perform an atomic operation to indicate we're about to
wait. For that case, this patch reduces the minimum number of atomic
operations from 2 to 1, which is a gain even in the case where no other
thread has changed the lock status at all. In case they have, either by
more threads attempting to lock or by the one that has the lock
unlocking it, this avoids the cacheline bouncing around between the
multiple CPUs between those two atomic operations. For QMutex, that
second atomic is a fetchAndStore, not testAndSet.

The above explanation is valid for architectures with Compare-And-Swap
instructions, such as x86 and ARMv8.1. For architectures using Load
Linked/Store Conditional instructions, the explanation doesn't apply but
the benefits still should because we avoid the expense of the LL.

See similar change to pthread_mutex_lock in
https://sourceware.org/git/?p=glibc.git;a=commit;h=d672a98a1af106bd68deb15576710cd61363f7a6

Change-Id: I3d728c4197df49169066fffd1756dcc26b2cf5f3
Reviewed-by: Marc Mutz <marc.mutz@qt.io>
This commit is contained in:
Thiago Macieira 2023-04-17 16:30:38 -07:00
parent 8c085c5722
commit 8598e84c5f
2 changed files with 8 additions and 5 deletions

View File

@ -102,7 +102,10 @@ public:
bool try_lock() noexcept { return tryLock(); }
private:
inline bool fastTryLock() noexcept {
inline bool fastTryLock() noexcept
{
if (d_ptr.loadRelaxed() != nullptr)
return false;
return d_ptr.testAndSetAcquire(nullptr, dummyLocked());
}
inline bool fastTryUnlock() noexcept {

View File

@ -191,8 +191,8 @@ bool QReadWriteLock::tryLockForRead()
bool QReadWriteLock::tryLockForRead(int timeout)
{
// Fast case: non contended:
QReadWriteLockPrivate *d;
if (d_ptr.testAndSetAcquire(nullptr, dummyLockedForRead, d))
QReadWriteLockPrivate *d = d_ptr.loadRelaxed();
if (d == nullptr && d_ptr.testAndSetAcquire(nullptr, dummyLockedForRead, d))
return true;
while (true) {
@ -305,8 +305,8 @@ bool QReadWriteLock::tryLockForWrite()
bool QReadWriteLock::tryLockForWrite(int timeout)
{
// Fast case: non contended:
QReadWriteLockPrivate *d;
if (d_ptr.testAndSetAcquire(nullptr, dummyLockedForWrite, d))
QReadWriteLockPrivate *d = d_ptr.loadRelaxed();
if (d == nullptr && d_ptr.testAndSetAcquire(nullptr, dummyLockedForWrite, d))
return true;
while (true) {