Jamie Reece Wilson
73e901d923
[+] Comment noting the partially flawed design of WOA ...i still dont want to support WOA_STRICTER_FIFO
269 lines
18 KiB
C++
269 lines
18 KiB
C++
/***
|
|
Copyright (C) 2023 J Reece Wilson (a/k/a "Reece"). All rights reserved.
|
|
|
|
File: WakeOnAddress.hpp
|
|
Date: 2023-3-11
|
|
Author: Reece
|
|
Note:
|
|
This API can be configured to run in one of two modes - Emulation and Wrapper modes
|
|
|
|
In Emulation Mode:
|
|
1: Wakes occur in FIFO order so long as the thread is in the kernel
|
|
2: uWordSize can be any length not exceeding 32 bytes
|
|
otherwise Wrapper Mode:
|
|
1: Wakes are orderless
|
|
2: uWordSize must be less than or equal to 8 bytes (todo: no?)
|
|
3: only the least significant 32bits are guaranteed to be used as wake signals
|
|
4: The special EWaitMethod variants will suffer a performance hit
|
|
in either mode:
|
|
1: WaitOnAddress[...] can wake at anytime the wakeup method is successful
|
|
2: WaitOnAddress[...] can drop any wakeup if the wakeup method would fail
|
|
|
|
By default:
|
|
Windows XP - Windows 7 => Emulation Mode
|
|
Windows 10+ => Wrapper Mode
|
|
Linux => Emulation Mode; however, Wrapper Mode is available
|
|
**************************************************************************************
|
|
All platforms : ThreadingConfig::bPreferEmulatedWakeOnAddress = !AuBuild::kIsNtDerived
|
|
**************************************************************************************
|
|
|
|
Also note: Alongside Wrapper Mode, there is an internal set of APIs that allow for 32-bit word WoA support for
|
|
AuThread primitives. These are only used if the operating system has a futex interface available at
|
|
runtime. MacOS, iOS, and <= Windows 7 support requires these paths to be disabled. In other cases,
|
|
the internal wrapper and Wrapper Mode should use this path to quickly yield to kernel
|
|
|
|
Generally speaking, AuThreadPrimitives will use the futex layer or some OS specific mechanism to
|
|
bail out into the kernels' thread scheduler as quickly as possible.
|
|
In any mode, AuThreadPrimitives will go from: Primitive -> kernel/platform; or
|
|
Primitive -> WoA Internal Wrapper -> kernel/platform
|
|
In ThreadingConfig::bPreferEmulatedWakeOnAddress mode, AuThreading::WaitOnAddress -> Emulation Mode.
|
|
In !ThreadingConfig::bPreferEmulatedWakeOnAddress mode, AuThreading::WaitOnAddress -> Wrapper Mode -> [...]
|
|
[...] -> Internal Wrapper -> kernel/platform
|
|
In any mode, the futex reference primitives including AuBarrier, AuInitOnce, AuFutexMutex, etc,
|
|
will always go from: inlined header template definition -> relinked symbol -> AuThreading::WaitOnAddress
|
|
-> [...].
|
|
|
|
Note that some edge case platforms can follow AuThreadPrimitives *.Generic -> Internal Wrapper -> [...]
|
|
[...] -> AuThreading::WaitOnAddress -> Emulation Mode.
|
|
This is only the case when, we lack OS specific wait paths for our primitives; and lack a native
|
|
wait on address interface to develop the internal wrapper. Fortunately, only more esoteric UNIX machines
|
|
require these. Further platform support can be added with this; only a semaphore or conditionvar/mutex
|
|
pair is required to bootstrap this path.
|
|
|
|
Memory note: Weakly ordered memory architecture is an alien concept. AuAtomicXXX operations ensure all previous stores
|
|
are visible across all cores (useful for semaphore increment and mutex-unlock operations), and that loads
|
|
are evaluated in order. For all intents and purposes, you should treat the au ecosystem like any
|
|
other strongly ordered processor and program pair. For memeworthy lockless algorithms, you can use
|
|
spec-of-the-year atomic word containers and related methods; we dont care about optimizing some midwits
|
|
weakly-ordered cas spinning and ABA-hell container, thats genuinely believed to be the best thing ever.
|
|
Sincerely, you are doing something wrong if you're write-locking a container for any notable length of
|
|
time, and more often than not, lock-free algorithms are bloated to all hell, just to end up losing to
|
|
read/write mutex guarded algorithms in most real world use cases - using an atomic pointer over lock bits
|
|
makes no difference besides from the amount of bugs you can expect to creep into your less flexible code.
|
|
|
|
tldr: Dont worry about memory ordering or ABA. Use the provided locks, AuAtomic ops, and thread primitives as expected.
|
|
(you'll be fine. trust me bro.)
|
|
|
|
Configuration reminder:
|
|
NT 6.2+ platforms may be optimized for the expected defacto case of EWaitMethod::eNotEqual / no "-Special".
|
|
If you're implementing special primitives or using AuFutexSemaphore with timeline acquisitions, remember to
|
|
set ThreadingConfig::bPreferEmulatedWakeOnAddress=true at Aurora::RuntimeStart
|
|
|
|
Compilation / WOA_STRICTER_FIFO:
|
|
Stricter FIFO guarantees are available when AuWakeOnAddress.cpp is compiled with WOA_STRICTER_FIFO.
|
|
Note that this will disable TryWaitOnAddress-like APIs and worsen the expected average case.
|
|
|
|
You will never be First In, First Out 100% of the time under this flawed API design:
|
|
Due to the nature of an atomic comparison within locked signal/wait paths, what amounts to a hidden condition
|
|
condvar pattern, the target address must be read under an internal lock to verify the initial sleep condition.
|
|
Every futex-like API does this. You cannot meet the API contract otherwise - it's the inherit nature of futex/
|
|
waitonaddress-like APIs as they exist in Windows, Linux, BSD-likes, and NaCl - they're just storageless condition
|
|
variables. You simply cannot have signal / waits, where there's no ordering whatsoever between signal and wake,
|
|
and where the pCompare condition isn't tested before each yield. You *need* atomicity or a lock to ensure wakes
|
|
are followed by signals; and for each sleep, you *need* to test what amounts to the condition you'd find under
|
|
a traditional condition-mutex while loop. Should that condition pass early, every futex impl bails out early.
|
|
Regardless of what Linux and NaCl developers will tell you, no futex API is ever truly first in-first out, due
|
|
to this inherit design flaw for real time programming. Anybody who claims otherwise is selling you a toll bridge.
|
|
The only valid workaround for this is to develop your own competing API that does away with the comparsion and
|
|
relies solely on bucketed semaphores of per-address ownership, under an incapatible set of APIs as follows:
|
|
{ Wait(pAddress, uTimeout); Signal(pAddress); and Release(pAddress); }
|
|
Emphasis on the release operation and missing comparison parameter.
|
|
Requires: (1) IOU counter under Signal, (2) Wait to stall for acks of in-order head dequeues, and (3) a
|
|
no fast path mandate.
|
|
(1) needs to be paired with a Release in order to not leak active semaphores.
|
|
|
|
***/
|
|
#pragma once
|
|
|
|
namespace Aurora::Threading
|
|
{
|
|
// Specifies to break a thread context yield when volatile pTargetAddress [... EWaitMethod operation ...] constant pCompareAddress
|
|
AUE_DEFINE(EWaitMethod, (
|
|
eNotEqual, eEqual, eLessThanCompare, eGreaterThanCompare, eLessThanOrEqualsCompare, eGreaterThanOrEqualsCompare, eAnd, eNotAnd
|
|
))
|
|
|
|
AUKN_SYM void WakeAllOnAddress(const void *pTargetAddress);
|
|
|
|
AUKN_SYM void WakeOnAddress(const void *pTargetAddress);
|
|
|
|
// WakeAllOnAddress with a uNMaximumThreads which may or may not be respected
|
|
AUKN_SYM void WakeNOnAddress(const void *pTargetAddress,
|
|
AuUInt8 uNMaximumThreads);
|
|
|
|
// On systems with processors of shared execution pipelines, these try-series of operations will spin (eg: mm_pause) for a configurable
|
|
// amount of time, or enter a low power mode, so long as the the process-wide state isn't overly contested. This means you can use these
|
|
// arbitrarily without worrying about an accidental thundering mm_pause herd. If you wish to call WaitOnAddress[...] afterwards, you should
|
|
// report you already spun via optAlreadySpun. If the application is configured to spin later on, this hint may be used to prevent a double spin.
|
|
AUKN_SYM bool TryWaitOnAddress(const void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize);
|
|
|
|
AUKN_SYM bool TryWaitOnAddressSpecial(EWaitMethod eMethod,
|
|
const void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize);
|
|
|
|
// On systems with processors of shared execution pipelines, these try-series of operations will spin (eg: mm_pause) for a configurable
|
|
// amount of time, or enter a low power mode, so long as the the process-wide state isn't overly contested. This means you can use these
|
|
// arbitrarily without worrying about an accidental thundering mm_pause herd. If you wish to call WaitOnAddress[...] afterwards, you should
|
|
// report you already spun via optAlreadySpun. If the application is configured to spin later on, this hint may be used to prevent a double spin.
|
|
// In the case of a pTargetAddress != pCompareAddress condition, the optional check parameter is used to verify the wake condition.
|
|
// Otherwise, spinning will continue.
|
|
AUKN_SYM bool TryWaitOnAddressEx(const void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
const AuFunction<bool(const void *, const void *, AuUInt8)> &check);
|
|
|
|
// See: TryWaitOnAddressEx
|
|
AUKN_SYM bool TryWaitOnAddressSpecialEx(EWaitMethod eMethod,
|
|
const void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
const AuFunction<bool(const void *, const void *, AuUInt8)> &check);
|
|
|
|
// Relative timeout variant of nanosecond resolution eNotEqual WoA. 0 = indefinite.
|
|
// In Wrapper Mode, it is possible to bypass the WoA implementation, and bail straight into the kernel.
|
|
// For improved order and EWaitMethod, do not use Wrapper Mode.
|
|
AUKN_SYM bool WaitOnAddress(const void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
AuUInt64 qwNanoseconds,
|
|
AuOptional<bool> optAlreadySpun = {} /*hint: do not spin before switching. subject to global config.*/);
|
|
|
|
// Relative timeout variant of nanosecond resolution WoA. 0 = indefinite
|
|
// Emulation Mode over Wrapper Mode is recommended for applications that heavily depend on these wait functions.
|
|
AUKN_SYM bool WaitOnAddressSpecial(EWaitMethod eMethod,
|
|
const void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
AuUInt64 qwNanoseconds,
|
|
AuOptional<bool> optAlreadySpun = {} /*hint: do not spin before switching. subject to global config.*/);
|
|
|
|
// Absolute timeout variant of nanosecond resolution eNotEqual WoA. Nanoseconds are in steady clock time. 0 = indefinite
|
|
// In Wrapper Mode, it is possible to bypass the WoA implementation, and bail straight into the kernel.
|
|
// For improved order and EWaitMethod, do not use Wrapper Mode.
|
|
AUKN_SYM bool WaitOnAddressSteady(const void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
AuUInt64 qwNanoseconds,
|
|
AuOptional<bool> optAlreadySpun = {} /*hint: do not spin before switching. subject to global config.*/);
|
|
|
|
// Absolute timeout variant of nanosecond resolution WoA. Nanoseconds are in steady clock time. 0 = indefinite
|
|
// Emulation Mode over Wrapper Mode is recommended for applications that heavily depend on these wait functions.
|
|
AUKN_SYM bool WaitOnAddressSpecialSteady(EWaitMethod eMethod,
|
|
const void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
AuUInt64 qwNanoseconds,
|
|
AuOptional<bool> optAlreadySpun = {} /*hint: do not spin before switching. subject to global config.*/);
|
|
|
|
// C++ doesn't allow for implicit casting between nonvolatile and volatile pointers.
|
|
// The following stubs unify the above APIs for non-volatile marked atomic containers.
|
|
// Whether the underlying data of "pTargetAddress" is thread-locally-volatile or not is upto the chosen compiler intrin used to load/store and/or whether you upcast to volatile later on.
|
|
|
|
inline void WakeAllOnAddress(const volatile void *pTargetAddress)
|
|
{
|
|
return WakeAllOnAddress((const void *)pTargetAddress);
|
|
}
|
|
|
|
inline void WakeOnAddress(const volatile void *pTargetAddress)
|
|
{
|
|
return WakeOnAddress((const void *)pTargetAddress);
|
|
}
|
|
|
|
inline void WakeNOnAddress(const volatile void *pTargetAddress,
|
|
AuUInt8 uNMaximumThreads)
|
|
{
|
|
return WakeNOnAddress((const void *)pTargetAddress, uNMaximumThreads);
|
|
}
|
|
|
|
inline bool TryWaitOnAddress(const volatile void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize)
|
|
{
|
|
return TryWaitOnAddress((const void *)pTargetAddress, pCompareAddress, uWordSize);
|
|
}
|
|
|
|
inline bool TryWaitOnAddressSpecial(EWaitMethod eMethod,
|
|
const volatile void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize)
|
|
{
|
|
return TryWaitOnAddressSpecial(eMethod, (const void *)pTargetAddress, pCompareAddress, uWordSize);
|
|
}
|
|
|
|
inline bool TryWaitOnAddressEx(const volatile void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
const AuFunction<bool(const void *, const void *, AuUInt8)> &check)
|
|
{
|
|
return TryWaitOnAddressEx((const void *)pTargetAddress, pCompareAddress, uWordSize, check);
|
|
}
|
|
|
|
inline bool TryWaitOnAddressSpecialEx(EWaitMethod eMethod,
|
|
const volatile void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
const AuFunction<bool(const void *, const void *, AuUInt8)> &check)
|
|
{
|
|
return TryWaitOnAddressSpecialEx(eMethod, (const void *)pTargetAddress, pCompareAddress, uWordSize, check);
|
|
}
|
|
|
|
inline bool WaitOnAddress(const volatile void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
AuUInt64 qwNanoseconds,
|
|
AuOptional<bool> optAlreadySpun = {})
|
|
{
|
|
return WaitOnAddress((const void *)pTargetAddress, pCompareAddress, uWordSize, qwNanoseconds, optAlreadySpun);
|
|
}
|
|
|
|
inline bool WaitOnAddressSpecial(EWaitMethod eMethod,
|
|
const volatile void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
AuUInt64 qwNanoseconds,
|
|
AuOptional<bool> optAlreadySpun = {})
|
|
{
|
|
return WaitOnAddressSpecial(eMethod, (const void *)pTargetAddress, pCompareAddress, uWordSize, qwNanoseconds, optAlreadySpun);
|
|
}
|
|
|
|
inline bool WaitOnAddressSteady(const volatile void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
AuUInt64 qwNanoseconds,
|
|
AuOptional<bool> optAlreadySpun = {})
|
|
{
|
|
return WaitOnAddressSteady((const void *)pTargetAddress, pCompareAddress, uWordSize, qwNanoseconds, optAlreadySpun);
|
|
}
|
|
|
|
inline bool WaitOnAddressSpecialSteady(EWaitMethod eMethod,
|
|
const volatile void *pTargetAddress,
|
|
const void *pCompareAddress,
|
|
AuUInt8 uWordSize,
|
|
AuUInt64 qwNanoseconds,
|
|
AuOptional<bool> optAlreadySpun = {})
|
|
{
|
|
return WaitOnAddressSpecialSteady(eMethod, (const void *)pTargetAddress, pCompareAddress, uWordSize, qwNanoseconds, optAlreadySpun);
|
|
}
|
|
} |