[*] Update media txt files

This commit is contained in:
Reece Wilson 2024-05-28 18:56:20 +01:00
parent 85cf0a793a
commit 22136a8ae2
3 changed files with 26 additions and 29 deletions

View File

@ -6,18 +6,11 @@ When true
When false When false
Readers can starve pending write waiters Readers can starve pending write waiters
Much like other Aurora primitives, there is no guarantee of order. Reader and writer threads are kept separate in separate queues. No wake operation for many-readers will result in a spurious wake of a writer thread, and vice versa.
Reader and writer threads are generally kept separate. No wake operation for many-readers will result in a spurious wake of a writer thread, and vice versa.
Waits/yields of write-upgrades, write-locks, and then read-locks are grouped together by identical wake characteristics defined by platform specific quirks, sorted by that respective order for the logical intentions of the abstract RWLock primitive. Waits/yields of write-upgrades, write-locks, and then read-locks are grouped together by identical wake characteristics defined by platform specific quirks, sorted by that respective order for the logical intentions of the abstract RWLock primitive.
To reiterate, only WakeOnAddress in emulation mode is guaranteed to have FIFO Wait/an order of thread wake-ups - that's to say tickets or prio-wake-queue aren't used to enforce access order.
Condvar and other reodering of thread primitives.txt implies tickets are a waste of cycles, serving no point but to thrash contexts. They serve no logical high level value, nor provide low level performance gains (*). It's just a spinlock algorithm a bunch of braindead normies thought was cute.
*: you are telling the os to resched or perform a SMT yield because your wake wasnt good enough. why would you context switch again? oh right, you'd do it to sit on the exact same physical core (0-12? 0-16? out of hundreds/thousands of contexts?), most likely thrashing cache in a switch, to dequeue a high level context to perform the exact same queued up work - but with different TLS indicies!!! "./Condvar and other reodering of thread primitives.txt" implies tickets are a waste of cycles, serving no point but to thrash contexts. They serve no logical high level value, nor provide low level performance gains.
you would be hard-pressed to make a convincing argument against this perspective... It's just a spinlock algorithm a bunch of braindead normie boomers thought was cute.
...that's unless you're stuck in the bygone-era of native fibers, async io routines, and the likes, oh shit we're going back in time with meme-langs and runtimes again aren't we? You're telling the os to resched or perform a SMT yield because your wake wasnt good enough.
They're more akin to a hack to support babiest first "while (!bSocketError) { accept(); CreateThread(); }" server than an answer to any real problem. Why would you context switch again? Oh right, you'd do it to sit on the exact same physical core (0-12? 0-16? out of hundreds/thousands of contexts?), most likely thrashing cache in a switch, to dequeue a high level context to perform the exact same queued up work - but with different TLS indicies!!!
A third/fourth level of indirection of a thread context (physical core -> (hyperthread ->) software/kernel switchable context -> abstract userland schedulers' context) makes very little sense, past making these instances of shit-code suck less. ...something something im abusing condvars and mutexes as timeline barriers for garbage code. okay...
Arguably, fibers and async routines are at best a hack for naive developers who don't think about what their program is doing... or at best a hack for a single-threaded interpreter program like Python or Lua.
Arguably, optimizing thread primitives with tickets serves no purpose other than to optimize fiber-iq code, where half-assed FIFO and monte carlo guarantees aren't good enough to ensure fiber #45345 doesn't idle or spin a bit more than fiber #346.
Arguably, if you're trying to "optimize" thread primitives with tickets and in-process scheduling, and not high level descriptions of what the synchronization primitive is doing in the context of its use case and the platform it runs on, you're a fucking idiot.
Any other scheduling leg-work should be the responsibility of the kernel and perhaps a chipset vendor driver being aware of system-wide thread priorities+tasks; you are not faster or smarter than a generic interface and the kernel/os doing one of its rudimentary responsibilities.

View File

@ -12,11 +12,11 @@
* Under mutex: * Under mutex:
* Sleep: [A, B, C] * Sleep: [A, B, C]
* *
* Under or out of mutex (it doesnt matter so long as: the write itself was under lock *or* the write condition was fenced against the conditions sleep count update via a later call to mutex::lock() ): * Under or out of mutex (it doesnt matter so long as you barrier the mutex after the state change):
* ~awake all? shutdown condition? who knows~ * ~awake all? shutdown condition? who knows~
* Broadcast * Broadcast
* *
* Out of mutex (!!!): * Out of mutex (Bad Use Case !!!):
* if (~missed/incorrect if !work available check before sleep~) * if (~missed/incorrect if !work available check before sleep~)
* Sleep: [D] * Sleep: [D]
* // given that WaitForSignal forces you to unlock and relock a mutex, this illogical branch should never happen * // given that WaitForSignal forces you to unlock and relock a mutex, this illogical branch should never happen
@ -31,7 +31,7 @@
* Under mutex: * Under mutex:
* Sleep: [A, B, C] * Sleep: [A, B, C]
* *
* Not under mutex: * Not under mutex (it doesnt matter so long as you barrier the mutex after the state change):
* Signal * Signal
* *
* Under mutex: * Under mutex:
@ -43,22 +43,26 @@
* *
*--------------------------------------------- *---------------------------------------------
* Cause: * Cause:
* The abstract condition accounts for the amount of threads sleeping accuarely, not the order. * The condition variables account for the amount of threads sleeping accuarely, not the order.
* This is usually a good thing because ordering under a spinloop generally does not happen in time and/or does not matter. * This is usually a good thing because ordering under a spinloop generally does not happen in time and/or does not matter.
* The lowest common denominator of kernel thread scheduling is fundamentally that of a semaphore scheduled with respect to buckets of integer thread priority levels, and nothing more complex than that.
* To implement ordering, is to implement cache-thrashing and increased context-switching for an abstract idea of "correctness" that doesn't apply to real code or performance goals. * To implement ordering, is to implement cache-thrashing and increased context-switching for an abstract idea of "correctness" that doesn't apply to real code or performance goals.
* (spoilers: your work pool of uniform priorities couldn't care less which thread wakes up first, nor does a single waiter pattern, but your end product will certainly bleed performance with yield thrashing) * (spoilers: using a condvar, your work pool of uniform priorities couldn't care less which thread wakes up first, nor does a single waiter pattern; but your end product will certainly bleed performance with ticket yield thrashing or suboptimal spinning )
* ( : the same can be said for semaphores; what part of waiting while an available work count is zero needs ordering?) * ( : the same can be said for semaphores; what part of waiting while an available work count is not zero needs ordering? )
* ( : yield thrashing, that might i add, serves no purpose other than to get the right thread local context and decoupled-from-parent thread id of a context on a given physical core of a highly limited set) * ( : yield thrashing, that might i add, serves no purpose other than to get the right thread local context and decoupled-from-parent thread id of a context on a given physical core of a highly limited set )
* * ( : the only valid use case for ordered lock types is in the instance of RWLock read exhausting writers, and thats easily accounted by separating the read and write wake queues )
*
*--------------------------------------------- *---------------------------------------------
* The fix[es]: * The fix[es] / Mitigations:
* * Ensure to properly check the correctness of the sleep condition, and that the mutex is properly locked, before calling Aurora condition primitives * * Ensure to check the correctness of the sleep condition, and that the mutex is properly locked, before calling any Aurora condition primitives' sleep routines
* (why the fuck would you be sleeping on a variable state observer without checking its' the state, causing an unwanted deficit? this is counter to the purpose of using a condvar.) * (why the fuck would you be sleeping on a variable state observer without checking its' state, causing an unwanted defect? this is counter to the purpose of using a condvar.)
* * Increase the size of the condition variable to account for a counter and implement inefficient rescheduling, to fix buggy code *
* * Increase the size of the condition variable to account for a counter and implement inefficient rescheduling, to fix fundamentally flawed user code
* (no thanks. implement ticket primitives yourself, see: the hacky workaround.) * (no thanks. implement ticket primitives yourself, see: the hacky workaround.)
* *
* * "Problematic signals:" I know what you're trying to do, and you're being stupid for attempting to force condvars to act as barriers this way. Instead, just use the actual timeline-capable semaphore of AuFutexSemaphore::LockUntilAtleastAbsNS and bPreferEmulatedWakeOnAddress = true.
*--------------------------------------------- *---------------------------------------------
* The hacky workaround: * The hacky workaround:
* * If you can somehow justify this, i doubt it, but if you can, you can force the slow-path order condvar+mutex+semaphore by using AuFutex[Mutex/Semaphore/Cond] with ThreadingConfig::bPreferEmulatedWakeOnAddress = true. * * If you can somehow justify this, i doubt it, but if you can, you can aim for the slow-path ordered sleep of/by using AuFutex[Mutex/Semaphore/Cond] with ThreadingConfig::bPreferEmulatedWakeOnAddress = true.
* You can further eliminate the fast paths to remove fast-path reordering; but really, if you care that much, you should be implementing your own ticket primitives over AuThreading WakeOnAddress with bPreferEmulatedWakeOnAddress = true for the guarantee of a FIFO WoA interface. * * Noting that all futex paths can still take a fast path to bypass ordering.
*/ */

View File

@ -1,4 +1,4 @@
1) Without a kernel driver, it is impossible for AuProcess to preallocate address space. Memory maps cannot be placed deterministically on stock Windows 7. Although, if existing Windows patching features are used to introduce a VirtualAlloc2 dynamically, it is possible for support to be added unoffically; Aurora Runtime will just assume it's a valid Win10 RSx+ install. 1) Without a kernel driver, it is impossible for AuProcess to preallocate address space. Memory maps cannot be placed deterministically on stock Windows 7. Although, if existing Windows patching features are used to introduce a VirtualAlloc2 dynamically, it is possible for support to be added unoffically; Aurora Runtime will just assume it's a valid Win10 RSx+ install. Ok, techincally I'm lying, it is possible to do on an unpatched kernel, but by god would it be annoying and be liable to data loss. You'd basically have to emulate POSIX-s mmap exactly like how DOS extenders do it: with exception handlers, a thread to handle write back cache flushing, Kernel32!MapUserPhysicalPagesScatter, and group policy adjustments. You're better off writing a driver if you need fine grained mapping (also the same for Win10).
2 - soft defect; emuation performance) The time to wake metric across AuThreading::[Wake/Wait]OnAddress starts off at 1.5 - 2.2x modern Windows with RtlWaitOnAddress; although, the best case of basic primitives will still be faster than SRW Locks of Windows 7 through 11. Era correct internal NT apis are used across XP - Windows 11 targets. Note, WaitOnAddress emulation is not required for basic thread primitives; such scheduler indirection would only hurt performance. Performance should otherwise be exactly what you would expect once you remove Microsofts regressing CRT and lackluster stock thread primitives from the equation. 2 - soft defect; emuation performance) The time to wake metric across AuThreading::[Wake/Wait]OnAddress starts off at 1.5 - 2.2x modern Windows with RtlWaitOnAddress; although, the best case of basic primitives will still be faster than SRW Locks of Windows 7 through 11. Era correct internal NT apis are used across XP - Windows 11 targets. Note, WaitOnAddress emulation is not required for basic thread primitives; such scheduler indirection would only hurt performance. Performance should otherwise be exactly what you would expect once you remove Microsofts regressing CRT and lackluster stock thread primitives from the equation.