This is not necessary, as we must emit an invalidating store before we
potentially consume an invalid expression. In fact, we're a bit
conservative here in this case for example:
int tmp = variable;
if (...)
{
variable = 10;
}
else
{
// Consuming tmp here is fine, but it was
// invalidated while emitting other branch.
// Technically, we need to study if there is an invalidating store
// in the CFG between the loading block and this block, and the other
// branch will not be a part of that analysis.
int tmp2 = tmp * tmp;
}
Fixing this case means complex CFG traversal *everywhere*, and it feels like overkill.
Fixing this exposed a bug with access chains, so fix a bug where expression dependencies were not
inherited properly in access chains. Access chains are now considered forwarded if there
is at least one dependency which is also forwarded.
This subtle bug removed any expression validation for trivially swizzled
variables. Make usage suppression a more explicit concept rather than
just hacking off forwarded_temporaries.
There is some fallout here with loop generation since our expression
invalidation is currently a bit too naive to handle loops properly.
The forwarding bug masked this problem until now.
If part of the loop condition is also used in the body, we end up
reading an invalid expression, which in turn forces a temporary to be
generated in the condition block, not good. We'll need to be smarter
here ...
If this is computed *before* a `demote`, but used *after*, forwarding it
will produce the wrong value. This does make for uglier shaders, but
it's necessary right now to ensure correctness.
I needed to use an assembly shader to produce the test for this.
`spirv-opt` is not smart enough (or too smart?) to eliminate the
variable that would be used in GLSL to express this.
This extension provides a new operation which causes a fragment to be
discarded without terminating the fragment shader invocation. The
invocation for the discarded fragment becomes a helper invocation, so
that derivatives will remain defined. The old `HelperInvocation` builtin
becomes undefined when this occurs, so a second new instruction queries
the current helper invocation status.
This is only fully supported for GLSL. HLSL doesn't support the
`IsHelperInvocation` operation and MSL doesn't support the
`DemoteToHelperInvocation` op.
Fixes#1052.
The only piece added by this extension is the `DeviceIndex` builtin,
which tells the shader which device in a grouped logical device it is
running on.
Metal's pipeline state objects are owned by the `MTLDevice` that created
them. Since Metal doesn't support logical grouping of devices the way
Vulkan does, we'll thus have to create a pipeline state for each device
in a grouped logical device. The upcoming peer group support in Metal 3
will not change this. For this reason, for Metal, the device index is
supplied as a constant at pipeline compile time.
There's an interaction between `VK_KHR_device_group` and
`VK_KHR_multiview` in the
`VK_PIPELINE_CREATE_VIEW_INDEX_FROM_DEVICE_INDEX_BIT`, which defines the
view index to be the same as the device index. The new
`view_index_from_device_index` MSL option supports this functionality.
Using the `PostDepthCoverage` mode specifies that the `gl_SampleMaskIn`
variable is to contain the computed coverage mask following the early
fragment tests, which this mode requires and implicitly enables.
Note that unlike Vulkan and OpenGL, Metal places this on the sample mask
input itself, and furthermore does *not* implicitly enable early
fragment testing. If it isn't enabled explicitly with an
`[[early_fragment_tests]]` attribute, the compiler will error out. So we
have to enable that mode explicitly if `PostDepthCoverage` is enabled
but `EarlyFragmentTests` isn't.
For Metal, only iOS supports this; for some reason, Apple has yet to
implement it on macOS, even though many desktop cards support it.
There is a case where we can deduce a for/while loop, but the continue
block is actually very painful to deal with, so handle that case as
well. Removes an exceptional case.
This decoration might only be present for the very last ID which is
consumed by a sampling or Load/Store instruction. To make sure our
access chains are emitted correctly, we have to back-propagate this
decoration.
This is quite complex since we cannot flush Phi inside the case labels,
we have to do it outside by emitting a lot of manual branches ourselves.
This should be extremely rare, but we need to handle this case.
If we compile multiple times due to forced_recompile, we had
deferred_declaration = true while emitting function prototypes which
broke an assumption. Fix this by clearing out stale state before leaving
a function.