ESSL does not support `GL_ARB_post_depth_coverage`. There, we must use
`GL_EXT_post_depth_coverage`. I've added this as a fallback for desktop
as well.
Note that `GL_EXT_post_depth_coverage` also requires the fragment shader
to set `early_fragment_tests` explicitly, while
`GL_ARB_post_depth_coverage` does not. It doesn't really matter either
way, since `SPV_KHR_post_depth_coverage` *also* requires both execution
modes to be explicitly set.
This is not necessary, as we must emit an invalidating store before we
potentially consume an invalid expression. In fact, we're a bit
conservative here in this case for example:
int tmp = variable;
if (...)
{
variable = 10;
}
else
{
// Consuming tmp here is fine, but it was
// invalidated while emitting other branch.
// Technically, we need to study if there is an invalidating store
// in the CFG between the loading block and this block, and the other
// branch will not be a part of that analysis.
int tmp2 = tmp * tmp;
}
Fixing this case means complex CFG traversal *everywhere*, and it feels like overkill.
Fixing this exposed a bug with access chains, so fix a bug where expression dependencies were not
inherited properly in access chains. Access chains are now considered forwarded if there
is at least one dependency which is also forwarded.
If this is computed *before* a `demote`, but used *after*, forwarding it
will produce the wrong value. This does make for uglier shaders, but
it's necessary right now to ensure correctness.
I needed to use an assembly shader to produce the test for this.
`spirv-opt` is not smart enough (or too smart?) to eliminate the
variable that would be used in GLSL to express this.
This extension provides a new operation which causes a fragment to be
discarded without terminating the fragment shader invocation. The
invocation for the discarded fragment becomes a helper invocation, so
that derivatives will remain defined. The old `HelperInvocation` builtin
becomes undefined when this occurs, so a second new instruction queries
the current helper invocation status.
This is only fully supported for GLSL. HLSL doesn't support the
`IsHelperInvocation` operation and MSL doesn't support the
`DemoteToHelperInvocation` op.
Fixes#1052.
Fix fallout from changes.
There's a bug in glslang that prevents `float16_t`, `[u]int16_t`, and
`[u]int8_t` constants from adding the corresponding SPIR-V capabilities.
SPIRV-Tools, meanwhile, tightened validation so that these constants are
only valid if the corresponding `Float16`, `Int16`, and `Int8` caps are
on. This affects the `16bit-constants.frag` test for GLSL and MSL.
The only piece added by this extension is the `DeviceIndex` builtin,
which tells the shader which device in a grouped logical device it is
running on.
Metal's pipeline state objects are owned by the `MTLDevice` that created
them. Since Metal doesn't support logical grouping of devices the way
Vulkan does, we'll thus have to create a pipeline state for each device
in a grouped logical device. The upcoming peer group support in Metal 3
will not change this. For this reason, for Metal, the device index is
supplied as a constant at pipeline compile time.
There's an interaction between `VK_KHR_device_group` and
`VK_KHR_multiview` in the
`VK_PIPELINE_CREATE_VIEW_INDEX_FROM_DEVICE_INDEX_BIT`, which defines the
view index to be the same as the device index. The new
`view_index_from_device_index` MSL option supports this functionality.
Using the `PostDepthCoverage` mode specifies that the `gl_SampleMaskIn`
variable is to contain the computed coverage mask following the early
fragment tests, which this mode requires and implicitly enables.
Note that unlike Vulkan and OpenGL, Metal places this on the sample mask
input itself, and furthermore does *not* implicitly enable early
fragment testing. If it isn't enabled explicitly with an
`[[early_fragment_tests]]` attribute, the compiler will error out. So we
have to enable that mode explicitly if `PostDepthCoverage` is enabled
but `EarlyFragmentTests` isn't.
For Metal, only iOS supports this; for some reason, Apple has yet to
implement it on macOS, even though many desktop cards support it.
There is a case where we can deduce a for/while loop, but the continue
block is actually very painful to deal with, so handle that case as
well. Removes an exceptional case.
There is a risk that we try to preserve a loop variable through multiple
iterations, even though the dominating block is inside a loop.
Fix this by analyzing if a block starts off by writing to a variable. In
that case, there cannot be any preservation going on. If we don't, pretend the
loop header is reading the variable, which moves the variable to an
appropriate scope.
Buffer objects can contain arbitrary pointers to blocks.
We can also implement ConvertPtrToU and ConvertUToPtr.
The latter can cast a uint64_t to any type as it pleases,
so we will need to generate fake buffer reference blocks to be able to
cast the type.
We had a bug where error conditions in DoWhileLoop emit path would not
detect that statements were being emitted due to the masking behavior
which happens when force_recompile is true. Fix this.
Also, refactor force_recompile into member functions so we can properly
break on any situation where this is set, without having to rely on
watchpoints in debuggers.
We have an edge case where the array is declared with a concrete size,
but in GLSL we must emit an unsized array, which breaks array copies.
Deal explicitly with this.
When we force recompile, the old var.self name we used as a fallback
name might have been disturbed, so we should recover certain names back
to their original form in case we are forced to take a recompile to make
the naming algorithm more deterministic.