Metal writes to the depth/stencil attachment before fragment
shader execution if the execution does not modify the depth
value. However, Vulkan expects the write to happen after
fragment shader execution. To circumvent the issue we add
a simple depth passthrough if the user opts in. Only
required when the depth/stencil attachment is used as
input attachment at the same time. It seems Metal does not
correctly detect the dependency.
Metal will incorrectly discard fragments with side effects under
certain circumstances prematurely. The conditions are the following:
- Fragment will always be discarded after side effect operation
- Pre fragment depth fails
- Modifies depth value for a constant value in the fragment shader.
This constant value will also fail the depth test.
However, Metal will also discard the fragment even if it has
operations with side effects inside the fragment shader before the
discard operation.
Vulkan states the graphics pipeline to execute in the following
order:
- Pre fragment depth test (cannot discard here due to modifying
depth value in fragment shader)
- Fragment shader (where the depth is modified and fragment
discarded)
- Post fragment depth test
Therefore, we need to enforce fragment shader execution and not
let Metal discard the fragment before that for such cases. This
change adds an option to provide such utility.
To date, all released Apple Silicon GPUs incorrectly interpret the
gradient vectors when sampling a cube texture. Specifically, they ignore
one of the three partial derivatives in each gradient depending on the
selected major axis, and they expect the remaining derivatives to be
partially transformed.
h/t @lexaknyazev for the code used in the `spvGradientCube()` function.
Fixes 8 tests under `dEQP-VK.glsl.texture_functions.texturegrad.*`.
Metal 3.1 introduced a Metal regression bug which causes an infinite recursion
crash during Metal's analysis of an entry point input structure that itself
contains internal recursion. This patch works around this by replacing the
recursive input declaration with a alternate variable of type void*, and
then casting to the correct type at the top of the entry point function.
- Add CompilerMSL::Options::replace_recursive_inputs to enable
replacing recursive input.
- Add Compiler::type_contains_recursion() to determine if a struct
contains internal recursion, and add custom Decorations to mark
such structs, to short-cut future similar checks.
- Replace recursive input struct declarations with void*,
and emit a recast to correct type at top of entry function.
- Add unit test.
- Compiler::type_is_top_level_block() remove hardcode reference to spirv_cross
namespace, as it interferes with configurable namespaces (unrelated).
Some Metal devices have a bug with depth array textures using comparison
with explicit LoD, where the LoD given will be biased by some amount.
For these devices, we can use a gradient instead, which does not exhibit
this problem. As with the fragment demote workaround, this is only
expected to be needed until the bug is fixed in Metal.
- Add CompilerMSL::Options::argument_buffers_tier as an enumeration to
allow calling app to specify platform argument buffer tier capabilities.
- Support iOS writable images in Tier2 argument buffers when specified.
Tier capabilities based on recommendations from Apple engineering.
Some Metal devices have a bug where storage resources can still be
written to even if the fragment is discarded. This is obviously a bug in
Metal, but bothering Apple to fix it will only fix it for newer
versions; therefore, a workaround is needed for older versions. I have
made this an option so that, in case the bug is ever fixed, the
workaround can be disabled.
This workaround is simple: if a fragment shader may discard its fragment
and writes to a storage resource, a variable representing the
`HelperInvocation` built-in is created and passed to all functions. The
flag is checked on all resource writes; writes do not occur when
`HelperInvocation` is `true`. This relies on the earlier workaround to
update `HelperInvocation` when the fragment is discarded.
Fixes at least 3 failures in the CTS.
Some Metal devices have a bug where `simd_is_helper_thread()` won't
return true after a fragment has been discarded. We can work around this
by manually setting `gl_HelperInvocation` upon discarding a fragment.
This is fairly unintrusive, so it is enabled by default. I've made it an
option so that, when the bug is fixed, we can disable it.
Flattening doesn't play well with dynamic indices. In this case, it's
better to leave it as an array of structs.
(I wanted to do this for named blocks generally. Trouble is, the builtin
`gl_out` block is *also* a named block...)
Fixes six more CTS tests, under
`dEQP-VK.tessellation.user_defined_io.per_patch_block_array.*`.
Using vertex-style stage input is complex, and it doesn't support
nesting of structures or arrays. By using raw buffer input instead, we
get this support "for free," and everything becomes much simpler.
Arguably, this is the way I should've done this in the first place.
Eventually, I'd like to make this the default, and then remove the
option altogether. (And I still need to do that with
`multi_patch_workgroup`...)
Should help fix 66 tests in the Vulkan CTS, under the following trees:
- `dEQP-VK.pipeline.*.interface_matching.*`
- `dEQP-VK.tessellation.user_defined_io.*`
- `dEQP-VK.clipping.user_defined.*`
This is analogous to the existing support for fixing up shader inputs.
It is intended to be used with tessellation to add implicit builtins
that are read from a later stage, despite not being written in an
earlier stage. (Believe it or not, this is in fact legal in Vulkan.)
Helps fix 8 CTS tests under `dEQP-VK.pipeline.*.no_position`. (Eight
other tests work solely by accident without this change.)
MSL backend supports emitting custom name, and there's no reason for
HLSL to not support that as well, but we have to make it an option to
not break existing users.
Makes codegen from typical D3D emulation SPIR-V more readable.
Also makes cross compilation with NotEqual more sensible.
It's very rare to actually need the strict NaN-checks in practice.
Also, glslang now emits UnordNotEqual by default it seems, so give up
trying to assume OrdNotEqual. Harmonize for UnordNotEqual as the sane
default.
Clang added -Wunqualified-std-cast-call in
https://reviews.llvm.org/D119670, which warns on unqualified std::move
and std::forward calls. This change qualifies these calls to allow the
project to build on HEAD Clang -Werror.
In Metal, the `[[position]]` input to a fragment shader remains at
fragment center, even at sample rate, like OpenGL and Direct3D. In
Vulkan, however, when the fragment shader runs at sample rate, the
`FragCoord` builtin moves to the sample position in the framebuffer,
instead of the fragment center. To account for this difference, adjust
the `FragCoord`, if present, by the sample position. The -0.5 offset is
because the fragment center is at (0.5, 0.5).
Also, add an option to force sample-rate shading in a fragment shader.
Since Metal has no explicit control for this, this is done by adding a
dummy `[[sample_id]]` which is otherwise unused, if none is already
present. This is intended to be used from e.g. MoltenVK when a
pipeline's `minSampleShading` value is nonzero.
Instead of checking if any `Input` variables have `Sample`
interpolation, I've elected to check that the `SampleRateShading`
capability is present. Since `SampleId`, `SamplePosition`, and the
`Sample` interpolation decoration require this cap, this should be
equivalent for any valid SPIR-V module. If this isn't acceptable, let me
know.
Add support for declaring a fixed subgroup size. Metal, like Vulkan with
`VK_EXT_subgroup_size_control`, allows the thread execution width to
vary depending on factors such as register usage. Unfortunately, this
breaks several tests that depend on the subgroup size being what the
device says it is. So we'll fix the subgroup size at the size the device
declares. The extra invocations in the subgroup will appear to be
inactive. Because of this, the ballot mask builtins are now ANDed with
the active subgroup mask.
Add support for emulating a subgroup of size 1. This is intended to be
used by Vulkan Portability implementations (e.g. MoltenVK) when the
hardware/software combo provides insufficient support for subgroups.
Luckily for us, Vulkan 1.1 only requires that the subgroup size be at
least 1.
Add support for quadgroup and SIMD-group functions which were added to
iOS in Metal 2.2 and 2.3. This will allow clients to take advantage of
expanded quadgroup and SIMD-group support in recent Metal versions and
on recent Apple GPUs (families 6 and 7).
Gut emulation of subgroup builtins in fragment shaders. It turns out
codegen for the SIMD-group functions in fragment wasn't implemented for
AMD on Mojave; it's a safe bet that it wasn't implemented for the other
drivers either. Subgroup support in fragment shaders now requires Metal
2.2.