Metal 3.1 introduced a Metal regression bug which causes an infinite recursion
crash during Metal's analysis of an entry point input structure that itself
contains internal recursion. This patch works around this by replacing the
recursive input declaration with a alternate variable of type void*, and
then casting to the correct type at the top of the entry point function.
- Add CompilerMSL::Options::replace_recursive_inputs to enable
replacing recursive input.
- Add Compiler::type_contains_recursion() to determine if a struct
contains internal recursion, and add custom Decorations to mark
such structs, to short-cut future similar checks.
- Replace recursive input struct declarations with void*,
and emit a recast to correct type at top of entry function.
- Add unit test.
- Compiler::type_is_top_level_block() remove hardcode reference to spirv_cross
namespace, as it interferes with configurable namespaces (unrelated).
Some Metal devices have a bug with depth array textures using comparison
with explicit LoD, where the LoD given will be biased by some amount.
For these devices, we can use a gradient instead, which does not exhibit
this problem. As with the fragment demote workaround, this is only
expected to be needed until the bug is fixed in Metal.
- Add CompilerMSL::Options::argument_buffers_tier as an enumeration to
allow calling app to specify platform argument buffer tier capabilities.
- Support iOS writable images in Tier2 argument buffers when specified.
Tier capabilities based on recommendations from Apple engineering.
Some Metal devices have a bug where storage resources can still be
written to even if the fragment is discarded. This is obviously a bug in
Metal, but bothering Apple to fix it will only fix it for newer
versions; therefore, a workaround is needed for older versions. I have
made this an option so that, in case the bug is ever fixed, the
workaround can be disabled.
This workaround is simple: if a fragment shader may discard its fragment
and writes to a storage resource, a variable representing the
`HelperInvocation` built-in is created and passed to all functions. The
flag is checked on all resource writes; writes do not occur when
`HelperInvocation` is `true`. This relies on the earlier workaround to
update `HelperInvocation` when the fragment is discarded.
Fixes at least 3 failures in the CTS.
Some Metal devices have a bug where `simd_is_helper_thread()` won't
return true after a fragment has been discarded. We can work around this
by manually setting `gl_HelperInvocation` upon discarding a fragment.
This is fairly unintrusive, so it is enabled by default. I've made it an
option so that, when the bug is fixed, we can disable it.
Flattening doesn't play well with dynamic indices. In this case, it's
better to leave it as an array of structs.
(I wanted to do this for named blocks generally. Trouble is, the builtin
`gl_out` block is *also* a named block...)
Fixes six more CTS tests, under
`dEQP-VK.tessellation.user_defined_io.per_patch_block_array.*`.
Using vertex-style stage input is complex, and it doesn't support
nesting of structures or arrays. By using raw buffer input instead, we
get this support "for free," and everything becomes much simpler.
Arguably, this is the way I should've done this in the first place.
Eventually, I'd like to make this the default, and then remove the
option altogether. (And I still need to do that with
`multi_patch_workgroup`...)
Should help fix 66 tests in the Vulkan CTS, under the following trees:
- `dEQP-VK.pipeline.*.interface_matching.*`
- `dEQP-VK.tessellation.user_defined_io.*`
- `dEQP-VK.clipping.user_defined.*`
This is analogous to the existing support for fixing up shader inputs.
It is intended to be used with tessellation to add implicit builtins
that are read from a later stage, despite not being written in an
earlier stage. (Believe it or not, this is in fact legal in Vulkan.)
Helps fix 8 CTS tests under `dEQP-VK.pipeline.*.no_position`. (Eight
other tests work solely by accident without this change.)
MSL backend supports emitting custom name, and there's no reason for
HLSL to not support that as well, but we have to make it an option to
not break existing users.
Makes codegen from typical D3D emulation SPIR-V more readable.
Also makes cross compilation with NotEqual more sensible.
It's very rare to actually need the strict NaN-checks in practice.
Also, glslang now emits UnordNotEqual by default it seems, so give up
trying to assume OrdNotEqual. Harmonize for UnordNotEqual as the sane
default.
Clang added -Wunqualified-std-cast-call in
https://reviews.llvm.org/D119670, which warns on unqualified std::move
and std::forward calls. This change qualifies these calls to allow the
project to build on HEAD Clang -Werror.
In Metal, the `[[position]]` input to a fragment shader remains at
fragment center, even at sample rate, like OpenGL and Direct3D. In
Vulkan, however, when the fragment shader runs at sample rate, the
`FragCoord` builtin moves to the sample position in the framebuffer,
instead of the fragment center. To account for this difference, adjust
the `FragCoord`, if present, by the sample position. The -0.5 offset is
because the fragment center is at (0.5, 0.5).
Also, add an option to force sample-rate shading in a fragment shader.
Since Metal has no explicit control for this, this is done by adding a
dummy `[[sample_id]]` which is otherwise unused, if none is already
present. This is intended to be used from e.g. MoltenVK when a
pipeline's `minSampleShading` value is nonzero.
Instead of checking if any `Input` variables have `Sample`
interpolation, I've elected to check that the `SampleRateShading`
capability is present. Since `SampleId`, `SamplePosition`, and the
`Sample` interpolation decoration require this cap, this should be
equivalent for any valid SPIR-V module. If this isn't acceptable, let me
know.
Add support for declaring a fixed subgroup size. Metal, like Vulkan with
`VK_EXT_subgroup_size_control`, allows the thread execution width to
vary depending on factors such as register usage. Unfortunately, this
breaks several tests that depend on the subgroup size being what the
device says it is. So we'll fix the subgroup size at the size the device
declares. The extra invocations in the subgroup will appear to be
inactive. Because of this, the ballot mask builtins are now ANDed with
the active subgroup mask.
Add support for emulating a subgroup of size 1. This is intended to be
used by Vulkan Portability implementations (e.g. MoltenVK) when the
hardware/software combo provides insufficient support for subgroups.
Luckily for us, Vulkan 1.1 only requires that the subgroup size be at
least 1.
Add support for quadgroup and SIMD-group functions which were added to
iOS in Metal 2.2 and 2.3. This will allow clients to take advantage of
expanded quadgroup and SIMD-group support in recent Metal versions and
on recent Apple GPUs (families 6 and 7).
Gut emulation of subgroup builtins in fragment shaders. It turns out
codegen for the SIMD-group functions in fragment wasn't implemented for
AMD on Mojave; it's a safe bet that it wasn't implemented for the other
drivers either. Subgroup support in fragment shaders now requires Metal
2.2.
Fix reversed coordinates: `y` should be used to calculate the row
address. Align row address to the row stride.
I've made the row alignment a function constant; this makes it possible
to override it at pipeline compile time.
Honestly, I don't know how this worked at all for Epic. It definitely
didn't work in the CTS prior to this.