We only considered invalid names, and overwrote the alias for the
function. The correct fix is to replace illegal names early, do the
reserved fixup, then copy back alias to entry point name.
This is necessary to avoid invalid output because of how implicit
dependencies on builtins work.
For example, the fixup for `BuiltInSubgroupEqMask` initializes the
variable based on `builtin_subgroup_invocation_id_id`, a field storing
the ID for a variable with decoration `BuiltInSubgroupLocalInvocationId`.
This could be either a variable that already exists in the input
(spirv_msl.cpp:300) or, if necessary, a newly created one
(spirv_msl.cpp:621). In both cases, though,
`builtin_subgroup_invocation_id_id` is only set under the condition
`need_subgroup_mask || needs_subgroup_invocation_id`.
`need_subgroup_mask` is true if any of the `BuiltInSubgroupXXMask` are
set in `active_input_builtins`.
Normally, if the program contains `BuiltInSubgroupEqMask`,
`Compiler::ActiveBuiltinHandler` will set it in `active_input_builtins`.
But this only happens if the variable is actually used, whereas
`fix_up_shader_inputs_outputs` loops over all variables in the program
regardless of whether they're used.
If `BuiltInSubgroupEqMask` is not used,
`builtin_subgroup_invocation_id_id` is never set, but before this patch
the fixup hook would try to use it anyway, producing MSL that references
a nonexistent variable named `_0`.
Avoid this by changing `fix_up_shader_inputs_outputs` to skip builtins
which are not set in `active_input_builtins` or
`active_output_builtins`. And add a test case.
In Metal, the `[[position]]` input to a fragment shader remains at
fragment center, even at sample rate, like OpenGL and Direct3D. In
Vulkan, however, when the fragment shader runs at sample rate, the
`FragCoord` builtin moves to the sample position in the framebuffer,
instead of the fragment center. To account for this difference, adjust
the `FragCoord`, if present, by the sample position. The -0.5 offset is
because the fragment center is at (0.5, 0.5).
Also, add an option to force sample-rate shading in a fragment shader.
Since Metal has no explicit control for this, this is done by adding a
dummy `[[sample_id]]` which is otherwise unused, if none is already
present. This is intended to be used from e.g. MoltenVK when a
pipeline's `minSampleShading` value is nonzero.
Instead of checking if any `Input` variables have `Sample`
interpolation, I've elected to check that the `SampleRateShading`
capability is present. Since `SampleId`, `SamplePosition`, and the
`Sample` interpolation decoration require this cap, this should be
equivalent for any valid SPIR-V module. If this isn't acceptable, let me
know.
We have been interchanging spv and SPIRV_Cross_ for a while, which
causes weirdness since we don't explicitly ban SPIRV_Cross identifiers,
as these identifiers are generally used for interface variable
workarounds.
Add support for declaring a fixed subgroup size. Metal, like Vulkan with
`VK_EXT_subgroup_size_control`, allows the thread execution width to
vary depending on factors such as register usage. Unfortunately, this
breaks several tests that depend on the subgroup size being what the
device says it is. So we'll fix the subgroup size at the size the device
declares. The extra invocations in the subgroup will appear to be
inactive. Because of this, the ballot mask builtins are now ANDed with
the active subgroup mask.
Add support for emulating a subgroup of size 1. This is intended to be
used by Vulkan Portability implementations (e.g. MoltenVK) when the
hardware/software combo provides insufficient support for subgroups.
Luckily for us, Vulkan 1.1 only requires that the subgroup size be at
least 1.
Add support for quadgroup and SIMD-group functions which were added to
iOS in Metal 2.2 and 2.3. This will allow clients to take advantage of
expanded quadgroup and SIMD-group support in recent Metal versions and
on recent Apple GPUs (families 6 and 7).
Gut emulation of subgroup builtins in fragment shaders. It turns out
codegen for the SIMD-group functions in fragment wasn't implemented for
AMD on Mojave; it's a safe bet that it wasn't implemented for the other
drivers either. Subgroup support in fragment shaders now requires Metal
2.2.
New in MSL 2.3 is a template that can be used in the place of a scalar
type in a stage-in struct. This template has methods which interpolate
the varying at the given points. Curiously, you can't set interpolation
attributes on such a varying; perspective-correctness is encoded in the
type, while interpolation must be done using one of the methods. This
makes using this somewhat awkward from SPIRV-Cross, requiring us to jump
through a bunch of hoops to make this all work.
Using varyings from functions in particular is a pain point, requiring
us to pass the stage-in struct itself around. An alternative is to pass
references to the interpolants; except this will fall over badly with
composite types, which naturally must be flattened. As with
tessellation, dynamic indexing isn't supported with pull-model
interpolation. This is because of the need to reference the original
struct member in order to call one of the pull-model interpolation
methods on it. Also, this is done at the variable level; this means that
if one varying in a struct is used with the pull-model functions, then
the entire struct is emitted as pull-model interpolants.
For some reason, this was not documented in the MSL spec, though there
is a property on `MTLDevice`, `supportsPullModelInterpolation`,
indicating support for this, which *is* documented. This does not appear
to be implemented yet for AMD: it returns `NO` from
`supportsPullModelInterpolation`, and pipelines with shaders using the
templates fail to compile. It *is* implemeted for Intel. It's probably
also implemented for Apple GPUs: on Apple Silicon, OpenGL calls down to
Metal, and it wouldn't be possible to use the interpolation functions
without this implemented in Metal.
Based on my testing, where SPIR-V and GLSL have the offset relative to
the pixel center, in Metal it appears to be relative to the pixel's
upper-left corner, as in HLSL. Therefore, I've added an offset 0.4375,
i.e. one half minus one sixteenth, to all arguments to
`interpolate_at_offset()`.
This also fixes a long-standing bug: if a pull-model interpolation
function is used on a varying, make sure that varying is declared. We
were already doing this only for the AMD pull-model function,
`interpolateAtVertexAMD()`; for reasons which are completely beyond me,
we weren't doing this for the base interpolation functions. I also note
that there are no tests for the interpolation functions for GLSL or
HLSL.
I kept the code to replace constant zero arguments, because `Bias` and
`Grad` still have some problems on desktop GPUs.
`Bias` works on AMD GPUs. `Grad` does not. Both work on Intel. Still
needs testing on NV. It will definitely work with Apple GPUs.
Metal doesn't support broadcasting or shuffling boolean values, but we
can work around that by casting it to `ushort`, then casting it back to
`bool`. I used `ushort` instead of `uint` because 16-bit values give
better throughput on Apple GPUs.