Subsequent stages can legally attempt to read from these variables,
which causes compilation failure.
Always make sure we emit user outputs in vertex shaders if they are
active in the entry point.
We only considered invalid names, and overwrote the alias for the
function. The correct fix is to replace illegal names early, do the
reserved fixup, then copy back alias to entry point name.
This is necessary to avoid invalid output because of how implicit
dependencies on builtins work.
For example, the fixup for `BuiltInSubgroupEqMask` initializes the
variable based on `builtin_subgroup_invocation_id_id`, a field storing
the ID for a variable with decoration `BuiltInSubgroupLocalInvocationId`.
This could be either a variable that already exists in the input
(spirv_msl.cpp:300) or, if necessary, a newly created one
(spirv_msl.cpp:621). In both cases, though,
`builtin_subgroup_invocation_id_id` is only set under the condition
`need_subgroup_mask || needs_subgroup_invocation_id`.
`need_subgroup_mask` is true if any of the `BuiltInSubgroupXXMask` are
set in `active_input_builtins`.
Normally, if the program contains `BuiltInSubgroupEqMask`,
`Compiler::ActiveBuiltinHandler` will set it in `active_input_builtins`.
But this only happens if the variable is actually used, whereas
`fix_up_shader_inputs_outputs` loops over all variables in the program
regardless of whether they're used.
If `BuiltInSubgroupEqMask` is not used,
`builtin_subgroup_invocation_id_id` is never set, but before this patch
the fixup hook would try to use it anyway, producing MSL that references
a nonexistent variable named `_0`.
Avoid this by changing `fix_up_shader_inputs_outputs` to skip builtins
which are not set in `active_input_builtins` or
`active_output_builtins`. And add a test case.
Add support for declaring a fixed subgroup size. Metal, like Vulkan with
`VK_EXT_subgroup_size_control`, allows the thread execution width to
vary depending on factors such as register usage. Unfortunately, this
breaks several tests that depend on the subgroup size being what the
device says it is. So we'll fix the subgroup size at the size the device
declares. The extra invocations in the subgroup will appear to be
inactive. Because of this, the ballot mask builtins are now ANDed with
the active subgroup mask.
Add support for emulating a subgroup of size 1. This is intended to be
used by Vulkan Portability implementations (e.g. MoltenVK) when the
hardware/software combo provides insufficient support for subgroups.
Luckily for us, Vulkan 1.1 only requires that the subgroup size be at
least 1.
Add support for quadgroup and SIMD-group functions which were added to
iOS in Metal 2.2 and 2.3. This will allow clients to take advantage of
expanded quadgroup and SIMD-group support in recent Metal versions and
on recent Apple GPUs (families 6 and 7).
Gut emulation of subgroup builtins in fragment shaders. It turns out
codegen for the SIMD-group functions in fragment wasn't implemented for
AMD on Mojave; it's a safe bet that it wasn't implemented for the other
drivers either. Subgroup support in fragment shaders now requires Metal
2.2.
Metal doesn't support broadcasting or shuffling boolean values, but we
can work around that by casting it to `ushort`, then casting it back to
`bool`. I used `ushort` instead of `uint` because 16-bit values give
better throughput on Apple GPUs.
Only the least *n* bits are significant, where *n* is the subgroup size.
The Vulkan CTS actually checks this.
The `FindLSB` tests weren't actually failing, but I masked that anyway,
in case there's some corner case the CTS is missing.
`SubgroupEqMask` had a fencepost error that gave wrong values for
invocation ID 32.
For `SubgroupGeMask` and `SubgroupGtMask`, I forgot to shift the values
from `extract_bits()` up so that the mask is in the correct position.
Using `insert_bits()` instead should fold these two operations into one.
`SubgroupLtMask` and `SubgroupLeMask` were already correct.
These need to use arrayed texture types, or Metal will complain when
binding the resource. The target layer is addressed relative to the
Layer output by the vertex pipeline, or to the ViewIndex if in a
multiview pipeline. Unlike with the s/t coordinates, Vulkan does not
forbid non-zero layer coordinates here, though this cannot be expressed
in Vulkan GLSL.
Supporting 3D textures will require additional work. Part of the problem
is that Metal does not allow texture views to subset a 3D texture, so we
need some way to pass the base depth to the shader.
This should hopefully reduce underutilization of the GPU, especially on
GPUs where the thread execution width is greater than the number of
control points.
This also simplifies initialization by reading the buffer directly
instead of using Metal's vertex-attribute-in-compute support. It turns
out the only way in which shader stages are allowed to differ in their
interfaces is in the number of components per vector; the base type must
be the same. Since we are using the raw buffer instead of attributes, we
can now also emit arrays and matrices directly into the buffer, instead
of flattening them and then unpacking them. Structs are still flattened,
however; this is due to the need to handle vectors with fewer components
than were output, and I think handling this while also directly emitting
structs could get ugly.
Another advantage of this scheme is that the extra invocations needed to
read the attributes when there were more input than output points are
now no more. The number of threads per workgroup is now lcm(SIMD-size,
output control points). This should ensure we always process a whole
number of patches per workgroup.
To avoid complexity handling indices in the tessellation control shader,
I've also changed the way vertex shaders for tessellation are handled.
They are now compute kernels using Metal's support for vertex-style
stage input. This lets us always emit vertices into the buffer in order
of vertex shader execution. Now we no longer have to deal with indexing
in the tessellation control shader. This also fixes a long-standing
issue where if an index were greater than the number of vertices to
draw, the vertex shader would wind up writing outside the buffer, and
the vertex would be lost.
This is a breaking change, and I know SPIRV-Cross has other clients, so
I've hidden this behind an option for now. In the future, I want to
remove this option and make it the default.
On MSL, the compiler refuses to allow access chains into a normal vector type.
What happens in practice instead is a read-modify-write where a vector type is
loaded, modified and written back.
The workaround is to convert a vector into a pointer-to-scalar before
the access chain continues to add the scalar index.
If a buffer rewrites its Offsets, all member references to that struct
are invalidated, and must be redirected, do so in to_member_reference,
but there might be other places where this is needed. Fix as required.
SPIR-V code relying on this is somewhat questionable, but seems to be
in-spec.
Do not attempt to defer declaration. It would happen to work in most
cases, but the edge case is where the first thing that happens to a
variable is being OpStore'd into.
DX may emit ArrayStride and MatrixStride of 16, but the size of the
object does not align with that and expect to pack other members inside
its last member.
The workaround is to emit array size/col/row one less than we expect and
rely on padding to carve out a "dead zone" for the last member.
- Fixes issue with clip_distance flattening in MSL where member to
flatten from would come from to_member_name, where it should have used
the builtin name directly. This member name was modified by this patch
and broke clip distance test shaders.
- Some cleanups with ir.meta, use ir.find_meta instead to not create
unnecessary hashmap nodes.
This avoids a lot of huge code changes.
Arrays generally cannot be copied in and out of buffers, at least no
compiler frontend seems to do it.
Also avoids a lot of issues surrounding packed vectors and matrices.
It is possible for a shader to declare two plain struct types which
simply share the same OpName without there being an implicit
value/buffer alias relationship.
For to_member_name(), make sure to use the type alias master when
resolving member names. The member name may be different in a type alias
master if the SPIR-V is being intentionally difficult.
Rolled the hashes used for glslang, SPIRV-Tools, and SPIRV-Headers to
HEAD, which includes the update to 1.5.
Added passing '--amb' to glslang, so I didn't have to explicitly set
bindings in a large number of test shaders that currently don't, and
now glslang considers them invalid.
Marked all shaders that no longer pass spirv-val as .invalid.
This change introduces functions and in one case, a class, to support
the `VK_KHR_sampler_ycbcr_conversion` extension. Except in the case of
GBGR8 and BGRG8 formats, for which Metal natively supports implicit
chroma reconstruction, we're on our own here. We have to do everything
ourselves. Much of the complexity comes from the need to support
multiple planes, which must now be passed to functions that use the
corresponding combined image-samplers. The rest is from the actual
Y'CbCr conversion itself, which requires additional post-processing of
the sample retrieved from the image.
Passing sampled images to a function was a particular problem. To
support this, I've added a new class which is emitted to MSL shaders
that pass sampled images with Y'CbCr conversions attached around. It
can handle sampled images with or without Y'CbCr conversion. This is an
awful abomination that should not exist, but I'm worried that there's
some shader out there which does this. This support requires Metal 2.0
to work properly, because it uses default-constructed texture objects,
which were only added in MSL 2. I'm not even going to get into arrays of
combined image-samplers--that's a whole other can of worms. They are
deliberately unsupported in this change.
I've taken the liberty of refactoring the support for texture swizzling
while I'm at it. It's now treated as a post-processing step similar to
Y'CbCr conversion. I'd like to think this is cleaner than having
everything in `to_function_name()`/`to_function_args()`. It still looks
really hairy, though. I did, however, get rid of the explicit type
arguments to `spvGatherSwizzle()`/`spvGatherCompareSwizzle()`.
Update the C API. In addition to supporting this new functionality, add
some compiler options that I added in previous changes, but for which I
neglected to update the C API.
This is not necessary, as we must emit an invalidating store before we
potentially consume an invalid expression. In fact, we're a bit
conservative here in this case for example:
int tmp = variable;
if (...)
{
variable = 10;
}
else
{
// Consuming tmp here is fine, but it was
// invalidated while emitting other branch.
// Technically, we need to study if there is an invalidating store
// in the CFG between the loading block and this block, and the other
// branch will not be a part of that analysis.
int tmp2 = tmp * tmp;
}
Fixing this case means complex CFG traversal *everywhere*, and it feels like overkill.
Fixing this exposed a bug with access chains, so fix a bug where expression dependencies were not
inherited properly in access chains. Access chains are now considered forwarded if there
is at least one dependency which is also forwarded.