There is a risk that we try to preserve a loop variable through multiple
iterations, even though the dominating block is inside a loop.
Fix this by analyzing if a block starts off by writing to a variable. In
that case, there cannot be any preservation going on. If we don't, pretend the
loop header is reading the variable, which moves the variable to an
appropriate scope.
In multiple-entry-point modules, we declared builtin inputs which were
not supposed to be used for that entry point.
Fix this, by being more strict when checking which builtins to emit.
This gets rather complicated because MSL does not support OpArrayLength
natively. We need to pass down a buffer which contains buffer sizes, and
we compute the array length on-demand.
Support both discrete descriptors as well as argument buffers.
MSL generally emits the aliases, which means we cannot always place the
master type first, unlike GLSL and HLSL. The logic fix is just to
reorder after we have tagged types with packing information, rather than
doing it in the parser fixup.
Change aux buffer to swizzle buffer.
There is no good reason to expand the aux buffer, so name it
appropriately.
Make the code cleaner by emitting a straight pointer to uint rather than
a dummy struct which only contains a single unsized array member anyways.
This will also end up being very similar to how we implement swizzle
buffers for argument buffers.
Do not use implied binding if it overflows int32_t.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
MSL does not seem to have a qualifier for this, but HLSL SM 5.1 does.
glslangValidator for HLSL does not support this, so skip any validation,
but it passes in FXC.
Atomics are not supported on images or texture_buffers in MSL.
Properly throw an error if OpImageTexelPointer is used (since it can
only be used for atomic operations anyways).
Avoids ugly warnings on nearly every compute shader.
We could do analysis to detect whether we need to emit this constant,
but it's a bit tedious to figure out if an OpConstantComponent is
actually used by opcodes, so just make it simple.
Return after loading the input control point array if there are more
input points than output points, and this was one of the helper
invocations spun off to load the input points. I was hesitant to do this
initially, since the MSL spec has this to say about barriers:
> The `threadgroup_barrier` (or `simdgroup_barrier`) function must be
> encountered by all threads in a threadgroup (or SIMD-group) executing
> the kernel.
That is, if any thread executes the barrier, then all threads must
execute it, or the barrier'd invocations will hang. But, the key words
here seem to be "executing the kernel;" inactive invocations, those that
have already returned, need not encounter the barrier to prevent hangs.
Indeed, I've encountered no problems from doing this, at least on my
hardware. This also fixes a few CTS tests that were failing due to
execution ordering; apparently, my assumption that the later, invalid
data written by the helpers would get overwritten was wrong.
The tessellation levels in Metal are stored as a densely-packed array of
half-precision floating point values. But, stage-in attributes in Metal
have to have offsets and strides aligned to a multiple of four, so we
can't add them individually. Luckily for us, the arrays have lengths
less than 4. So, let's use vectors for them!
Triangles get a single attribute with a `float4`, where the outer levels
are in `.xyz` and the inner levels are in `.w`. The arrays are unpacked
as though we had added the elements individually. Quads get two: a
`float4` with the outer levels and a `float2` with the inner levels.
Further, since vectors can be indexed as arrays, there's no need to
unpack them in this case.
This also saves on precious vertex attributes. Before, we were using up
to 6 of them. Now we need two at most.
In SPIR-V, there are always two inner levels and four outer levels, even
if the input patch isn't a quad patch. But in MSL, due to requirements
imposed by Metal, only one inner level and three outer levels exist when
the input patch is a triangle patch. We must explicitly ignore any write
to the nonexistent second inner and fourth outer levels in this case.
This is intended to be used to support `VK_KHR_maintenance2`'s
tessellation domain origin feature. If `tess_domain_origin_lower_left`
is `true`, the `v` coordinate will be inverted with respect to the
domain. Additionally, in `Triangles` mode, the `v` and `w` coordinates
will be swapped. This is because the winding order is interpreted
differently in lower-left mode.
These are mapped to Metal's post-tessellation vertex functions. The
semantic difference is much less here, so this change should be simpler
than the previous one. There are still some hairy parts, though.
In MSL, the array of control point data is represented by a special
type, `patch_control_point<T>`, where `T` is a valid stage-input type.
This object must be embedded inside the patch-level stage input. For
this reason, I've added a new type to the type system to represent this.
On Mac, the number of input control points to the function must be
specified in the `patch()` attribute. This is optional on iOS.
SPIRV-Cross takes this from the `OutputVertices` execution mode; the
intent is that if it's not set in the shader itself, MoltenVK will set
it from the tessellation control shader. If you're translating these
offline, you'll have to update the control point count manually, since
this number must match the number that is passed to the
`drawPatches:...` family of methods.
Fixes#120.
This should fix a whole host of issues related to structs in the `Input`
class in a tessellation control shader.
Also, use pointer arithmetic instead of dereferencing the `ops` array.
This is critical in case we wind up stepping beyond the bounds of the
array.
There's no need to do so, since these are not stage-out structs being
returned, but regular structures being written to a buffer. This also
neatly avoids issues writing to composite (e.g. arrayed) per-patch
outputs from a tessellation control shader.
These are transpiled to kernel functions that write the output of the
shader to three buffers: one for per-vertex varyings, one for per-patch
varyings, and one for the tessellation levels. This structure is
mandated by the way Metal works, where the tessellation factors are
supplied to the draw method in their own buffer, while the per-patch and
per-vertex varyings are supplied as though they were vertex attributes;
since they have different step rates, they must be in separate buffers.
The kernel is expected to be run in a workgroup whose size is the
greater of the number of input or output control points. It uses Metal's
support for vertex-style stage input to a compute shader to get the
input values; therefore, at least one instance must run per input point.
Meanwhile, Vulkan mandates that it run at least once per output point.
Overrunning the output array is a concern, but any values written should
either be discarded or overwritten by subsequent patches. I'm probably
going to put some slop space in the buffer when I integrate this into
MoltenVK to be on the safe side.