When inside a loop, treat any read of outer expressions to happen
multiple times, forcing a temporary of said outer expressions.
This avoids the problem where we can end up relying on loop-invariant code motion to happen in the
compiler when converting optimized shaders.
When we see a switch block which only contains one default block, emit a
do {} while(false) statement instead, which is far more idiomatic and
readable anyways.
Metal is picky about interface matching. If the types don't match
exactly, down to the number of vector components, Metal fails pipline
compilation. To support pipelines where the number of components
consumed by the fragment shader is less than that produced by the vertex
shader, we have to fix up the fragment shader to accept all the
components produced.
DX may emit ArrayStride and MatrixStride of 16, but the size of the
object does not align with that and expect to pack other members inside
its last member.
The workaround is to emit array size/col/row one less than we expect and
rely on padding to carve out a "dead zone" for the last member.
DXVK emits SPIR-V where fragment shader builtins have names derived from
DXBC assembly, e.g. `oDepth` for `FragDepth`. When we declared the
disabled output, we used this name, but when referencing it, we
continued to use the GLSL name. This breaks compilation.
Like with `point_size` when not rendering points, Metal complains when
writing to a variable using the `[[depth]]` qualifier when no depth
buffer be attached. In that case, we must avoid emitting `FragDepth`,
just like with `PointSize`.
I assume it will also complain if there be no stencil attachment and the
shader write to `[[stencil]]`, or it write to `[[color(n)]]` but there
be no color attachment at n.
Limit inline blocks to one per descriptor set.
This should avoid the need for complicated code to calculate the
argument buffer ID stride of an inline uniform block. If there's demand
for more inline blocks, we can revisit this.
Here, the inline uniform block is explicit: we instantiate the buffer
block itself in the argument buffer, instead of a pointer to the buffer.
I just hope this will work with the `MTLArgumentDescriptor` API...
Note that Metal recursively assigns individual members of embedded
structs IDs. This means for automatic assignment that we have to
calculate the binding stride for a given buffer block. For MoltenVK,
we'll simply increment the ID by the size of the inline uniform block.
Then the later IDs will never conflict with the inline uniform block. We
can get away with this because Metal doesn't require that IDs be
contiguous, only monotonically increasing.
This is implied in both GL and GLES. Emitting memoryBarrierShared() was
based on earlier confusion in the spec which has since been fixed and
clarified.