`half` cannot be bitcasted to `float`, because the two types are not the
same size. Use an expanding cast instead.
We were already doing this for stores to the tessellation levels; why I
didn't also do this for loads is beyond me.
Fix reversed coordinates: `y` should be used to calculate the row
address. Align row address to the row stride.
I've made the row alignment a function constant; this makes it possible
to override it at pipeline compile time.
Honestly, I don't know how this worked at all for Epic. It definitely
didn't work in the CTS prior to this.
MSL 2.3 has everything needed to support this extension on all
platforms. The existing `discard_fragment()` function was given demote
semantics, similar to Direct3D, and the `simd_is_helper_thread()`
function was finally added to iOS.
I've left the old test alone. Should I remove it in favor of these?
These need to use arrayed texture types, or Metal will complain when
binding the resource. The target layer is addressed relative to the
Layer output by the vertex pipeline, or to the ViewIndex if in a
multiview pipeline. Unlike with the s/t coordinates, Vulkan does not
forbid non-zero layer coordinates here, though this cannot be expressed
in Vulkan GLSL.
Supporting 3D textures will require additional work. Part of the problem
is that Metal does not allow texture views to subset a 3D texture, so we
need some way to pass the base depth to the shader.
Some older iOS devices don't support layered rendering. In that case,
don't set `[[render_target_array_index]]`, because the compiler will
reject the shader in that case. The client will then have to unroll the
render pass manually.
Account for a non-zero base instance when calculating the view index and
the "real" instance index. Before, it was likely broken with a non-zero
base instance, since the calculated instance index could be less than
the base instance.
- Do not silently drop reserved identifiers in the parser. This makes it
possible to reflect identifiers which are reserved by the
cross-compiler module.
- Instead of dropping the name, emit _RESERVED_IDENTIFIER_FIXUP in the
source to make it clear that a name has been rewritten.
- Document what is reserved and not.
Prior to this point, we were treating them as flattened, as they are in
old-style tessellation control shaders, and still are for structs in
new-style shaders. This is not true for outputs; output composites are
not flattened at all. This semantic mismatch broke a Vulkan CTS test.
It should now pass.
To facilitate an improved linking-by-name use case for older GL,
we will be more aggressive about merging struct definitions, even for
rather unrelated cases where we don't strictly need to use type aliases.
In Metal render pipelines don't have an option to set a sampleMask
parameter, the only way to get that functionality is to set the
sample_mask output of the fragment shader to this value directly.
We also need to take care to combine the fixed sample mask with the
one that the shader might possibly output.
This should hopefully reduce underutilization of the GPU, especially on
GPUs where the thread execution width is greater than the number of
control points.
This also simplifies initialization by reading the buffer directly
instead of using Metal's vertex-attribute-in-compute support. It turns
out the only way in which shader stages are allowed to differ in their
interfaces is in the number of components per vector; the base type must
be the same. Since we are using the raw buffer instead of attributes, we
can now also emit arrays and matrices directly into the buffer, instead
of flattening them and then unpacking them. Structs are still flattened,
however; this is due to the need to handle vectors with fewer components
than were output, and I think handling this while also directly emitting
structs could get ugly.
Another advantage of this scheme is that the extra invocations needed to
read the attributes when there were more input than output points are
now no more. The number of threads per workgroup is now lcm(SIMD-size,
output control points). This should ensure we always process a whole
number of patches per workgroup.
To avoid complexity handling indices in the tessellation control shader,
I've also changed the way vertex shaders for tessellation are handled.
They are now compute kernels using Metal's support for vertex-style
stage input. This lets us always emit vertices into the buffer in order
of vertex shader execution. Now we no longer have to deal with indexing
in the tessellation control shader. This also fixes a long-standing
issue where if an index were greater than the number of vertices to
draw, the vertex shader would wind up writing outside the buffer, and
the vertex would be lost.
This is a breaking change, and I know SPIRV-Cross has other clients, so
I've hidden this behind an option for now. In the future, I want to
remove this option and make it the default.
On MSL, the compiler refuses to allow access chains into a normal vector type.
What happens in practice instead is a read-modify-write where a vector type is
loaded, modified and written back.
The workaround is to convert a vector into a pointer-to-scalar before
the access chain continues to add the scalar index.
When inside a loop, treat any read of outer expressions to happen
multiple times, forcing a temporary of said outer expressions.
This avoids the problem where we can end up relying on loop-invariant code motion to happen in the
compiler when converting optimized shaders.
When we see a switch block which only contains one default block, emit a
do {} while(false) statement instead, which is far more idiomatic and
readable anyways.
Metal is picky about interface matching. If the types don't match
exactly, down to the number of vector components, Metal fails pipline
compilation. To support pipelines where the number of components
consumed by the fragment shader is less than that produced by the vertex
shader, we have to fix up the fragment shader to accept all the
components produced.
DX may emit ArrayStride and MatrixStride of 16, but the size of the
object does not align with that and expect to pack other members inside
its last member.
The workaround is to emit array size/col/row one less than we expect and
rely on padding to carve out a "dead zone" for the last member.
DXVK emits SPIR-V where fragment shader builtins have names derived from
DXBC assembly, e.g. `oDepth` for `FragDepth`. When we declared the
disabled output, we used this name, but when referencing it, we
continued to use the GLSL name. This breaks compilation.
Like with `point_size` when not rendering points, Metal complains when
writing to a variable using the `[[depth]]` qualifier when no depth
buffer be attached. In that case, we must avoid emitting `FragDepth`,
just like with `PointSize`.
I assume it will also complain if there be no stencil attachment and the
shader write to `[[stencil]]`, or it write to `[[color(n)]]` but there
be no color attachment at n.
Limit inline blocks to one per descriptor set.
This should avoid the need for complicated code to calculate the
argument buffer ID stride of an inline uniform block. If there's demand
for more inline blocks, we can revisit this.
Here, the inline uniform block is explicit: we instantiate the buffer
block itself in the argument buffer, instead of a pointer to the buffer.
I just hope this will work with the `MTLArgumentDescriptor` API...
Note that Metal recursively assigns individual members of embedded
structs IDs. This means for automatic assignment that we have to
calculate the binding stride for a given buffer block. For MoltenVK,
we'll simply increment the ID by the size of the inline uniform block.
Then the later IDs will never conflict with the inline uniform block. We
can get away with this because Metal doesn't require that IDs be
contiguous, only monotonically increasing.
This is implied in both GL and GLES. Emitting memoryBarrierShared() was
based on earlier confusion in the spec which has since been fixed and
clarified.
MSL does not support this, so we have to emulate it by passing it around
as a varying between stages. We use a special "user(clipN)" attribute
for this rather than locN which is used for user varyings.
This CL updates the test runner to only run spirv-opt if the generated
SPIR-V is valid. If validation is skipped it's possible to hit aborts
and other memory errors in the optimizer as it assumes the SPIR-V is
valid.
There was a hack to workaround a bug in DXC where control point -> patch
constant phase was passed in Function storage, but we have to use
Workgroup here. We will not support these kinds of hacks for invalid
SPIR-V, so hack the reference files to use the "proper" fix and remove
the hack for time being.
Need to make output 100% exact for MSVC and GCC libc testing, but they are 1 ULP
off when converting fp32 to string in some weird corner cases.
Roundtrip should be correct though, but that outside the scope of
SPIRV-Cross.
We had output dependent on complex_continue being set, but setting that
flag was dependent on unordered_set declaration order. Make it invariant
to ordering and change the implementation so it knows about the new
temporary hoisting for access chains.
To support loading array of array properly in tessellation, we need a
rewrite of how tessellation access chains are handled.
The major change is to remove the implicit unflatten step inside
access_chain which does not take into account the case where you load
directly from a control point array variable.
We defer unflatten step until OpLoad time instead.
This fixes cases where we load array of {array,matrix,struct}.
Removes the hacky path for MSL access chain index workaround.
Non-patch arrays of IO variables in tesc/tese have their array index
stripped, and access chains are specially handled, we shouldn't attempt
to create "normal" arrays of these.