Since `bool` is a logical type, it cannot be used in uniform or storage
buffers. Therefore, replacing it in structures should not change the
shader interface.
We leave it alone for builtins. (FIXME: Should we also leave it for
I/O varyings?)
Fixes 24 CTS tests under `dEQP-VK.memory_model.shared`.
- Assign ulongn physical type to buffer pointers in short arrays
when array stride is larger than pointer size.
- Support GL_EXT_buffer_reference_uvec2 casting
buffer reference pointers to and from uvec2 values.
- When packing structs, include structs inside physical buffers.
- Update mechanism for traversing pointer arrays when calculating type sizes.
- Added unit test shaders.
Fixes numerous CTS tests of types
dEQP-VK.pipeline.interface_matching.vector_length.member_of_*,
passing complex nested structs between stages as stage I/O.
- Make add_composite_member_variable_to_interface_block() recursive to allow
struct members to contain nested structs, building up member names and access
chains recursively, and only add the resulting flattened leaf members to the
synthetic input and output interface blocks.
- Recursively generate individual location numbers for the flattened members
of the input/output block.
- Replace to_qualified_member_name() with append_member_name().
- Update add_variable_to_interface_block() to support arrays as struct members,
adding a member to input and output interface blocks for each element of the array.
- Pass name qualifiers to add_plain_member_variable_to_interface_block() to allow
struct members to be arrays of structs, building up member names and access chains,
and adding multiple distinct flattened leaf members to the synthetic input and
output interface blocks.
- Generate individual location numbers for the individual array members
of the input/output block.
- SPIRVCrossDecorationInterfaceMemberIndex references the index of a member
of a variable that is a struct type. The value is relative to the variable,
and for structs nested within that top-level struct, the index value needs
to take into consideration the members within those nested structs.
- Pass var_mbr_idx to add_plain_member_variable_to_interface_block() and
add_composite_member_variable_to_interface_block(), start at zero for each
variable, and increment for each member or nested member within that variable.
- Add unit test shaders-msl/vert/out-block-with-nested-struct-array.vert
- Add unit test shaders-msl/vert/out-block-with-struct-array.vert
- Add unit test shaders-msl/tese/in-block-with-nested-struct.tese
We were passing arrays by value which the compiler fails to optimize,
causing abyssal performance. To fix this, we need to consider that
descriptors can be in constant or const device address spaces.
Also, lone descriptors are passed by value, so we explicitly remove address
space qualifiers.
One failure case is when shader passes a texture/sampler array as an
argument. It's all UniformConstant in SPIR-V, but in MSL it might be
thread, const device or constant, so that won't work ...
Global variable use works fine though, and that should cover 99.9999999%
of use cases.
Fragment shaders that require explicit early fragment tests are incompatible
with specifying depth and stencil values within the shader. If explicit early
fragment tests is specified, remove the depth and stencil outputs from the
output structure, and replace them with dummy local variables.
Add CompilerMSL:uses_explicit_early_fragment_test() function to consolidate
testing for whether early fragment tests are required.
Add two unit tests for depth-out with, and without, early fragment tests.
Promote to short instead and do simple casts on load/store instead.
Not 100% complete fix since structs can contain booleans, but this is
getting into pretty ridiculously complicated territory.
Add spvQuantizeToF16() family of synthetic functions to convert
from float to half and back again, and add function attribute
[[clang::optnone]] to honor infinities during conversions.
Adjust SPIRV-Cross unit test reference shaders to accommodate these changes.
Matching output/input struct member types between shader stages could fail if
a location is shared between members, each using different components of that
location, because the member vecsize was only stored once for the location.
Add MSLShaderInput::component member.
Use LocationComponentPair to key inputs_by_location, instead of just location.
ensure_correct_input_type() pass component value as well as location.
Vulkan specifies that the Sample Mask Test occurs before fragment shading.
This means gl_SampleMaskIn should be influenced by both sample-shading and
VkPipelineMultisampleStateCreateInfo::pSampleMask.
CTS tests dEQP-VK.pipeline.multisample_shader_builtin.* bear this out.
For sample-shading, gl_SampleMaskIn should only have a single bit set,
Since Metal does not filter for this, apply a bitmask based on gl_SampleID.
For a fixed sample mask, since Metal is unaware of
VkPipelineMultisampleStateCreateInfo::pSampleMask, we need to ensure that
we apply it to both gl_SampleMaskIn and gl_SampleMask. This has the side
effect of a redundant application of pSampleMask if the shader already
includes gl_SampleMaskIn when setting gl_SampleMask, but I don't see an
easy way around this.
Also, simplify the logic for including the fixed sample mask in gl_ShaderMask,
and print the fixed sample mask as a hex value for readability of bits.
We'll need to force a temporary and mark it as precise.
MSL is a little weird here, but we can piggyback on top of the invariant
float math option here to force fma() operations everywhere.
Firstly, never flatten inputs or outputs in multi-patch mode.
The main scenario where we do need to care is Block IO.
In this case, we should only flatten the top-level member, and after
that we use access chains as normal.
Using structs in Input storage class is now possible as well. We don't
need to consider per-location fixups at all here. In Vulkan, IO structs
must match exactly. Only plain vectors can have smaller vector sizes as
a special case.
For buffers, support all MSLResourceBinding::basetype pointers, not just void*.
Rename MSLResourceBinding::base_type to basetype for consistent use in other structs.
Add lookup from argument buffer argument index to resource binding for efficiency.
Fix error in advancing padding counts with combined image samplers.
Run clang-format.
If CompilerMSL::Options::pad_argument_buffer_resources enabled, Metal argument buffer
struct members are positionally aligned to their argument indexes by adding synthetic
padding members when needed. The types and sizes of these synthetic members are
identified in the resource_bindings vector provided through the API.
Add CompilerMSL::Options::pad_argument_buffer_resources to enable padding
Metal argument buffer structs to positionally match members to argument indexes.
Add MSLResourceBinding::base_type to identify resource type through API.
We only considered invalid names, and overwrote the alias for the
function. The correct fix is to replace illegal names early, do the
reserved fixup, then copy back alias to entry point name.
In Metal, the `[[position]]` input to a fragment shader remains at
fragment center, even at sample rate, like OpenGL and Direct3D. In
Vulkan, however, when the fragment shader runs at sample rate, the
`FragCoord` builtin moves to the sample position in the framebuffer,
instead of the fragment center. To account for this difference, adjust
the `FragCoord`, if present, by the sample position. The -0.5 offset is
because the fragment center is at (0.5, 0.5).
Also, add an option to force sample-rate shading in a fragment shader.
Since Metal has no explicit control for this, this is done by adding a
dummy `[[sample_id]]` which is otherwise unused, if none is already
present. This is intended to be used from e.g. MoltenVK when a
pipeline's `minSampleShading` value is nonzero.
Instead of checking if any `Input` variables have `Sample`
interpolation, I've elected to check that the `SampleRateShading`
capability is present. Since `SampleId`, `SamplePosition`, and the
`Sample` interpolation decoration require this cap, this should be
equivalent for any valid SPIR-V module. If this isn't acceptable, let me
know.
Add support for declaring a fixed subgroup size. Metal, like Vulkan with
`VK_EXT_subgroup_size_control`, allows the thread execution width to
vary depending on factors such as register usage. Unfortunately, this
breaks several tests that depend on the subgroup size being what the
device says it is. So we'll fix the subgroup size at the size the device
declares. The extra invocations in the subgroup will appear to be
inactive. Because of this, the ballot mask builtins are now ANDed with
the active subgroup mask.
Add support for emulating a subgroup of size 1. This is intended to be
used by Vulkan Portability implementations (e.g. MoltenVK) when the
hardware/software combo provides insufficient support for subgroups.
Luckily for us, Vulkan 1.1 only requires that the subgroup size be at
least 1.
Add support for quadgroup and SIMD-group functions which were added to
iOS in Metal 2.2 and 2.3. This will allow clients to take advantage of
expanded quadgroup and SIMD-group support in recent Metal versions and
on recent Apple GPUs (families 6 and 7).
Gut emulation of subgroup builtins in fragment shaders. It turns out
codegen for the SIMD-group functions in fragment wasn't implemented for
AMD on Mojave; it's a safe bet that it wasn't implemented for the other
drivers either. Subgroup support in fragment shaders now requires Metal
2.2.
New in MSL 2.3 is a template that can be used in the place of a scalar
type in a stage-in struct. This template has methods which interpolate
the varying at the given points. Curiously, you can't set interpolation
attributes on such a varying; perspective-correctness is encoded in the
type, while interpolation must be done using one of the methods. This
makes using this somewhat awkward from SPIRV-Cross, requiring us to jump
through a bunch of hoops to make this all work.
Using varyings from functions in particular is a pain point, requiring
us to pass the stage-in struct itself around. An alternative is to pass
references to the interpolants; except this will fall over badly with
composite types, which naturally must be flattened. As with
tessellation, dynamic indexing isn't supported with pull-model
interpolation. This is because of the need to reference the original
struct member in order to call one of the pull-model interpolation
methods on it. Also, this is done at the variable level; this means that
if one varying in a struct is used with the pull-model functions, then
the entire struct is emitted as pull-model interpolants.
For some reason, this was not documented in the MSL spec, though there
is a property on `MTLDevice`, `supportsPullModelInterpolation`,
indicating support for this, which *is* documented. This does not appear
to be implemented yet for AMD: it returns `NO` from
`supportsPullModelInterpolation`, and pipelines with shaders using the
templates fail to compile. It *is* implemeted for Intel. It's probably
also implemented for Apple GPUs: on Apple Silicon, OpenGL calls down to
Metal, and it wouldn't be possible to use the interpolation functions
without this implemented in Metal.
Based on my testing, where SPIR-V and GLSL have the offset relative to
the pixel center, in Metal it appears to be relative to the pixel's
upper-left corner, as in HLSL. Therefore, I've added an offset 0.4375,
i.e. one half minus one sixteenth, to all arguments to
`interpolate_at_offset()`.
This also fixes a long-standing bug: if a pull-model interpolation
function is used on a varying, make sure that varying is declared. We
were already doing this only for the AMD pull-model function,
`interpolateAtVertexAMD()`; for reasons which are completely beyond me,
we weren't doing this for the base interpolation functions. I also note
that there are no tests for the interpolation functions for GLSL or
HLSL.
(GL_EXT_nonuniform_qualifier/SPV_EXT_descriptor_indexing).
MSLResourceBinding includes array size through API, and substitutes
in that size if the image or sampler array is not explicitly sized.
OpCopyObject supports SPIRCombinedImageSampler type in MSL.
Metal doesn't support broadcasting or shuffling boolean values, but we
can work around that by casting it to `ushort`, then casting it back to
`bool`. I used `ushort` instead of `uint` because 16-bit values give
better throughput on Apple GPUs.
Only the least *n* bits are significant, where *n* is the subgroup size.
The Vulkan CTS actually checks this.
The `FindLSB` tests weren't actually failing, but I masked that anyway,
in case there's some corner case the CTS is missing.
`half` cannot be bitcasted to `float`, because the two types are not the
same size. Use an expanding cast instead.
We were already doing this for stores to the tessellation levels; why I
didn't also do this for loads is beyond me.
Fix reversed coordinates: `y` should be used to calculate the row
address. Align row address to the row stride.
I've made the row alignment a function constant; this makes it possible
to override it at pipeline compile time.
Honestly, I don't know how this worked at all for Epic. It definitely
didn't work in the CTS prior to this.
These need to use arrayed texture types, or Metal will complain when
binding the resource. The target layer is addressed relative to the
Layer output by the vertex pipeline, or to the ViewIndex if in a
multiview pipeline. Unlike with the s/t coordinates, Vulkan does not
forbid non-zero layer coordinates here, though this cannot be expressed
in Vulkan GLSL.
Supporting 3D textures will require additional work. Part of the problem
is that Metal does not allow texture views to subset a 3D texture, so we
need some way to pass the base depth to the shader.
Some older iOS devices don't support layered rendering. In that case,
don't set `[[render_target_array_index]]`, because the compiler will
reject the shader in that case. The client will then have to unroll the
render pass manually.
In Metal render pipelines don't have an option to set a sampleMask
parameter, the only way to get that functionality is to set the
sample_mask output of the fragment shader to this value directly.
We also need to take care to combine the fixed sample mask with the
one that the shader might possibly output.
This should hopefully reduce underutilization of the GPU, especially on
GPUs where the thread execution width is greater than the number of
control points.
This also simplifies initialization by reading the buffer directly
instead of using Metal's vertex-attribute-in-compute support. It turns
out the only way in which shader stages are allowed to differ in their
interfaces is in the number of components per vector; the base type must
be the same. Since we are using the raw buffer instead of attributes, we
can now also emit arrays and matrices directly into the buffer, instead
of flattening them and then unpacking them. Structs are still flattened,
however; this is due to the need to handle vectors with fewer components
than were output, and I think handling this while also directly emitting
structs could get ugly.
Another advantage of this scheme is that the extra invocations needed to
read the attributes when there were more input than output points are
now no more. The number of threads per workgroup is now lcm(SIMD-size,
output control points). This should ensure we always process a whole
number of patches per workgroup.
To avoid complexity handling indices in the tessellation control shader,
I've also changed the way vertex shaders for tessellation are handled.
They are now compute kernels using Metal's support for vertex-style
stage input. This lets us always emit vertices into the buffer in order
of vertex shader execution. Now we no longer have to deal with indexing
in the tessellation control shader. This also fixes a long-standing
issue where if an index were greater than the number of vertices to
draw, the vertex shader would wind up writing outside the buffer, and
the vertex would be lost.
This is a breaking change, and I know SPIRV-Cross has other clients, so
I've hidden this behind an option for now. In the future, I want to
remove this option and make it the default.
On MSL, the compiler refuses to allow access chains into a normal vector type.
What happens in practice instead is a read-modify-write where a vector type is
loaded, modified and written back.
The workaround is to convert a vector into a pointer-to-scalar before
the access chain continues to add the scalar index.
Metal is picky about interface matching. If the types don't match
exactly, down to the number of vector components, Metal fails pipline
compilation. To support pipelines where the number of components
consumed by the fragment shader is less than that produced by the vertex
shader, we have to fix up the fragment shader to accept all the
components produced.
Like with `point_size` when not rendering points, Metal complains when
writing to a variable using the `[[depth]]` qualifier when no depth
buffer be attached. In that case, we must avoid emitting `FragDepth`,
just like with `PointSize`.
I assume it will also complain if there be no stencil attachment and the
shader write to `[[stencil]]`, or it write to `[[color(n)]]` but there
be no color attachment at n.
Limit inline blocks to one per descriptor set.
This should avoid the need for complicated code to calculate the
argument buffer ID stride of an inline uniform block. If there's demand
for more inline blocks, we can revisit this.
Here, the inline uniform block is explicit: we instantiate the buffer
block itself in the argument buffer, instead of a pointer to the buffer.
I just hope this will work with the `MTLArgumentDescriptor` API...
Note that Metal recursively assigns individual members of embedded
structs IDs. This means for automatic assignment that we have to
calculate the binding stride for a given buffer block. For MoltenVK,
we'll simply increment the ID by the size of the inline uniform block.
Then the later IDs will never conflict with the inline uniform block. We
can get away with this because Metal doesn't require that IDs be
contiguous, only monotonically increasing.