This should hopefully reduce underutilization of the GPU, especially on
GPUs where the thread execution width is greater than the number of
control points.
This also simplifies initialization by reading the buffer directly
instead of using Metal's vertex-attribute-in-compute support. It turns
out the only way in which shader stages are allowed to differ in their
interfaces is in the number of components per vector; the base type must
be the same. Since we are using the raw buffer instead of attributes, we
can now also emit arrays and matrices directly into the buffer, instead
of flattening them and then unpacking them. Structs are still flattened,
however; this is due to the need to handle vectors with fewer components
than were output, and I think handling this while also directly emitting
structs could get ugly.
Another advantage of this scheme is that the extra invocations needed to
read the attributes when there were more input than output points are
now no more. The number of threads per workgroup is now lcm(SIMD-size,
output control points). This should ensure we always process a whole
number of patches per workgroup.
To avoid complexity handling indices in the tessellation control shader,
I've also changed the way vertex shaders for tessellation are handled.
They are now compute kernels using Metal's support for vertex-style
stage input. This lets us always emit vertices into the buffer in order
of vertex shader execution. Now we no longer have to deal with indexing
in the tessellation control shader. This also fixes a long-standing
issue where if an index were greater than the number of vertices to
draw, the vertex shader would wind up writing outside the buffer, and
the vertex would be lost.
This is a breaking change, and I know SPIRV-Cross has other clients, so
I've hidden this behind an option for now. In the future, I want to
remove this option and make it the default.
Metal is picky about interface matching. If the types don't match
exactly, down to the number of vector components, Metal fails pipline
compilation. To support pipelines where the number of components
consumed by the fragment shader is less than that produced by the vertex
shader, we have to fix up the fragment shader to accept all the
components produced.
Not exactly sure what is going on, internet suggests there's a pipe that
fills up. Calling close() before we start waiting for results seems to
do the trick.
Like with `point_size` when not rendering points, Metal complains when
writing to a variable using the `[[depth]]` qualifier when no depth
buffer be attached. In that case, we must avoid emitting `FragDepth`,
just like with `PointSize`.
I assume it will also complain if there be no stencil attachment and the
shader write to `[[stencil]]`, or it write to `[[color(n)]]` but there
be no color attachment at n.
Here, the inline uniform block is explicit: we instantiate the buffer
block itself in the argument buffer, instead of a pointer to the buffer.
I just hope this will work with the `MTLArgumentDescriptor` API...
Note that Metal recursively assigns individual members of embedded
structs IDs. This means for automatic assignment that we have to
calculate the binding stride for a given buffer block. For MoltenVK,
we'll simply increment the ID by the size of the inline uniform block.
Then the later IDs will never conflict with the inline uniform block. We
can get away with this because Metal doesn't require that IDs be
contiguous, only monotonically increasing.
This CL updates the test runner to only run spirv-opt if the generated
SPIR-V is valid. If validation is skipped it's possible to hit aborts
and other memory errors in the optimizer as it assumes the SPIR-V is
valid.
If there are enough members in an IAB, we cannot use the constant
address space as MSL compiler complains about there being too many
members. Support emitting the device address space instead.
Rolled the hashes used for glslang, SPIRV-Tools, and SPIRV-Headers to
HEAD, which includes the update to 1.5.
Added passing '--amb' to glslang, so I didn't have to explicitly set
bindings in a large number of test shaders that currently don't, and
now glslang considers them invalid.
Marked all shaders that no longer pass spirv-val as .invalid.
Vulkan has two types of buffer descriptors,
`VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC` and
`VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC`, which allow the client to
offset the buffers by an amount given when the descriptor set is bound
to a pipeline. Metal provides no direct support for this when the buffer
in question is in an argument buffer, so once again we're on our own.
These offsets cannot be stored or associated in any way with the
argument buffer itself, because they are set at bind time. Different
pipelines may have different offsets set. Therefore, we must use a
separate buffer, not in any argument buffer, to hold these offsets. Then
the shader must manually offset the buffer pointer.
This change fully supports arrays, including arrays of arrays, even
though Vulkan forbids them. It does not, however, support runtime
arrays. Perhaps later.
This command allows the caller to set the base value of
`BuiltInWorkgroupId`, and thus of `BuiltInGlobalInvocationId`. Metal
provides no direct support for this... but it does provide a builtin,
`[[grid_origin]]`, normally used to pass the base values for the stage
input region, which we will now abuse to pass the dispatch base and
avoid burning a buffer binding.
`[[grid_origin]]`, as part of Metal's support for compute stage input,
requires MSL 1.2. For 1.0 and 1.1, we're forced to provide a buffer.
(Curiously, this builtin was undocumented until the MSL 2.2 release. Go
figure.)
Make sure to test everything with scalar as well to catch any weird edge
cases.
Not all opcodes are covered here, just the arithmetic ones. FP64 packing
is also ignored.
The only piece added by this extension is the `DeviceIndex` builtin,
which tells the shader which device in a grouped logical device it is
running on.
Metal's pipeline state objects are owned by the `MTLDevice` that created
them. Since Metal doesn't support logical grouping of devices the way
Vulkan does, we'll thus have to create a pipeline state for each device
in a grouped logical device. The upcoming peer group support in Metal 3
will not change this. For this reason, for Metal, the device index is
supplied as a constant at pipeline compile time.
There's an interaction between `VK_KHR_device_group` and
`VK_KHR_multiview` in the
`VK_PIPELINE_CREATE_VIEW_INDEX_FROM_DEVICE_INDEX_BIT`, which defines the
view index to be the same as the device index. The new
`view_index_from_device_index` MSL option supports this functionality.
This is needed to support `VK_KHR_multiview`, which is in turn needed
for Vulkan 1.1 support. Unfortunately, Metal provides no native support
for this, and Apple is once again less than forthcoming, so we have to
implement it all ourselves.
Tessellation and geometry shaders are deliberately unsupported for now.
The problem is that the current implementation encodes the `ViewIndex`
as part of the `InstanceIndex`, which in the SPIR-V environment at least
only exists in the vertex shader. So we need to work out a way to pass
the view index along to the later stages.
This implementation runs vertex shaders for all views up to the highest
bit set in the view mask, even those whose bits are clear. The fragments
for the inactive views are then discarded. Avoiding this is difficult:
calculating the view indices becomes far more complicated if we can only
run for those views which are set in the mask.
MSL does not seem to have a qualifier for this, but HLSL SM 5.1 does.
glslangValidator for HLSL does not support this, so skip any validation,
but it passes in FXC.
Atomics are not supported on images or texture_buffers in MSL.
Properly throw an error if OpImageTexelPointer is used (since it can
only be used for atomic operations anyways).
In the bizarre case where the ID of a loaded opaque type aliased with a
literal which was used as part of another texturing instruction, we
could end up with a case where domination analysis assumed the loaded
opaque type needed to be moved to a different scope.
Fix the issue by never doing dominance analysis for opaque temporaries,
and be more robust when analyzing texturing instructions.
Also make sure reflection output is deterministic.
This patch slightly alterered output for some unknown reason, but it came from an
unordered_map, so it's fine.
This is intended to be used to support `VK_KHR_maintenance2`'s
tessellation domain origin feature. If `tess_domain_origin_lower_left`
is `true`, the `v` coordinate will be inverted with respect to the
domain. Additionally, in `Triangles` mode, the `v` and `w` coordinates
will be swapped. This is because the winding order is interpreted
differently in lower-left mode.
If not enough components are provided in the shader,
the shader MSL compiler throws an error rather than make components
undefined. This hurts portability, so we need to add explicit padding
here.
- Add new Windows support
- Use CMake/CTest instead of Make + shell scripts
- Use --parallel in CTest
- Fix CTest on Windows
- Cleanups in test_shaders.py
- Force specific commit for SPIRV-Headers
- Fix Inf/NaN odd-ball case by moving to ASM
A lot of changes in spirv-opt output.
Some new invalid SPIR-V was found but most of them were not significant
for SPIRV-Cross, so just marked them as invalid.
Even as of Metal 2.1, MSL still doesn't support arrays of buffers
directly. Therefore, we must manually expand them. In the prologue, we
define arrays holding the argument pointers; these arrays are what the
transpiled code ends up referencing. We might be able to do similar
things for textures and samplers prior to MSL 2.0.
Speaking of which, also enable texture arrays on iOS MSL 1.2.
It's intended to be used with MoltenVK to support arbitrary
`VkComponentMapping` settings. The idea is that MoltenVK will pass a
buffer (which it set to some buffer index that isn't being used)
containing packed versions of the `VkComponentMapping` struct, one for
each sampled image.
Yes, this is horribly ugly. It is unfortunately necessary. Much of the
ugliness is to support swizzling gather operations, where we need to
alter the component that the gather operates on--something complicated
by the `gather()` method requiring the passed-in component to be a
constant expression. It doesn't even support swizzling gathers on depth
textures, though I could add that if it turns out we need it.
Update SPIRV-Tools/glslang commits.
Use vulkan1.1 environment for testing.
Found new "errors" in SPIRV-Tools, so disable validation on those shaders
for now.
Allow function calls to include globals as arguments.
Allow function calls to include built-ins as arguments.
Include all meta info when creating function args from globals.
Do not manufacture a sampler for Buffer-type sampled images.
Add code option to test_shaders.py to preserve SPIR-V code for interactive debugging.
Remove unsupported sampler1DShadow from shaders-msl/frag/texture-proj-shadow.frag.
Improve error message response from unsupported depth texture formats.
Fix several integer cast warnings in unrelated code.
Run ./format_all.sh on unrelated files.
CompilerMSL add emit_custom_functions() function.
CompilerMSL restrict use of as_type<> cast to necessary conditions.
CompilerMSL refactor get_declared_struct_member_size() and
get_declared_struct_member_alignment() functions, and remove
unnecessary get_declared_type_size() functions.
Add test shaders-msl/vulkan/frag/spec-constant.vk.frag.
Add bool members is_read and is_written to SPIRType::Image.
Output correct texture read/write access by marking whether textures
are read from and written to by the shader.
Override bitcast_glsl_op() to use Metal as_type<type> functions.
Add implementations of SPIR-V functions inverse(), degrees() & radians().
Map inverseSqrt() to rsqrt().
Map roundEven() to rint().
GLSL functions imageSize() and textureSize() map to equivalent
expression using MSL get_width() & get_height() functions.
Map several SPIR-V integer bitfield functions to MSL equivalents.
Map SPIR-V atomic functions to MSL equivalents.
Map texture packing and unpacking functions to MSL equivalents.
Refactor existing, and add new, image query functions.
Reorganize header lines into includes and pragmas.
Simplify type_to_glsl() logic.
Add MSL test case vert/functions.vert for added function implementations.
Add MSL test case comp/atomic.comp for added function implementations.
test_shaders.py use macOS compilation for MSL shader compilation validations.