This provides a few functions normally available in OpenCL to the SPIR-V
shader environment. These functions happen to be available in Metal as
well.
No GLSL, unfortunately. Intel has yet to publish a
`GL_INTEL_shader_integer_functions2` spec.
Fix fallout from changes.
There's a bug in glslang that prevents `float16_t`, `[u]int16_t`, and
`[u]int8_t` constants from adding the corresponding SPIR-V capabilities.
SPIRV-Tools, meanwhile, tightened validation so that these constants are
only valid if the corresponding `Float16`, `Int16`, and `Int8` caps are
on. This affects the `16bit-constants.frag` test for GLSL and MSL.
The only piece added by this extension is the `DeviceIndex` builtin,
which tells the shader which device in a grouped logical device it is
running on.
Metal's pipeline state objects are owned by the `MTLDevice` that created
them. Since Metal doesn't support logical grouping of devices the way
Vulkan does, we'll thus have to create a pipeline state for each device
in a grouped logical device. The upcoming peer group support in Metal 3
will not change this. For this reason, for Metal, the device index is
supplied as a constant at pipeline compile time.
There's an interaction between `VK_KHR_device_group` and
`VK_KHR_multiview` in the
`VK_PIPELINE_CREATE_VIEW_INDEX_FROM_DEVICE_INDEX_BIT`, which defines the
view index to be the same as the device index. The new
`view_index_from_device_index` MSL option supports this functionality.
Using the `PostDepthCoverage` mode specifies that the `gl_SampleMaskIn`
variable is to contain the computed coverage mask following the early
fragment tests, which this mode requires and implicitly enables.
Note that unlike Vulkan and OpenGL, Metal places this on the sample mask
input itself, and furthermore does *not* implicitly enable early
fragment testing. If it isn't enabled explicitly with an
`[[early_fragment_tests]]` attribute, the compiler will error out. So we
have to enable that mode explicitly if `PostDepthCoverage` is enabled
but `EarlyFragmentTests` isn't.
For Metal, only iOS supports this; for some reason, Apple has yet to
implement it on macOS, even though many desktop cards support it.
The old method of using a different unpacked matrix type doesn't work
for scalar alignment. It certainly wouldn't have any effect for a square
matrix, since the number of columns and rows are the same. So now we'll
store them as arrays of packed vectors.
Relaxed block layout relaxed the restrictions on vector alignment,
allowing them to be aligned on scalar boundaries. Scalar block layout
relaxes this further, allowing *any* member to be aligned on a scalar
boundary. The requirement that a vector not improperly straddle a
16-byte boundary is also relaxed.
I've also added a test showing that `std430` layout works with UBOs.
I'm troubled by the dual meaning of the `Packed` extended decoration. In
some instances (struct, `float[]`, and `vec2[]` members), it actually
means the exact opposite, that the member needs extra padding. This is
especially problematic for `vec2[]`, because now we need to distinguish
the two cases by checking the array stride. I wonder if this should
actually be split into two decorations.
There is a case where we can deduce a for/while loop, but the continue
block is actually very painful to deal with, so handle that case as
well. Removes an exceptional case.
MSL prior to 2.2 doesn't support these natively in any stage but
compute. But, we can (assuming no threads were terminated prematurely)
get their values with some creative uses of the
`simd_prefix_exclusive_sum()` and `simd_sum()` functions.
Also, fix a missing `to_expression()` with `BuiltInSubgroupEqMask`.
For KhronosGroup/MoltenVK#629.
This is needed to support `VK_KHR_multiview`, which is in turn needed
for Vulkan 1.1 support. Unfortunately, Metal provides no native support
for this, and Apple is once again less than forthcoming, so we have to
implement it all ourselves.
Tessellation and geometry shaders are deliberately unsupported for now.
The problem is that the current implementation encodes the `ViewIndex`
as part of the `InstanceIndex`, which in the SPIR-V environment at least
only exists in the vertex shader. So we need to work out a way to pass
the view index along to the later stages.
This implementation runs vertex shaders for all views up to the highest
bit set in the view mask, even those whose bits are clear. The fragments
for the inactive views are then discarded. Avoiding this is difficult:
calculating the view indices becomes far more complicated if we can only
run for those views which are set in the mask.
If we compile multiple times due to forced_recompile, we had
deferred_declaration = true while emitting function prototypes which
broke an assumption. Fix this by clearing out stale state before leaving
a function.
In multiple-entry-point modules, we declared builtin inputs which were
not supposed to be used for that entry point.
Fix this, by being more strict when checking which builtins to emit.
This gets rather complicated because MSL does not support OpArrayLength
natively. We need to pass down a buffer which contains buffer sizes, and
we compute the array length on-demand.
Support both discrete descriptors as well as argument buffers.
Change aux buffer to swizzle buffer.
There is no good reason to expand the aux buffer, so name it
appropriately.
Make the code cleaner by emitting a straight pointer to uint rather than
a dummy struct which only contains a single unsized array member anyways.
This will also end up being very similar to how we implement swizzle
buffers for argument buffers.
Do not use implied binding if it overflows int32_t.
Some support for subgroups is present starting in Metal 2.0 on both iOS
and macOS. macOS gains more complete support in 10.14 (Metal 2.1).
Some restrictions are present. On iOS and on macOS 10.13, the
implementation of `OpGroupNonUniformElect` is incorrect: if thread 0 has
already terminated or is not executing a conditional branch, the first
thread that *is* will falsely believe itself not to be. Unfortunately,
this operation is part of the "basic" feature set; without it, subgroups
cannot be supported at all.
The `SubgroupSize` and `SubgroupLocalInvocationId` builtins are only
available in compute shaders (and, by extension, tessellation control
shaders), despite SPIR-V making them available in all stages. This
limits the usefulness of some of the subgroup operations in fragment
shaders.
Although Metal on macOS supports some clustered, inclusive, and
exclusive operations, it does not support them all. In particular,
inclusive and exclusive min, max, and, or, and xor; as well as cluster
sizes other than 4 are not supported. If this becomes a problem, they
could be emulated, but at a significant performance cost due to the need
for non-uniform operations.
MSL does not seem to have a qualifier for this, but HLSL SM 5.1 does.
glslangValidator for HLSL does not support this, so skip any validation,
but it passes in FXC.
Atomics are not supported on images or texture_buffers in MSL.
Properly throw an error if OpImageTexelPointer is used (since it can
only be used for atomic operations anyways).
The tessellation levels in Metal are stored as a densely-packed array of
half-precision floating point values. But, stage-in attributes in Metal
have to have offsets and strides aligned to a multiple of four, so we
can't add them individually. Luckily for us, the arrays have lengths
less than 4. So, let's use vectors for them!
Triangles get a single attribute with a `float4`, where the outer levels
are in `.xyz` and the inner levels are in `.w`. The arrays are unpacked
as though we had added the elements individually. Quads get two: a
`float4` with the outer levels and a `float2` with the inner levels.
Further, since vectors can be indexed as arrays, there's no need to
unpack them in this case.
This also saves on precious vertex attributes. Before, we were using up
to 6 of them. Now we need two at most.
In SPIR-V, there are always two inner levels and four outer levels, even
if the input patch isn't a quad patch. But in MSL, due to requirements
imposed by Metal, only one inner level and three outer levels exist when
the input patch is a triangle patch. We must explicitly ignore any write
to the nonexistent second inner and fourth outer levels in this case.
This is intended to be used to support `VK_KHR_maintenance2`'s
tessellation domain origin feature. If `tess_domain_origin_lower_left`
is `true`, the `v` coordinate will be inverted with respect to the
domain. Additionally, in `Triangles` mode, the `v` and `w` coordinates
will be swapped. This is because the winding order is interpreted
differently in lower-left mode.
These are mapped to Metal's post-tessellation vertex functions. The
semantic difference is much less here, so this change should be simpler
than the previous one. There are still some hairy parts, though.
In MSL, the array of control point data is represented by a special
type, `patch_control_point<T>`, where `T` is a valid stage-input type.
This object must be embedded inside the patch-level stage input. For
this reason, I've added a new type to the type system to represent this.
On Mac, the number of input control points to the function must be
specified in the `patch()` attribute. This is optional on iOS.
SPIRV-Cross takes this from the `OutputVertices` execution mode; the
intent is that if it's not set in the shader itself, MoltenVK will set
it from the tessellation control shader. If you're translating these
offline, you'll have to update the control point count manually, since
this number must match the number that is passed to the
`drawPatches:...` family of methods.
Fixes#120.
This should fix a whole host of issues related to structs in the `Input`
class in a tessellation control shader.
Also, use pointer arithmetic instead of dereferencing the `ops` array.
This is critical in case we wind up stepping beyond the bounds of the
array.
There's no need to do so, since these are not stage-out structs being
returned, but regular structures being written to a buffer. This also
neatly avoids issues writing to composite (e.g. arrayed) per-patch
outputs from a tessellation control shader.
These are transpiled to kernel functions that write the output of the
shader to three buffers: one for per-vertex varyings, one for per-patch
varyings, and one for the tessellation levels. This structure is
mandated by the way Metal works, where the tessellation factors are
supplied to the draw method in their own buffer, while the per-patch and
per-vertex varyings are supplied as though they were vertex attributes;
since they have different step rates, they must be in separate buffers.
The kernel is expected to be run in a workgroup whose size is the
greater of the number of input or output control points. It uses Metal's
support for vertex-style stage input to a compute shader to get the
input values; therefore, at least one instance must run per input point.
Meanwhile, Vulkan mandates that it run at least once per output point.
Overrunning the output array is a concern, but any values written should
either be discarded or overwritten by subsequent patches. I'm probably
going to put some slop space in the buffer when I integrate this into
MoltenVK to be on the safe side.
Storage was in place already, so mostly just dealing with bitcasts and
constants.
Simplies some of the bitcasting logic, and this exposed some bugs in the
implementation. Refactor to use correct width integers with explicit bitcast opcodes.
Apparently we didn't use those yet. MSL seems to be able to alias struct
types and variable types to a degree, so that's why it has escaped
testing until now.
If not enough components are provided in the shader,
the shader MSL compiler throws an error rather than make components
undefined. This hurts portability, so we need to add explicit padding
here.
This allows shaders to declare and use pointer-type variables. Pointers
may be loaded and stored, be the result of an `OpSelect`, be passed to
and returned from functions, and even be passed as inputs to the `OpPhi`
instruction. All types of pointers may be used as variable pointers.
Variable pointers to storage buffers and workgroup memory may even be
loaded from and stored to, as though they were ordinary variables. In
addition, this enables using an interior pointer to an array as though
it were an array pointer itself using the `OpPtrAccessChain`
instruction.
This is a rather large and involved change, mostly because this is
somewhat complicated with a lot of moving parts. It's a wonder
SPIRV-Cross's output is largely unchanged. Indeed, many of these changes
are to accomplish exactly that! Perhaps the largest source of changes
was the violation of the assumption that, when emitting types, the
pointer type didn't matter.
One of the test cases added by the change doesn't optimize very well;
the output of `spirv-opt` here is invalid SPIR-V. I need to file a bug
with SPIRV-Tools about this.
I wanted to test that variable pointers to images worked too, but I
couldn't figure out how to propagate the access qualifier properly--in
MSL, it's part of the type, so getting this right is important. I've
punted on that for now.
Just like OpAccessChain we need to make use of the meta information
available to use from access_chain_internal as we can extract a packed
vector or transposed vector from a composite, not just memory load.
A block name cannot alias with any name in its own scope,
and it cannot alias with any other "global" name.
To solve this, we need to complicate the name cache updates a little bit
where we have a "primary" namespace and "secondary" namespace.
This is required to avoid relying on complex sub-expression elimination
in compilers, and generates cleaner code.
The problem case is if a complex expression is used in an access chain,
like:
Composite comp = buffer[texture(...)];
vec4 a = comp.a + comp.b + comp.c;
Before, we did not have common subexpression tracking for
OpLoad/OpAccessChain, so we easily ended up with code like:
vec4 a = buffer[texture(...)].a + buffer[texture(...)].b + buffer[texture(...)].c;
A good compiler will optimize this, but we should not rely on it, and
forcing texture(...) to a temporary also looks better.
The solution is to add a vector "implied_expression_reads", which works
similarly to expression_dependencies. We also need an extra mechanism in
to_expression which lets us skip expression read checking and do it
later. E.g. for expr -> access chain -> load, we should only trigger
a read of expr when using the loaded expression.
In GLSL, 8-bit types require GL_EXT_shader_8bit_storage. 16-bit types
can use either GL_AMD_gpu_shader_int16/GL_AMD_gpu_shader_half_float or
GL_EXT_shader_16bit_storage.