This is not necessary, as we must emit an invalidating store before we
potentially consume an invalid expression. In fact, we're a bit
conservative here in this case for example:
int tmp = variable;
if (...)
{
variable = 10;
}
else
{
// Consuming tmp here is fine, but it was
// invalidated while emitting other branch.
// Technically, we need to study if there is an invalidating store
// in the CFG between the loading block and this block, and the other
// branch will not be a part of that analysis.
int tmp2 = tmp * tmp;
}
Fixing this case means complex CFG traversal *everywhere*, and it feels like overkill.
Fixing this exposed a bug with access chains, so fix a bug where expression dependencies were not
inherited properly in access chains. Access chains are now considered forwarded if there
is at least one dependency which is also forwarded.
This subtle bug removed any expression validation for trivially swizzled
variables. Make usage suppression a more explicit concept rather than
just hacking off forwarded_temporaries.
There is some fallout here with loop generation since our expression
invalidation is currently a bit too naive to handle loops properly.
The forwarding bug masked this problem until now.
If part of the loop condition is also used in the body, we end up
reading an invalid expression, which in turn forces a temporary to be
generated in the condition block, not good. We'll need to be smarter
here ...
This maps them to their MSL equivalents. I've mapped `Coherent` to
`volatile` since MSL doesn't have anything weaker than `volatile` but
stronger than nothing.
As part of this, I had to remove the implicit `volatile` added for
atomic operation casts. If the buffer is already `coherent` or
`volatile`, then we would add a second `volatile`, which would be
redundant. I think this is OK even when the buffer *doesn't* have
`coherent`: `T *` is implicitly convertible to `volatile T *`, but not
vice-versa. It seems to compile OK at any rate. (Note that the
non-`volatile` overloads of the atomic functions documented in the spec
aren't present in the MSL 2.2 stdlib headers.)
`restrict` is tricky, because in MSL, as in C++, it needs to go *after*
the asterisk or ampersand for the pointer type it's modifying.
Another issue is that, in the `Simple`, `GLSL450`, and `Vulkan` memory
models, `Restrict` is the default (i.e. does not need to be specified);
but MSL likely follows the `OpenCL` model where `Aliased` is the
default. We probably need to implicitly set either `Restrict` or
`Aliased` depending on the module's declared memory model.
We used to use the Binding decoration for this, but this method is
hopelessly broken. If no explicit MSL resource remapping exists, we
remap automatically in a manner which should always "just work".
In multiple-entry-point modules, we declared builtin inputs which were
not supposed to be used for that entry point.
Fix this, by being more strict when checking which builtins to emit.
MSL generally emits the aliases, which means we cannot always place the
master type first, unlike GLSL and HLSL. The logic fix is just to
reorder after we have tagged types with packing information, rather than
doing it in the parser fixup.
Avoids ugly warnings on nearly every compute shader.
We could do analysis to detect whether we need to emit this constant,
but it's a bit tedious to figure out if an OpConstantComponent is
actually used by opcodes, so just make it simple.
Return after loading the input control point array if there are more
input points than output points, and this was one of the helper
invocations spun off to load the input points. I was hesitant to do this
initially, since the MSL spec has this to say about barriers:
> The `threadgroup_barrier` (or `simdgroup_barrier`) function must be
> encountered by all threads in a threadgroup (or SIMD-group) executing
> the kernel.
That is, if any thread executes the barrier, then all threads must
execute it, or the barrier'd invocations will hang. But, the key words
here seem to be "executing the kernel;" inactive invocations, those that
have already returned, need not encounter the barrier to prevent hangs.
Indeed, I've encountered no problems from doing this, at least on my
hardware. This also fixes a few CTS tests that were failing due to
execution ordering; apparently, my assumption that the later, invalid
data written by the helpers would get overwritten was wrong.
The tessellation levels in Metal are stored as a densely-packed array of
half-precision floating point values. But, stage-in attributes in Metal
have to have offsets and strides aligned to a multiple of four, so we
can't add them individually. Luckily for us, the arrays have lengths
less than 4. So, let's use vectors for them!
Triangles get a single attribute with a `float4`, where the outer levels
are in `.xyz` and the inner levels are in `.w`. The arrays are unpacked
as though we had added the elements individually. Quads get two: a
`float4` with the outer levels and a `float2` with the inner levels.
Further, since vectors can be indexed as arrays, there's no need to
unpack them in this case.
This also saves on precious vertex attributes. Before, we were using up
to 6 of them. Now we need two at most.
In SPIR-V, there are always two inner levels and four outer levels, even
if the input patch isn't a quad patch. But in MSL, due to requirements
imposed by Metal, only one inner level and three outer levels exist when
the input patch is a triangle patch. We must explicitly ignore any write
to the nonexistent second inner and fourth outer levels in this case.
These are transpiled to kernel functions that write the output of the
shader to three buffers: one for per-vertex varyings, one for per-patch
varyings, and one for the tessellation levels. This structure is
mandated by the way Metal works, where the tessellation factors are
supplied to the draw method in their own buffer, while the per-patch and
per-vertex varyings are supplied as though they were vertex attributes;
since they have different step rates, they must be in separate buffers.
The kernel is expected to be run in a workgroup whose size is the
greater of the number of input or output control points. It uses Metal's
support for vertex-style stage input to a compute shader to get the
input values; therefore, at least one instance must run per input point.
Meanwhile, Vulkan mandates that it run at least once per output point.
Overrunning the output array is a concern, but any values written should
either be discarded or overwritten by subsequent patches. I'm probably
going to put some slop space in the buffer when I integrate this into
MoltenVK to be on the safe side.
Structs are aligned as you would expect in MSL (maximum member
alignment), and it is not minimum 16 bytes like in std140.
Also rename the dummy "pad" members to a reserved naming scheme.
Apparently we didn't use those yet. MSL seems to be able to alias struct
types and variable types to a degree, so that's why it has escaped
testing until now.
This is a fairly fundamental change on how IDs are handled.
It serves many purposes:
- Improve performance. We only need to iterate over IDs which are
relevant at any one time.
- Makes sure we iterate through IDs in SPIR-V module declaration order
rather than ID space. IDs don't have to be monotonically increasing,
which was an assumption SPIRV-Cross used to have. It has apparently
never been a problem until now.
- Support LUTs of structs. We do this by interleaving declaration of
constants and struct types in SPIR-V module order.
To support this, the ParsedIR interface needed to change slightly.
Before setting any ID with variant_set<T> we let ParsedIR know
that an ID with a specific type has been added. The surface for change
should be minimal.
ParsedIR will maintain a per-type list of IDs which the cross-compiler
will need to consider for later.
Instead of looping over ir.ids[] (which can be extremely large), we loop
over types now, using:
ir.for_each_typed_id<SPIRVariable>([&](uint32_t id, SPIRVariable &var) {
handle_variable(var);
});
Now we make sure that we're never looking at irrelevant types.
This allows shaders to declare and use pointer-type variables. Pointers
may be loaded and stored, be the result of an `OpSelect`, be passed to
and returned from functions, and even be passed as inputs to the `OpPhi`
instruction. All types of pointers may be used as variable pointers.
Variable pointers to storage buffers and workgroup memory may even be
loaded from and stored to, as though they were ordinary variables. In
addition, this enables using an interior pointer to an array as though
it were an array pointer itself using the `OpPtrAccessChain`
instruction.
This is a rather large and involved change, mostly because this is
somewhat complicated with a lot of moving parts. It's a wonder
SPIRV-Cross's output is largely unchanged. Indeed, many of these changes
are to accomplish exactly that! Perhaps the largest source of changes
was the violation of the assumption that, when emitting types, the
pointer type didn't matter.
One of the test cases added by the change doesn't optimize very well;
the output of `spirv-opt` here is invalid SPIR-V. I need to file a bug
with SPIRV-Tools about this.
I wanted to test that variable pointers to images worked too, but I
couldn't figure out how to propagate the access qualifier properly--in
MSL, it's part of the type, so getting this right is important. I've
punted on that for now.
Just like OpAccessChain we need to make use of the meta information
available to use from access_chain_internal as we can extract a packed
vector or transposed vector from a composite, not just memory load.
A block name cannot alias with any name in its own scope,
and it cannot alias with any other "global" name.
To solve this, we need to complicate the name cache updates a little bit
where we have a "primary" namespace and "secondary" namespace.
- Add new Windows support
- Use CMake/CTest instead of Make + shell scripts
- Use --parallel in CTest
- Fix CTest on Windows
- Cleanups in test_shaders.py
- Force specific commit for SPIRV-Headers
- Fix Inf/NaN odd-ball case by moving to ASM