Buffer objects can contain arbitrary pointers to blocks.
We can also implement ConvertPtrToU and ConvertUToPtr.
The latter can cast a uint64_t to any type as it pleases,
so we will need to generate fake buffer reference blocks to be able to
cast the type.
We made the mistake of registering a dependency on the atomic variable
even if the atomic result was forced to a temporary. There is no need to
register reads from atomic variables like this as we always force atomic
results to a temporary and argument read/writes do not need to be
tracked.
Atomics are not supported on images or texture_buffers in MSL.
Properly throw an error if OpImageTexelPointer is used (since it can
only be used for atomic operations anyways).
* origin/master:
Support running {,update_}test_shader.sh with CMake builds.
Don't apply vertex attribute remapping other non-vertex or non-input interface blocks
Force complex loop in certain rare access chain scenarios.
Fix guard around [[noreturn]].
Deal with mismatched signs in S/U/F conversion opcodes.
Workaround lack of lvalue/rvalue operator overload on MSVC 2013.
Support direct conversions to std::vector from SmallVector.
Fix some minor copy constructor issues in Variant.
Make sure ids_for_types are moved correctly in move operator.
Run format_all.sh.
Refactor out error handling and containers to new headers.
Do not use SmallVector as input type in public interfaces.
Fix various bugs found in testing.
Explicitly implement move operators for ParsedIR.
Try another MSVC 2013 workaround.
Implement edge cases in insert/end and add a simple test case.
Fix GCC 4.x warnings.
Workaround lack of alignas on MSVC 2013.
Reduce pressure on global allocation.
CLI: Make --iterations more useful.
If we generate an access chain in a loop body, and it is consumed in the
loop continue block, we have a problem because we cannot emit a
temporary here holding the access chain reference. Force a complex loop
body to workaround this exceptionally rare case.
We cannot deduce if OpLoad needs ArrayCopy templates early since it's
heavily context dependent, and we might only know on 3rd iteration of
the compile loop.
We had a bug where error conditions in DoWhileLoop emit path would not
detect that statements were being emitted due to the masking behavior
which happens when force_recompile is true. Fix this.
Also, refactor force_recompile into member functions so we can properly
break on any situation where this is set, without having to rely on
watchpoints in debuggers.
Avoids ugly warnings on nearly every compute shader.
We could do analysis to detect whether we need to emit this constant,
but it's a bit tedious to figure out if an OpConstantComponent is
actually used by opcodes, so just make it simple.
-1 (0xffffffff) literal means the component should be undefined.
Since we cannot express undefined directly, just use a 0 literal in the
appropriate type.
We have an edge case where the array is declared with a concrete size,
but in GLSL we must emit an unsized array, which breaks array copies.
Deal explicitly with this.
Return after loading the input control point array if there are more
input points than output points, and this was one of the helper
invocations spun off to load the input points. I was hesitant to do this
initially, since the MSL spec has this to say about barriers:
> The `threadgroup_barrier` (or `simdgroup_barrier`) function must be
> encountered by all threads in a threadgroup (or SIMD-group) executing
> the kernel.
That is, if any thread executes the barrier, then all threads must
execute it, or the barrier'd invocations will hang. But, the key words
here seem to be "executing the kernel;" inactive invocations, those that
have already returned, need not encounter the barrier to prevent hangs.
Indeed, I've encountered no problems from doing this, at least on my
hardware. This also fixes a few CTS tests that were failing due to
execution ordering; apparently, my assumption that the later, invalid
data written by the helpers would get overwritten was wrong.
The tessellation levels in Metal are stored as a densely-packed array of
half-precision floating point values. But, stage-in attributes in Metal
have to have offsets and strides aligned to a multiple of four, so we
can't add them individually. Luckily for us, the arrays have lengths
less than 4. So, let's use vectors for them!
Triangles get a single attribute with a `float4`, where the outer levels
are in `.xyz` and the inner levels are in `.w`. The arrays are unpacked
as though we had added the elements individually. Quads get two: a
`float4` with the outer levels and a `float2` with the inner levels.
Further, since vectors can be indexed as arrays, there's no need to
unpack them in this case.
This also saves on precious vertex attributes. Before, we were using up
to 6 of them. Now we need two at most.