Buffer objects can contain arbitrary pointers to blocks.
We can also implement ConvertPtrToU and ConvertUToPtr.
The latter can cast a uint64_t to any type as it pleases,
so we will need to generate fake buffer reference blocks to be able to
cast the type.
Atomics are not supported on images or texture_buffers in MSL.
Properly throw an error if OpImageTexelPointer is used (since it can
only be used for atomic operations anyways).
* origin/master:
Support running {,update_}test_shader.sh with CMake builds.
Don't apply vertex attribute remapping other non-vertex or non-input interface blocks
Force complex loop in certain rare access chain scenarios.
Fix guard around [[noreturn]].
Deal with mismatched signs in S/U/F conversion opcodes.
Workaround lack of lvalue/rvalue operator overload on MSVC 2013.
Support direct conversions to std::vector from SmallVector.
Fix some minor copy constructor issues in Variant.
Make sure ids_for_types are moved correctly in move operator.
Run format_all.sh.
Refactor out error handling and containers to new headers.
Do not use SmallVector as input type in public interfaces.
Fix various bugs found in testing.
Explicitly implement move operators for ParsedIR.
Try another MSVC 2013 workaround.
Implement edge cases in insert/end and add a simple test case.
Fix GCC 4.x warnings.
Workaround lack of alignas on MSVC 2013.
Reduce pressure on global allocation.
CLI: Make --iterations more useful.
- Replace ostringstream with custom implementation.
~30% performance uplift on vector-shuffle-oom test.
Allocations are measurably reduced in Valgrind.
- Replace std::vector with SmallVector.
Classic malloc optimization, small vectors are backed by inline data.
~ 7-8% gain on vector-shuffle-oom on GCC 8 on Linux.
- Use an object pool for IVariant type.
We generally allocate a lot of SPIR* objects. We can amortize these
allocations neatly by pooling them.
- ~15% overall uplift on ./test_shaders.py --iterations 10000 shaders/.
We cannot deduce if OpLoad needs ArrayCopy templates early since it's
heavily context dependent, and we might only know on 3rd iteration of
the compile loop.
We had a bug where error conditions in DoWhileLoop emit path would not
detect that statements were being emitted due to the masking behavior
which happens when force_recompile is true. Fix this.
Also, refactor force_recompile into member functions so we can properly
break on any situation where this is set, without having to rely on
watchpoints in debuggers.
This is a pragmatic trick to avoid symbol collision where a project
links against SPIRV-Cross statically, while linking to other projects
which also use SPIRV-Cross statically. We can end up with very awkward
symbol collisions which can resolve themselves silently because
SPIRV-Cross is pulled in as necessary. To fix this, we must use
different symbols and embed two copies of SPIRV-Cross in this scenario,
now with different namespaces, which in turn leads to different symbols.
The design of backend.int16_t_literal_suffix and backend.uint16_t_literal_suffix
allows them to be set to null, but that was not always tested for.
I have removed the expectation that they can be null and set
backend.int16_t_literal_suffix to "" when no suffix is needed.
That has the same effect, and seemed to be a more usable and defensive approach.
Avoids ugly warnings on nearly every compute shader.
We could do analysis to detect whether we need to emit this constant,
but it's a bit tedious to figure out if an OpConstantComponent is
actually used by opcodes, so just make it simple.
This adds a new C API for SPIRV-Cross which is intended to be stable,
both API and ABI wise.
The C++ API has been refactored a bit to make the C wrapper easier and
cleaner to write. Especially the vertex attribute / resource interfaces
for MSL has been rewritten to avoid taking mutable pointers into the
interface. This would be very annoying to wrap and it didn't fit well
with the rest of the C++ API to begin with. While doing this, I went
ahead and removed all the old deprecated interfaces.
The CMake build system has also seen an overhaul.
It is now possible to build static/shared/CLI separately with -D
options.
The shared library only exposes the C API, as it is the only ABI-stable
API. pkg-configs as well as CMake modules are exported and installed for
the shared library configuration.
We were using std::locale::global() to force a C locale which is not
safe when SPIRV-Cross is used in a multi-threaded environment.
To fix this, we could tap into various per-platform specific locale
handling to get safe thread-local locales, but since locales only affect
the decimal point in floats, we simply query the locale instead and do
the necessary radix replacement ourselves, without touching the locale.
This should be much safer and cleaner than the alternative.
Return after loading the input control point array if there are more
input points than output points, and this was one of the helper
invocations spun off to load the input points. I was hesitant to do this
initially, since the MSL spec has this to say about barriers:
> The `threadgroup_barrier` (or `simdgroup_barrier`) function must be
> encountered by all threads in a threadgroup (or SIMD-group) executing
> the kernel.
That is, if any thread executes the barrier, then all threads must
execute it, or the barrier'd invocations will hang. But, the key words
here seem to be "executing the kernel;" inactive invocations, those that
have already returned, need not encounter the barrier to prevent hangs.
Indeed, I've encountered no problems from doing this, at least on my
hardware. This also fixes a few CTS tests that were failing due to
execution ordering; apparently, my assumption that the later, invalid
data written by the helpers would get overwritten was wrong.
If a stage takes the position as both an input and an output (i.e. a
tessellation shader or a geometry shader), then we could wind up fixing
up the input position by mistake. Ensure that doesn't happen, by only
setting the `qual_pos_var_name` variable from the output position.
The tessellation levels in Metal are stored as a densely-packed array of
half-precision floating point values. But, stage-in attributes in Metal
have to have offsets and strides aligned to a multiple of four, so we
can't add them individually. Luckily for us, the arrays have lengths
less than 4. So, let's use vectors for them!
Triangles get a single attribute with a `float4`, where the outer levels
are in `.xyz` and the inner levels are in `.w`. The arrays are unpacked
as though we had added the elements individually. Quads get two: a
`float4` with the outer levels and a `float2` with the inner levels.
Further, since vectors can be indexed as arrays, there's no need to
unpack them in this case.
This also saves on precious vertex attributes. Before, we were using up
to 6 of them. Now we need two at most.
These are often arrayed builtins, which MSL maps to more than one
attribute. SPIRV-Cross automatically assigns succeeding addresses to
arrayed attributes, so we really only need the first one. This of course
assumes that the inputs are sorted by location.
Builtin attributes in SPIR-V aren't linked by location, but by their
built-in-ness. This poses a problem for MSL, since builtin inputs in
the vertex pipeline are just regular attributes. We must then assign
them locations so that they can be matched up to the attributes in the
stage input descriptor--and also to avoid duplicate attribute numbers in
tessellation evaluation shaders, where there are two different
stage-in structs, so the member index therein is no longer unique!
In SPIR-V, there are always two inner levels and four outer levels, even
if the input patch isn't a quad patch. But in MSL, due to requirements
imposed by Metal, only one inner level and three outer levels exist when
the input patch is a triangle patch. We must explicitly ignore any write
to the nonexistent second inner and fourth outer levels in this case.