This should hopefully reduce underutilization of the GPU, especially on
GPUs where the thread execution width is greater than the number of
control points.
This also simplifies initialization by reading the buffer directly
instead of using Metal's vertex-attribute-in-compute support. It turns
out the only way in which shader stages are allowed to differ in their
interfaces is in the number of components per vector; the base type must
be the same. Since we are using the raw buffer instead of attributes, we
can now also emit arrays and matrices directly into the buffer, instead
of flattening them and then unpacking them. Structs are still flattened,
however; this is due to the need to handle vectors with fewer components
than were output, and I think handling this while also directly emitting
structs could get ugly.
Another advantage of this scheme is that the extra invocations needed to
read the attributes when there were more input than output points are
now no more. The number of threads per workgroup is now lcm(SIMD-size,
output control points). This should ensure we always process a whole
number of patches per workgroup.
To avoid complexity handling indices in the tessellation control shader,
I've also changed the way vertex shaders for tessellation are handled.
They are now compute kernels using Metal's support for vertex-style
stage input. This lets us always emit vertices into the buffer in order
of vertex shader execution. Now we no longer have to deal with indexing
in the tessellation control shader. This also fixes a long-standing
issue where if an index were greater than the number of vertices to
draw, the vertex shader would wind up writing outside the buffer, and
the vertex would be lost.
This is a breaking change, and I know SPIRV-Cross has other clients, so
I've hidden this behind an option for now. In the future, I want to
remove this option and make it the default.
When inside a loop, treat any read of outer expressions to happen
multiple times, forcing a temporary of said outer expressions.
This avoids the problem where we can end up relying on loop-invariant code motion to happen in the
compiler when converting optimized shaders.
If a buffer rewrites its Offsets, all member references to that struct
are invalidated, and must be redirected, do so in to_member_reference,
but there might be other places where this is needed. Fix as required.
SPIR-V code relying on this is somewhat questionable, but seems to be
in-spec.
This is triggered when building projects that depend on SPIRV-Cross with
clang-cl on Windows with `-pedantic`
../../third_party/spirv-cross/spirv_common.hpp(781,3): error: enumerator
value is not representable in the underlying type 'int'
[-Werror,-Wmicrosoft-enum-value]
NoDominator = 0xffffffffu
Some fallout where internal functions are using stronger types.
Overkill to move everything over to strong types right now, but perhaps
move over to it slowly over time.
This change introduces functions and in one case, a class, to support
the `VK_KHR_sampler_ycbcr_conversion` extension. Except in the case of
GBGR8 and BGRG8 formats, for which Metal natively supports implicit
chroma reconstruction, we're on our own here. We have to do everything
ourselves. Much of the complexity comes from the need to support
multiple planes, which must now be passed to functions that use the
corresponding combined image-samplers. The rest is from the actual
Y'CbCr conversion itself, which requires additional post-processing of
the sample retrieved from the image.
Passing sampled images to a function was a particular problem. To
support this, I've added a new class which is emitted to MSL shaders
that pass sampled images with Y'CbCr conversions attached around. It
can handle sampled images with or without Y'CbCr conversion. This is an
awful abomination that should not exist, but I'm worried that there's
some shader out there which does this. This support requires Metal 2.0
to work properly, because it uses default-constructed texture objects,
which were only added in MSL 2. I'm not even going to get into arrays of
combined image-samplers--that's a whole other can of worms. They are
deliberately unsupported in this change.
I've taken the liberty of refactoring the support for texture swizzling
while I'm at it. It's now treated as a post-processing step similar to
Y'CbCr conversion. I'd like to think this is cleaner than having
everything in `to_function_name()`/`to_function_args()`. It still looks
really hairy, though. I did, however, get rid of the explicit type
arguments to `spvGatherSwizzle()`/`spvGatherCompareSwizzle()`.
Update the C API. In addition to supporting this new functionality, add
some compiler options that I added in previous changes, but for which I
neglected to update the C API.
This command allows the caller to set the base value of
`BuiltInWorkgroupId`, and thus of `BuiltInGlobalInvocationId`. Metal
provides no direct support for this... but it does provide a builtin,
`[[grid_origin]]`, normally used to pass the base values for the stage
input region, which we will now abuse to pass the dispatch base and
avoid burning a buffer binding.
`[[grid_origin]]`, as part of Metal's support for compute stage input,
requires MSL 1.2. For 1.0 and 1.1, we're forced to provide a buffer.
(Curiously, this builtin was undocumented until the MSL 2.2 release. Go
figure.)
This is quite complex since we cannot flush Phi inside the case labels,
we have to do it outside by emitting a lot of manual branches ourselves.
This should be extremely rare, but we need to handle this case.
We used to use the Binding decoration for this, but this method is
hopelessly broken. If no explicit MSL resource remapping exists, we
remap automatically in a manner which should always "just work".
- Replace ostringstream with custom implementation.
~30% performance uplift on vector-shuffle-oom test.
Allocations are measurably reduced in Valgrind.
- Replace std::vector with SmallVector.
Classic malloc optimization, small vectors are backed by inline data.
~ 7-8% gain on vector-shuffle-oom on GCC 8 on Linux.
- Use an object pool for IVariant type.
We generally allocate a lot of SPIR* objects. We can amortize these
allocations neatly by pooling them.
- ~15% overall uplift on ./test_shaders.py --iterations 10000 shaders/.
This is a pragmatic trick to avoid symbol collision where a project
links against SPIRV-Cross statically, while linking to other projects
which also use SPIRV-Cross statically. We can end up with very awkward
symbol collisions which can resolve themselves silently because
SPIRV-Cross is pulled in as necessary. To fix this, we must use
different symbols and embed two copies of SPIRV-Cross in this scenario,
now with different namespaces, which in turn leads to different symbols.
This adds a new C API for SPIRV-Cross which is intended to be stable,
both API and ABI wise.
The C++ API has been refactored a bit to make the C wrapper easier and
cleaner to write. Especially the vertex attribute / resource interfaces
for MSL has been rewritten to avoid taking mutable pointers into the
interface. This would be very annoying to wrap and it didn't fit well
with the rest of the C++ API to begin with. While doing this, I went
ahead and removed all the old deprecated interfaces.
The CMake build system has also seen an overhaul.
It is now possible to build static/shared/CLI separately with -D
options.
The shared library only exposes the C API, as it is the only ABI-stable
API. pkg-configs as well as CMake modules are exported and installed for
the shared library configuration.