Using vertex-style stage input is complex, and it doesn't support
nesting of structures or arrays. By using raw buffer input instead, we
get this support "for free," and everything becomes much simpler.
Arguably, this is the way I should've done this in the first place.
Eventually, I'd like to make this the default, and then remove the
option altogether. (And I still need to do that with
`multi_patch_workgroup`...)
Should help fix 66 tests in the Vulkan CTS, under the following trees:
- `dEQP-VK.pipeline.*.interface_matching.*`
- `dEQP-VK.tessellation.user_defined_io.*`
- `dEQP-VK.clipping.user_defined.*`
- Add CompilerMSL::emit_binary_ptr_op() and to_ptr_expression()
to emit binary pointer op. Compare matrix addresses without automatic
transpose() conversion, to avoid error taking address of temporary copy.
- Add Compiler::add_active_interface_variable() to also track active
interface vars in the entry point for SPIR-V 1.4 and above.
- For OpPtrAccessChain that ends in array element, use Element
as offset to existing index, otherwise it will access into
array dimension that doesn't exist.
- Dereference pointer function call arguments. Ultimately, this
dereferencing is actually backwards, and in future, we should aim
to properly support passing pointer variables between functions,
but such a refactoring was beyond the scope here.
- Use [] to declare array of pointers, as array<T*> is not supported in MSL.
- Add unit test shaders.
Clang added -Wunqualified-std-cast-call in
https://reviews.llvm.org/D119670, which warns on unqualified std::move
and std::forward calls. This change qualifies these calls to allow the
project to build on HEAD Clang -Werror.
Introduces an idea of a recompilation making forward progress.
There are some extreme edge cases where we need more than 3 loops, but
only allow this in specific circumstances where we can reason about
forward progress being made.
We don't need to keep track of them because when the block.condition is
either a SPIRConstant or a SPIRVariable, we can get the type directly in
the get_case_list.
Signed-off-by: Sebastián Aedo <saedo@codeweavers.com>
Now we added block.cases_32bit as requested and we only parse if the
remaining ops are a multiple of 2. None of them are mutable because we
return a reference of them depending of the op.condition width.
Signed-off-by: Sebastián Aedo <saedo@codeweavers.com>
SPIR-V allows an image to be marked as a depth image, but with a non-depth
format. Such images should be read or sampled as vectors instead of scalars,
except when they are subject to compare operations.
Don't mark an OpSampledImage as using a compare operation just because the
image contains a depth marker. Instead, require that a compare operation
is actually used on that image.
Compiler::image_is_comparison() was really testing whether an image is a
depth image, since it incorporates the depth marker. Rename that function
to is_depth_image(), to clarify what it is really testing.
In Compiler::is_depth_image(), do not treat an image as a depth image
if it has been explicitly marked with a color format, unless the image
is subject to compare operations.
In CompilerMSL::to_function_name(), test for compare operations
specifically, rather than assuming them from the depth-image marker.
CompilerGLSL and CompilerMSL still contain a number of internal tests that
use is_depth_image() both for testing for a depth image, and for testing
whether compare operations are being used. I've left these as they are
for now, but these should be cleaned up at some point.
Add unit tests for fetch/sample depth images with color formats and no compare ops.
This is somewhat awkward to support, but the best effort we can do here
is to analyze various Load/Store opcodes and deduce the ideal overall
alignment based on this. This is not a 100% perfect solution, but should
be correct for any reasonable use case.
Also fix various nitpicks with BDA support while I'm at it.
We don't need to keep track of the type itself, only its width since the
type check of the OpSwitch can be done at runtime. This also avoids
creating a dangling reference.
Signed-off-by: Sebastián Aedo <saedo@codeweavers.com>
Moving out the logic from the parser as requested because it's sensitive
to try to keep the parsing the most simple process as said.
For that, the load_types is now tracked in the ParsedIR, which can be
accessed in the Compiler struct. The switch cases are fixed in the CFG
stage since that's the point where the nullptr is deref.
Signed-off-by: Sebastián Aedo <saedo@codeweavers.com>
Certain shaders where functions have a *ton* of merging control flow
will end up with exponential time complexity to figure out parameter
preservation semantics.
The trivial fix to make it O(1) again is to terminate recursive traversal early if we've seen
the path before. Simple oversight :(
Fairly minor differences, so can keep them side by side without too much
effort. NV support is effectively deprecated now however.
- Add OpConvertUToAccelerationStructureKHR
- Ignore/Terminate ray is now a terminator in KHR, but a call in NV.
- Fix some bugs with reportIntersection.
Subsequent stages can legally attempt to read from these variables,
which causes compilation failure.
Always make sure we emit user outputs in vertex shaders if they are
active in the entry point.
New in MSL 2.3 is a template that can be used in the place of a scalar
type in a stage-in struct. This template has methods which interpolate
the varying at the given points. Curiously, you can't set interpolation
attributes on such a varying; perspective-correctness is encoded in the
type, while interpolation must be done using one of the methods. This
makes using this somewhat awkward from SPIRV-Cross, requiring us to jump
through a bunch of hoops to make this all work.
Using varyings from functions in particular is a pain point, requiring
us to pass the stage-in struct itself around. An alternative is to pass
references to the interpolants; except this will fall over badly with
composite types, which naturally must be flattened. As with
tessellation, dynamic indexing isn't supported with pull-model
interpolation. This is because of the need to reference the original
struct member in order to call one of the pull-model interpolation
methods on it. Also, this is done at the variable level; this means that
if one varying in a struct is used with the pull-model functions, then
the entire struct is emitted as pull-model interpolants.
For some reason, this was not documented in the MSL spec, though there
is a property on `MTLDevice`, `supportsPullModelInterpolation`,
indicating support for this, which *is* documented. This does not appear
to be implemented yet for AMD: it returns `NO` from
`supportsPullModelInterpolation`, and pipelines with shaders using the
templates fail to compile. It *is* implemeted for Intel. It's probably
also implemented for Apple GPUs: on Apple Silicon, OpenGL calls down to
Metal, and it wouldn't be possible to use the interpolation functions
without this implemented in Metal.
Based on my testing, where SPIR-V and GLSL have the offset relative to
the pixel center, in Metal it appears to be relative to the pixel's
upper-left corner, as in HLSL. Therefore, I've added an offset 0.4375,
i.e. one half minus one sixteenth, to all arguments to
`interpolate_at_offset()`.
This also fixes a long-standing bug: if a pull-model interpolation
function is used on a varying, make sure that varying is declared. We
were already doing this only for the AMD pull-model function,
`interpolateAtVertexAMD()`; for reasons which are completely beyond me,
we weren't doing this for the base interpolation functions. I also note
that there are no tests for the interpolation functions for GLSL or
HLSL.
In some cases, we need to get a literal value from a spec constant op.
Mostly relevant when emitting buffers, so implement a 32-bit integer
scalar subset of the evaluator. Can be extended as needed to support
evaluating any specialization constant operation.
When inside a loop, treat any read of outer expressions to happen
multiple times, forcing a temporary of said outer expressions.
This avoids the problem where we can end up relying on loop-invariant code motion to happen in the
compiler when converting optimized shaders.