There is no direct way to express this, so invert boolean results to
force any NaN -> true. glslang emits Ordered compare instructions
everywhere, and the GLSL spec is not clear on this, so assume this is
fine.
It is possible for a shader to declare two plain struct types which
simply share the same OpName without there being an implicit
value/buffer alias relationship.
For to_member_name(), make sure to use the type alias master when
resolving member names. The member name may be different in a type alias
master if the SPIR-V is being intentionally difficult.
First, when generating from HLSL before invoking the code that comes from the HLSL patch-function a control-flow and full memory-barrier are required to ensure that all the temporary values in thread-local storage for the patch are available.
Second, the inputs to control and evaluation shaders must be properly forwarded from the global variables in SPIRV to the member variables in the relevant input structure.
Finally when arrays of interpolators are used for input or output we need to add an extra level of array indirection because Metal works at a different granularity than SPIRV.
Five parts.
1. Fix tessellation patch function processing.
2. Fix loads from tessellation control inputs not being forwarded to the gl_in structure array.
3. Fix loads from tessellation evaluation inputs not being forwarded to the stage_in structure array.
4. Workaround SPIRV losing an array indirection in tessellation shaders - not the best solution but enough to keep things progressing.
5. Apparently gl_TessLevelInner/Outer is special and needs to not be placed into the input array.
Some fallout where internal functions are using stronger types.
Overkill to move everything over to strong types right now, but perhaps
move over to it slowly over time.
Vulkan has two types of buffer descriptors,
`VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC` and
`VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC`, which allow the client to
offset the buffers by an amount given when the descriptor set is bound
to a pipeline. Metal provides no direct support for this when the buffer
in question is in an argument buffer, so once again we're on our own.
These offsets cannot be stored or associated in any way with the
argument buffer itself, because they are set at bind time. Different
pipelines may have different offsets set. Therefore, we must use a
separate buffer, not in any argument buffer, to hold these offsets. Then
the shader must manually offset the buffer pointer.
This change fully supports arrays, including arrays of arrays, even
though Vulkan forbids them. It does not, however, support runtime
arrays. Perhaps later.
This was straightforward to implement in GLSL. The
`ShadingRateInterlockOrderedEXT` and `ShadingRateInterlockUnorderedEXT`
modes aren't implemented yet, because we don't support
`SPV_NV_shading_rate` or `SPV_EXT_fragment_invocation_density` yet.
HLSL and MSL were more interesting. They don't support this directly,
but they do support marking resources as "rasterizer ordered," which
does roughly the same thing. So this implementation scans all accesses
inside the critical section and marks all storage resources found
therein as rasterizer ordered. They also don't support the fine-grained
controls on pixel- vs. sample-level interlock and disabling ordering
guarantees that GLSL and SPIR-V do, but that's OK. "Unordered" here
merely means the order is undefined; that it just so happens to be the
same as rasterizer order is immaterial. As for pixel- vs. sample-level
interlock, Vulkan explicitly states:
> With sample shading enabled, [the `PixelInterlockOrderedEXT` and
> `PixelInterlockUnorderedEXT`] execution modes are treated like
> `SampleInterlockOrderedEXT` or `SampleInterlockUnorderedEXT`
> respectively.
and:
> If [the `SampleInterlockOrderedEXT` or `SampleInterlockUnorderedEXT`]
> execution modes are used in single-sample mode they are treated like
> `PixelInterlockOrderedEXT` or `PixelInterlockUnorderedEXT`
> respectively.
So this will DTRT for MoltenVK and gfx-rs, at least.
MSL additionally supports multiple raster order groups; resources that
are not accessed together can be placed in different ROGs to allow them
to be synchronized separately. A more sophisticated analysis might be
able to place resources optimally, but that's outside the scope of this
change. For now, we assign all resources to group 0, which should do for
our purposes.
`glslang` doesn't support the `RasterizerOrdered` UAVs this
implementation produces for HLSL, so the test case needs `fxc.exe`.
It also insists on GLSL 4.50 for `GL_ARB_fragment_shader_interlock`,
even though the spec says it needs either 4.20 or
`GL_ARB_shader_image_load_store`; and it doesn't support the
`GL_NV_fragment_shader_interlock` extension at all. So I haven't been
able to test those code paths.
Fixes#1002.
This change introduces functions and in one case, a class, to support
the `VK_KHR_sampler_ycbcr_conversion` extension. Except in the case of
GBGR8 and BGRG8 formats, for which Metal natively supports implicit
chroma reconstruction, we're on our own here. We have to do everything
ourselves. Much of the complexity comes from the need to support
multiple planes, which must now be passed to functions that use the
corresponding combined image-samplers. The rest is from the actual
Y'CbCr conversion itself, which requires additional post-processing of
the sample retrieved from the image.
Passing sampled images to a function was a particular problem. To
support this, I've added a new class which is emitted to MSL shaders
that pass sampled images with Y'CbCr conversions attached around. It
can handle sampled images with or without Y'CbCr conversion. This is an
awful abomination that should not exist, but I'm worried that there's
some shader out there which does this. This support requires Metal 2.0
to work properly, because it uses default-constructed texture objects,
which were only added in MSL 2. I'm not even going to get into arrays of
combined image-samplers--that's a whole other can of worms. They are
deliberately unsupported in this change.
I've taken the liberty of refactoring the support for texture swizzling
while I'm at it. It's now treated as a post-processing step similar to
Y'CbCr conversion. I'd like to think this is cleaner than having
everything in `to_function_name()`/`to_function_args()`. It still looks
really hairy, though. I did, however, get rid of the explicit type
arguments to `spvGatherSwizzle()`/`spvGatherCompareSwizzle()`.
Update the C API. In addition to supporting this new functionality, add
some compiler options that I added in previous changes, but for which I
neglected to update the C API.
ESSL does not support `GL_ARB_post_depth_coverage`. There, we must use
`GL_EXT_post_depth_coverage`. I've added this as a fallback for desktop
as well.
Note that `GL_EXT_post_depth_coverage` also requires the fragment shader
to set `early_fragment_tests` explicitly, while
`GL_ARB_post_depth_coverage` does not. It doesn't really matter either
way, since `SPV_KHR_post_depth_coverage` *also* requires both execution
modes to be explicitly set.
We were going down a tree of expressions multiple times and this caused
an exponential explosion in time, which was not caught until recently.
Fix this by blocking any traversal going through an ID more than one
time.
This fix overall improves performance by almost an order of magnitude on a
particular test shader rather than slowing it down by ~75x.