Fragment shaders that require explicit early fragment tests are incompatible
with specifying depth and stencil values within the shader. If explicit early
fragment tests is specified, remove the depth and stencil outputs from the
output structure, and replace them with dummy local variables.
Add CompilerMSL:uses_explicit_early_fragment_test() function to consolidate
testing for whether early fragment tests are required.
Add two unit tests for depth-out with, and without, early fragment tests.
SPIR-V allows an image to be marked as a depth image, but with a non-depth
format. Such images should be read or sampled as vectors instead of scalars,
except when they are subject to compare operations.
Don't mark an OpSampledImage as using a compare operation just because the
image contains a depth marker. Instead, require that a compare operation
is actually used on that image.
Compiler::image_is_comparison() was really testing whether an image is a
depth image, since it incorporates the depth marker. Rename that function
to is_depth_image(), to clarify what it is really testing.
In Compiler::is_depth_image(), do not treat an image as a depth image
if it has been explicitly marked with a color format, unless the image
is subject to compare operations.
In CompilerMSL::to_function_name(), test for compare operations
specifically, rather than assuming them from the depth-image marker.
CompilerGLSL and CompilerMSL still contain a number of internal tests that
use is_depth_image() both for testing for a depth image, and for testing
whether compare operations are being used. I've left these as they are
for now, but these should be cleaned up at some point.
Add unit tests for fetch/sample depth images with color formats and no compare ops.
This is somewhat awkward to support, but the best effort we can do here
is to analyze various Load/Store opcodes and deduce the ideal overall
alignment based on this. This is not a 100% perfect solution, but should
be correct for any reasonable use case.
Also fix various nitpicks with BDA support while I'm at it.
Apparently, it's legal to use a selection construct where both paths
branch to same location, but a different merge point is used.
This breaks many assumptions the variable scope analyzer makes.
The only logical way to generate code for this scenario is to treat the
selection construct as a trivial switch construct with only a default
case.
Previous test for SPIRVCrossDecorationPhysicalTypePacked on parent struct
when unpacking member struct was too restrictive, and not needed as long
as padding compensates.
Populate member_type_index_redirection as reverse lookup, not forward lookup.
Move use of member_type_index_redirection from CompilerMSL::to_member_reference()
to CompilerGLSL::access_chain_internal() to access all redirected type info,
not just name.
Promote to short instead and do simple casts on load/store instead.
Not 100% complete fix since structs can contain booleans, but this is
getting into pretty ridiculously complicated territory.
Emit synthetic functions before function constants.
Support use of spvQuantizeToF16() in function constants for numerical
behavior consistency with the op code.
Ensure subnormal results from OpQuantizeToF16 are flushed to zero per SPIR-V spec.
Adjust SPIRV-Cross unit test reference shaders to accommodate these changes.
Any MSL reference shader that inclues a synthetic function is affected,
since the location it is emitted has changed.
Add spvQuantizeToF16() family of synthetic functions to convert
from float to half and back again, and add function attribute
[[clang::optnone]] to honor infinities during conversions.
Adjust SPIRV-Cross unit test reference shaders to accommodate these changes.
Add [[clang::optnone]] attribute to spvF*() functions used for handling
floating point operations decorated with DecorationNoContraction.
Just using precise::fma() did not work.
Adjust SPIRV-Cross unit test reference shaders to accommodate these changes.
Based on CTS testing, math optimizations between MSL and Vulkan are inconsistent.
In some cases, enabling MSL's fast-math compilation option matches Vulkan's math
results. In other cases, disabling it does. Broadly enabling or disabling fast-math
across all shaders results in some CTS test failures either way.
To fix this, selectively enable/disable fast-math optimizations in the MSL code,
using metal::fast and metal::precise function namespaces, where supported, and
the [[clang::optnone]] function attribute otherwise.
Adjust SPIRV-Cross unit test reference shaders to accommodate these changes.
Add test shader for new functionality.
Add legacy test reference shader for unrelated buffer-bitcast
test, that doesn't seem to have been added previously.
When gl_Position is defined by SPIR-V, but neither used nor initialized,
it appeared twice in the MSL output, as gl_Position and glPosition_1.
The existing tests for whether an output is active check only that it is
used by an op, or initialized. Adding the implicit gl_Position also marked
the existing gl_Position as active, duplicating the output variable.
Fix is that when checking for the need to add an implicit gl_Position
output, also check if the var is already defined in the shader,
and just needs to be marked as active.
Add test shader.
Vulkan specifies that the Sample Mask Test occurs before fragment shading.
This means gl_SampleMaskIn should be influenced by both sample-shading and
VkPipelineMultisampleStateCreateInfo::pSampleMask.
CTS tests dEQP-VK.pipeline.multisample_shader_builtin.* bear this out.
For sample-shading, gl_SampleMaskIn should only have a single bit set,
Since Metal does not filter for this, apply a bitmask based on gl_SampleID.
For a fixed sample mask, since Metal is unaware of
VkPipelineMultisampleStateCreateInfo::pSampleMask, we need to ensure that
we apply it to both gl_SampleMaskIn and gl_SampleMask. This has the side
effect of a redundant application of pSampleMask if the shader already
includes gl_SampleMaskIn when setting gl_SampleMask, but I don't see an
easy way around this.
Also, simplify the logic for including the fixed sample mask in gl_ShaderMask,
and print the fixed sample mask as a hex value for readability of bits.
Emit block members directly in the IO structs and sort them.
Ensures we can get some kind of stable order between stages.
To complete the story, we'll need to be able to inject unused inputs /
builtins, or eliminate unused outputs (probably easiest solution).