In Metal, the `[[position]]` input to a fragment shader remains at
fragment center, even at sample rate, like OpenGL and Direct3D. In
Vulkan, however, when the fragment shader runs at sample rate, the
`FragCoord` builtin moves to the sample position in the framebuffer,
instead of the fragment center. To account for this difference, adjust
the `FragCoord`, if present, by the sample position. The -0.5 offset is
because the fragment center is at (0.5, 0.5).
Also, add an option to force sample-rate shading in a fragment shader.
Since Metal has no explicit control for this, this is done by adding a
dummy `[[sample_id]]` which is otherwise unused, if none is already
present. This is intended to be used from e.g. MoltenVK when a
pipeline's `minSampleShading` value is nonzero.
Instead of checking if any `Input` variables have `Sample`
interpolation, I've elected to check that the `SampleRateShading`
capability is present. Since `SampleId`, `SamplePosition`, and the
`Sample` interpolation decoration require this cap, this should be
equivalent for any valid SPIR-V module. If this isn't acceptable, let me
know.
I kept the code to replace constant zero arguments, because `Bias` and
`Grad` still have some problems on desktop GPUs.
`Bias` works on AMD GPUs. `Grad` does not. Both work on Intel. Still
needs testing on NV. It will definitely work with Apple GPUs.
These need to use arrayed texture types, or Metal will complain when
binding the resource. The target layer is addressed relative to the
Layer output by the vertex pipeline, or to the ViewIndex if in a
multiview pipeline. Unlike with the s/t coordinates, Vulkan does not
forbid non-zero layer coordinates here, though this cannot be expressed
in Vulkan GLSL.
Supporting 3D textures will require additional work. Part of the problem
is that Metal does not allow texture views to subset a 3D texture, so we
need some way to pass the base depth to the shader.
In Metal render pipelines don't have an option to set a sampleMask
parameter, the only way to get that functionality is to set the
sample_mask output of the fragment shader to this value directly.
We also need to take care to combine the fixed sample mask with the
one that the shader might possibly output.
Metal is picky about interface matching. If the types don't match
exactly, down to the number of vector components, Metal fails pipline
compilation. To support pipelines where the number of components
consumed by the fragment shader is less than that produced by the vertex
shader, we have to fix up the fragment shader to accept all the
components produced.
Like with `point_size` when not rendering points, Metal complains when
writing to a variable using the `[[depth]]` qualifier when no depth
buffer be attached. In that case, we must avoid emitting `FragDepth`,
just like with `PointSize`.
I assume it will also complain if there be no stencil attachment and the
shader write to `[[stencil]]`, or it write to `[[color(n)]]` but there
be no color attachment at n.
MSL does not support this, so we have to emulate it by passing it around
as a varying between stages. We use a special "user(clipN)" attribute
for this rather than locN which is used for user varyings.
If there are enough members in an IAB, we cannot use the constant
address space as MSL compiler complains about there being too many
members. Support emitting the device address space instead.
This was straightforward to implement in GLSL. The
`ShadingRateInterlockOrderedEXT` and `ShadingRateInterlockUnorderedEXT`
modes aren't implemented yet, because we don't support
`SPV_NV_shading_rate` or `SPV_EXT_fragment_invocation_density` yet.
HLSL and MSL were more interesting. They don't support this directly,
but they do support marking resources as "rasterizer ordered," which
does roughly the same thing. So this implementation scans all accesses
inside the critical section and marks all storage resources found
therein as rasterizer ordered. They also don't support the fine-grained
controls on pixel- vs. sample-level interlock and disabling ordering
guarantees that GLSL and SPIR-V do, but that's OK. "Unordered" here
merely means the order is undefined; that it just so happens to be the
same as rasterizer order is immaterial. As for pixel- vs. sample-level
interlock, Vulkan explicitly states:
> With sample shading enabled, [the `PixelInterlockOrderedEXT` and
> `PixelInterlockUnorderedEXT`] execution modes are treated like
> `SampleInterlockOrderedEXT` or `SampleInterlockUnorderedEXT`
> respectively.
and:
> If [the `SampleInterlockOrderedEXT` or `SampleInterlockUnorderedEXT`]
> execution modes are used in single-sample mode they are treated like
> `PixelInterlockOrderedEXT` or `PixelInterlockUnorderedEXT`
> respectively.
So this will DTRT for MoltenVK and gfx-rs, at least.
MSL additionally supports multiple raster order groups; resources that
are not accessed together can be placed in different ROGs to allow them
to be synchronized separately. A more sophisticated analysis might be
able to place resources optimally, but that's outside the scope of this
change. For now, we assign all resources to group 0, which should do for
our purposes.
`glslang` doesn't support the `RasterizerOrdered` UAVs this
implementation produces for HLSL, so the test case needs `fxc.exe`.
It also insists on GLSL 4.50 for `GL_ARB_fragment_shader_interlock`,
even though the spec says it needs either 4.20 or
`GL_ARB_shader_image_load_store`; and it doesn't support the
`GL_NV_fragment_shader_interlock` extension at all. So I haven't been
able to test those code paths.
Fixes#1002.
Fix fallout from changes.
There's a bug in glslang that prevents `float16_t`, `[u]int16_t`, and
`[u]int8_t` constants from adding the corresponding SPIR-V capabilities.
SPIRV-Tools, meanwhile, tightened validation so that these constants are
only valid if the corresponding `Float16`, `Int16`, and `Int8` caps are
on. This affects the `16bit-constants.frag` test for GLSL and MSL.
Using the `PostDepthCoverage` mode specifies that the `gl_SampleMaskIn`
variable is to contain the computed coverage mask following the early
fragment tests, which this mode requires and implicitly enables.
Note that unlike Vulkan and OpenGL, Metal places this on the sample mask
input itself, and furthermore does *not* implicitly enable early
fragment testing. If it isn't enabled explicitly with an
`[[early_fragment_tests]]` attribute, the compiler will error out. So we
have to enable that mode explicitly if `PostDepthCoverage` is enabled
but `EarlyFragmentTests` isn't.
For Metal, only iOS supports this; for some reason, Apple has yet to
implement it on macOS, even though many desktop cards support it.
There is a case where we can deduce a for/while loop, but the continue
block is actually very painful to deal with, so handle that case as
well. Removes an exceptional case.
If we compile multiple times due to forced_recompile, we had
deferred_declaration = true while emitting function prototypes which
broke an assumption. Fix this by clearing out stale state before leaving
a function.
Change aux buffer to swizzle buffer.
There is no good reason to expand the aux buffer, so name it
appropriately.
Make the code cleaner by emitting a straight pointer to uint rather than
a dummy struct which only contains a single unsized array member anyways.
This will also end up being very similar to how we implement swizzle
buffers for argument buffers.
Do not use implied binding if it overflows int32_t.
MSL does not seem to have a qualifier for this, but HLSL SM 5.1 does.
glslangValidator for HLSL does not support this, so skip any validation,
but it passes in FXC.
Atomics are not supported on images or texture_buffers in MSL.
Properly throw an error if OpImageTexelPointer is used (since it can
only be used for atomic operations anyways).
Storage was in place already, so mostly just dealing with bitcasts and
constants.
Simplies some of the bitcasting logic, and this exposed some bugs in the
implementation. Refactor to use correct width integers with explicit bitcast opcodes.
If not enough components are provided in the shader,
the shader MSL compiler throws an error rather than make components
undefined. This hurts portability, so we need to add explicit padding
here.
This is required to avoid relying on complex sub-expression elimination
in compilers, and generates cleaner code.
The problem case is if a complex expression is used in an access chain,
like:
Composite comp = buffer[texture(...)];
vec4 a = comp.a + comp.b + comp.c;
Before, we did not have common subexpression tracking for
OpLoad/OpAccessChain, so we easily ended up with code like:
vec4 a = buffer[texture(...)].a + buffer[texture(...)].b + buffer[texture(...)].c;
A good compiler will optimize this, but we should not rely on it, and
forcing texture(...) to a temporary also looks better.
The solution is to add a vector "implied_expression_reads", which works
similarly to expression_dependencies. We also need an extra mechanism in
to_expression which lets us skip expression read checking and do it
later. E.g. for expr -> access chain -> load, we should only trigger
a read of expr when using the loaded expression.
- Add new Windows support
- Use CMake/CTest instead of Make + shell scripts
- Use --parallel in CTest
- Fix CTest on Windows
- Cleanups in test_shaders.py
- Force specific commit for SPIRV-Headers
- Fix Inf/NaN odd-ball case by moving to ASM