This is implied in both GL and GLES. Emitting memoryBarrierShared() was
based on earlier confusion in the spec which has since been fixed and
clarified.
We had a case where loops were marked complex in a cascading fashion
where each loop iteration would discover one new complex loop. This was
a problem with three nested loops.
This will be awkward to report in GLSL where we check multiple packing
standards, but for HLSL it should be easy since there's only CBuffer
packing standard to worry about.
There is no direct way to express this, so invert boolean results to
force any NaN -> true. glslang emits Ordered compare instructions
everywhere, and the GLSL spec is not clear on this, so assume this is
fine.
It is possible for a shader to declare two plain struct types which
simply share the same OpName without there being an implicit
value/buffer alias relationship.
For to_member_name(), make sure to use the type alias master when
resolving member names. The member name may be different in a type alias
master if the SPIR-V is being intentionally difficult.
First, when generating from HLSL before invoking the code that comes from the HLSL patch-function a control-flow and full memory-barrier are required to ensure that all the temporary values in thread-local storage for the patch are available.
Second, the inputs to control and evaluation shaders must be properly forwarded from the global variables in SPIRV to the member variables in the relevant input structure.
Finally when arrays of interpolators are used for input or output we need to add an extra level of array indirection because Metal works at a different granularity than SPIRV.
Five parts.
1. Fix tessellation patch function processing.
2. Fix loads from tessellation control inputs not being forwarded to the gl_in structure array.
3. Fix loads from tessellation evaluation inputs not being forwarded to the stage_in structure array.
4. Workaround SPIRV losing an array indirection in tessellation shaders - not the best solution but enough to keep things progressing.
5. Apparently gl_TessLevelInner/Outer is special and needs to not be placed into the input array.
Some fallout where internal functions are using stronger types.
Overkill to move everything over to strong types right now, but perhaps
move over to it slowly over time.
Vulkan has two types of buffer descriptors,
`VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC` and
`VK_DESCRIPTOR_TYPE_STORAGE_BUFFER_DYNAMIC`, which allow the client to
offset the buffers by an amount given when the descriptor set is bound
to a pipeline. Metal provides no direct support for this when the buffer
in question is in an argument buffer, so once again we're on our own.
These offsets cannot be stored or associated in any way with the
argument buffer itself, because they are set at bind time. Different
pipelines may have different offsets set. Therefore, we must use a
separate buffer, not in any argument buffer, to hold these offsets. Then
the shader must manually offset the buffer pointer.
This change fully supports arrays, including arrays of arrays, even
though Vulkan forbids them. It does not, however, support runtime
arrays. Perhaps later.
This was straightforward to implement in GLSL. The
`ShadingRateInterlockOrderedEXT` and `ShadingRateInterlockUnorderedEXT`
modes aren't implemented yet, because we don't support
`SPV_NV_shading_rate` or `SPV_EXT_fragment_invocation_density` yet.
HLSL and MSL were more interesting. They don't support this directly,
but they do support marking resources as "rasterizer ordered," which
does roughly the same thing. So this implementation scans all accesses
inside the critical section and marks all storage resources found
therein as rasterizer ordered. They also don't support the fine-grained
controls on pixel- vs. sample-level interlock and disabling ordering
guarantees that GLSL and SPIR-V do, but that's OK. "Unordered" here
merely means the order is undefined; that it just so happens to be the
same as rasterizer order is immaterial. As for pixel- vs. sample-level
interlock, Vulkan explicitly states:
> With sample shading enabled, [the `PixelInterlockOrderedEXT` and
> `PixelInterlockUnorderedEXT`] execution modes are treated like
> `SampleInterlockOrderedEXT` or `SampleInterlockUnorderedEXT`
> respectively.
and:
> If [the `SampleInterlockOrderedEXT` or `SampleInterlockUnorderedEXT`]
> execution modes are used in single-sample mode they are treated like
> `PixelInterlockOrderedEXT` or `PixelInterlockUnorderedEXT`
> respectively.
So this will DTRT for MoltenVK and gfx-rs, at least.
MSL additionally supports multiple raster order groups; resources that
are not accessed together can be placed in different ROGs to allow them
to be synchronized separately. A more sophisticated analysis might be
able to place resources optimally, but that's outside the scope of this
change. For now, we assign all resources to group 0, which should do for
our purposes.
`glslang` doesn't support the `RasterizerOrdered` UAVs this
implementation produces for HLSL, so the test case needs `fxc.exe`.
It also insists on GLSL 4.50 for `GL_ARB_fragment_shader_interlock`,
even though the spec says it needs either 4.20 or
`GL_ARB_shader_image_load_store`; and it doesn't support the
`GL_NV_fragment_shader_interlock` extension at all. So I haven't been
able to test those code paths.
Fixes#1002.
This change introduces functions and in one case, a class, to support
the `VK_KHR_sampler_ycbcr_conversion` extension. Except in the case of
GBGR8 and BGRG8 formats, for which Metal natively supports implicit
chroma reconstruction, we're on our own here. We have to do everything
ourselves. Much of the complexity comes from the need to support
multiple planes, which must now be passed to functions that use the
corresponding combined image-samplers. The rest is from the actual
Y'CbCr conversion itself, which requires additional post-processing of
the sample retrieved from the image.
Passing sampled images to a function was a particular problem. To
support this, I've added a new class which is emitted to MSL shaders
that pass sampled images with Y'CbCr conversions attached around. It
can handle sampled images with or without Y'CbCr conversion. This is an
awful abomination that should not exist, but I'm worried that there's
some shader out there which does this. This support requires Metal 2.0
to work properly, because it uses default-constructed texture objects,
which were only added in MSL 2. I'm not even going to get into arrays of
combined image-samplers--that's a whole other can of worms. They are
deliberately unsupported in this change.
I've taken the liberty of refactoring the support for texture swizzling
while I'm at it. It's now treated as a post-processing step similar to
Y'CbCr conversion. I'd like to think this is cleaner than having
everything in `to_function_name()`/`to_function_args()`. It still looks
really hairy, though. I did, however, get rid of the explicit type
arguments to `spvGatherSwizzle()`/`spvGatherCompareSwizzle()`.
Update the C API. In addition to supporting this new functionality, add
some compiler options that I added in previous changes, but for which I
neglected to update the C API.
ESSL does not support `GL_ARB_post_depth_coverage`. There, we must use
`GL_EXT_post_depth_coverage`. I've added this as a fallback for desktop
as well.
Note that `GL_EXT_post_depth_coverage` also requires the fragment shader
to set `early_fragment_tests` explicitly, while
`GL_ARB_post_depth_coverage` does not. It doesn't really matter either
way, since `SPV_KHR_post_depth_coverage` *also* requires both execution
modes to be explicitly set.
We were going down a tree of expressions multiple times and this caused
an exponential explosion in time, which was not caught until recently.
Fix this by blocking any traversal going through an ID more than one
time.
This fix overall improves performance by almost an order of magnitude on a
particular test shader rather than slowing it down by ~75x.
This is not necessary, as we must emit an invalidating store before we
potentially consume an invalid expression. In fact, we're a bit
conservative here in this case for example:
int tmp = variable;
if (...)
{
variable = 10;
}
else
{
// Consuming tmp here is fine, but it was
// invalidated while emitting other branch.
// Technically, we need to study if there is an invalidating store
// in the CFG between the loading block and this block, and the other
// branch will not be a part of that analysis.
int tmp2 = tmp * tmp;
}
Fixing this case means complex CFG traversal *everywhere*, and it feels like overkill.
Fixing this exposed a bug with access chains, so fix a bug where expression dependencies were not
inherited properly in access chains. Access chains are now considered forwarded if there
is at least one dependency which is also forwarded.
This subtle bug removed any expression validation for trivially swizzled
variables. Make usage suppression a more explicit concept rather than
just hacking off forwarded_temporaries.
There is some fallout here with loop generation since our expression
invalidation is currently a bit too naive to handle loops properly.
The forwarding bug masked this problem until now.
If part of the loop condition is also used in the body, we end up
reading an invalid expression, which in turn forces a temporary to be
generated in the condition block, not good. We'll need to be smarter
here ...
If this is computed *before* a `demote`, but used *after*, forwarding it
will produce the wrong value. This does make for uglier shaders, but
it's necessary right now to ensure correctness.
I needed to use an assembly shader to produce the test for this.
`spirv-opt` is not smart enough (or too smart?) to eliminate the
variable that would be used in GLSL to express this.
This extension provides a new operation which causes a fragment to be
discarded without terminating the fragment shader invocation. The
invocation for the discarded fragment becomes a helper invocation, so
that derivatives will remain defined. The old `HelperInvocation` builtin
becomes undefined when this occurs, so a second new instruction queries
the current helper invocation status.
This is only fully supported for GLSL. HLSL doesn't support the
`IsHelperInvocation` operation and MSL doesn't support the
`DemoteToHelperInvocation` op.
Fixes#1052.
The only piece added by this extension is the `DeviceIndex` builtin,
which tells the shader which device in a grouped logical device it is
running on.
Metal's pipeline state objects are owned by the `MTLDevice` that created
them. Since Metal doesn't support logical grouping of devices the way
Vulkan does, we'll thus have to create a pipeline state for each device
in a grouped logical device. The upcoming peer group support in Metal 3
will not change this. For this reason, for Metal, the device index is
supplied as a constant at pipeline compile time.
There's an interaction between `VK_KHR_device_group` and
`VK_KHR_multiview` in the
`VK_PIPELINE_CREATE_VIEW_INDEX_FROM_DEVICE_INDEX_BIT`, which defines the
view index to be the same as the device index. The new
`view_index_from_device_index` MSL option supports this functionality.
Using the `PostDepthCoverage` mode specifies that the `gl_SampleMaskIn`
variable is to contain the computed coverage mask following the early
fragment tests, which this mode requires and implicitly enables.
Note that unlike Vulkan and OpenGL, Metal places this on the sample mask
input itself, and furthermore does *not* implicitly enable early
fragment testing. If it isn't enabled explicitly with an
`[[early_fragment_tests]]` attribute, the compiler will error out. So we
have to enable that mode explicitly if `PostDepthCoverage` is enabled
but `EarlyFragmentTests` isn't.
For Metal, only iOS supports this; for some reason, Apple has yet to
implement it on macOS, even though many desktop cards support it.
There is a case where we can deduce a for/while loop, but the continue
block is actually very painful to deal with, so handle that case as
well. Removes an exceptional case.
This decoration might only be present for the very last ID which is
consumed by a sampling or Load/Store instruction. To make sure our
access chains are emitted correctly, we have to back-propagate this
decoration.
This is quite complex since we cannot flush Phi inside the case labels,
we have to do it outside by emitting a lot of manual branches ourselves.
This should be extremely rare, but we need to handle this case.
If we compile multiple times due to forced_recompile, we had
deferred_declaration = true while emitting function prototypes which
broke an assumption. Fix this by clearing out stale state before leaving
a function.
We seen to have to deal with a case where a block is used multiple times
without any "proper" structured control flow, so we risk losing deferred
declaration state.
This gets rather complicated because MSL does not support OpArrayLength
natively. We need to pass down a buffer which contains buffer sizes, and
we compute the array length on-demand.
Support both discrete descriptors as well as argument buffers.
MSL generally emits the aliases, which means we cannot always place the
master type first, unlike GLSL and HLSL. The logic fix is just to
reorder after we have tagged types with packing information, rather than
doing it in the parser fixup.
MSL does not seem to have a qualifier for this, but HLSL SM 5.1 does.
glslangValidator for HLSL does not support this, so skip any validation,
but it passes in FXC.
Buffer objects can contain arbitrary pointers to blocks.
We can also implement ConvertPtrToU and ConvertUToPtr.
The latter can cast a uint64_t to any type as it pleases,
so we will need to generate fake buffer reference blocks to be able to
cast the type.
We made the mistake of registering a dependency on the atomic variable
even if the atomic result was forced to a temporary. There is no need to
register reads from atomic variables like this as we always force atomic
results to a temporary and argument read/writes do not need to be
tracked.
If we generate an access chain in a loop body, and it is consumed in the
loop continue block, we have a problem because we cannot emit a
temporary here holding the access chain reference. Force a complex loop
body to workaround this exceptionally rare case.
- Replace ostringstream with custom implementation.
~30% performance uplift on vector-shuffle-oom test.
Allocations are measurably reduced in Valgrind.
- Replace std::vector with SmallVector.
Classic malloc optimization, small vectors are backed by inline data.
~ 7-8% gain on vector-shuffle-oom on GCC 8 on Linux.
- Use an object pool for IVariant type.
We generally allocate a lot of SPIR* objects. We can amortize these
allocations neatly by pooling them.
- ~15% overall uplift on ./test_shaders.py --iterations 10000 shaders/.
We had a bug where error conditions in DoWhileLoop emit path would not
detect that statements were being emitted due to the masking behavior
which happens when force_recompile is true. Fix this.
Also, refactor force_recompile into member functions so we can properly
break on any situation where this is set, without having to rely on
watchpoints in debuggers.
This is a pragmatic trick to avoid symbol collision where a project
links against SPIRV-Cross statically, while linking to other projects
which also use SPIRV-Cross statically. We can end up with very awkward
symbol collisions which can resolve themselves silently because
SPIRV-Cross is pulled in as necessary. To fix this, we must use
different symbols and embed two copies of SPIRV-Cross in this scenario,
now with different namespaces, which in turn leads to different symbols.
The design of backend.int16_t_literal_suffix and backend.uint16_t_literal_suffix
allows them to be set to null, but that was not always tested for.
I have removed the expectation that they can be null and set
backend.int16_t_literal_suffix to "" when no suffix is needed.
That has the same effect, and seemed to be a more usable and defensive approach.
-1 (0xffffffff) literal means the component should be undefined.
Since we cannot express undefined directly, just use a 0 literal in the
appropriate type.
We have an edge case where the array is declared with a concrete size,
but in GLSL we must emit an unsized array, which breaks array copies.
Deal explicitly with this.
We can trivially deal with cases where the loop tests are simply
inverted. We can also deal with cases where the condition block branches
to the merge block via other noop blocks.
This makes SPIR-V codegen easier when targeting SPIRV-Cross.
We were using std::locale::global() to force a C locale which is not
safe when SPIRV-Cross is used in a multi-threaded environment.
To fix this, we could tap into various per-platform specific locale
handling to get safe thread-local locales, but since locales only affect
the decimal point in floats, we simply query the locale instead and do
the necessary radix replacement ourselves, without touching the locale.
This should be much safer and cleaner than the alternative.
These are mapped to Metal's post-tessellation vertex functions. The
semantic difference is much less here, so this change should be simpler
than the previous one. There are still some hairy parts, though.
In MSL, the array of control point data is represented by a special
type, `patch_control_point<T>`, where `T` is a valid stage-input type.
This object must be embedded inside the patch-level stage input. For
this reason, I've added a new type to the type system to represent this.
On Mac, the number of input control points to the function must be
specified in the `patch()` attribute. This is optional on iOS.
SPIRV-Cross takes this from the `OutputVertices` execution mode; the
intent is that if it's not set in the shader itself, MoltenVK will set
it from the tessellation control shader. If you're translating these
offline, you'll have to update the control point count manually, since
this number must match the number that is passed to the
`drawPatches:...` family of methods.
Fixes#120.
When we force recompile, the old var.self name we used as a fallback
name might have been disturbed, so we should recover certain names back
to their original form in case we are forced to take a recompile to make
the naming algorithm more deterministic.
`bitcast_glsl_op()` is sometimes called for `Boolean` types, e.g. for
specialization constants. We don't want the assert to trip if this is
going to be a no-op anyway.
Storage was in place already, so mostly just dealing with bitcasts and
constants.
Simplies some of the bitcasting logic, and this exposed some bugs in the
implementation. Refactor to use correct width integers with explicit bitcast opcodes.
Structs are aligned as you would expect in MSL (maximum member
alignment), and it is not minimum 16 bytes like in std140.
Also rename the dummy "pad" members to a reserved naming scheme.
MSL does not support value semantics for arrays (sigh), so we need to
force constant references and deal with copies if we have a different
address space than what we end up guessing.
A flat array was consuming way too much memory and was far too slow to
initialize properly with a very large ID bound (8 million IDs, showed up as #1 hotspot in perf).
Meta struct does not have to be in-order as we never iterate over it in
a meaningful way, so using a hashmap here is reasonable. Very few IDs
should need decorations or meta-data, so this should also be a quite
decent memory save.
For the pathological case, a 6x uplift was observed.
This is a fairly fundamental change on how IDs are handled.
It serves many purposes:
- Improve performance. We only need to iterate over IDs which are
relevant at any one time.
- Makes sure we iterate through IDs in SPIR-V module declaration order
rather than ID space. IDs don't have to be monotonically increasing,
which was an assumption SPIRV-Cross used to have. It has apparently
never been a problem until now.
- Support LUTs of structs. We do this by interleaving declaration of
constants and struct types in SPIR-V module order.
To support this, the ParsedIR interface needed to change slightly.
Before setting any ID with variant_set<T> we let ParsedIR know
that an ID with a specific type has been added. The surface for change
should be minimal.
ParsedIR will maintain a per-type list of IDs which the cross-compiler
will need to consider for later.
Instead of looping over ir.ids[] (which can be extremely large), we loop
over types now, using:
ir.for_each_typed_id<SPIRVariable>([&](uint32_t id, SPIRVariable &var) {
handle_variable(var);
});
Now we make sure that we're never looking at irrelevant types.
This allows shaders to declare and use pointer-type variables. Pointers
may be loaded and stored, be the result of an `OpSelect`, be passed to
and returned from functions, and even be passed as inputs to the `OpPhi`
instruction. All types of pointers may be used as variable pointers.
Variable pointers to storage buffers and workgroup memory may even be
loaded from and stored to, as though they were ordinary variables. In
addition, this enables using an interior pointer to an array as though
it were an array pointer itself using the `OpPtrAccessChain`
instruction.
This is a rather large and involved change, mostly because this is
somewhat complicated with a lot of moving parts. It's a wonder
SPIRV-Cross's output is largely unchanged. Indeed, many of these changes
are to accomplish exactly that! Perhaps the largest source of changes
was the violation of the assumption that, when emitting types, the
pointer type didn't matter.
One of the test cases added by the change doesn't optimize very well;
the output of `spirv-opt` here is invalid SPIR-V. I need to file a bug
with SPIRV-Tools about this.
I wanted to test that variable pointers to images worked too, but I
couldn't figure out how to propagate the access qualifier properly--in
MSL, it's part of the type, so getting this right is important. I've
punted on that for now.
Just like OpAccessChain we need to make use of the meta information
available to use from access_chain_internal as we can extract a packed
vector or transposed vector from a composite, not just memory load.
A block name cannot alias with any name in its own scope,
and it cannot alias with any other "global" name.
To solve this, we need to complicate the name cache updates a little bit
where we have a "primary" namespace and "secondary" namespace.
This is required to avoid relying on complex sub-expression elimination
in compilers, and generates cleaner code.
The problem case is if a complex expression is used in an access chain,
like:
Composite comp = buffer[texture(...)];
vec4 a = comp.a + comp.b + comp.c;
Before, we did not have common subexpression tracking for
OpLoad/OpAccessChain, so we easily ended up with code like:
vec4 a = buffer[texture(...)].a + buffer[texture(...)].b + buffer[texture(...)].c;
A good compiler will optimize this, but we should not rely on it, and
forcing texture(...) to a temporary also looks better.
The solution is to add a vector "implied_expression_reads", which works
similarly to expression_dependencies. We also need an extra mechanism in
to_expression which lets us skip expression read checking and do it
later. E.g. for expr -> access chain -> load, we should only trigger
a read of expr when using the loaded expression.
Avoids certain cases of variance between translation units by forcing
every dependent expression of a store to be temporary.
Should avoid the major failure cases where invariance matters.
In GLSL, 8-bit types require GL_EXT_shader_8bit_storage. 16-bit types
can use either GL_AMD_gpu_shader_int16/GL_AMD_gpu_shader_half_float or
GL_EXT_shader_16bit_storage.
When trying to validate buffer sizes, we usually need to bail out when
using SpecConstantOps, but for some very specific cases where we allow
unsized arrays currently, we can safely allow "unknown" sized arrays as
well.
This is probably the best we can do, when we have even more difficult
cases than this, we throw a more sensible error message.
Previously, when generating non-Vulkan GLSL, each use of a spec constant
would be subsituted for its default value and the declaration of the constant
itself would be omitted completely.
This change slightly alters this behavior. The uses of the constant are kept,
as well as the declaration, although the latter is stripped of the layout
qualifier. The declaration is also prepended with the following code:
#ifndef <constant name>_value
#define <constant name> <default constant value>
#endif
and the constant itself now looks like
const <constant type> <constant name> = <constant name>_value;
The rationale for this change is that it gives the user a way to provide
custom values for specialization constants even when the target does not
support them.
Setting force_temporary to true produces invalid GLSL because sampler
variables are copied:
highp sampler2D _377 = DiffuseMapTexture;
This change fixes the problem by always forwarding forwardable
variables. I also took an opportunity to restructure the code to make
it easier to read and add extra conditions to in the future.
- Add new Windows support
- Use CMake/CTest instead of Make + shell scripts
- Use --parallel in CTest
- Fix CTest on Windows
- Cleanups in test_shaders.py
- Force specific commit for SPIRV-Headers
- Fix Inf/NaN odd-ball case by moving to ASM
This is a large refactor which splits out the SPIR-V parser from
Compiler and moves it into its more appropriately named Parser module.
The Parser is responsible for building a ParsedIR structure which is
then consumed by one or more compilers.
Compiler can take a ParsedIR by value or move reference. This should
allow for optimal case for both multiple compilations and single
compilation scenarios.
Even as of Metal 2.1, MSL still doesn't support arrays of buffers
directly. Therefore, we must manually expand them. In the prologue, we
define arrays holding the argument pointers; these arrays are what the
transpiled code ends up referencing. We might be able to do similar
things for textures and samplers prior to MSL 2.0.
Speaking of which, also enable texture arrays on iOS MSL 1.2.
Need some pretty hideous ladder variable system, but high level
languages do not support breaking out of a loop. break in switch blocks
and break in loops alias each other.
Implement this by flattening outputs and unflattening inputs explicitly.
This allows us to pass down a single struct instead of dealing with the
insanity that would be passing down each flattened member separately.
Remove stage_uniforms_var_id.
Seems to be dead code. Naked uniforms do not exist in SPIR-V for Vulkan,
which this seems to have been intended for. It was also unused elsewhere.
When the name of an alias global variable collides with a global
declaration, MSL would emit inconsistent names, sometimes with the
naming fix, sometimes without, because names were being tracked in two
separate meta blocks. Fix this by always redirecting parameter naming to
the original base variable as necessary.