This is not necessary, as we must emit an invalidating store before we
potentially consume an invalid expression. In fact, we're a bit
conservative here in this case for example:
int tmp = variable;
if (...)
{
variable = 10;
}
else
{
// Consuming tmp here is fine, but it was
// invalidated while emitting other branch.
// Technically, we need to study if there is an invalidating store
// in the CFG between the loading block and this block, and the other
// branch will not be a part of that analysis.
int tmp2 = tmp * tmp;
}
Fixing this case means complex CFG traversal *everywhere*, and it feels like overkill.
Fixing this exposed a bug with access chains, so fix a bug where expression dependencies were not
inherited properly in access chains. Access chains are now considered forwarded if there
is at least one dependency which is also forwarded.
This subtle bug removed any expression validation for trivially swizzled
variables. Make usage suppression a more explicit concept rather than
just hacking off forwarded_temporaries.
There is some fallout here with loop generation since our expression
invalidation is currently a bit too naive to handle loops properly.
The forwarding bug masked this problem until now.
If part of the loop condition is also used in the body, we end up
reading an invalid expression, which in turn forces a temporary to be
generated in the condition block, not good. We'll need to be smarter
here ...
If this is computed *before* a `demote`, but used *after*, forwarding it
will produce the wrong value. This does make for uglier shaders, but
it's necessary right now to ensure correctness.
I needed to use an assembly shader to produce the test for this.
`spirv-opt` is not smart enough (or too smart?) to eliminate the
variable that would be used in GLSL to express this.
This extension provides a new operation which causes a fragment to be
discarded without terminating the fragment shader invocation. The
invocation for the discarded fragment becomes a helper invocation, so
that derivatives will remain defined. The old `HelperInvocation` builtin
becomes undefined when this occurs, so a second new instruction queries
the current helper invocation status.
This is only fully supported for GLSL. HLSL doesn't support the
`IsHelperInvocation` operation and MSL doesn't support the
`DemoteToHelperInvocation` op.
Fixes#1052.
Fix fallout from changes.
There's a bug in glslang that prevents `float16_t`, `[u]int16_t`, and
`[u]int8_t` constants from adding the corresponding SPIR-V capabilities.
SPIRV-Tools, meanwhile, tightened validation so that these constants are
only valid if the corresponding `Float16`, `Int16`, and `Int8` caps are
on. This affects the `16bit-constants.frag` test for GLSL and MSL.
The only piece added by this extension is the `DeviceIndex` builtin,
which tells the shader which device in a grouped logical device it is
running on.
Metal's pipeline state objects are owned by the `MTLDevice` that created
them. Since Metal doesn't support logical grouping of devices the way
Vulkan does, we'll thus have to create a pipeline state for each device
in a grouped logical device. The upcoming peer group support in Metal 3
will not change this. For this reason, for Metal, the device index is
supplied as a constant at pipeline compile time.
There's an interaction between `VK_KHR_device_group` and
`VK_KHR_multiview` in the
`VK_PIPELINE_CREATE_VIEW_INDEX_FROM_DEVICE_INDEX_BIT`, which defines the
view index to be the same as the device index. The new
`view_index_from_device_index` MSL option supports this functionality.
Using the `PostDepthCoverage` mode specifies that the `gl_SampleMaskIn`
variable is to contain the computed coverage mask following the early
fragment tests, which this mode requires and implicitly enables.
Note that unlike Vulkan and OpenGL, Metal places this on the sample mask
input itself, and furthermore does *not* implicitly enable early
fragment testing. If it isn't enabled explicitly with an
`[[early_fragment_tests]]` attribute, the compiler will error out. So we
have to enable that mode explicitly if `PostDepthCoverage` is enabled
but `EarlyFragmentTests` isn't.
For Metal, only iOS supports this; for some reason, Apple has yet to
implement it on macOS, even though many desktop cards support it.
There is a case where we can deduce a for/while loop, but the continue
block is actually very painful to deal with, so handle that case as
well. Removes an exceptional case.
There is a risk that we try to preserve a loop variable through multiple
iterations, even though the dominating block is inside a loop.
Fix this by analyzing if a block starts off by writing to a variable. In
that case, there cannot be any preservation going on. If we don't, pretend the
loop header is reading the variable, which moves the variable to an
appropriate scope.
Buffer objects can contain arbitrary pointers to blocks.
We can also implement ConvertPtrToU and ConvertUToPtr.
The latter can cast a uint64_t to any type as it pleases,
so we will need to generate fake buffer reference blocks to be able to
cast the type.
We had a bug where error conditions in DoWhileLoop emit path would not
detect that statements were being emitted due to the masking behavior
which happens when force_recompile is true. Fix this.
Also, refactor force_recompile into member functions so we can properly
break on any situation where this is set, without having to rely on
watchpoints in debuggers.
We have an edge case where the array is declared with a concrete size,
but in GLSL we must emit an unsized array, which breaks array copies.
Deal explicitly with this.
When we force recompile, the old var.self name we used as a fallback
name might have been disturbed, so we should recover certain names back
to their original form in case we are forced to take a recompile to make
the naming algorithm more deterministic.
Storage was in place already, so mostly just dealing with bitcasts and
constants.
Simplies some of the bitcasting logic, and this exposed some bugs in the
implementation. Refactor to use correct width integers with explicit bitcast opcodes.
This is a fairly fundamental change on how IDs are handled.
It serves many purposes:
- Improve performance. We only need to iterate over IDs which are
relevant at any one time.
- Makes sure we iterate through IDs in SPIR-V module declaration order
rather than ID space. IDs don't have to be monotonically increasing,
which was an assumption SPIRV-Cross used to have. It has apparently
never been a problem until now.
- Support LUTs of structs. We do this by interleaving declaration of
constants and struct types in SPIR-V module order.
To support this, the ParsedIR interface needed to change slightly.
Before setting any ID with variant_set<T> we let ParsedIR know
that an ID with a specific type has been added. The surface for change
should be minimal.
ParsedIR will maintain a per-type list of IDs which the cross-compiler
will need to consider for later.
Instead of looping over ir.ids[] (which can be extremely large), we loop
over types now, using:
ir.for_each_typed_id<SPIRVariable>([&](uint32_t id, SPIRVariable &var) {
handle_variable(var);
});
Now we make sure that we're never looking at irrelevant types.
A block name cannot alias with any name in its own scope,
and it cannot alias with any other "global" name.
To solve this, we need to complicate the name cache updates a little bit
where we have a "primary" namespace and "secondary" namespace.
This is required to avoid relying on complex sub-expression elimination
in compilers, and generates cleaner code.
The problem case is if a complex expression is used in an access chain,
like:
Composite comp = buffer[texture(...)];
vec4 a = comp.a + comp.b + comp.c;
Before, we did not have common subexpression tracking for
OpLoad/OpAccessChain, so we easily ended up with code like:
vec4 a = buffer[texture(...)].a + buffer[texture(...)].b + buffer[texture(...)].c;
A good compiler will optimize this, but we should not rely on it, and
forcing texture(...) to a temporary also looks better.
The solution is to add a vector "implied_expression_reads", which works
similarly to expression_dependencies. We also need an extra mechanism in
to_expression which lets us skip expression read checking and do it
later. E.g. for expr -> access chain -> load, we should only trigger
a read of expr when using the loaded expression.
Avoids certain cases of variance between translation units by forcing
every dependent expression of a store to be temporary.
Should avoid the major failure cases where invariance matters.
In GLSL, 8-bit types require GL_EXT_shader_8bit_storage. 16-bit types
can use either GL_AMD_gpu_shader_int16/GL_AMD_gpu_shader_half_float or
GL_EXT_shader_16bit_storage.
When trying to validate buffer sizes, we usually need to bail out when
using SpecConstantOps, but for some very specific cases where we allow
unsized arrays currently, we can safely allow "unknown" sized arrays as
well.
This is probably the best we can do, when we have even more difficult
cases than this, we throw a more sensible error message.
Previously, when generating non-Vulkan GLSL, each use of a spec constant
would be subsituted for its default value and the declaration of the constant
itself would be omitted completely.
This change slightly alters this behavior. The uses of the constant are kept,
as well as the declaration, although the latter is stripped of the layout
qualifier. The declaration is also prepended with the following code:
#ifndef <constant name>_value
#define <constant name> <default constant value>
#endif
and the constant itself now looks like
const <constant type> <constant name> = <constant name>_value;
The rationale for this change is that it gives the user a way to provide
custom values for specialization constants even when the target does not
support them.
- Add new Windows support
- Use CMake/CTest instead of Make + shell scripts
- Use --parallel in CTest
- Fix CTest on Windows
- Cleanups in test_shaders.py
- Force specific commit for SPIRV-Headers
- Fix Inf/NaN odd-ball case by moving to ASM
A lot of changes in spirv-opt output.
Some new invalid SPIR-V was found but most of them were not significant
for SPIRV-Cross, so just marked them as invalid.
Need some pretty hideous ladder variable system, but high level
languages do not support breaking out of a loop. break in switch blocks
and break in loops alias each other.
When the name of an alias global variable collides with a global
declaration, MSL would emit inconsistent names, sometimes with the
naming fix, sometimes without, because names were being tracked in two
separate meta blocks. Fix this by always redirecting parameter naming to
the original base variable as necessary.
MSL would force thread const& which would not work if the input argument
came from a different storage class.
Emit proper non-reference arguments for such values.
- The HLSL compiler now has its own list of keywords in addition to
the ones from GLSL.
- Added "buffer", "precise", and "shared" to the GLSL keywords.
Deal with various query functions which require dummy sampler.
In SPIR-V, separate images are used, but GLSL (even Vulkan GLSL)
requires combined sampler images ...
Update SPIRV-Tools/glslang commits.
Use vulkan1.1 environment for testing.
Found new "errors" in SPIRV-Tools, so disable validation on those shaders
for now.
Certain patterns with OpVectorShuffle (and probably others) will cascade
to so large, that they can cause OOM. After we have observed
force_recompile, don't spend unnecessary memory emitting code which will
never be used.
Normally, temporary declaration must dominate any use of it,
so we generally did not need to analyze the CFG for these variables,
but there is an edge case where you have an inliner doing:
do {
create_temporary;
break;
} while(0);
use_temporary;
The inside of the loop dominates the outer scope, but we cannot emit
code like this in GLSL, so make sure we hoist these temporaries outside
the "loop".