When doing reflection users care about the names of the variable, the
name of the type, and the name of the members. Remove duplicates breaks
this because it removes the names one of the types when merging.
To fix this we have to keep the different types around for each
resource. This commit adds code to remove duplicates to look for the
types uses to describe resources, and make sure they do not get merged.
However, allow merging of a type used in a resource with something not
used in a resource. Was done when the non resource type came second.
This could have a negative effect on compile time, but it was not
expected to be much.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1372.
Fixes#937
Stop std140/430 validation when runtime array is encountered.
Check for standard uniform/storage buffer layout instead of std140/430.
Added validator command line switch to skip block layout checking.
Validate structs decorated as Block/BufferBlock only when they
are used as variable with storage class of uniform or push
constant.
Expose --relax-block-layout to command line.
dneto0 modification:
- Use integer arithmetic instead of floor.
Add SPV_ENV_WEBGPU_0 for work-in-progress WebGPU.
val: Disallow OpUndef in WebGPU env
Silence unused variable warnings when !defined(SPIRV_EFFCE)
Limit visibility of validate_instruction.cpp's symbols
Only InstructionPass needs to be visible so all other functions are put
in an anonymous namespace inside the libspirv namespace.
Also add a corresponding check for capabilities in the validator.
Update previously existing test cases where an instruction used to fail
assembling because of a version check, but now they succeed because the
instruction is also guarded by a capability. Now it should assemble.
Add tests to ensure that capabilities are checked appropriately.
The explicitly reserved instructions OpImageSparseSampleProj*
now assemble, but they fail validation.
Fixes#1624
[val] Add extra context to error messages.
This CL extends the error messages produced by the validator to output the
disassembly of the errored line.
The validation_id messages have also been updated to print the line number of
the error instead of the word number. Note, the error number is from the start
of the SPIR-V, it does not include any headers printed in the disassembled code.
Fixes#670, #1581
This CL reverts the revert of 'Disallow array-of-arrays with DescriptorSets when
validating." Other changes have been committed which should aleviate the
AppVeryor resource constraints.
This reverts commit f2c93c6e12.
This CL adds validation to disallow using an array-of-arrays when attached to a
DescriptorSet.
Fixes#1522
Validate Ids before DataRules.
The DataRule validators call FindDefs with the assumption that they
definitions being looked at can be found. This may not be true if we
have not validated identifiers first.
This CL flips the IdPass and DataRulesPass to fix this issue.
Fixes#491
* Basic blocks now have a link to the terminator
* Check all case sepecific rules
* Missing check for branching into the middle of a case (#1618)
Fixes#1120
Checks that all static uses of the Input and Output variables are listed
as interfaces in each corresponding entry point declaration.
* Changed validation state to track interface lists
* updated many tests
* Modified validation state to store entry point names
* Combined with interface list and called EntryPointDescription
* Updated uses
* Changed interface validation error messages to output entry point name
in addtion to ID
We replace the std::vector in the Operand class by a new class that does
a small size optimization. This helps improve compile time on Windows.
Tested on three sets of shaders. Trying various values for the small
vector. The optimal value for the operand class was 2. However, for
the Instruction class, using an std::vector was optimal. Size of "0"
means that an std::vector was used.
Instruction size
0 4 8
Operand Size
0 489 544 684
1 593 487
2 469 570
4 473
8 505
This is a single thread run of ~120 shaders. For the multithreaded run
the results were the similar. The basline time was ~62sec. The
optimal configuration was an 2 for the OperandData and an
std::vector for the OperandList with a compile time of ~38sec. Similar
expiriments were done with other sets of shaders. The compile time still
improved, but not as much.
Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1609.
- Fix tests for basic group operations (e.g. Reduce) to allow for
new capabilities in SPIR-V 1.3 that enable them.
- Refactor operand capability check to avoid code duplication and
to put all checks that don't need table lookup before any table
lookup.
- Test round trip assembly/disassembly support for extension
SPV_NV_viewport_array2
- Test assembly and validation of decoration ViewportRelativeNV
Fixes#1596
Fixes#1281
* New structured cfg check: all non-construct header blocks'
predecessors must come from within the construct
* New function to calculate blocks in a construct
* Fixed a bug in BasicBlock type bitset
Relaxing check to not consider unreachable predecessors
* Fixing broken common uniform elim test
This CL updates the validate_id code to output the name of the object along with
the id number. There were a few instances which already output the name, this
just extends to all of them. Now, the output should say 123[obj] instead of just
123.
Issue #1581
* Disallow array-of-arrays with DescriptorSets when validating.
This CL adds validation to disallow using an array-of-arrays when attached to a
DescriptorSet.
Fixes#1522
The following passes are updated to preserve the inst-to-block and
def-use analysies:
private-to-local
aggressive dead-code elimination
dead branch elimination
local-single-block elimination
local-single-store elimination
reduce load size
compact ids (inst-to-block only)
merge block
dead-insert elimination
ccp
The one execption is that compact ids still kills the def-use manager.
This is because it changes so many ids it is faster to kill and rebuild.
Does everything in
https://github.com/KhronosGroup/SPIRV-Tools/issues/1593 except for the
changes to merge return.
By using forward pointers, we are able to define a struct that has a
pointer to itself. This could be directly or indirectly. The current
implementation of the type manager did not handle this case. There are
three changes that are made in this commit inorder to handle this case:
1) Change the handling of OpTypeForwardPointer
The current handling of OpTypeForwardsPointer is broken if there is a
reference to the pointer before the real definition. When build the
type that contain the forward delared pointer, the type manager will ask
for the type for that ID, and will get a nullptr because it does not
exists. This nullptr is not handleded very well.
The change is to keep track of the incomplete types the first time
through all of the types. An incomplete type is a ForwardPointer or any
type that references an incomplete type.
Then we implement a second pass through the incomplete types that will
complete them.
2) Hashing types.
When hashing a type, we want to uses all of the subtypes as part of the
hash. However, with types that reference them selves, this creates an
infinite recursion. To get around this, we keep track of which types
have been seen on the path from the root type. If we have see the
current type already then we can stop the recursion.
3) Comparing types.
In order to check if two types are the same, we must check that all of
their subtypes are the same as well. This also causes an infinit
recursion. The solution is to stop comparing the subtypes if we are
trying to compare two pointer types that we are already in the middle of
comparing. The ideas is that if the two pointer are different, then in
progress compare will return false itself.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1578.
We add a new rule to the folding rules to fold an FMix feeding an
extract when the alpha value for the element being extracted is either
0 or 1. In those case, we can simple extract from one of the operands
to the FMix.
With that change the simplification pass completely subsumes the
insert-extract elimination pass. So we remove the insert-extract
elimination passes and replce them with calls to the simplification
pass.
In a follow up PR, we should delete the insert-extract elimination pass.
Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.
Currently it's impossible for external code to register a pass because
the only source file that can create pass tokens is optimizer.cpp. This
makes it hard to add passes that can't be upstreamed since you can't run
them from the usual pass sequence without reimplementing Optimizer.
This change adds a PassToken constructor that takes unique_ptr to
opt::Pass; if out-of-tree code implements opt::Pass it can register a
custom pass without having to add it to SPIRV-Tools source code.
According to the SPIR-V Spec, section 2.4 Logical Layout of a Module there
should be a single required OpMemoryModel instruction provided. This CL adds
validation that OpMemoryModel is provided to the SPIR-V validator.
Fixes#1207
Removes the limit on scalar replacement for the lagalization passes.
This is done by adding an option to the pass (and command line option)
to set the limit on maximum size of the composite that scalar
replacement is willing to divide.
Fixes#1494.
ADCE does not treat OpCopyMemory as an instruction that references
memory. Because of that stores are removed that should not be.
This change teaches ADCE that OpCopyMemory and OpCopyMemorySize both
loads from and stores to memory. This will keep other stores live when
needed, and will allows ADCE to remove OpCopyMemory instructions as
well.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1556.
SPV_EXT_shader_viewport_index_layer enables using ViewportIndex
and Layer in vertex and tessellation shaders.
Also, as per the Vulkan spec:
> The ViewportIndex decoration must be used only within vertex,
> tessellation evaluation, geometry, and fragment shaders.
> In a vertex, tessellation evaluation, or geometry shader, any
> variable decorated with ViewportIndex must be declared using
> the Output storage class.
> In a fragment shader, any variable decorated with ViewportIndex
> must be declared using the Input storage class.
Similarly for Layer.
Currently in scalar replacement, we create a new variable for every
memeber of the composite being divided. It is often overkill, because
not all of those members will be used. This change will check which
elements are used and only create variable for the members that are
used.
This reduces the compile time for one set of shader from 248s to 165s.
Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
The code patterns generated by DXC around function calls can cause many
store to be storing the same value that was just loaded from the same
location:
```
%10 = OpLoad %type %var
OpStore %var %10
```
We want to clean these up very early on because they can cause other
transformations to do a lot of work. For the cases I see, they can be
removed during local-single-block-elim.
For one set of shaders the compile time goes from 248s to 182s. A 26%
improvement.
Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
We have already disabled common uniform elimination because it created
sequences of loads an entire uniform object, then we extract just a
single element. This caused problems in some drivers, and is just
generally slow because it loads more memory than needed.
However, there are other way to get into this situation, so I've added
a pass that looks specifically for this pattern and removes it when only
a portion of the load is used.
Fixes#1547.
An FClamp instruction forces a values to be within a certain interval.
When the upper or lower bound of the FClamp is a constant and the value
being compared with is a constant, then in some case we can fold the
compared because the entire range is say less than the value.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1549.
If there is a shader with a variable in the workgroup storage class that
is stored to, but not loadeds, then we know nothing will read those
loads. It should be safe to remove them.
This is implemented in ADCE by treating workgroup variables the same
way that private variables are treated.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1550.
At this time, DCE will only remove an instruction if it is a combinator.
However, there are certain non-combinator instructions that can be
safely removed if their results are not used. The derivative
instructions are on example.
We are also missing some instructions from the list of combinators
those are added as the same time.
When doing if-conversion, we do not currently move code out of the side
nodes. The reason for this is that it can increase the number of
instructions that get executed because both side nods will have to be
executed now.
In this commit, we add code to move an instruction, and all of the
instructions it depends on, out of a side node and into the header of
the selection construct. However to keep the cost down, we only do it
when the two values in the OpPhi node compute the same value. This way
we have to move only one of the instructions and the other becomes
unused most of the time. So no real extra cost.
Makes the value number table an alalysis in the ir context.
Added more opcodes to list of code motion safe opcodes.
Fixes#1526.
Previously, the loop class used the terms latch and continue block
interchangeably. This patch splits the two and corrects and tests some
uses of the old uses of GetLatchBlock.
This pass will look for adjacent loops that are compatible and legal to
be fused.
Loops are compatible if:
- they both have one induction variable
- they have the same upper and lower bounds
- same initial value
- same condition
- they have the same update step
- they are adjacent
- there are no break/continue in either of them
Fusion is legal if:
- fused loops do not have any dependencies with dependence distance
greater than 0 that did not exist in the original loops.
- there are no function calls in the loops (could have side-effects)
- there are no barriers in the loops
It will fuse all such loops as long as the number of registers used for
the fused loop stays under the threshold defined by
max_registers_per_loop.
Adds support for spliting loops whose register pressure exceeds a user
provided level. This pass will split a loop into two or more loops given
that the loop is a top level loop and that spliting the loop is legal.
Control flow is left intact for dead code elimination to remove.
This pass is enabled with the --loop-fission flag to spirv-opt.
Track live scalars in VDCE as if they were single element vectors.
Handle the extended instructions for GLSL in VDCE.
Handle composite construct instructions in VDCE.
If one of the operands to an OpVectorTimesScalar instruction is zero,
then the result will be the 0 vector. Currently we do not fold the
insturction unless both operands are constants. This change fixes that.
We also allow folding of OpPhi instructions where the incoming values
are either an OpUndef or the OpPhi instruction itself. As with other
cases, this can be simplified to the OpUndef.
Track live scalars in VDCE as if they were single element vectors.
Handle the extended instructions for GLSL in VDCE.
Handle composite construct instructions in VDCE.
Fixes#1511.
Eliminate unused store to variable if followed by store to same
variable in same block.
Most significantly, this cleans up stores made unused by this pass.
These useless stores can inhibit subsequent optimizations, specifically
LocalSingleStoreElim. Eliminating them makes subsequent optimization more
effective.
The main effect of this pass is to simplify the work done by the SSA
rewriter. It catches many local loads/stores that help speeding up the
work done by the main rewriter.
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.
It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
For each function, the analysis determine which SSA registers are live
at the beginning of each basic block and which one are killed at
the end of the basic block.
It also includes utilities to simulate the register pressure for loop
fusion and fission.
The implementation is based on the paper "A non-iterative data-flow
algorithm for computing liveness sets in strict ssa programs" from
Boissinot et al.
* Adds new pass for validating non-uniform group instructions
* Currently on checks execution scope for Vulkan 1.1 and SPIR-V 1.3
* Added test framework
The local-single-store-elim algorithm is not fundamentally bad.
However, when there are a large number of variables, some of the
maps that are used can become very large. These large data structures
then take a very long time to be destroyed. I've seen cases around 40%
if the time.
I've rewritten that algorithm to not use as much memory. This give a
significant improvement when running a large number of shader through
DXC.
I've also made a small change to local-single-block-elim to delete the
loads that is has replaced. That way local-single-store-elim will not
have to look at those. local-single-store-elim now does the same thing.
The time for one set goes from 309s down to 126s. For another set, the
time goes from 102s down to 88s.
GCD MIV test as described in Chapter 3 of "Optimizing Compilers for
Modern Architectures: A Dependence-Based Approach" by Randy Allen, and
Ken Kennedy.
Delta test as described in Figure 3 of "Practical Dependence Testing" by
Gina Goff, Ken Kennedy, and Chau-Wen Tseng from PLDI '91.
* Reworked how execution model limitations are checked
* Now OpFunction checks which entry points call it and checks its
registered limitations instead of building a call stack in the entry
point
* New tests
* Moving function to entry point mapping into VState
Relaxs checks for per-vertex builtin variables. If the builtin
decoration is applied to a variable, then those checks now allow a level
of arraying on the variable before checking the type consistency.
* Allows arrays of variables to be present for the per-vertex variables:
* Position
* PointSize
* ClipDistance
* CullDistance
* Updated tests
Add test for case where OpBranch branches to a value (a function value).
Previous tests only checked a label value (name of a block.).
Update validate_id.cpp to remove the TODO for OpBranch and say that it
is already checked in validate_cfg.cpp
The unordered_set in ADCE that holds all of the live instructions takes
a very long time to be destroyed. In some shaders, it takes over 40% of
the time.
If we look at the unique ids of the live instructions, I believe they
are dense enough make a simple bit vector a good choice for to hold that
data. When I check the density of the bit vector for larger shaders, we
are usually using less than 4 bytes per element in the vector, and
almost always less than 16.
So, in this commit, I introduce a simple bit vector class, and
use it in ADCE.
This help improve the compile time for some shaders on windows by the
40% mentioned above.
Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
For each loop in a function, the pass walks the loops from inner to outer most loop
and tries to peel loop for which a certain amount of iteration can be done before or after the loop.
To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.
Provides functionality to perform ZIV and SIV dependency analysis tests
between a load and store within the same loop.
Dependency tests rely on scalar analysis to prove and disprove dependencies
with regard to the loop being analysed.
Based on the 1990 paper Practical Dependence Testing by Goff, Kennedy, Tseng
Adds support for marking loops in the loop nest as IRRELEVANT.
Loops are marked IRRELEVANT if the analysed instructions contain
no induction variables for the loops, i.e. the loops induction
variable is not relevent to the dependence of the store and load.
Adding three rules to fold OpDot (implemented as two).
- When an OpDot has two constants, then fold to the resulting const.
- When one of the inputs is the 0 vector, then fold to zero.
- When one of the inputs is a single 1 with 0s, then rewrite to an
OpCompositeExtract of the appropriate element. This will help find
even more folding opportunities.
Contributes to #709.
According to Vulkan spec 1.1.72:
> The PrimitiveId decoration must be used only within fragment,
> tessellation control, tessellation evaluation, and geometry shaders.
> In a tessellation control or tessellation evaluation shader, any
> variable decorated with PrimitiveId must be declared using the Input
> storage class.
We were enforcing that PrimitiveId can only be used with Output
storage class for TCS and TES before.
From the test case, the slice of the CFG that is interesting for the bug
is
25
|
v
30
|
v
31<-+
| |
v |
34--+
1. In block 25, we have a Phi candidate for %f with arguments
%47 = Phi[%float_0, %0]. This merges %float_0 and a yet unknown
argument from the external loop backedge.
2. We are now processing block 34:
i. The load %35 = OpLoad %f triggers a Phi candidate to be placed in
block 31.
ii. The Phi candidate %50 = Phi needs two arguments. The one coming
from block 30 is %47. But the one coming from block 34 (which we
are now processing and have marked sealed), finds %50 itself as
the reaching def for %f.
3. This wrongfully marks %50 as a copy-of Phi, which ultimately makes
both %47 and %50 copy-of Phis that get eliminated.
Migrating to unified grammar means we sometimes have two fields
for a certain feature: version and extensions. It means the feature
in question can be used either in SPIR-V of advanced-enough
versions or in any SPIR-V with with the specified extensions.
Validator now respects the above rules.
At every definition of a builtin id, run at-reference-check rules on the
defining instruction as well.
Previosly the validation was missing the case when invalid storage class
was defined in the instruction which defines the built-in, and not in
the instruction which references the built-in.
Refactored validate built-ins to make
GetExecutionModels(entry_point)
and
GetExecutionModes(entry_point)
available in validation state.
Entry points are allowed to have multiple execution modes and execution
models.
Finished the last missing feature in Vulkan built-ins validation:
FragDepth requires DepthReplacing.
Currently OpImageTexelPointer operations are treat like a use of the
pointer, but it does
not look for the memory being referenced to make sure stores are not
removed.
This change teaches it so identify the memory being accessed, and
treats it as if that memory is loaded.
Fixes to #1445.
OpImageTexelPointer acts like a special kind of load. It is not an
array load, but it also cannot be removed the same way a regular
load can. The type of propagation that needs to be done is similar
to what we do for arrays, so I want to merge that code into that
optmization.
Contributers to #1445.
OpImageTexelPointer acts like a special kind of load. It is still
safe to change the storage class of a variable used in a
OpImageTexalPointer instruction.
Contributes to #1445.
CPPreference.com has this description of digits10:
“The value of std::numeric_limits<T>::digits10 is the number of
base-10 digits that can be represented by the type T without change,
that is, any number with this many significant decimal digits can be
converted to a value of type T and back to decimal form, without
change due to rounding or overflow.”
This means that any number with this many digits can be represented
accurately in the corresponding type. A change in any digit in a
number after that may or may not cause it a different bitwise
representation. Therefore this isn’t necessarily enough precision to
accurately represent the value in text. Instead we need max_digits10
which has the following description:
“The value of std::numeric_limits<T>::max_digits10 is the number of
base-10 digits that are necessary to uniquely represent all distinct
values of the type T, such as necessary for
serialization/deserialization to text.”
The patch includes a test case in hex_float_test which tries to do a
round-robin conversion of a number that requires more than 6 decimal
places to be accurately represented. This would fail without the
patch.
Sadly this also breaks a bunch of other tests. Some of the tests in
hex_float_test use ldexp and then compare it with a value which is not
the same as the one returned by ldexp but instead is the value rounded
to 6 decimals. Others use values that are not evenly representable as
a binary floating fraction but then happened to generate the same
value when rounded to 6 decimals. Where the actual value didn’t seem
to matter these have been changed with different values that can be
represented as a binary fraction.
When building C code with gcc and the
-Wstrict-prototypes option, function declarations
and definitions that don't specify their argument
types generate warnings. Functions that don't
take parameters need to specify (void) as their
parameter list, rather than leaving it empty.
Note this only applies to C, so only the functions
exported in C-compatible headers need fixing. In
C++ functions can't be declared/defined without a
parameter list, so C++ can safely allow an empty
parameter list to imply (void).
Previously we use symbols in spv_target_env as the minimum version
requirements for features. That makes version check implicitly
relies on the order of entries in the spv_target_env enum, which
also contains client APIs. Instead, we should use the standard
scheme for constructing SPIR-V version; and by doing that we can
also map client API entries to universial SPIR-V versions.
When the original code copies an entire array or struct one element at a
time, this turns into a series of OpCompositeInsert instruction followed
by a store of the whole array. We currently miss opportunities in copy
propagate arrays because we do not recognize this as a copy.
This commit adds code to copy propagate arrays to identify this code
pattern.
Also updates the performance passed to run array copy propagation.
The first implementation of MemroyObject, which is used in copy
propagate arrays, forced the access chain to be like the access chains
in OpCompositeExtract. This excluded the possibility of the memory
object from representing an array element that was extracted with a
variable index. Looking at the code, that restriction is not
neccessary. I also see some opportunities for doing this in some real
shaders.
Contributes to #1430.