Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.
It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
For each loop in a function, the pass walks the loops from inner to outer most loop
and tries to peel loop for which a certain amount of iteration can be done before or after the loop.
To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.
The sprir-v generated from HLSL code contain many copyies of very large
arrays. Not only are these time consumming, but they also cause
problems for drivers because they require too much space.
To work around this, we will implement an array copy propagation. Note
that we will not implement a complete array data flow analysis in order
to implement this. We will be looking for very simple cases:
1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.
The hard part is keeping all of the types correct. We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.
Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent. With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values. See |Struct::GetExtraHashWords|.
Contributes to #1416.
This pass replaces the load/store elimination passes. It implements the
SSA re-writing algorithm proposed in
Simple and Efficient Construction of Static Single Assignment Form.
Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
In: Jhala R., De Bosschere K. (eds)
Compiler Construction. CC 2013.
Lecture Notes in Computer Science, vol 7791.
Springer, Berlin, Heidelberg
https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6
In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.
When a target variable is loaded, it queries the variable's reaching
definition. If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.
The algorithm avoids repeated lookups using memoization.
For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA. That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
Strips reflection info. This is limited to decorations and
decoration instructions related to the SPV_GOOGLE_hlsl_functionality1
extension.
It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1.
It will also remove the OpExtension for SPV_GOOGLE_decorate_string
if there are no further remaining uses of OpDecorateStringGOOGLE.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398
It moves all conditional branching and switch whose conditions are loop
invariant and uniform. Before performing the loop unswitch we check that
the loop does not contain any instruction that would prevent it
(barriers, group instructions etc.).
This patch adds initial support for loop unrolling in the form of a
series of utility classes which perform the unrolling. The pass can
be run with the command spirv-opt --loop-unroll. This will unroll
loops within the module which have the unroll hint set. The unroller
imposes a number of requirements on the loops it can unroll. These are
documented in the comments for the LoopUtils::CanPerformUnroll method in
loop_utils.h. Some of the restrictions will be lifted in future patches.
Creates a pass that will remove instructions that are invalid for the
current shader stage. For the instruction to be considered for replacement
1) The opcode must be valid for a shader modules.
2) The opcode must be invalid for the current shader stage.
3) All entry points to the module must be for the same shader stage.
4) The function containing the instruction must be reachable from an entry point.
Fixes#1247.
* Handles simple cases only
* Identifies phis in blocks with two predecessors and attempts to
convert the phi to an select
* does not perform code motion currently so the converted values must
dominate the join point (e.g. can't be defined in the branches)
* limited for now to two predecessors, but can be extended to handle
more cases
* Adding if conversion to -O and -Os
We have come across a driver bug where and OpUnreachable inside a loop
is causing the shader to go into an infinite loop. This commit will try
to avoid this bug by turning OpUnreachable instructions that are
contained in a loop into branches to the loop merge block.
This is not added to "-O" and "-Os" because it should only be used if
the driver being targeted has this problem.
Fixes#1209.
This implements the conditional constant propagation pass proposed in
Constant propagation with conditional branches,
Wegman and Zadeck, ACM TOPLAS 13(2):181-210.
The main logic resides in CCPPass::VisitInstruction. Instruction that
may produce a constant value are evaluated with the constant folder. If
they produce a new constant, the instruction is considered interesting.
Otherwise, it's considered varying (for unfoldable instructions) or
just not interesting (when not enough operands have a constant value).
The other main piece of logic is in CCPPass::VisitBranch. This
evaluates the selector of the branch. When it's found to be a known
value, it computes the destination basic block and sets it. This tells
the propagator which branches to follow.
The patch required extensions to the constant manager as well. Instead
of hashing the Constant pointers, this patch changes the constant pool
to hash the contents of the Constant. This allows the lookups to be
done using the actual values of the Constant, preventing duplicate
definitions.
When a private variable is used in a single function, it can be
converted to a function scope variable in that function. This adds a
pass that does that. The pass can be enabled using the option
`--private-to-local`.
This transformation allows other transformations to act on these
variables.
Also moved `FindPointerToType` from the inline class to the type manager.
types. This allows the lookup of type declaration ids from arbitrarily
constructed types. Users should be cautious when dealing with non-unique
types (structs and potentially pointers) to get the exact id if
necessary.
* Changed the spec composite constant folder to handle ambiguous composites
* Added functionality to create necessary instructions for a type
* Added ability to remove ids from the type manager
Adds a scalar replacement pass. The pass considers all function scope
variables of composite type. If there are accesses to individual
elements (and it is legal) the pass replaces the variable with a
variable for each composite element and updates all the uses.
Added the pass to -O
Added NumUses and NumUsers to DefUseManager
Added some helper methods for the inst to block mapping in context
Added some helper methods for specific constant types
No longer generate duplicate pointer types.
* Now searches for an existing pointer of the appropriate type instead
of failing validation
* Fixed spec constant extracts
* Addressed changes for review
* Changed RunSinglePassAndMatch to be able to run validation
* current users do not enable it
Added handling of acceptable decorations.
* Decorations are also transfered where appropriate
Refactored extension checking into FeatureManager
* Context now owns a feature manager
* consciously NOT an analysis
* added some test
* fixed some minor issues related to decorates
* added some decorate related tests for scalar replacement
Adds a pass that looks for redundant instruction in a function, and
removes them. The algorithm is a hash table based value numbering
algorithm that traverses the dominator tree.
This pass removes completely redundant instructions, not partially
redundant ones.
Re-formatted the source tree with the command:
$ /usr/bin/clang-format -style=file -i \
$(find include source tools test utils -name '*.cpp' -or -name '*.h')
This required a fix to source/val/decoration.h. It was not including
spirv.h, which broke builds when the #include headers were re-ordered by
clang-format.
Creates a pass that removes redundant instructions within the same basic
block. This will be implemented using a hash based value numbering
algorithm.
Added a number of functions that check for the Vulkan descriptor types.
These are used to determine if we are variables are read-only or not.
Implemented a function to check if loads and variables are read-only.
Implemented kernel specific and shader specific versions.
A big change is that the Combinator analysis in ADCE is factored out
into the IRContext as an analysis. This was done because it is being
reused in the value number table.
Works with current DefUseManager infrastructure.
Added merge return to the standard opts.
Added validation to passes.
Disabled pass for shader capabilty.
NFC. This just makes sure every file is formatted following the
formatting definition in .clang-format.
Re-formatted with:
$ clang-format -i $(find source tools include -name '*.cpp')
$ clang-format -i $(find source tools include -name '*.h')
There does not seem to be any pass that remove global variables. I
think we could use one. This pass will look specifically for global
variables that are not referenced and are not exported. Any decoration
associated with the variable will also be removed. However, this could
cause types or constants to become unreferenced. They will not be
removed. Another pass will have to be called to remove those.
- Adds a new pass CFGCleanupPass. This serves as an umbrella pass to
remove unnecessary cruft from a CFG.
- Currently, the only cleanup operation done is the removal of
unreachable basic blocks.
- Adds unit tests.
- Adds a flag to spirvopt to execute the pass (--cfg-cleanup).
Creates a pass called eliminate dead functions that looks for functions
that could never be called, and deletes them from the module.
To support this change a new function was added to the Pass class to
traverse the call trees from diffent starting points.
Includes a test to ensure that annotations are removed when deleting a
dead function. They were not, so fixed that up as well.
Did some cleanup of the assembly for the test in pass_test.cpp. Trying
to make them smaller and easier to read.
Create a new optimization pass, strength reduction, which will replace
integer multiplication by a constant power of 2 with an equivalent bit
shift. More changes could be added later.
- Does not duplicate constants
- Adds vector |Concat| utility function to a common test header.
Only inline calls to functions with opaque params or return
TODO: Handle parameter type or return type where the opqaue
type is buried within an array.
- UniformElim: Only process reachable blocks
- UniformElim: Don't reuse loads of samplers and images across blocks.
Added a second phase which only reuses loads within a block for samplers
and images.
- UniformElim: Upgrade CopyObject skipping in GetPtr
- UniformElim: Add extensions whitelist
Currently disallowing SPV_KHR_variable_pointers because it doesn't
handle extended pointer forms.
- UniformElim: Do not process shaders with GroupDecorate
- UniformElim: Bail on shaders with non-32-bit ints.
- UniformElim: Document support for only single index and add TODO.
Create aggressive dead code elimination pass
This pass eliminates unused code from functions. In addition,
it detects and eliminates code which may have spurious uses but which do
not contribute to the output of the function. The most common cause of
such code sequences is summations in loops whose result is no longer used
due to dead code elimination. This optimization has additional compile
time cost over standard dead code elimination.
This pass only processes entry point functions. It also only processes
shaders with logical addressing. It currently will not process functions
with function calls. It currently only supports the GLSL.std.450 extended
instruction set. It currently does not support any extensions.
This pass will be made more effective by first running passes that remove
dead control flow and inlines function calls.
This pass can be especially useful after running Local Access Chain
Conversion, which tends to cause cycles of dead code to be left after
Store/Load elimination passes are completed. These cycles cannot be
eliminated with standard dead code elimination.
Additionally: This transform uses a whitelist of instructions that it
knows do have side effects, (a.k.a. combinators). It assumes other
instructions have side effects: it will not remove them, and assumes
they have side effects via their ID operands.
A SSA local variable load/store elimination pass.
For every entry point function, eliminate all loads and stores of function
scope variables only referenced with non-access-chain loads and stores.
Eliminate the variables as well.
The presence of access chain references and function calls can inhibit
the above optimization.
Only shader modules with logical addressing are currently processed.
Currently modules with any extensions enabled are not processed. This
is left for future work.
This pass is most effective if preceeded by Inlining and
LocalAccessChainConvert. LocalSingleStoreElim and LocalSingleBlockElim
will reduce the work that this pass has to do.
Add --flatten-decorations to spirv-opt
Flattens decoration groups. That is, replace OpDecorationGroup
and its uses in OpGroupDecorate and OpGroupMemberDecorate with
ordinary OpDecorate and OpMemberDecorate instructions.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/602
The pass instance is constructed with a map from spec id (uint32_t) to
default values in string format. The default value strings will be
parsed to numbers according to the target spec constant type.
If the Spec Id decoration is found to be applied on multiple different
target ids, that decoration instruction (OpDecorate or OpGroupDecorate)
will be skipped. But other decoration instrucitons may still be
processed.
De-duplicate constants and unifies the uses of constants for a SPIR-V
module. If two constants are defined exactly the same, only one of them
will be kept and all the uses of the removed constant will be redirected
to the kept one.
This pass handles normal constants (defined with
OpConstant{|True|False|Composite}), some spec constants (those defined
with OpSpecConstant{Op|Composite}) and null constants (defined with
OpConstantNull).
There are several cases not handled by this pass:
1) If there are decorations for the result id of a constant defining
instruction, that instruction will not be processed. This means the
instruction won't be used to replace other instructions and other
instructions won't be used to replace it either.
2) This pass does not unify null constants (defined with
OpConstantNull instruction) with their equivalent zero-valued normal
constants (defined with OpConstant{|False|Composite} with zero as the
operand values or component values).
For the spec constants defined by OpSpecConstantOp and
OpSpecContantComposite, if all of their operands are constants with
determined values (normal constants whose values are fixed), calculate
the correct values of the spec constants and re-define them as normal
constants.
In short, this pass replaces all the spec constants defined by
OpSpecContantOp and OpSpecConstantComposite with normal constants when
possible. So far not all valid operations of OpSpecConstantOp are
supported, we have several constriction here:
1) Only 32-bit integer and boolean (both scalar and vector) are
supported for any arithmetic operations. Integers in other width (like
64-bit) are not supported.
2) OpSConvert, OpFConvert, OpQuantizeToF16, and all the
operations under Kernel capability, are not supported.
3) OpCompositeInsert is not supported.
Note that this pass does not unify normal constants. This means it is
possible to have new generatd constants defining the same values.
Add a pass to freeze spec constants to their default values. This pass does
not fold the frozen spec constants and does not handle SpecConstantOp
instructions and SpecConstantComposite instructions.