This CL moves the files in opt/ to consistenly be under the opt::
namespace. This frees up the ir:: namespace so it can be used to make a
shared ir represenation.
Many optimization will run on function scope symbols only. When symbols
are moved from private scope to function scople, then these optimizations
can do more.
I believe it is a good idea to run this pass with both -O and -Os. To
get the most out of it it should be run ASAP after inlining and something
that remove all of the dead functions.
Revert "Don't merge types of resources"
This reverts commit f393b0e480, but leaves
the tests that were added. Added new test. These test are the so that,
if someone tries the same change I made, they will see the test that
they need to handle.
Don't run remove duplicates in -O and -Os
Romve duplicates was run to help reduce compile time when looking for
types in the type manager. I've run compile time test on three sets
of shaders, and the compile time does not seem to change.
It should be safe to remove it.
We add a new rule to the folding rules to fold an FMix feeding an
extract when the alpha value for the element being extracted is either
0 or 1. In those case, we can simple extract from one of the operands
to the FMix.
With that change the simplification pass completely subsumes the
insert-extract elimination pass. So we remove the insert-extract
elimination passes and replce them with calls to the simplification
pass.
In a follow up PR, we should delete the insert-extract elimination pass.
Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.
Currently it's impossible for external code to register a pass because
the only source file that can create pass tokens is optimizer.cpp. This
makes it hard to add passes that can't be upstreamed since you can't run
them from the usual pass sequence without reimplementing Optimizer.
This change adds a PassToken constructor that takes unique_ptr to
opt::Pass; if out-of-tree code implements opt::Pass it can register a
custom pass without having to add it to SPIRV-Tools source code.
Removes the limit on scalar replacement for the lagalization passes.
This is done by adding an option to the pass (and command line option)
to set the limit on maximum size of the composite that scalar
replacement is willing to divide.
Fixes#1494.
We have already disabled common uniform elimination because it created
sequences of loads an entire uniform object, then we extract just a
single element. This caused problems in some drivers, and is just
generally slow because it loads more memory than needed.
However, there are other way to get into this situation, so I've added
a pass that looks specifically for this pattern and removes it when only
a portion of the load is used.
Fixes#1547.
This pass will look for adjacent loops that are compatible and legal to
be fused.
Loops are compatible if:
- they both have one induction variable
- they have the same upper and lower bounds
- same initial value
- same condition
- they have the same update step
- they are adjacent
- there are no break/continue in either of them
Fusion is legal if:
- fused loops do not have any dependencies with dependence distance
greater than 0 that did not exist in the original loops.
- there are no function calls in the loops (could have side-effects)
- there are no barriers in the loops
It will fuse all such loops as long as the number of registers used for
the fused loop stays under the threshold defined by
max_registers_per_loop.
Adds support for spliting loops whose register pressure exceeds a user
provided level. This pass will split a loop into two or more loops given
that the loop is a top level loop and that spliting the loop is legal.
Control flow is left intact for dead code elimination to remove.
This pass is enabled with the --loop-fission flag to spirv-opt.
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.
It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
For each loop in a function, the pass walks the loops from inner to outer most loop
and tries to peel loop for which a certain amount of iteration can be done before or after the loop.
To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.
When the original code copies an entire array or struct one element at a
time, this turns into a series of OpCompositeInsert instruction followed
by a store of the whole array. We currently miss opportunities in copy
propagate arrays because we do not recognize this as a copy.
This commit adds code to copy propagate arrays to identify this code
pattern.
Also updates the performance passed to run array copy propagation.
The sprir-v generated from HLSL code contain many copyies of very large
arrays. Not only are these time consumming, but they also cause
problems for drivers because they require too much space.
To work around this, we will implement an array copy propagation. Note
that we will not implement a complete array data flow analysis in order
to implement this. We will be looking for very simple cases:
1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.
The hard part is keeping all of the types correct. We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.
Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent. With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values. See |Struct::GetExtraHashWords|.
Contributes to #1416.
This patch adds a new option --time-report to spirv-opt. For each pass
executed by spirv-opt, the flag prints resource utilization for the pass
(CPU time, wall time, RSS and page faults)
This fixes issue #1378
This pass replaces the load/store elimination passes. It implements the
SSA re-writing algorithm proposed in
Simple and Efficient Construction of Static Single Assignment Form.
Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
In: Jhala R., De Bosschere K. (eds)
Compiler Construction. CC 2013.
Lecture Notes in Computer Science, vol 7791.
Springer, Berlin, Heidelberg
https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6
In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.
When a target variable is loaded, it queries the variable's reaching
definition. If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.
The algorithm avoids repeated lookups using memoization.
For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA. That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
We are seeing shaders that have multiple returns in a functions. These
functions must get inlined for legalization purposes; however, the
inliner does not know how to inline functions that have multiple
returns.
The solution we will go with it to improve the merge return pass to
handle structured control flow.
Note that the merge return pass will assume the cfg has been cleanedup
by dead branch elimination.
Fixes#857.
Strips reflection info. This is limited to decorations and
decoration instructions related to the SPV_GOOGLE_hlsl_functionality1
extension.
It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1.
It will also remove the OpExtension for SPV_GOOGLE_decorate_string
if there are no further remaining uses of OpDecorateStringGOOGLE.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398
The algorithm used in DCEInst to remove dead code is very slow. It is
fine if you only want to remove a small number of instructions, but, if
you need to remove a large number of instructions, then the algorithm in
ADCE is much faster.
This PR removes the calls to DCEInst in the load-store removal passes
and adds a pass of ADCE afterwards.
A number of different iterations of the order of optimization, and I
believe this is the best I could find.
The results I have on 3 sets of shaders are:
Legalization:
Set 1: 5.39 -> 5.01
Set 2: 13.98 -> 8.38
Set 3: 98.00 -> 96.26
Performance passes:
Set 1: 6.90 -> 5.23
Set 2: 10.11 -> 6.62
Set 3: 253.69 -> 253.74
Size reduction passes:
Set 1: 7.16 -> 7.25
Set 2: 17.17 -> 16.81
Set 3: 112.06 -> 107.71
Note that the third set's compile time is large because of the large
number of basic blocks, not so much because of the number of
instructions. That is why we don't see much gain there.
It moves all conditional branching and switch whose conditions are loop
invariant and uniform. Before performing the loop unswitch we check that
the loop does not contain any instruction that would prevent it
(barriers, group instructions etc.).
In some shaders there are a lot of very large and deeply nested
structures. This creates a lot of work for scalar replacement. Also,
since commit ca4457b we have been very aggressive as rewriting
variables. This has causes a large increase in compile time in creating
and then deleting the instructions.
To help low the costs, I want to run a cleanup of some of the easy loads
and stores to remove. This reduces the number of symbols sroa has to
work on. It also reduces the amount of code the simplifier has to
simplify because it was not generated by sroa.
To confirm the improvement, I ran numbers on three different sets of
shaders:
Time to run --legalize-hlsl:
Set #1: 55.89s -> 12.0s
Set #2: 1m44s -> 1m40.5s
Set #3: 6.8s -> 5.7s
Time to run -O
Set #1: 18.8s -> 10.9s
Set #2: 5m44s -> 4m17s
Set #3: 7.8s -> 7.8s
Contributes to #1328.
The simplification pass works better after all of the dead branches are
removed. So swapping them around in the legalization passes. Also
adding the simplification pass to performance passes right after dead
branch elimination.
Added CCP to the legalization passes so we can propagate the constants
into the branchs, and remove as many branches a possible. CCP is
designed to still get opportunities even if the branches are dead, so it
is a good place for it.
Fixes#1118
This patch adds initial support for loop unrolling in the form of a
series of utility classes which perform the unrolling. The pass can
be run with the command spirv-opt --loop-unroll. This will unroll
loops within the module which have the unroll hint set. The unroller
imposes a number of requirements on the loops it can unroll. These are
documented in the comments for the LoopUtils::CanPerformUnroll method in
loop_utils.h. Some of the restrictions will be lifted in future patches.
Implementation of the simplification pass.
- Create pass that calls the instruction folder on each instruction and
propagate instructions that fold to a copy. This will do copy
propagation as well.
- Did not use the propagator engine because I want to modify the instruction
as we go along.
- Change folding to not allocate new instructions, but make changes in
place. This change had a big impact on compile time.
- Add simplification pass to the legalization passes in place of
insert-extract elimination.
- Added test cases for new folding rules.
- Added tests for the simplification pass
- Added a method to the CFG to apply a function to the basic blocks in
reverse post order.
Contributes to #1164.
* Moved initial insert/extract passes later to cover more opportunities
* Added an extra set of passes to clean up opportunities exposed later
in the pipeline
Creates a pass that will remove instructions that are invalid for the
current shader stage. For the instruction to be considered for replacement
1) The opcode must be valid for a shader modules.
2) The opcode must be invalid for the current shader stage.
3) All entry points to the module must be for the same shader stage.
4) The function containing the instruction must be reachable from an entry point.
Fixes#1247.
* Handles simple cases only
* Identifies phis in blocks with two predecessors and attempts to
convert the phi to an select
* does not perform code motion currently so the converted values must
dominate the join point (e.g. can't be defined in the branches)
* limited for now to two predecessors, but can be extended to handle
more cases
* Adding if conversion to -O and -Os
We have come across a driver bug where and OpUnreachable inside a loop
is causing the shader to go into an infinite loop. This commit will try
to avoid this bug by turning OpUnreachable instructions that are
contained in a loop into branches to the loop merge block.
This is not added to "-O" and "-Os" because it should only be used if
the driver being targeted has this problem.
Fixes#1209.
With work that Alan has done, some passes have become redundant. ADCE
now removed unused variables. Dead branch elimination removes
unreachable blocks. This means we can remove CFG Cleanup and dead
variable elimination.
Modified ADCE to remove dead globals.
* Entry point and execution mode instructions are marked as alive
* Reachable functions and their parameters are marked as alive
* Instruction deletion now deferred until the end of the pass
* Eliminated dead insts set, added IsDead to calculate that value
instead
* Ported applicable dead variable elimination tests
* Ported dead constant elim tests
Added dead function elimination to ADCE
* ported dead function elim tests
Added handling of decoration groups in ADCE
* Uses a custom sorter to traverse decorations in a specific order
* Simplifies necessary checks
Updated -O and -Os pass lists.
This fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1143.
When an instruction transitions from constant to bottom (varying) in the
lattice, we were telling the propagator that the instruction was
varying, but never updating the actual value in the values table.
This led to incorrect value substitutions at the end of propagation.
The patch also re-enables CCP in -O and -Os.
Adds optimizer API to write disassembly to a given output stream
before each pass, and after the last pass.
Adds spirv-opt --print-all option to write disassembly to stderr
before each pass, and after the last pass.
I've a few passes the legalization passes. The first is to add the
more specialized load-store removal passes to help improve the compile
time, as was suggested in #1118.
I've also added dead branch elimination while we wait for the behaviour
of dead branch elimination to be folded into CFG cleanup.
I did not add CCP because it seems like most of the constant propagation
what is needed is already being done by the load-store removal passes,
which call `ReplaceAllUsesWith`. We can reconsider this if needed.
In CCP we should not need to insert Phi nodes because CCP never looks at
loads/stores. This required adjusting two tests that relied on Phi
instructions being inserted. I changed the tests to have the Phi
instructions pre-inserted.
I also added a new test to make sure that CCP does not try to look
through stores and loads.
Finally, given that CCP does not handle loads/stores, it's better to run
mem2reg before it. I've changed the -O/-Os schedules to run local
multi-store elimination before CCP.
Although this is just an efficiency fix for CCP, it is
also working around a bug in Phi insertion. When Phi instructions are
inserted, they are never associated a basic block. This causes a
segfault when the propagator tries to lookup CFG edges when analyzing
Phi instructions.
This implements the conditional constant propagation pass proposed in
Constant propagation with conditional branches,
Wegman and Zadeck, ACM TOPLAS 13(2):181-210.
The main logic resides in CCPPass::VisitInstruction. Instruction that
may produce a constant value are evaluated with the constant folder. If
they produce a new constant, the instruction is considered interesting.
Otherwise, it's considered varying (for unfoldable instructions) or
just not interesting (when not enough operands have a constant value).
The other main piece of logic is in CCPPass::VisitBranch. This
evaluates the selector of the branch. When it's found to be a known
value, it computes the destination basic block and sets it. This tells
the propagator which branches to follow.
The patch required extensions to the constant manager as well. Instead
of hashing the Constant pointers, this patch changes the constant pool
to hash the contents of the Constant. This allows the lookups to be
done using the actual values of the Constant, preventing duplicate
definitions.
Changes the set of optimizations done for legalization. While doing
this, I added documentation to explain why we want each optimization.
A new option "--legalize-hlsl" is added so the legalization passes can
be easily run from the command line.
The legalize option implies skip-validation.
When a private variable is used in a single function, it can be
converted to a function scope variable in that function. This adds a
pass that does that. The pass can be enabled using the option
`--private-to-local`.
This transformation allows other transformations to act on these
variables.
Also moved `FindPointerToType` from the inline class to the type manager.
types. This allows the lookup of type declaration ids from arbitrarily
constructed types. Users should be cautious when dealing with non-unique
types (structs and potentially pointers) to get the exact id if
necessary.
* Changed the spec composite constant folder to handle ambiguous composites
* Added functionality to create necessary instructions for a type
* Added ability to remove ids from the type manager
Adds a scalar replacement pass. The pass considers all function scope
variables of composite type. If there are accesses to individual
elements (and it is legal) the pass replaces the variable with a
variable for each composite element and updates all the uses.
Added the pass to -O
Added NumUses and NumUsers to DefUseManager
Added some helper methods for the inst to block mapping in context
Added some helper methods for specific constant types
No longer generate duplicate pointer types.
* Now searches for an existing pointer of the appropriate type instead
of failing validation
* Fixed spec constant extracts
* Addressed changes for review
* Changed RunSinglePassAndMatch to be able to run validation
* current users do not enable it
Added handling of acceptable decorations.
* Decorations are also transfered where appropriate
Refactored extension checking into FeatureManager
* Context now owns a feature manager
* consciously NOT an analysis
* added some test
* fixed some minor issues related to decorates
* added some decorate related tests for scalar replacement
Adds a pass that looks for redundant instruction in a function, and
removes them. The algorithm is a hash table based value numbering
algorithm that traverses the dominator tree.
This pass removes completely redundant instructions, not partially
redundant ones.
Creates a pass that removes redundant instructions within the same basic
block. This will be implemented using a hash based value numbering
algorithm.
Added a number of functions that check for the Vulkan descriptor types.
These are used to determine if we are variables are read-only or not.
Implemented a function to check if loads and variables are read-only.
Implemented kernel specific and shader specific versions.
A big change is that the Combinator analysis in ADCE is factored out
into the IRContext as an analysis. This was done because it is being
reused in the value number table.
Each instruction is given an unique id that can be used for ordering
purposes. The ids are generated via the IRContext.
Major changes:
* Instructions now contain a uint32_t for unique id and a cached context
pointer
* Most constructors have been modified to take a context as input
* unfortunately I cannot remove the default and copy constructors, but
developers should avoid these
* Added accessors to parents of basic block and function
* Removed the copy constructors for BasicBlock and Function and replaced
them with Clone functions
* Reworked BuildModule to return an IRContext owning the built module
* Since all instructions require a context, the context now becomes the
basic unit for IR
* Added a constructor to context to create an owned module internally
* Replaced uses of Instruction's copy constructor with Clone whereever I
found them
* Reworked the linker functionality to perform clones into a different
context instead of moves
* Updated many tests to be consistent with the above changes
* Still need to add new tests to cover added functionality
* Added comparison operators to Instruction
* Added an internal option to LinkerOptions to verify merged ids are
unique
* Added a test for the linker to verify merged ids are unique
* Updated MergeReturnPass to supply a context
* Updated DecorationManager to supply a context for cloned decorations
* Reworked several portions of the def use tests in anticipation of next
set of changes