* Handle id overflow in the ssa rewriter.
Remove LocalSSAElim pass at the same time. It does the same thing as the SSARewrite pass. Then even share almost all of the same code.
Fixes crbug.com/997246
Currently the IRContext is passed into the Pass::Process method. It is
then up to the individual pass to store the context into the context_
variable. This CL changes the Run method to store the context before
calling Process which no-longer receives the context as a parameter.
This CL moves the files in opt/ to consistenly be under the opt::
namespace. This frees up the ir:: namespace so it can be used to make a
shared ir represenation.
This pass replaces the load/store elimination passes. It implements the
SSA re-writing algorithm proposed in
Simple and Efficient Construction of Static Single Assignment Form.
Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
In: Jhala R., De Bosschere K. (eds)
Compiler Construction. CC 2013.
Lecture Notes in Computer Science, vol 7791.
Springer, Berlin, Heidelberg
https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6
In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.
When a target variable is loaded, it queries the variable's reaching
definition. If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.
The algorithm avoids repeated lookups using memoization.
For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA. That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
The algorithm used in DCEInst to remove dead code is very slow. It is
fine if you only want to remove a small number of instructions, but, if
you need to remove a large number of instructions, then the algorithm in
ADCE is much faster.
This PR removes the calls to DCEInst in the load-store removal passes
and adds a pass of ADCE afterwards.
A number of different iterations of the order of optimization, and I
believe this is the best I could find.
The results I have on 3 sets of shaders are:
Legalization:
Set 1: 5.39 -> 5.01
Set 2: 13.98 -> 8.38
Set 3: 98.00 -> 96.26
Performance passes:
Set 1: 6.90 -> 5.23
Set 2: 10.11 -> 6.62
Set 3: 253.69 -> 253.74
Size reduction passes:
Set 1: 7.16 -> 7.25
Set 2: 17.17 -> 16.81
Set 3: 112.06 -> 107.71
Note that the third set's compile time is large because of the large
number of basic blocks, not so much because of the number of
instructions. That is why we don't see much gain there.
On some shader code we have in our testsuite, Phi insertion is showing
massive compile time slowdowns, particularly during destruction. The
specific shader I was looking at has about 600 variables to keep track
of and around 3200 basic blocks. The algorithm is currently O(var x
blocks), which means maps with around 2M entries. This was taking about
8 minutes of compile time.
This patch changes the tracking of stored variables to be more sparse.
Instead of having every basic block contain all the tracked variables in
the map, they now have only the variables actually stored in that block.
This speeds up deallocation, which brings down compile time to about
1m20s.
Note that this is not the definite fix for this. I will re-write Phi
insertion to use a standard SSA rewriting algorithm
(https://github.com/KhronosGroup/SPIRV-Tools/issues/893).
This contributes to
https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
Pass now paints live blocks and fixes constant branches and switches as
it goes. No longer requires structured control flow. It also removes
unreachable blocks as a side effect. It fixes the IR (phis) before doing
any code removal (other than terminator changes).
Added several unit tests for updated/new functionality.
Does not remove dead edge from a phi node:
* Checks that incoming edges are live in order to retain them
* Added BasicBlock::IsSuccessor
* added test
Fixing phi updates in the presence of extra backedge blocks
* Added tests to catch bug
Reworked how phis are updated
* Instead of creating a new Phi and RAUW'ing the old phi with it, I now
replace the phi operands, but maintain the def/use manager correctly.
For unreachable merge:
* When considering unreachable continue blocks the code now properly
checks whether the incoming edge will continue to be live.
Major refactoring for review
* Broke into 4 major functions
* marking live blocks
* marking structured targets
* fixing phis
* deleting blocks
A few optimizations are updates to handle code that is suppose to be
using the logical addressing mode, but still has variables that contain
pointers as long as the pointer are to opaque objects. This is called
"relaxed logical addressing".
|Instruction::GetBaseAddress| will check that pointers that are use meet
the relaxed logical addressing rules. Optimization that now handle
relaxed logical addressing instead of logical addressing are:
- aggressive dead-code elimination
- local access chain convert
- local store elimination passes.
The current method of removing an instruction is to call ToNop. The
problem with this is that it leaves around an instruction that later
passes will look at. We should just delete the instruction.
In MemPass there is a utility routine called DCEInst. It can delete
essentially any instruction, which can invalidate pointers now that they
are actually deleted. The interface was changed to add a call back that
can be used to update any local data structures that contain
ir::Intruction*.
To make the decoration manger available everywhere, and to reduce the
number of times it needs to be build, I add one the IRContext.
As the same time, I move code that modifies decoration instruction into
the IRContext from mempass and the decoration manager. This will make
it easier to keep everything up to date.
This should take care of issue #928.
This analysis builds a map from instructions to the basic block that
contains them. It is accessed via get_instr_block(). Once built, it is kept
up-to-date by the IRContext, as long as instructions are removed via
KillInst.
I have not yet marked passes that preserve this analysis. I will do it
in a separate change.
Other changes:
- Add documentation about analysis values requirement to be powers of 2.
- Force a re-build of the def-use manager in tests.
- Fix AllPreserveFirstOnlyAfterPassWithChange to use the
DummyPassPreservesFirst pass.
- Fix sentinel value for IRContext::Analysis enum.
- Fix logic for checking if the instr<->block mapping is valid in KillInst.
NFC. This just makes sure every file is formatted following the
formatting definition in .clang-format.
Re-formatted with:
$ clang-format -i $(find source tools include -name '*.cpp')
$ clang-format -i $(find source tools include -name '*.h')
This class moves some of the CFG-related functionality into a new
class opt::CFG. There is some other code related to the CFG in the
inliner and in opt::LocalSingleStoreElimPass that should also be moved,
but that require more changes than this pure restructuring.
I will move those bits in a follow-up PR.
Currently, the CFG is computed every time a pass is instantiated, but
this should be later moved to the new IRContext class that @s-perron is
working on.
Other re-factoring:
- Add BasicBlock::ContinueBlockIdIfAny. Re-factored out of MergeBlockIdIfAny
- Rewrite IsLoopHeader in terms of GetLoopMergeInst.
- Run clang-format on some files.
This is the first part of adding the IRContext. This class is meant to
hold the extra data that is build on top of the module that it
owns.
The first part will simply create the IRContext class and get it passed
to the passes in place of the module. For now it does not have any
functionality of its own, but it acts more as a wrapper for the module.
The functions that I added to the IRContext are those that either
traverse the headers or add to them. I did this because we may decide
to have other ways of dealing with these sections (for example adding a
type pool, or use the decoration manager).
I also added the function that add to the header because the IRContext
needs to know when an instruction is added to update other data
structures appropriately.
Note that there is still lots of work that needs to be done. There are
still many places that change the module, and do not inform the context.
That will be the next step.
Mark structured conditional branches live only if one or more instructions
in their associated construct is marked live. After closure, replace dead
structured conditional branches with a branch to its merge and remove
dead blocks.
ADCE: Dead If Elim: Remove duplicate StructuredOrder code
Also generalize ComputeStructuredOrder so that the caller can specify the
root block for the order. Phi insertion uses pseudo_entry_block and adce and
dead branch elim use the first block of the function.
ADCE: Dead If Elim: Pull redundant code out of InsertPhiInstructions
ADCE: Dead If Elim: Encapsulate CFG Cleanup Initialization
ADCE: Dead If Elim: Remove redundant code from ADCE initialization
ADCE: Dead If: Use CFGCleanup to eliminate newly dead blocks
Moved bulk of CFG Cleanup code into MemPass.
This implements two cleanups suggested by @s-perron
(https://github.com/KhronosGroup/SPIRV-Tools/pull/921):
- Move FindNamedOrDecoratedIds() into MemPass::InitializeProcessing().
- Remove FinalizeNextId(). Always call SetIdBound() from
Pass::TakeNextId().
Including a re-factor of common behaviour into class Pass:
The following functions are now in class Pass:
- IsLoopHeader.
- ComputeStructuredOrder
- ComputeStructuredSuccessors (annoyingly, I could not re-factor all
instances of this function, the copy in common_uniform_elim_pass.cpp
is slightly different and fails with the common implementation).
- GetPointeeTypeId
- TakeNextId
- FinalizeNextId
- MergeBlockIdIfAny
This is a NFC (non-functional change)
Includes code to deal correctly with OpFunctionParameter. This
is needed by opaque propagation which may not exhaustively inline
entry point functions.
Adds ProcessEntryPointCallTree: a method to do work on the
functions in the entry point call trees in a deterministic order.
ADCE will now generate correct code in the presence of function calls.
This is needed for opaque type optimization needed by glslang. Currently
all function calls are marked as live. TODO: mark calls live only if they
write a non-local.
This avoids conversion on variables which will not ultimately be optimized.
Also removed an obsolete restriction from FindTargetVars(). Also added
decorates to supported refs (eg. RelaxedPrecision). Also fixed name to
IsNonTypeDecorate().