* Change implementation of post order CFG traversal
It seems like the recursion is going very deep, and causing some problem
is particular situations. I've reimplemented the CFG post order
traversal to not use recursion.
Fixes#2539.
Added documentation to the ir context to indicates that TakeNextId()
returns 0 when the max id is reached. TODOs were added to each call
sight so that we know where we have to start to handle this case.
Handle id overflow in |SplitLoopHeader|.
Handle id overflow in |GetOrCreatePreHeaderBlock|.
Handle failure to create preheader in LICM.
Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1841.
The current implementation of merge return can create bad, but correct,
code. When it is not in a loop construct, it will insert a lot of
extra branch around code. The potentially large number of branches are
bad. At the same time, it can separate code store to variables from
its uses hiding the fact that the store dominates the load.
This hurts the later analysis because the compiler thinks that multiple
values can reach a load, when there is really only 1. This poorer
analysis leads to missed optimizations.
The solution is to create a dummy loop around the entire body of the
function, then we can break from that loop with a single branch. Also
only new merge nodes would be those at the end of loops meaning that
most analysies will not be hurt.
Remove dead code for cases that are no longer possible.
It seems like some drivers expect there the be an OpSelectionMerge
before conditional branches, even if they are not strictly needed.
So we add them.
Many of the files have using std::<foo> statements in them, but then the
use of <foo> will be inconsistently std::<foo> or <foo> scattered
through the file. This CL removes all of the using statements and
updates the code to have the required std:: prefix.
This CL removes the context() method from opt::Function. In the places
where the context() was used we can retrieve, or provide, the context in
another fashion.
This CL moves the files in opt/ to consistenly be under the opt::
namespace. This frees up the ir:: namespace so it can be used to make a
shared ir represenation.
For each function, the analysis determine which SSA registers are live
at the beginning of each basic block and which one are killed at
the end of the basic block.
It also includes utilities to simulate the register pressure for loop
fusion and fission.
The implementation is based on the paper "A non-iterative data-flow
algorithm for computing liveness sets in strict ssa programs" from
Boissinot et al.
We are seeing shaders that have multiple returns in a functions. These
functions must get inlined for legalization purposes; however, the
inliner does not know how to inline functions that have multiple
returns.
The solution we will go with it to improve the merge return pass to
handle structured control flow.
Note that the merge return pass will assume the cfg has been cleanedup
by dead branch elimination.
Fixes#857.
Implementation of the simplification pass.
- Create pass that calls the instruction folder on each instruction and
propagate instructions that fold to a copy. This will do copy
propagation as well.
- Did not use the propagator engine because I want to modify the instruction
as we go along.
- Change folding to not allocate new instructions, but make changes in
place. This change had a big impact on compile time.
- Add simplification pass to the legalization passes in place of
insert-extract elimination.
- Added test cases for new folding rules.
- Added tests for the simplification pass
- Added a method to the CFG to apply a function to the basic blocks in
reverse post order.
Contributes to #1164.
This patch adds LoopUtils class to handle some loop related transformations. For now it has 2 transformations that simplifies other transformations such as loop unroll or unswitch:
- Dedicate exit blocks: this ensure that all exit basic block
(out-of-loop basic blocks that have a predecessor in the loop)
have all their predecessors in the loop;
- Loop Closed SSA (LCSSA): this ensure that all definitions in a loop are used inside the loop
or in a phi instruction in an exit basic block.
It also adds the following capabilities:
- Loop::IsLCSSA to test if the loop is in a LCSSA form
- Loop::GetOrCreatePreHeaderBlock that can build a loop preheader if required;
- New methods to allow on the fly updates of the loop descriptors.
- New methods to allow on the fly updates of the CFG analysis.
- Instruction::SetOperand to allow expression of the index relative to Instruction::NumOperands (to be compatible with the index returned by DefUseManager::ForEachUse)
This ensure that all basic blocks in a function have a valid entry the CFG object.
The entry block has no predecessors but remains a valid basic block
for which we might want to query the number of predecessors.
Some unreachable basic blocks may not have predecessors as well.
In order to keep track of all of the implicit capabilities as well as
the explicit ones, we will add them all to the feature manager. That is
the object that needs to be queried when checking if a capability is
enabled.
The name of the "HasCapability" function in the module was changed to
make it more obvious that it does not check for implied capabilities.
Keep an spv_context and AssemblyGrammar in IRContext
Re-formatted the source tree with the command:
$ /usr/bin/clang-format -style=file -i \
$(find include source tools test utils -name '*.cpp' -or -name '*.h')
This required a fix to source/val/decoration.h. It was not including
spirv.h, which broke builds when the #include headers were re-ordered by
clang-format.
Each instruction is given an unique id that can be used for ordering
purposes. The ids are generated via the IRContext.
Major changes:
* Instructions now contain a uint32_t for unique id and a cached context
pointer
* Most constructors have been modified to take a context as input
* unfortunately I cannot remove the default and copy constructors, but
developers should avoid these
* Added accessors to parents of basic block and function
* Removed the copy constructors for BasicBlock and Function and replaced
them with Clone functions
* Reworked BuildModule to return an IRContext owning the built module
* Since all instructions require a context, the context now becomes the
basic unit for IR
* Added a constructor to context to create an owned module internally
* Replaced uses of Instruction's copy constructor with Clone whereever I
found them
* Reworked the linker functionality to perform clones into a different
context instead of moves
* Updated many tests to be consistent with the above changes
* Still need to add new tests to cover added functionality
* Added comparison operators to Instruction
* Added an internal option to LinkerOptions to verify merged ids are
unique
* Added a test for the linker to verify merged ids are unique
* Updated MergeReturnPass to supply a context
* Updated DecorationManager to supply a context for cloned decorations
* Reworked several portions of the def use tests in anticipation of next
set of changes
This class moves some of the CFG-related functionality into a new
class opt::CFG. There is some other code related to the CFG in the
inliner and in opt::LocalSingleStoreElimPass that should also be moved,
but that require more changes than this pure restructuring.
I will move those bits in a follow-up PR.
Currently, the CFG is computed every time a pass is instantiated, but
this should be later moved to the new IRContext class that @s-perron is
working on.
Other re-factoring:
- Add BasicBlock::ContinueBlockIdIfAny. Re-factored out of MergeBlockIdIfAny
- Rewrite IsLoopHeader in terms of GetLoopMergeInst.
- Run clang-format on some files.