* When handling unreachable merges and continues, do not optimize to the
same IR
* pass did not check whether the unreachable blocks were in the
optimized form before transforming them
* added a test to catch this issue
* Should handle all possibilities
* Stricter checks for what is disallowed:
* header and header
* merge and merge
* Allow header and merge blocks to be merged
* Erases the structured control declaration if merging header and
merge blocks together.
* If the dead branch elim is performed on a module without structured
control flow, the OpSelectionMerge may not be present
* Add a check for pointer validity before dereferencing
* Added a test to catch the bug
* Forces traversal of phis if the def has changed to varying
* Mark a phi as varying if all incoming values are varying
* added a test to catch the bug
This adds Dead Insert Elimination to the end of the
--eliminate-insert-extract pass. See the new tests for examples of code
that will benefit.
Essentially, this removes OpCompositeInsert instructions which are not
used, either because there is no instruction which uses the value at the
index it is inserted, or because a subsequent insert intercepts any such
use.
This code has been seen to remove significant amounts of dead code from
real-life HLSL shaders being ported to Vulkan. In fact, it is needed to
remove dead texture samples which cause Vulkan validation layer errors
(unbound textures and samplers) if not removed . Such DCE is thus
required for fxc equivalence and legalization.
This analysis operates across "chains" of Inserts which can also contain
Phi instructions.
* Handles simple cases only
* Identifies phis in blocks with two predecessors and attempts to
convert the phi to an select
* does not perform code motion currently so the converted values must
dominate the join point (e.g. can't be defined in the branches)
* limited for now to two predecessors, but can be extended to handle
more cases
* Adding if conversion to -O and -Os
The current folding routines have a very cumbersome interface, make them
harder to use, and not a obvious how to extend.
This change is to create a new interface for the folding routines, and
show how it can be used by calling it from CCP.
This does not make a significant change to the behaviour of CCP. In
general it should produce the same code as before; however it is
possible that an instruction that takes 32-bit integers as inputs and
the result is not a 32-bit integer or bool will not be folded as before.
It seems like andriod has a problem with INT32_MAX and the like. I'll
explicitly define those if the are not already defined.
The class factorize the instruction building process.
Def-use manager analysis can be updated on the fly to maintain coherency.
To be updated to take into account more analysis.
* AddToWorklist can now be called unconditionally
* It will only add instructions that have not already been marked as
live
* Fixes a case where a merge was not added to the worklist because the
branch was already marked as live
* Added two similar tests that fail without the fix
We have come across a driver bug where and OpUnreachable inside a loop
is causing the shader to go into an infinite loop. This commit will try
to avoid this bug by turning OpUnreachable instructions that are
contained in a loop into branches to the loop merge block.
This is not added to "-O" and "-Os" because it should only be used if
the driver being targeted has this problem.
Fixes#1209.
At the moment specialization constants look like constants to ccp. This
causes a problem because they are handled differently by the constant
manager.
I choose to simply skip over them, and not try to add them to the value
table. We can do specialization before ccp if we want to be able to
propagate these values.
Fixes#1199.
The current code expects the users of the constant manager to initialize
it with all of the constants in the module. The problem is that you do
not want to redo the work multiple times. So I decided to move that
code to the constructor of the constant manager. This way it will
always be initialized on first use.
I also removed an assert that expects all constant instructions to be
successfully mapped. This is because not all OpConstant* instruction
can map to a constant, and neither do the OpSpecConstant* instructions.
The real problem is that an OpConstantComposite can contain a member
that is OpUndef. I tried to treat OpUndef like OpConstantNull, but this
failed because an OpSpecConstantComposite with an OpUndef cannot be
changed to an OpConstantComposite. Since I feel this case will not be
common, I decided to not complicate the code.
Fixes#1193.
* Added for Instruction, BasicBlock, Function and Module
* Uses new disassembly functionality that can disassemble individual
instructions
* For debug use only (no caching is done)
* Each output converts module to binary, parses and outputs an
individual instruction
* Added a test for whole module output
* Disabling Microsoft checked iterator warnings
* Updated check_copyright.py to accept 2018
* Changed MemPass::InsertPhiInstructions to set basic blocks for new
phis
* Local SSA elim now maintains instr to block mapping
* Added a test and confirmed it fails without the updated phis
* IRContext::set_instr_block no longer builds the map if the analysis is
invalid
* Added instruction to block mapping verification to
IRContext::IsConsistent()
This improves Extract replacement to continue through VectorShuffle.
It will also handle Mix with 0.0 or 1.0 in the a-value of the desired
component.
To facilitate optimization of VectorShuffle, the algorithm was refactored
to pass around the indices of the extract in a vector rather than pass the
extract instruction itself. This allows the indices to be modified as the
algorithm progresses.
Modified ADCE to remove dead globals.
* Entry point and execution mode instructions are marked as alive
* Reachable functions and their parameters are marked as alive
* Instruction deletion now deferred until the end of the pass
* Eliminated dead insts set, added IsDead to calculate that value
instead
* Ported applicable dead variable elimination tests
* Ported dead constant elim tests
Added dead function elimination to ADCE
* ported dead function elim tests
Added handling of decoration groups in ADCE
* Uses a custom sorter to traverse decorations in a specific order
* Simplifies necessary checks
Updated -O and -Os pass lists.
Pass now paints live blocks and fixes constant branches and switches as
it goes. No longer requires structured control flow. It also removes
unreachable blocks as a side effect. It fixes the IR (phis) before doing
any code removal (other than terminator changes).
Added several unit tests for updated/new functionality.
Does not remove dead edge from a phi node:
* Checks that incoming edges are live in order to retain them
* Added BasicBlock::IsSuccessor
* added test
Fixing phi updates in the presence of extra backedge blocks
* Added tests to catch bug
Reworked how phis are updated
* Instead of creating a new Phi and RAUW'ing the old phi with it, I now
replace the phi operands, but maintain the def/use manager correctly.
For unreachable merge:
* When considering unreachable continue blocks the code now properly
checks whether the incoming edge will continue to be live.
Major refactoring for review
* Broke into 4 major functions
* marking live blocks
* marking structured targets
* fixing phis
* deleting blocks
This fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1143.
When an instruction transitions from constant to bottom (varying) in the
lattice, we were telling the propagator that the instruction was
varying, but never updating the actual value in the values table.
This led to incorrect value substitutions at the end of propagation.
The patch also re-enables CCP in -O and -Os.
Add post-order tree iterator.
Add DominatorTreeNode extensions:
- Add begin/end methods to do pre-order and post-order tree traversal from a given DominatorTreeNode
Add DominatorTree extensions:
- Add begin/end methods to do pre-order and post-order tree traversal
- Tree traversal ignore by default the pseudo entry block
- Retrieve a DominatorTreeNode from a basic block
Add loop descriptor:
- Add a LoopDescriptor class to register all loops in a given function.
- Add a Loop class to describe a loop:
- Loop parent
- Nested loops
- Loop depth
- Loop header, merge, continue and preheader
- Basic blocks that belong to the loop
Correct a bug that forced dominator tree to be constantly rebuilt.
In value numbering, we treat loads and stores of images, ie OpImageLoad,
as a memory operation where it is interested in the "base address" of
the instruction. In those cases, it is an image instruction.
The problem is that `Instruction::GetBaseAddress()` does not account for
the image instructions, so the assert at the end to make sure it found
a valid base address for its addressing mode fails.
The solution is to look at the load/store instruction to determine how
the assertion should be done.
Fixes#1160.
This fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1159. I
had missed a nuance in the original algorithm. When simulating Phi
instructions, the SSA edges out of a Phi instruction should never be
added to the list of edges to simulate.
Phi instructions can be in SSA def-use cycles with other Phi
instructions. This was causing the propagator to fall into an infinite
loop when the same def-use edge kept being added to the queue.
The original algorithm in the paper specifically separates the visit of
a Phi instruction vs the visit of a regular instruction. This fix makes
the implementation match the original algorithm.
In CCP we should not need to insert Phi nodes because CCP never looks at
loads/stores. This required adjusting two tests that relied on Phi
instructions being inserted. I changed the tests to have the Phi
instructions pre-inserted.
I also added a new test to make sure that CCP does not try to look
through stores and loads.
Finally, given that CCP does not handle loads/stores, it's better to run
mem2reg before it. I've changed the -O/-Os schedules to run local
multi-store elimination before CCP.
Although this is just an efficiency fix for CCP, it is
also working around a bug in Phi insertion. When Phi instructions are
inserted, they are never associated a basic block. This causes a
segfault when the propagator tries to lookup CFG edges when analyzing
Phi instructions.
This implements the conditional constant propagation pass proposed in
Constant propagation with conditional branches,
Wegman and Zadeck, ACM TOPLAS 13(2):181-210.
The main logic resides in CCPPass::VisitInstruction. Instruction that
may produce a constant value are evaluated with the constant folder. If
they produce a new constant, the instruction is considered interesting.
Otherwise, it's considered varying (for unfoldable instructions) or
just not interesting (when not enough operands have a constant value).
The other main piece of logic is in CCPPass::VisitBranch. This
evaluates the selector of the branch. When it's found to be a known
value, it computes the destination basic block and sets it. This tells
the propagator which branches to follow.
The patch required extensions to the constant manager as well. Instead
of hashing the Constant pointers, this patch changes the constant pool
to hash the contents of the Constant. This allows the lookups to be
done using the actual values of the Constant, preventing duplicate
definitions.
In order to keep track of all of the implicit capabilities as well as
the explicit ones, we will add them all to the feature manager. That is
the object that needs to be queried when checking if a capability is
enabled.
The name of the "HasCapability" function in the module was changed to
make it more obvious that it does not check for implied capabilities.
Keep an spv_context and AssemblyGrammar in IRContext
* changed the way duplicate types are removed to stop copying
instructions
* Reworked RemoveDuplicatesPass::AreTypesSame to use type manager and
type equality
* Reworked TypeManager memory management to store a pool of unique
pointers of types
* removed unique pointers from id map
* fixed instances where free'd memory could be accessed
A few optimizations are updates to handle code that is suppose to be
using the logical addressing mode, but still has variables that contain
pointers as long as the pointer are to opaque objects. This is called
"relaxed logical addressing".
|Instruction::GetBaseAddress| will check that pointers that are use meet
the relaxed logical addressing rules. Optimization that now handle
relaxed logical addressing instead of logical addressing are:
- aggressive dead-code elimination
- local access chain convert
- local store elimination passes.
When a private variable is used in a single function, it can be
converted to a function scope variable in that function. This adds a
pass that does that. The pass can be enabled using the option
`--private-to-local`.
This transformation allows other transformations to act on these
variables.
Also moved `FindPointerToType` from the inline class to the type manager.
types. This allows the lookup of type declaration ids from arbitrarily
constructed types. Users should be cautious when dealing with non-unique
types (structs and potentially pointers) to get the exact id if
necessary.
* Changed the spec composite constant folder to handle ambiguous composites
* Added functionality to create necessary instructions for a type
* Added ability to remove ids from the type manager
This fixes issue #1075
- Mark continue when conditional branch with merge block.
Only mark if merge block is not continue block.
- Handle conditional branch break with preceding merge
Inlining is not setting the parent (function) for each basic block.
This can cause problems for later optimizations. The solution is to set
the parent for each new block just before it is linked into the
function.
Adds a scalar replacement pass. The pass considers all function scope
variables of composite type. If there are accesses to individual
elements (and it is legal) the pass replaces the variable with a
variable for each composite element and updates all the uses.
Added the pass to -O
Added NumUses and NumUsers to DefUseManager
Added some helper methods for the inst to block mapping in context
Added some helper methods for specific constant types
No longer generate duplicate pointer types.
* Now searches for an existing pointer of the appropriate type instead
of failing validation
* Fixed spec constant extracts
* Addressed changes for review
* Changed RunSinglePassAndMatch to be able to run validation
* current users do not enable it
Added handling of acceptable decorations.
* Decorations are also transfered where appropriate
Refactored extension checking into FeatureManager
* Context now owns a feature manager
* consciously NOT an analysis
* added some test
* fixed some minor issues related to decorates
* added some decorate related tests for scalar replacement
Adds a pass that looks for redundant instruction in a function, and
removes them. The algorithm is a hash table based value numbering
algorithm that traverses the dominator tree.
This pass removes completely redundant instructions, not partially
redundant ones.
Currently when inlining a call, the name and decorations for the result of the
call is not deleted. This should be changed. Added a test for this as well.
This fixes issue #622.
Support for dominator and post dominator analysis on ir::Functions. This patch contains a DominatorTree class for building the tree and DominatorAnalysis and DominatorAnalysisPass classes for interfacing and caching the built trees.