Implementation of the simplification pass.
- Create pass that calls the instruction folder on each instruction and
propagate instructions that fold to a copy. This will do copy
propagation as well.
- Did not use the propagator engine because I want to modify the instruction
as we go along.
- Change folding to not allocate new instructions, but make changes in
place. This change had a big impact on compile time.
- Add simplification pass to the legalization passes in place of
insert-extract elimination.
- Added test cases for new folding rules.
- Added tests for the simplification pass
- Added a method to the CFG to apply a function to the basic blocks in
reverse post order.
Contributes to #1164.
Add pkg-config file for shared libraries
Properly build SPIRV-Tools DLL
Test C interface with shared library
Set PATH to shared library file for c_interface_shared test
Otherwise, the test won't find SPIRV-Tools-shared.dll.
Do not use private functions when testing with shared library
Make all symbols hidden by default for shared library target
* Added TypeManager::RebuildType
* rebuilds the type and its constituent types in terms of memory owned
by the manager.
* Used by TypeManager::RegisterType to properly allocate memory
* Adding an unit test to expose the issue
* Added some tests to provide coverage of RebuildType
* Added an accessor to the target pointer for a forward pointer
The combinator initialization was only looking at the capabilities
in the shader and not the inferred capabilities. Geometry and tessellation
shaders were not setting the Shader capability which is inferred. So the
combinator set was not initialized correctly causing problems for ADCE.
Create the folding engine that will
1) attempt to fold an instruction.
2) iterates on the folding so small folding rules can be easily combined.
3) insert new instructions when needed.
I've added the minimum number of rules needed to test the features above.
* Moved initial insert/extract passes later to cover more opportunities
* Added an extra set of passes to clean up opportunities exposed later
in the pipeline
This patch adds LoopUtils class to handle some loop related transformations. For now it has 2 transformations that simplifies other transformations such as loop unroll or unswitch:
- Dedicate exit blocks: this ensure that all exit basic block
(out-of-loop basic blocks that have a predecessor in the loop)
have all their predecessors in the loop;
- Loop Closed SSA (LCSSA): this ensure that all definitions in a loop are used inside the loop
or in a phi instruction in an exit basic block.
It also adds the following capabilities:
- Loop::IsLCSSA to test if the loop is in a LCSSA form
- Loop::GetOrCreatePreHeaderBlock that can build a loop preheader if required;
- New methods to allow on the fly updates of the loop descriptors.
- New methods to allow on the fly updates of the CFG analysis.
- Instruction::SetOperand to allow expression of the index relative to Instruction::NumOperands (to be compatible with the index returned by DefUseManager::ForEachUse)
Creates a pass that will remove instructions that are invalid for the
current shader stage. For the instruction to be considered for replacement
1) The opcode must be valid for a shader modules.
2) The opcode must be invalid for the current shader stage.
3) All entry points to the module must be for the same shader stage.
4) The function containing the instruction must be reachable from an entry point.
Fixes#1247.
* Had to remove templating from InstructionBuilder as a result
* now preserved analyses are specified as a constructor argument
* updated tests and uses
* changed static_assert to a runtime assert
* this should probably get further changes in the future
* When handling unreachable merges and continues, do not optimize to the
same IR
* pass did not check whether the unreachable blocks were in the
optimized form before transforming them
* added a test to catch this issue
* Should handle all possibilities
* Stricter checks for what is disallowed:
* header and header
* merge and merge
* Allow header and merge blocks to be merged
* Erases the structured control declaration if merging header and
merge blocks together.
* If the dead branch elim is performed on a module without structured
control flow, the OpSelectionMerge may not be present
* Add a check for pointer validity before dereferencing
* Added a test to catch the bug
* Forces traversal of phis if the def has changed to varying
* Mark a phi as varying if all incoming values are varying
* added a test to catch the bug
This adds Dead Insert Elimination to the end of the
--eliminate-insert-extract pass. See the new tests for examples of code
that will benefit.
Essentially, this removes OpCompositeInsert instructions which are not
used, either because there is no instruction which uses the value at the
index it is inserted, or because a subsequent insert intercepts any such
use.
This code has been seen to remove significant amounts of dead code from
real-life HLSL shaders being ported to Vulkan. In fact, it is needed to
remove dead texture samples which cause Vulkan validation layer errors
(unbound textures and samplers) if not removed . Such DCE is thus
required for fxc equivalence and legalization.
This analysis operates across "chains" of Inserts which can also contain
Phi instructions.
* Handles simple cases only
* Identifies phis in blocks with two predecessors and attempts to
convert the phi to an select
* does not perform code motion currently so the converted values must
dominate the join point (e.g. can't be defined in the branches)
* limited for now to two predecessors, but can be extended to handle
more cases
* Adding if conversion to -O and -Os
Ban floating point case for OpAtomicLoad, OpAtomicExchange,
OpAtomicCompareExchange. In graphics (Shader) environments, these
instructions only operate on scalar integers. Ban the floating point
case. OpenCL supports atomic_float.
Implemented Vulkan-specific rules:
- OpTypeImage must declare a scalar 32-bit float or 32-bit integer type
for the “Sampled Type”.
- OpSampledImage must only consume an “Image” operand whose type has its
“Sampled” operand set to 1.
The current folding routines have a very cumbersome interface, make them
harder to use, and not a obvious how to extend.
This change is to create a new interface for the folding routines, and
show how it can be used by calling it from CCP.
This does not make a significant change to the behaviour of CCP. In
general it should produce the same code as before; however it is
possible that an instruction that takes 32-bit integers as inputs and
the result is not a 32-bit integer or bool will not be folded as before.
It seems like andriod has a problem with INT32_MAX and the like. I'll
explicitly define those if the are not already defined.
The class factorize the instruction building process.
Def-use manager analysis can be updated on the fly to maintain coherency.
To be updated to take into account more analysis.
* AddToWorklist can now be called unconditionally
* It will only add instructions that have not already been marked as
live
* Fixes a case where a merge was not added to the worklist because the
branch was already marked as live
* Added two similar tests that fail without the fix
We have come across a driver bug where and OpUnreachable inside a loop
is causing the shader to go into an infinite loop. This commit will try
to avoid this bug by turning OpUnreachable instructions that are
contained in a loop into branches to the loop merge block.
This is not added to "-O" and "-Os" because it should only be used if
the driver being targeted has this problem.
Fixes#1209.
This ensure that all basic blocks in a function have a valid entry the CFG object.
The entry block has no predecessors but remains a valid basic block
for which we might want to query the number of predecessors.
Some unreachable basic blocks may not have predecessors as well.