Commit Graph

88 Commits

Author SHA1 Message Date
Steven Perron
4b64beb1ae
Add descriptor array scalar replacement (#2742)
Creates a pass that will replace a descriptor array with individual variables.  See #2740 for details.

Fixes #2740.
2019-08-08 10:53:19 -04:00
David Neto
31590104ec
Add pass to inject code for robust-buffer-access semantics (#2771)
spirv-opt: Add --graphics-robust-access

Clamps access chain indices so they are always
in bounds.

Assumes:
- Logical addressing mode
- No runtime-array-descriptor-indexing
- No variable pointers

Adds stub code for clamping coordinate and samples
for OpImageTexelPointer.

Adds SinglePassRunAndFail optimizer test fixture.

Android.mk: add source/opt/graphics_robust_access_pass.cpp

Adds Constant::GetSignExtendedValue, Constant::GetZeroExtendedValue
2019-07-30 19:52:46 -04:00
greg-lunarg
92c41ff1e7 Remove Common Uniform Elimination Pass (#2731)
Remove Common Uniform Elimination Pass

Fixes #2520.
2019-07-12 11:02:10 -04:00
Ryan Harrison
f6d9a17843
Add pass to fix some invalid unreachable blocks for WebGPU (#2563)
Attempts to split up unreachable blocks that are used both as a
merge-block and a continue-target.

Fixes #2429
2019-05-09 12:56:10 -04:00
Steven Perron
32af42616a
Change implementation of post order CFG traversal (#2543)
* Change implementation of post order CFG traversal

It seems like the recursion is going very deep, and causing some problem
is particular situations.  I've reimplemented the CFG post order
traversal to not use recursion.

Fixes #2539.
2019-04-29 17:09:20 -04:00
Ryan Harrison
048dcd38ce
Implement WebGPU->Vulkan initializer conversion for 'Function' variables (#2513)
WebGPU requires certain variables to be initialized, whereas there are
known issues with using initializers in Vulkan. This PR is the first
of three implementing a pass to decompose initialized variables into
a variable declaration followed by a store. This has been broken up
into multiple PRs, because there 3 distinct cases that need to be
handled, which require separate implementations.

This first PR implements the basic infrastructure that is needed, and
handling of Function storage class variables. Private and Output will
be handled in future PRs.

This is part of resolving #2388
2019-04-16 14:31:36 -04:00
Ryan Harrison
102e430a88
Add pass to legalize OpVectorShuffle for WebGPU (#2509)
In WebGPU, the component operand 0xFFFFFFFF is forbidden, but in
Vulkan it is used to indicate a value is undefined. When converting to
WebGPU, 0xFFFFFFFF needs to converted to a legal value, though the
specific one does not matter, since it was used to indicate an
undefined entry in the original code. Choosing to use 0, since the
operands are required to be on [0, N-1], so 0 is guaranteed to always
be valid.

Fixes #2349
2019-04-12 12:14:23 -04:00
Steven Perron
3a0bc9e724
Add fix storage class code. (#2434)
This pass tries to fix validation error due to a mismatch of storage classes
in instructions.  There is no guarantee that all such error will be fixed,
and it is possible that in fixing these errors, it could lead to other
errors.

Fixes #2430.
2019-04-05 13:12:08 -04:00
Ryan Harrison
01964e325f
Add pass to generate needed initializers for WebGPU (#2481)
Fixes #2387
2019-04-03 11:44:09 -04:00
Ryan Harrison
e545522146
Add --strip-atomic-counter-memory (#2413)
Adds an optimization pass to remove usages of AtomicCounterMemory
bit. This bit is ignored in Vulkan environments and outright forbidden
in WebGPU ones.

Fixes #2242
2019-03-14 13:34:33 -04:00
Steven Perron
1b0047f210
Add pass to remove dead members. (#2379)
Add a pass that looks for members of structs whose values do not affects
the output of the shader. Those members are then removed and just
treated like padding in the struct.
2019-02-14 13:42:35 -05:00
Steven Perron
dd4157dcee
Sink (#2284)
Add code sinking pass. It will move OpLoad and OpAccessChain instructions as close as possible to their uses.

Part of #1611.
2019-01-17 15:56:36 -05:00
alan-baker
06c9dc07bd
Upgrade modf and frexp (#2266)
Fixes #2138

* Modf and frexp are upgraded to use the struct version of the
instruction and generate an explicit store whose flags can be upgraded
separately
* Fixed major bug where availability and visibility were reversed for
non-copy memory instructions
* Fixed bug where availability and visibility scope operands were reversed for copy memory
* Upgraded all opt tests to use SPV_ENV_UNIVERSAL_1_3
* Upgrade tests moved into unified tests and removed standalone test
2019-01-07 12:36:38 -05:00
Jeff Bolz
8fc8dfe4a5 Add PCH_FILE for upgrade_memory_model target. MSVC doesn't like building pass_utils.cpp twice in the same folder with different PCH settings. (#2186) 2018-12-10 10:54:19 -05:00
alan-baker
e510b1bac5
Update memory model (#1904)
Upgrade to VulkanKHR memory model

* Converts Logical GLSL450 memory model to Logical VulkanKHR
* Adds extension and capability
* Removes deprecated decorations and replaces them with appropriate
flags on downstream instructions
* Support for Workgroup upgrades
* Support for copy memory
* Adding support for image functions
* Adding barrier upgrades and tests
* Use QueueFamilyKHR scope instead of device
2018-11-30 14:15:51 -05:00
Steven Perron
2d2a512691
Don't inline recursive functions. (#2130)
* Move ProcessFunction* function from pass to the context.

There are a few functions that are used to traverse the call tree.
They currently live in the Pass class, but they have nothing to do with
a pass, and may be needed outside of a pass.  They would be better in
the ir context, or in a specific call tree class if we ever have a need
for it.

* Don't inline recursive functions.

Inlining does not check if a function is recursive or not.  This has
been fine as long as the shader was a Vulkan shader, which forbid
recursive functions.  However, not all shaders are vulkan, so either
we limit inlining to Vulkan shaders or we teach it to look for recursive
functions.

I prefer to keep the passes as general as is reasonable.  The change
does not require much new code in inlining and gives a reason to refactor
some other code.

The changes are to add a member function to the Function class that
checks if that function is recursive or not.

Then this is used in inlining to not inlining a function call if it calls
a recursive function.

* Add id to function analysis

There are a few places that build a map from ids to Function whose
result is that id.  I decided to add an analysis to the context for this
to reduce that code, and simplify some of the functions.

* Add missing file.
2018-11-29 14:24:58 -05:00
greg-lunarg
c37388f1ad Add passes to propagate and eliminate redundant line instructions (#2027). (#2039)
These are bookend passes designed to help preserve line information
across passes which delete, move and clone instructions. The propagation
pass attaches a debug line instruction to every instruction based on
SPIR-V line propagation rules. It should be performed before optimization.
The redundant line elimination pass eliminates all line instructions
which match the previous line instruction. This pass should be performed
at the end of optimization to reduce physical SPIR-V file size.

Fixes #2027.
2018-11-15 14:06:17 -05:00
greg-lunarg
1e9fc1aac1 Add base and core bindless validation instrumentation classes (#2014)
* Add base and core bindless validation instrumentation classes

* Fix formatting.

* Few more formatting fixes

* Fix build failure

* More build fixes

* Need to call non-const functions in order.

Specifically, these are functions which call TakeNextId(). These need to
be called in a specific order to guarantee that tests which do exact
compares will work across all platforms. c++ pretty much does not
guarantee order of evaluation of operands, so any such functions need to
be called separately in individual statements to guarantee order.

* More ordering.

* And more ordering.

* And more formatting.

* Attempt to fix NDK build

* Another attempt to address NDK build problem.

* One more attempt at NDK build failure

* Add instrument.hpp to BUILD.gn

* Some name improvement in instrument.hpp

* Change all types in instrument.hpp to int.

* Improve documentation in instrument.hpp

* Format fixes

* Comment clean up in instrument.hpp

* imageInst -> image_inst

* Fix GetLabel() issue.
2018-11-08 13:54:54 -05:00
Jeff Bolz
c06a35b902 Rename PCH macro to spvtools_pch to avoid conflicts with other projects. Also add pch to test/opt. (#2034) 2018-11-07 09:15:04 -05:00
dan sinclair
9e6f5134d1
Reduce number of test targets (#2024)
This CL takes the various opt unit tests and makes a single executable
instead of one per test. This reduces the number of build targets by
~125 when building with ninja.
2018-11-01 10:19:37 -04:00
Steven Perron
5f599e700e
Fix infinite loop in dead-branch-elimination (#1891)
* Create structed cfg analysis.

There are lots of optimization that have to traverse the CFG in a
structured order just because it wants to know which constructs a
basic block in contained in.  This adds extra complexity to these
optimizations, for causes too much refactoring of older optimizations.

To help with this problem, I have written an analysis that can give this
information.

* Identify branches breaking from loops.

Dead branch elimination does a search for a conditional branch to the
end of the current selection construct.  This search assumes that the
only way to leave the construct is through the merge node.  But that is
not true.  The code can jump to the merge node of a loop that contains
the construct.

The search needs to take this into consideration.
2018-09-17 13:00:24 -04:00
Alan Baker
755e5c9420 Transform to combine consecutive access chains
* Combines OpAccessChain, OpInBoundsAccessChain, OpPtrAccessChain and
OpInBoundsPtrAccessChain
* New folding rule to fold add with 0 for integers
 * Converts to a bitcast if the result type does not match the operand
 type
V
2018-07-31 13:42:47 -04:00
Steven Perron
9ecbcf5fc8 Make sure the constant folder get the correct type.
There are a few locations where we need to handle duplicate types.  We
cannot merge them because they may be needed for reflection.  When this
happens we need do some extra lookups in the type manager.

The specific fixes are:

1) When generating a constant through `GetDefiningInstruction` accept
and use an id for the desired type of the constant.  This will make sure
you get the type that is needed.

2) In Private-to-local, make sure we to update the def-use chains when a
new pointer type is created.

3) In the type manager, make sure that `FindPointerToType` returns a
pointer that points to the given type and not a duplicate type.

4) In scalar replacment, make sure the null constants that are created
are the correct type.
2018-07-05 14:34:30 -04:00
Steven Perron
af430ec822 Add pass to fold a load feeding an extract.
We have already disabled common uniform elimination because it created
sequences of loads an entire uniform object, then we extract just a
single element.  This caused problems in some drivers, and is just
generally slow because it loads more memory than needed.

However, there are other way to get into this situation, so I've added
a pass that looks specifically for this pattern and removes it when only
a portion of the load is used.

Fixes #1547.
2018-05-14 15:40:34 -04:00
Steven Perron
2c0ce87210
Vector DCE (#1512)
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.

It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
2018-04-23 11:13:07 -04:00
Victor Lomuller
0ec08c28c1 Add register liveness analysis.
For each function, the analysis determine which SSA registers are live
at the beginning of each basic block and which one are killed at
the end of the basic block.

It also includes utilities to simulate the register pressure for loop
fusion and fission.

The implementation is based on the paper "A non-iterative data-flow
algorithm for computing liveness sets in strict ssa programs" from
Boissinot et al.
2018-04-20 09:45:15 -04:00
Alexander Johnston
61b50b3bfa ZIV and SIV loop dependence analysis.
Provides functionality to perform ZIV and SIV dependency analysis tests
between a load and store within the same loop.

Dependency tests rely on scalar analysis to prove and disprove dependencies
with regard to the loop being analysed.

Based on the 1990 paper Practical Dependence Testing by Goff, Kennedy, Tseng

Adds support for marking loops in the loop nest as IRRELEVANT.
Loops are marked IRRELEVANT if the analysed instructions contain
no induction variables for the loops, i.e. the loops induction
variable is not relevent to the dependence of the store and load.
2018-04-11 09:32:42 -04:00
Stephen McGroarty
ad7e4b8401 Initial patch for scalar evolution analysis
This patch adds support for the analysis of scalars in loops. It works
by traversing the defuse chain to build a DAG of scalar operations and
then simplifies the DAG by folding constants and grouping like terms.
It represents induction variables as recurrent expressions with respect
to a given loop and can simplify DAGs containing recurrent expression by
rewritting the entire DAG to be a recurrent expression with respect to
the same loop.
2018-03-28 16:34:23 -04:00
Steven Perron
c4dc046399 Copy propagate arrays
The sprir-v generated from HLSL code contain many copyies of very large
arrays.  Not only are these time consumming, but they also cause
problems for drivers because they require too much space.

To work around this, we will implement an array copy propagation.  Note
that we will not implement a complete array data flow analysis in order
to implement this.  We will be looking for very simple cases:

1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.

The hard part is keeping all of the types correct.  We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.

Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent.  With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values.  See |Struct::GetExtraHashWords|.

Contributes to #1416.
2018-03-26 14:44:41 -04:00
David Neto
844e186cf7 Add --strip-reflect pass
Strips reflection info. This is limited to decorations and
decoration instructions related to the SPV_GOOGLE_hlsl_functionality1
extension.
It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1.
It will also remove the OpExtension for SPV_GOOGLE_decorate_string
if there are no further remaining uses of OpDecorateStringGOOGLE.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398
2018-03-15 21:20:42 -04:00
Steven Perron
06cdb96984 Make use of the instruction folder.
Implementation of the simplification pass.

- Create pass that calls the instruction folder on each instruction and
  propagate instructions that fold to a copy.  This will do copy
  propagation as well.

- Did not use the propagator engine because I want to modify the instruction
  as we go along.

- Change folding to not allocate new instructions, but make changes in
  place.  This change had a big impact on compile time.

- Add simplification pass to the legalization passes in place of
  insert-extract elimination.

- Added test cases for new folding rules.

- Added tests for the simplification pass

- Added a method to the CFG to apply a function to the basic blocks in
  reverse post order.

Contributes to #1164.
2018-02-07 23:01:47 -05:00
Steven Perron
61d8c0384b Add pass to reaplce invalid opcodes
Creates a pass that will remove instructions that are invalid for the
current shader stage.  For the instruction to be considered for replacement

1) The opcode must be valid for a shader modules.
2) The opcode must be invalid for the current shader stage.
3) All entry points to the module must be for the same shader stage.
4) The function containing the instruction must be reachable from an entry point.

Fixes #1247.
2018-02-01 15:25:09 -05:00
GregF
f28b106173 InsertExtractElim: Split out DeadInsertElim as separate pass 2018-01-30 08:52:14 -05:00
Alan Baker
2e93e806e4 Initial implementation of if conversion
* Handles simple cases only
* Identifies phis in blocks with two predecessors and attempts to
convert the phi to an select
 * does not perform code motion currently so the converted values must
 dominate the join point (e.g. can't be defined in the branches)
 * limited for now to two predecessors, but can be extended to handle
 more cases
* Adding if conversion to -O and -Os
2018-01-25 09:42:00 -08:00
Steven Perron
6c409e30a2 Add generic folding function and use in CCP
The current folding routines have a very cumbersome interface, make them
harder to use, and not a obvious how to extend.

This change is to create a new interface for the folding routines, and
show how it can be used by calling it from CCP.

This does not make a significant change to the behaviour of CCP.  In
general it should produce the same code as before; however it is
possible that an instruction that takes 32-bit integers as inputs and
the result is not a 32-bit integer or bool will not be folded as before.

It seems like andriod has a problem with INT32_MAX and the like.  I'll
explicitly define those if the are not already defined.
2018-01-22 14:26:49 -05:00
Victor Lomuller
cf3b2a58c4 Introduce an instruction builder helper class.
The class factorize the instruction building process.
Def-use manager analysis can be updated on the fly to maintain coherency.
To be updated to take into account more analysis.
2018-01-19 10:17:45 -05:00
Steven Perron
34d4294c2c Create a pass to work around a driver bug related to OpUnreachable.
We have come across a driver bug where and OpUnreachable inside a loop
is causing the shader to go into an infinite loop.  This commit will try
to avoid this bug by turning OpUnreachable instructions that are
contained in a loop into branches to the loop merge block.

This is not added to "-O" and "-Os" because it should only be used if
the driver being targeted has this problem.

Fixes #1209.
2018-01-18 20:31:46 -05:00
Victor Lomuller
e8ad02f3dd Add loop descriptors and some required dominator tree extensions.
Add post-order tree iterator.

Add DominatorTreeNode extensions:
 - Add begin/end methods to do pre-order and post-order tree traversal from a given DominatorTreeNode

Add DominatorTree extensions:
  - Add begin/end methods to do pre-order and post-order tree traversal
  - Tree traversal ignore by default the pseudo entry block
  - Retrieve a DominatorTreeNode from a basic block

Add loop descriptor:
  - Add a LoopDescriptor class to register all loops in a given function.
  - Add a Loop class to describe a loop:
    - Loop parent
    - Nested loops
    - Loop depth
    - Loop header, merge, continue and preheader
    - Basic blocks that belong to the loop

Correct a bug that forced dominator tree to be constantly rebuilt.
2018-01-08 09:31:13 -05:00
Diego Novillo
4ba9dcc8a0 Implement SSA CCP (SSA Conditional Constant Propagation).
This implements the conditional constant propagation pass proposed in

Constant propagation with conditional branches,
Wegman and Zadeck, ACM TOPLAS 13(2):181-210.

The main logic resides in CCPPass::VisitInstruction.  Instruction that
may produce a constant value are evaluated with the constant folder. If
they produce a new constant, the instruction is considered interesting.
Otherwise, it's considered varying (for unfoldable instructions) or
just not interesting (when not enough operands have a constant value).

The other main piece of logic is in CCPPass::VisitBranch.  This
evaluates the selector of the branch.  When it's found to be a known
value, it computes the destination basic block and sets it.  This tells
the propagator which branches to follow.

The patch required extensions to the constant manager as well. Instead
of hashing the Constant pointers, this patch changes the constant pool
to hash the contents of the Constant.  This allows the lookups to be
done using the actual values of the Constant, preventing duplicate
definitions.
2017-12-21 14:29:45 -05:00
Alan Baker
1ab8ad654a Fixing bugs in type manager memory management
* changed the way duplicate types are removed to stop copying
instructions
* Reworked RemoveDuplicatesPass::AreTypesSame to use type manager and
type equality
* Reworked TypeManager memory management to store a pool of unique
pointers of types
 * removed unique pointers from id map
 * fixed instances where free'd memory could be accessed
2017-12-21 08:59:06 -05:00
Pierre Moreau
424f744db1 Opt: Fix implementation and comment of AreDecorationsTheSame
Target should not be ignored when comparing decorations in RemoveDuplicates
Opt: Remove unused code in RemoveDuplicateDecorations
2017-12-19 15:36:47 -05:00
Steven Perron
b86eb6842b Convert private variables to function scope.
When a private variable is used in a single function, it can be
converted to a function scope variable in that function.  This adds a
pass that does that.  The pass can be enabled using the option
`--private-to-local`.

This transformation allows other transformations to act on these
variables.

Also moved `FindPointerToType` from the inline class to the type manager.
2017-12-19 14:21:04 -05:00
Alan Baker
867451f49e Add scalar replacement
Adds a scalar replacement pass. The pass considers all function scope
variables of composite type. If there are accesses to individual
elements (and it is legal) the pass replaces the variable with a
variable for each composite element and updates all the uses.

Added the pass to -O
Added NumUses and NumUsers to DefUseManager
Added some helper methods for the inst to block mapping in context
Added some helper methods for specific constant types

No longer generate duplicate pointer types.

* Now searches for an existing pointer of the appropriate type instead
of failing validation
* Fixed spec constant extracts
* Addressed changes for review
* Changed RunSinglePassAndMatch to be able to run validation
 * current users do not enable it

Added handling of acceptable decorations.

* Decorations are also transfered where appropriate

Refactored extension checking into FeatureManager

* Context now owns a feature manager
 * consciously NOT an analysis
 * added some test
* fixed some minor issues related to decorates
* added some decorate related tests for scalar replacement
2017-12-11 10:51:13 -05:00
Steven Perron
5d602abd66 Add global redundancy elimination
Adds a pass that looks for redundant instruction in a function, and
removes them.  The algorithm is a hash table based value numbering
algorithm that traverses the dominator tree.

This pass removes completely redundant instructions, not partially
redundant ones.
2017-12-07 18:35:38 -05:00
Stephen McGroarty
8ba68fa9b9 Dominator Tree Analysis (#3)
Support for dominator and post dominator analysis on ir::Functions. This patch contains a DominatorTree class for building the tree and DominatorAnalysis and DominatorAnalysisPass classes for interfacing and caching the built trees.
2017-12-05 22:59:43 -05:00
Diego Novillo
74327845aa Generic value propagation engine.
This class implements a generic value propagation algorithm based on the
conditional constant propagation algorithm proposed in

     Constant propagation with conditional branches,
     Wegman and Zadeck, ACM TOPLAS 13(2):181-210.

The implementation is based on

     A Propagation Engine for GCC
     Diego Novillo, GCC Summit 2005
     http://ols.fedoraproject.org/GCC/Reprints-2005/novillo-Reprint.pdf

The purpose of this implementation is to act as a common framework for any
transformation that needs to propagate values from statements producing new
values to statements using those values.
2017-11-27 23:32:06 -05:00
Steven Perron
28c415500d Create a local value numbering pass
Creates a pass that removes redundant instructions within the same basic
block.  This will be implemented using a hash based value numbering
algorithm.

Added a number of functions that check for the Vulkan descriptor types.
These are used to determine if we are variables are read-only or not.

Implemented a function to check if loads and variables are read-only.
Implemented kernel specific and shader specific versions.

A big change is that the Combinator analysis in ADCE is factored out
into the IRContext as an analysis. This was done because it is being
reused in the value number table.
2017-11-23 11:45:09 -05:00
Alan Baker
a92d69b43d Initial implementation of merge return pass.
Works with current DefUseManager infrastructure.

Added merge return to the standard opts.

Added validation to passes.

Disabled pass for shader capabilty.
2017-11-15 10:27:04 -05:00
Steven Perron
f32d11f74b Add the IRContext (part 2): Add def-use manager
This change will move the instances of the def-use manager to the
IRContext.  This allows it to persists across optimization, and does
not have to be rebuilt multiple times.

Added test to ensure that the IRContext is validating and invalidating
the analyses correctly.
2017-11-08 13:35:34 -05:00
Steven Perron
5834719fc1 Add pass to remove dead variables at the module level.
There does not seem to be any pass that remove global variables.  I
think we could use one.  This pass will look specifically for global
variables that are not referenced and are not exported.  Any decoration
associated with the variable will also be removed.  However, this could
cause types or constants to become unreferenced.  They will not be
removed.  Another pass will have to be called to remove those.
2017-10-23 13:57:05 -04:00