Commit Graph

428 Commits

Author SHA1 Message Date
Steven Perron
2c0ce87210
Vector DCE (#1512)
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.

It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
2018-04-23 11:13:07 -04:00
Victor Lomuller
efc5061929 Dominator analysis interface clean.
Remove the CFG requirement when querying a dominator/post-dominator from an IRContext.

Updated all uses of the function and tests.
2018-04-20 15:41:59 -04:00
Jaebaek Seo
48802bad72 Constant folding for OpVectorTimesScalar 2018-04-20 13:43:04 -04:00
Victor Lomuller
0ec08c28c1 Add register liveness analysis.
For each function, the analysis determine which SSA registers are live
at the beginning of each basic block and which one are killed at
the end of the basic block.

It also includes utilities to simulate the register pressure for loop
fusion and fission.

The implementation is based on the paper "A non-iterative data-flow
algorithm for computing liveness sets in strict ssa programs" from
Boissinot et al.
2018-04-20 09:45:15 -04:00
David Neto
e7c2e91ded Fix for old XCode: std::set has explicit ctor 2018-04-19 16:33:12 -04:00
Greg Fischer
df7f00f60e DeadInsertElim: Don't revisit select phi nodes during MarkInsertChain
Fixes #1487.
2018-04-19 14:40:00 -04:00
Jaebaek Seo
430a29335e Fix broken pointer of CommonUniformElimPass 2018-04-19 09:36:10 -04:00
Steven Perron
c20a718e00 Rewrite local-single-store-elim to not create large data structures.
The local-single-store-elim algorithm is not fundamentally bad.
However, when there are a large number of variables, some of the
maps that are used can become very large.  These large data structures
then take a very long time to be destroyed.  I've seen cases around 40%
if the time.

I've rewritten that algorithm to not use as much memory.  This give a
significant improvement when running a large number of shader through
DXC.

I've also made a small change to local-single-block-elim to delete the
loads that is has replaced.  That way local-single-store-elim will not
have to look at those.  local-single-store-elim now does the same thing.

The time for one set goes from 309s down to 126s.  For another set, the
time goes from 102s down to 88s.
2018-04-18 16:38:18 -04:00
Jaebaek Seo
0fa42996b5
Merge pull request #1461 from jaebaek/fnegate
Add constant folding for OpFNegate

Contributes to #709
2018-04-18 13:46:10 -04:00
Toomas Remmelg
0f335cf87e Add support for MIV and Delta test dependence analysis.
GCD MIV test as described in Chapter 3 of "Optimizing Compilers for
Modern Architectures: A Dependence-Based Approach" by Randy Allen, and
Ken Kennedy.

Delta test as described in Figure 3 of "Practical Dependence Testing" by
Gina Goff, Ken Kennedy, and Chau-Wen Tseng from PLDI '91.
2018-04-17 13:57:02 -04:00
Jaebaek Seo
d8b9306a4f Add more unit tests 2018-04-17 12:08:45 -04:00
Jaebaek Seo
79491259e0 Add constant folding for FNegate 2018-04-17 12:08:45 -04:00
David Neto
152b9a681e ADCE: Remove OpDecorateStringGOOGLE
Also fix a few failures to set "modified" status when removing
global values.

Add OpDecorateStringGOOGLE to decoration ordering

Fixes #1492
2018-04-17 10:24:30 -04:00
Steven Perron
d42f65e7c1 Use a bit vector in ADCE
The unordered_set in ADCE that holds all of the live instructions takes
a very long time to be destroyed.  In some shaders, it takes over 40% of
the time.

If we look at the unique ids of the live instructions, I believe they
are dense enough make a simple bit vector a good choice for to hold that
data.  When I check the density of the bit vector for larger shaders, we
are usually using less than 4 bytes per element in the vector, and
almost always less than 16.

So, in this commit, I introduce a simple bit vector class, and
use it in ADCE.

This help improve the compile time for some shaders on windows by the
40% mentioned above.

Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
2018-04-13 16:38:02 -04:00
Steven Perron
8190c26270 Change parameter to Mempass::RemovePhiOperands
Pass a hashtable by const ref instead of by value.  Big impact on
compile time.
2018-04-13 09:53:37 -04:00
Steven Perron
bc648fd76a Delete unused code in MemPass
Since the SSA rewriter was added, the code old phi insertion code is no
longer used.  It is going stale and should be deleted.
2018-04-11 15:40:33 -04:00
Steven Perron
c584ac4fc6 Don't allow an instance of a pass to be run multiple times. 2018-04-11 12:02:30 -04:00
Victor Lomuller
10e5d7cf13 Add a loop peeling pass.
For each loop in a function, the pass walks the loops from inner to outer most loop
and tries to peel loop for which a certain amount of iteration can be done before or after the loop.

To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.
2018-04-11 15:41:29 +01:00
Alexander Johnston
61b50b3bfa ZIV and SIV loop dependence analysis.
Provides functionality to perform ZIV and SIV dependency analysis tests
between a load and store within the same loop.

Dependency tests rely on scalar analysis to prove and disprove dependencies
with regard to the loop being analysed.

Based on the 1990 paper Practical Dependence Testing by Goff, Kennedy, Tseng

Adds support for marking loops in the loop nest as IRRELEVANT.
Loops are marked IRRELEVANT if the analysed instructions contain
no induction variables for the loops, i.e. the loops induction
variable is not relevent to the dependence of the store and load.
2018-04-11 09:32:42 -04:00
Steven Perron
53bc1623ec Fold OpDot
Adding three rules to fold OpDot (implemented as two).

- When an OpDot has two constants, then fold to the resulting const.

- When one of the inputs is the 0 vector, then fold to zero.

- When one of the inputs is a single 1 with 0s, then rewrite to an
OpCompositeExtract of the appropriate element.  This will help find
even more folding opportunities.

Contributes to #709.
2018-04-10 13:09:37 -04:00
David Neto
a91cbfbf75 Optimizer: update extension whitelists
Add two new extensions:
- SPV_NV_shader_subgroup_partitioned
- SPV_EXT_descriptor_indexing
2018-04-06 15:56:20 -04:00
GregF
6fbfe1c016 Fix SSA rewrite for nested loops.
From the test case, the slice of the CFG that is interesting for the bug
is

25
|
v
30
|
v
31<-+
|   |
v   |
34--+

1. In block 25, we have a Phi candidate for %f with arguments
   %47 = Phi[%float_0, %0]. This merges %float_0 and a yet unknown
   argument from the external loop backedge.
2. We are now processing block 34:
   i. The load %35 = OpLoad %f triggers a Phi candidate to be placed in
      block 31.
  ii. The Phi candidate %50 = Phi needs two arguments. The one coming
      from block 30 is %47. But the one coming from block 34 (which we
      are now processing and have marked sealed), finds %50 itself as
      the reaching def for %f.
3. This wrongfully marks %50 as a copy-of Phi, which ultimately makes
   both %47 and %50 copy-of Phis that get eliminated.
2018-04-06 15:17:52 -04:00
Steven Perron
742454968d OpName and decorations should not stop array copy prop. 2018-04-04 22:24:10 -04:00
Steven Perron
7c5d49bf2a Teach ADCE about OpImageTexelPointer
Currently OpImageTexelPointer operations are treat like a use of the
pointer, but it does
not look for the memory being referenced to make sure stores are not
removed.

This change teaches it so identify the memory being accessed, and
treats it as if that memory is loaded.

Fixes to #1445.
2018-04-04 13:45:29 -04:00
Steven Perron
c33af63264 Teach array copy propagation about OpImageTexelPointer.
OpImageTexelPointer acts like a special kind of load.  It is not an
array load, but it also cannot be removed the same way a regular
load can.  The type of propagation that needs to be done is similar
to what we do for arrays, so I want to merge that code into that
optmization.

Contributers to #1445.
2018-04-04 13:42:51 -04:00
Steven Perron
e64a4656b3 Teach the private to local about OpImageTexelPointer.
OpImageTexelPointer acts like a special kind of load.  It is still
safe to change the storage class of a variable used in a
OpImageTexalPointer instruction.

Contributes to #1445.
2018-04-04 13:42:35 -04:00
Steven Perron
cbceeceab4 In copy-prop-arrays, indentify copies via OpCompositeInsert
When the original code copies an entire array or struct one element at a
time, this turns into a series of OpCompositeInsert instruction followed
by a store of the whole array.  We currently miss opportunities in copy
propagate arrays because we do not recognize this as a copy.

This commit adds code to copy propagate arrays to identify this code
pattern.

Also updates the performance passed to run array copy propagation.
2018-03-29 09:39:55 -04:00
Steven Perron
d8ca09821d Handle non-constant accesses in memory objects (copy prop arrays)
The first implementation of MemroyObject, which is used in copy
propagate arrays, forced the access chain to be like the access chains
in OpCompositeExtract.  This excluded the possibility of the memory
object from representing an array element that was extracted with a
variable index.   Looking at the code, that restriction is not
neccessary.  I also see some opportunities for doing this in some real
shaders.

Contributes to #1430.
2018-03-28 20:23:47 -04:00
Stephen McGroarty
ad7e4b8401 Initial patch for scalar evolution analysis
This patch adds support for the analysis of scalars in loops. It works
by traversing the defuse chain to build a DAG of scalar operations and
then simplifies the DAG by folding constants and grouping like terms.
It represents induction variables as recurrent expressions with respect
to a given loop and can simplify DAGs containing recurrent expression by
rewritting the entire DAG to be a recurrent expression with respect to
the same loop.
2018-03-28 16:34:23 -04:00
Steven Perron
c26866ee74 Preserve analyses after copy propagate arrays
Contributes to #1430.
2018-03-28 10:38:52 -04:00
Steven Perron
5e07ab1358 Handle more cases in copy propagate arrays.
When we change the type of an object that gets stored, we do not want to
change the type of the memory location being stored to.  In order to
still be able to do the rewrite, we will decompose and rebuild the
object so it is the type that can be stored.

Fixes #1416.
2018-03-27 11:04:49 -04:00
Steven Perron
c4dc046399 Copy propagate arrays
The sprir-v generated from HLSL code contain many copyies of very large
arrays.  Not only are these time consumming, but they also cause
problems for drivers because they require too much space.

To work around this, we will implement an array copy propagation.  Note
that we will not implement a complete array data flow analysis in order
to implement this.  We will be looking for very simple cases:

1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.

The hard part is keeping all of the types correct.  We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.

Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent.  With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values.  See |Struct::GetExtraHashWords|.

Contributes to #1416.
2018-03-26 14:44:41 -04:00
Eleni Maria Stea
045cc8f75b Fixes compile errors generated with -Wpedantic
This patch fixes the compile errors generated when the options
SPIRV_WARN_EVERYTHING and SPIRV_WERROR (that force -Wpedantic) are
set to cmake.
2018-03-22 09:40:11 -04:00
Steven Perron
dbb35c4260 Fixed remaining review comments from #1380 2018-03-21 16:47:01 -04:00
Diego Novillo
2e644e4578 Fix VS2013 build failures. 2018-03-20 21:44:17 -04:00
Jaebaek Seo
3b594e1630 Add --time-report to spirv-opt
This patch adds a new option --time-report to spirv-opt.  For each pass
executed by spirv-opt, the flag prints resource utilization for the pass
(CPU time, wall time, RSS and page faults)

This fixes issue #1378
2018-03-20 21:30:06 -04:00
Diego Novillo
735d8a579e SSA rewrite pass.
This pass replaces the load/store elimination passes.  It implements the
SSA re-writing algorithm proposed in

     Simple and Efficient Construction of Static Single Assignment Form.
     Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
     In: Jhala R., De Bosschere K. (eds)
     Compiler Construction. CC 2013.
     Lecture Notes in Computer Science, vol 7791.
     Springer, Berlin, Heidelberg

     https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6

In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.

When a target variable is loaded, it queries the variable's reaching
definition.  If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.

The algorithm avoids repeated lookups using memoization.

For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA.  That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
2018-03-20 20:56:55 -04:00
Victor Lomuller
bdf421cf40 Add loop peeling utility
The loop peeler util takes a loop as input and create a new one before.
The iterator of the duplicated loop then set to accommodate the number
of iteration required for the peeling.

The loop peeling pass that decided to do the peeling and profitability
analysis is left for a follow-up PR.
2018-03-20 10:21:10 -04:00
Steven Perron
b3daa93b46 Change merge return pass to handle structured cfg.
We are seeing shaders that have multiple returns in a functions.  These
functions must get inlined for legalization purposes; however, the
inliner does not know how to inline functions that have multiple
returns.

The solution we will go with it to improve the merge return pass to
handle structured control flow.

Note that the merge return pass will assume the cfg has been cleanedup
by dead branch elimination.

Fixes #857.
2018-03-19 13:49:04 -04:00
David Neto
844e186cf7 Add --strip-reflect pass
Strips reflection info. This is limited to decorations and
decoration instructions related to the SPV_GOOGLE_hlsl_functionality1
extension.
It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1.
It will also remove the OpExtension for SPV_GOOGLE_decorate_string
if there are no further remaining uses of OpDecorateStringGOOGLE.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398
2018-03-15 21:20:42 -04:00
David Neto
2e3aec23ca Add recent Google extensions to optimizer whitelists
Optimizations should work in the presence of recent
SPV_GOOGLE_decorate_string and SPV_GOOGLE_hlsl_functionality1

SPV_GOOGLE_decorate_string:
- Adds operation OpDecorateStringGOOGLE to decorate an object with decorations
  having string operands.

SPV_GOOGLE_hlsl_functionality1:
- Adds HlslSemanticGOOGLE, used to decorate an interface variable with
  an HLSL semantic string.  Optimizations already preserve those variables
  as required because they are interface variables (with uses), independent
  of whether they have HLSL decorations.

- Adds HlslCounterBufferGOOGLE, used to associate a buffer with a
  counter variable.

Fixes #1391
2018-03-15 11:16:20 -04:00
Alan Baker
9f3a1c85cc NFC: Speed up dead insert phi traversal on Windows. 2018-03-14 17:45:47 -04:00
David Neto
884933366b Teach DecorationManager about OpDecorateStringGOOGLE
Also add more decoration manager test coverage for OpDecorateId.

Fixes #1396
2018-03-13 22:18:33 -04:00
Alan Baker
7e03e76a5f Fixes #1402. Don't merge non-branch terminators into loop header.
Added tests
2018-03-13 22:16:17 -04:00
Alan Baker
43d1609183 Fixes #1407. Removing assertion against void pointer
Added test
2018-03-13 19:45:20 -04:00
Alan Baker
4065adf05d Fixes #1404. Don't DCE workgroup size
Added test.
2018-03-13 19:38:31 -04:00
Greg Fischer
077249b67f Fix InsertFeedingExtract rule when extract remains. 2018-03-12 22:06:23 -04:00
Pierre Moreau
5bd55f10cd Reimplement the DecorationManager
This reimplementation fixes several issues when removing decorations associated
to an ID (partially addresses #1174 and gives tools for fixing #898), as well
as making it easier to remove groups; a few additional tests have been added.

DecorationManager::RemoveDecoration() will still not delete dead decorations it
created, but I do not think it is its job either; given the following input

```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
OpDecorate %2 Restrict
%2      = OpDecorationGroup
OpGroupDecorate %2 %1 %3
OpDecorate %4 Invariant
%4      = OpDecorationGroup
OpGroupDecorate %4 %2
%uint   = OpTypeInt 32 0
%1      = OpVariable %uint Uniform
%3      = OpVariable %uint Uniform
```

which of the following two outputs would you expect RemoveDecoration(2) to produce:

```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
%uint = OpTypeInt 32 0
%1 = OpVariable %uint Uniform
%3 = OpVariable %uint Uniform
```

or

```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
OpDecorate %4 Invariant
%4      = OpDecorationGroup
%uint   = OpTypeInt 32 0
%1      = OpVariable %uint Uniform
%3      = OpVariable %uint Uniform
```

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/924
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1174
2018-03-12 09:56:14 -04:00
David Neto
340370eddb Remove extension whitelist from some transforms
Remove extension whitelists from transforms that are essentially
combinatorial (and avoiding pointers) or which affect only control flow.
It's very very unlikely an extension will add a new control flow construct.

Remove from:
- dead branch elimination
- dead insertion elimination
- insert extract elimination
- block merge

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1392
2018-03-08 12:25:49 -05:00
Rex Xu
314cfa29b2 Add missing SPV extension strings 2018-03-08 21:54:00 +08:00
Alan Baker
bc9cfee6fa Fixes #1385. Grab correct input to calculate indices.
* Added tests to catch the bug
2018-03-07 16:07:40 -05:00
Alan Baker
5f50e6209c Fixes #1376. Don't handle half folding gracefully.
* Added early returns to folding rules to prevent half attempts
* Added some tests
2018-03-06 14:00:02 -05:00
David Neto
5f69f75126 Support SPV_GOOGLE_decorate_string and SPV_GOOGLE_hlsl_functionality1
This commit add assembling, disassembling, and basic validation for two
Google extensions to better support HLSL translation.
2018-03-05 13:34:13 -05:00
Steven Perron
9ba50e34f2 Avoid generating duplicate names when merging types
The merging types we do not remove other information related to the
types.  We simply leave it duplicated, and hope it is removed later.
This is what happens with decorations.  They are removed in the next
phase of remove duplicates.  However, for OpNames that is not the case.
We end up with two different names for the same id, which does not make
sense.

The solution is to remove the names and decorations for the type being
removed instead of rewriting them to refer to the other type.

Note that it is possible that if the first type does not have a name,
then the types will end up with no name.  That is fine because the names
should not have any semantic significance anyway.

The was identified in issue #1372, but this does not fix that issue.
2018-03-05 12:02:50 -05:00
Alan Baker
52bceb3569 Handles more cases of redundant selects
* Handles OpConstantNull and vector types
 * vector selects (except against a null) are converted to vector
 shuffles
* Added tests
2018-03-02 14:28:08 -05:00
Alan Baker
824625760b Fixes #1361. Mark all non-constant global values as varying in CCP
* Also mark function parameters as varying
* Conservatively mark assignment instructions as varying if any input is
varying after attempting to fold
* Added a test to catch this case
2018-03-01 15:24:41 -05:00
Alan Baker
ce5941a642 Fixes #1357. Support null constants better in folding
* getFloatConstantKind() now handles OpConstantNull
* PerformOperation() now handles OpConstantNull for vectors
* Fixed some instances where we would attempt to merge a division by 0
* added tests
2018-02-28 23:12:27 -05:00
GregF
bdaf8d56fb Opt: Add constant folding for FToI and IToF 2018-02-28 23:08:52 -05:00
Alan Baker
9457cabbce Fixes #1354. Do not merge integer division.
* Removes merging of div with a div or mul for integers
* Updated tests
2018-02-28 13:33:21 -05:00
Steven Perron
588f4fcc95 Add more folding rules for vector shuffle.
Adds rule to fold OpVectorShuffle with constant inputs.

Adds rules to fold OpCompositeExtrac being fed by an OpVectorShuffle.
2018-02-27 21:20:22 -05:00
Victor Lomuller
90e1637ce4 Remove Function::GetBlocks pushed by accident 2018-02-27 21:07:10 -05:00
Steven Perron
2cb589cc14 Remove uses DCEInst and call ADCE
The algorithm used in DCEInst to remove dead code is very slow.  It is
fine if you only want to remove a small number of instructions, but, if
you need to remove a large number of instructions, then the algorithm in
ADCE is much faster.

This PR removes the calls to DCEInst in the load-store removal passes
and adds a pass of ADCE afterwards.

A number of different iterations of the order of optimization, and I
believe this is the best I could find.

The results I have on 3 sets of shaders are:

Legalization:

Set 1: 5.39 -> 5.01
Set 2: 13.98 -> 8.38
Set 3: 98.00 -> 96.26

Performance passes:

Set 1: 6.90 -> 5.23
Set 2: 10.11 -> 6.62
Set 3: 253.69 -> 253.74

Size reduction passes:

Set 1: 7.16 -> 7.25
Set 2: 17.17 -> 16.81
Set 3: 112.06 -> 107.71

Note that the third set's compile time is large because of the large
number of basic blocks, not so much because of the number of
instructions.  That is why we don't see much gain there.
2018-02-27 21:06:08 -05:00
David Neto
0c13467161 Consistently include latest spirv.h header file.
Use indirection through latest_version_spirv.h

Also, when generating enum tables, use the unified1 JSON grammar since
it now has FragmentFullyCoveredEXT but the other JSON grammars don't.
They are starting to fall behind.
2018-02-27 18:47:29 -05:00
Alan Baker
802cf053c7 Merge arithmetic with non-trivial constant operands
Adding basis of arithmetic merging

* Refactored constant collection in ConstantManager
* New rules:
 * consecutive negates
 * negate of arithmetic op with a constant
 * consecutive muls
 * reciprocal of div

* Removed IRContext::CanFoldFloatingPoint
 * replaced by Instruction::IsFloatingPointFoldingAllowed
* Fixed some bad tests
* added some header comments

Added PerformIntegerOperation

* minor fixes to constants and tests
* fixed IntMultiplyBy1 to work with 64 bit ints
* added tests for integer mul merging

Adding test for vector integer multiply merging

Adding support for merging integer add and sub through negate

* Added tests

Adding rules to merge mult with preceding divide

* Has a couple tests, but needs more
* Added more comments

Fixed bug in integer division folding

* Will no longer merge through integer division if there would be a
remainder in the division
* Added a bunch more tests

Adding rules to merge divide and multiply through divide

* Improved comments
* Added tests

Adding rules to handle mul or div of a negation

* Added tests

Changes for review

* Early exit if no constants are involved in more functions
* fixed some comments
* removed unused declaration
* clarified some logic

Adding new rules for add and subtract

* Fold adds of adds, subtracts or negates
* Fold subtracts of adds, subtracts or negates
* Added tests
2018-02-27 13:02:13 -05:00
Victor Lomuller
3497a94460 Add loop unswitch pass.
It moves all conditional branching and switch whose conditions are loop
invariant and uniform. Before performing the loop unswitch we check that
the loop does not contain any instruction that would prevent it
(barriers, group instructions etc.).
2018-02-27 08:52:46 -05:00
Stephen McGroarty
e354984b09 Unroller support for multiple induction variables
Support for multiple induction variables within a loop and support for
loop condition operands <= and >=.
2018-02-27 11:50:08 +00:00
Steven Perron
94af58a350 Clean up variables before sroa
In some shaders there are a lot of very large and deeply nested
structures.  This creates a lot of work for scalar replacement.  Also,
since commit ca4457b we have been very aggressive as rewriting
variables.  This has causes a large increase in compile time in creating
and then deleting the instructions.

To help low the costs, I want to run a cleanup of some of the easy loads
and stores to remove.  This reduces the number of symbols sroa has to
work on.  It also reduces the amount of code the simplifier has to
simplify because it was not generated by sroa.

To confirm the improvement, I ran numbers on three different sets of
shaders:

Time to run --legalize-hlsl:

Set #1: 55.89s -> 12.0s
Set #2: 1m44s -> 1m40.5s
Set #3: 6.8s -> 5.7s

Time to run -O

Set #1: 18.8s -> 10.9s
Set #2: 5m44s -> 4m17s
Set #3: 7.8s -> 7.8s

Contributes to #1328.
2018-02-22 21:40:58 -05:00
Steven Perron
3f19c2031a Preserve analysies in the simplification pass
Fixes a bug at the same time.  In `UpdateDefUse`, if the definition
already exists, we are not suppose to analyse it again.  When you do
the entries for the definition are deleted, and we don't want that.
The check for this was wrong.
2018-02-22 16:06:30 -05:00
GregF
46a9ec9d23 Opt: Check for side-effects in DCEInst()
This function now checks for side-effects before adding operand
instructions to the dead instruction work list.

Because this fix puts more pressure on IsCombinatorInstruction() to
be correct, this commit adds all OpConstant* and OpType* instructions
to combinator_ops_ set.

Fixes #1341.
2018-02-22 12:24:13 -05:00
Alan Baker
01760d2f0f Fixes #1338. Handle OpConstantNull in branch/switch conditions
* No longer assume the branch/switch condition must be bool or int
constants (respectively)
* Added a couple unit tests for each case
2018-02-21 10:22:39 -05:00
Steven Perron
51ecc7318f Reduce instruction create and deletion during inlining.
When inlining a function call the instructions in the same basic block
as the call get cloned.  The clone is added to the set of new blocks
containing the inlined code, and the original instructions are deleted.

This PR will change this so that we simply move the instructions to the
new blocks.  This saves on the creation and deletion of the
instructions.

Contributes to #1328.
2018-02-21 09:50:47 -05:00
Steven Perron
c1b936637e Add Insert-extract elimination back into legalization passes.
Fixes #1326.
2018-02-21 09:46:51 -05:00
Arseny Kapoulkine
309be423cc Add folding for redundant add/sub/mul/div/mix operations
This change implements instruction folding for arithmetic operations
that are redundant, specifically:

  x + 0 = 0 + x = x
  x - 0 = x
  0 - x = -x
  x * 0 = 0 * x = 0
  x * 1 = 1 * x = x
  0 / x = 0
  x / 1 = x
  mix(a, b, 0) = a
  mix(a, b, 1) = b

Cache ExtInst import id in feature manager

This allows us to avoid string lookups during optimization; for now we
just cache GLSL std450 import id but I can imagine caching more sets as
they become utilized by the optimizer.

Add tests for add/sub/mul/div/mix folding

The tests cover scalar float/double cases, and some vector cases.

Since most of the code for floating point folding is shared, the tests
for vector folding are not as exhaustive as scalar.

To test sub->negate folding I had to implement a custom fixture.
2018-02-20 18:29:27 -05:00
Steven Perron
fa3ac3cc33 Revert "Preserve analysies in the simplification pass"
This reverts commit ec3bbf093e.
2018-02-20 18:21:25 -05:00
Steven Perron
ec3bbf093e Preserve analysies in the simplification pass
Building the def-use chains is very expensive, so we do not want to
invalidate them it if is not necessary.  At the moment, it seems like
most optimizatoins are good at not invalidating the def-use chains, but
simplification does.

This PR get the simlification pass to keep the analysies valid.

Contributes to #1328.
2018-02-20 14:45:08 -05:00
Diego Novillo
6c75050136 Speed up Phi insertion.
On some shader code we have in our testsuite, Phi insertion is showing
massive compile time slowdowns, particularly during destruction.  The
specific shader I was looking at has about 600 variables to keep track
of and around 3200 basic blocks.  The algorithm is currently O(var x
blocks), which means maps with around 2M entries.  This was taking about
8 minutes of compile time.

This patch changes the tracking of stored variables to be more sparse.
Instead of having every basic block contain all the tracked variables in
the map, they now have only the variables actually stored in that block.

This speeds up deallocation, which brings down compile time to about
1m20s.

Note that this is not the definite fix for this.  I will re-write Phi
insertion to use a standard SSA rewriting algorithm
(https://github.com/KhronosGroup/SPIRV-Tools/issues/893).

This contributes to
https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
2018-02-20 12:04:06 -05:00
Steven Perron
9d95a91a9f Fix folding insert feeding extract
I mixed up two cases when folding an OpCompositeExtract that is feed by
and OpCompositeInsert.  The specific cases are demonstracted in the new
test.  I mixed up the conditions for the cases, and treated one like the
other.

Fixes #1323.
2018-02-20 11:22:51 -05:00
Alan Baker
c3f34d8bf3 Fixes #1300. Adding checks for bad CCP transitions and unsettled values
* Now track propagation status and assert on bad statuses
 * Added helper methods to access instruction propagation status
* Modified the phi meet operator to properly reflect the paper it is
based on
* Modified SSA edge addition so that all edge are added, but only on
state changes
* Fixed a bug in instruction simulation where interesting conditional
branches would not mark the interesting edge as executed
 * Added a test to catch this bug
* Added an ostream operator for SSAPropagator::PropStatus
2018-02-18 19:41:34 -05:00
Steven Perron
04cd63e5b9 Make better use of simplification pass
The simplification pass works better after all of the dead branches are
removed.  So swapping them around in the legalization passes.  Also
adding the simplification pass to performance passes right after dead
branch elimination.

Added CCP to the legalization passes so we can propagate the constants
into the branchs, and remove as many branches a possible.  CCP is
designed to still get opportunities even if the branches are dead, so it
is a good place for it.

Fixes #1118
2018-02-16 20:46:49 -05:00
Arseny Kapoulkine
1054413600 Add constant folding rules for floating-point comparison
This change handles all 6 regular comparison types in two variations,
ordered (true if values are ordered *and* comparison is true) and
unordered (true if values are unordered *or* comparison is true).

Ordered comparison matches the default floating-point behavior on host
but we use std::isnan to check ordering explicitly anyway.

This change also slightly reworks the floating-point folding support
code to make it possible to define a folding operation that returns
boolean instead of floating point.

These tests exhaustively test ordered/unordered comparisons for
float/double.

Since for NaN inputs the comparison result doesn't depend on the
comparison function, we just test == and !=; NaN inputs result in true
unordered comparisons and false ordered comparisons.
2018-02-16 20:41:22 -05:00
Arseny Kapoulkine
27d23a92a0 Remove constants from constant manager in KillInst
Registering a constant in constant manager establishes a relation
between instruction that defined it and constant object. On complex
shaders this could result in the constant definition getting removed as
part of one of the DCE pass, and a subsequent simplification pass trying
to use the defining instruction for the constant.

To fix this, we now remove associated constant entries from constant
manager when killing constant instructions; the constant object is still
registered and can be remapped to a new instruction later.

GetDefiningInstruction shouldn't ever return nullptr after this change
so add an assertion to check for that.
2018-02-16 20:37:12 -05:00
Steven Perron
50f307f889 Simplify OpPhi instructions referencing unreachable continues
In dead branch elimination, we already recognize unreachable continue
blocks, and update OpPhi instruction accordingly.  This change adds an
extra check: if the head block has exactly 1 other incoming edge, then
replace the OpPhi with the value from that edge.

Fixes #1314.
2018-02-16 18:58:03 -05:00
Steven Perron
3756b387f3 Get CCP to use the constant floating point rules.
Fixes #1311
2018-02-16 13:49:47 -05:00
Lei Zhang
f3a10470d3
Avoid using static unordered_map (#1304)
unordered_map is not POD. Using it as static may cause problems
when operator new() and operator delete() is customized.

Also changed some function signatures to use const char* instead
of std::string, which will give caller the flexibility to avoid
creating a std::string.
2018-02-15 10:19:15 -05:00
Arseny Kapoulkine
32a8e04c7d Add folding of redundant OpSelect insns
We can fold OpSelect into one of the operands in two cases:

- condition is constant
- both results are the same

Even if the original shader doesn't have either of these, if-conversion
pass sometimes ends up generating instructions like

   %7127 = OpSelect %int %3220 %7058 %7058

And this optimization cleans them up.
2018-02-15 10:03:22 -05:00
Steven Perron
0e9f2f948a Add id to name map
Adding a map from an id to it set of OpName and OpMemberName
instructions.  This will be used in KillNameAndDecorates to kill the
names for the ids that are being removed.

In my test, the compile time for 50 shaders went from 1m57s to 55s.
This was on linux using the release build.

Fixes #1290.
2018-02-14 15:53:13 -05:00
Steven Perron
6669d8163d Fold binary floating point operators.
Adds the floating rules for FAdd, FDiv, FMul, and FSub.

Contributes to #1164.
2018-02-14 15:48:15 -05:00
Stephen McGroarty
dd8400e150 Initial support for loop unrolling.
This patch adds initial support for loop unrolling in the form of a
series of utility classes which perform the unrolling. The pass can
be run with the command spirv-opt --loop-unroll. This will unroll
loops within the module which have the unroll hint set. The unroller
imposes a number of requirements on the loops it can unroll. These are
documented in the comments for the LoopUtils::CanPerformUnroll method in
loop_utils.h. Some of the restrictions will be lifted in future patches.
2018-02-14 15:44:38 -05:00
Alan Baker
229ebc0665 Fixes #1295. Mark undef values as varying in ccp.
* Undef now marked as varying in ccp
 * this prevents incorrect meet operations since phis were always not
 interesting
* added a test to catch the bug
2018-02-14 10:21:26 -05:00
Diego Novillo
08699920ad Cleanup. Use proper #include guard. NFC. 2018-02-12 13:21:48 -05:00
Steven Perron
06b437dedc Avoid using the def-use manager during inlining.
There seems to only be a single location where the def-use manager is
used.  It is to get information about a type.  We can do that with the
type manager instead.

Fixes #1285
2018-02-12 09:47:55 -05:00
Arseny Kapoulkine
70bf3514e8 Fix spirv.h include to rely on include paths
This is important when SPIRV-Headers are not checked out to external/
folder and mirrors other places in the code where spirv.h is included.
2018-02-09 18:29:17 -08:00
Steven Perron
1d7b1423f9 Add folding of OpCompositeExtract and OpConstantComposite constant instructions.
Create files for constant folding rules.

Add the rules for OpConstantComposite and OpCompositeExtract.
2018-02-09 17:52:33 -05:00
Steven Perron
1a849ffb60 Add header files missing from CMakeLists.txt 2018-02-08 23:02:22 -05:00
Alexander Johnston
84ccd0b9ae Loop invariant code motion initial implementation 2018-02-08 22:55:47 -05:00
GregF
ca4457b4b6 SROA: Do replacement on structs with no partial references. 2018-02-08 15:20:02 -05:00
Steven Perron
06cdb96984 Make use of the instruction folder.
Implementation of the simplification pass.

- Create pass that calls the instruction folder on each instruction and
  propagate instructions that fold to a copy.  This will do copy
  propagation as well.

- Did not use the propagator engine because I want to modify the instruction
  as we go along.

- Change folding to not allocate new instructions, but make changes in
  place.  This change had a big impact on compile time.

- Add simplification pass to the legalization passes in place of
  insert-extract elimination.

- Added test cases for new folding rules.

- Added tests for the simplification pass

- Added a method to the CFG to apply a function to the basic blocks in
  reverse post order.

Contributes to #1164.
2018-02-07 23:01:47 -05:00
Alan Baker
871022772e Registering a type now rebuilds it out of memory owned by the manager.
* Added TypeManager::RebuildType
 * rebuilds the type and its constituent types in terms of memory owned
 by the manager.
 * Used by TypeManager::RegisterType to properly allocate memory
* Adding an unit test to expose the issue
* Added some tests to provide coverage of RebuildType
* Added an accessor to the target pointer for a forward pointer
2018-02-06 10:17:56 -05:00
GregF
860b2ee5fc ADCE: Fix combinator initialization
The combinator initialization was only looking at the capabilities
in the shader and not the inferred capabilities. Geometry and tessellation
shaders were not setting the Shader capability which is inferred. So the
combinator set was not initialized correctly causing problems for ADCE.
2018-02-05 16:54:03 -05:00
David Neto
9e19fc0f31 VS2013: LoopDescriptor LoopContainerType can't contain unique_ptr
The loop descriptor must explicitly manage the storage for contained
Loop objects.

Fixes #1262
2018-02-05 14:19:21 -05:00