Commit Graph

1540 Commits

Author SHA1 Message Date
Steven Perron
9ba0879ddf Improve Vector DCE
Track live scalars in VDCE as if they were single element vectors.

Handle the extended instructions for GLSL in VDCE.

Handle composite construct instructions in VDCE.
2018-04-30 11:55:50 -04:00
Steven Perron
a00a0a09ae Revert "Improvements to vector dce."
This reverts commit 2813722993.

A regression was found.  Undoing the change until it is fixed.
2018-04-27 10:33:19 -04:00
Alan Baker
4246abdc74 Fixes handling of kill and unreachable ops in inlining.
Fixes #1527

* Adds handling for copying OpKill and OpUnreachable and forces the
generation of a new basic block
* Adds tests to check
2018-04-27 09:42:37 -04:00
Steven Perron
e1bcd2b2d8 Fold OpVectorTimesScalar and OpPhi better.
If one of the operands to an OpVectorTimesScalar instruction is zero,
then the result will be the 0 vector. Currently we do not fold the
insturction unless both operands are constants. This change fixes that.

We also allow folding of OpPhi instructions where the incoming values
are either an OpUndef or the OpPhi instruction itself. As with other
cases, this can be simplified to the OpUndef.
2018-04-26 12:41:16 -04:00
Steven Perron
2813722993 Improvements to vector dce.
Track live scalars in VDCE as if they were single element vectors.

Handle the extended instructions for GLSL in VDCE.

Handle composite construct instructions in VDCE.

Fixes #1511.
2018-04-26 11:07:48 -04:00
Cort Stratton
72524db2de Fixes #1521: PadToWord() should use std::move() in && variant 2018-04-25 22:03:14 -04:00
Greg Fischer
268be6143d LocalSingleBlockElim: Add store-store elimination
Eliminate unused store to variable if followed by store to same
variable in same block.

Most significantly, this cleans up stores made unused by this pass.
These useless stores can inhibit subsequent optimizations, specifically
LocalSingleStoreElim. Eliminating them makes subsequent optimization more
effective.

The main effect of this pass is to simplify the work done by the SSA
rewriter.  It catches many local loads/stores that help speeding up the
work done by the main rewriter.
2018-04-25 10:30:18 -04:00
Steven Perron
ee8cd5c847 Add Dead insert elmination back in. 2018-04-24 10:10:30 -04:00
Steven Perron
2c0ce87210
Vector DCE (#1512)
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.

It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
2018-04-23 11:13:07 -04:00
David Neto
7a59283587 Another fix for old XCode: std::set explicit ctor in test code 2018-04-20 15:58:01 -04:00
Victor Lomuller
efc5061929 Dominator analysis interface clean.
Remove the CFG requirement when querying a dominator/post-dominator from an IRContext.

Updated all uses of the function and tests.
2018-04-20 15:41:59 -04:00
Jaebaek Seo
48802bad72 Constant folding for OpVectorTimesScalar 2018-04-20 13:43:04 -04:00
Victor Lomuller
0ec08c28c1 Add register liveness analysis.
For each function, the analysis determine which SSA registers are live
at the beginning of each basic block and which one are killed at
the end of the basic block.

It also includes utilities to simulate the register pressure for loop
fusion and fission.

The implementation is based on the paper "A non-iterative data-flow
algorithm for computing liveness sets in strict ssa programs" from
Boissinot et al.
2018-04-20 09:45:15 -04:00
Alan Baker
09c206b6fb Fixes #1480. Validate group non-uniform scopes.
* Adds new pass for validating non-uniform group instructions
 * Currently on checks execution scope for Vulkan 1.1 and SPIR-V 1.3
* Added test framework
2018-04-20 09:25:00 -04:00
David Neto
e7c2e91ded Fix for old XCode: std::set has explicit ctor 2018-04-19 16:33:12 -04:00
GregF
1c89da46ff Test/DependencyAnalysis: Fix uninitialized variables 2018-04-19 15:34:15 -04:00
Greg Fischer
df7f00f60e DeadInsertElim: Don't revisit select phi nodes during MarkInsertChain
Fixes #1487.
2018-04-19 14:40:00 -04:00
Jaebaek Seo
430a29335e Fix broken pointer of CommonUniformElimPass 2018-04-19 09:36:10 -04:00
Steven Perron
c20a718e00 Rewrite local-single-store-elim to not create large data structures.
The local-single-store-elim algorithm is not fundamentally bad.
However, when there are a large number of variables, some of the
maps that are used can become very large.  These large data structures
then take a very long time to be destroyed.  I've seen cases around 40%
if the time.

I've rewritten that algorithm to not use as much memory.  This give a
significant improvement when running a large number of shader through
DXC.

I've also made a small change to local-single-block-elim to delete the
loads that is has replaced.  That way local-single-store-elim will not
have to look at those.  local-single-store-elim now does the same thing.

The time for one set goes from 309s down to 126s.  For another set, the
time goes from 102s down to 88s.
2018-04-18 16:38:18 -04:00
Jaebaek Seo
0fa42996b5
Merge pull request #1461 from jaebaek/fnegate
Add constant folding for OpFNegate

Contributes to #709
2018-04-18 13:46:10 -04:00
Jaebaek Seo
3c5bd26668 Typo 2018-04-17 14:13:19 -04:00
Toomas Remmelg
0f335cf87e Add support for MIV and Delta test dependence analysis.
GCD MIV test as described in Chapter 3 of "Optimizing Compilers for
Modern Architectures: A Dependence-Based Approach" by Randy Allen, and
Ken Kennedy.

Delta test as described in Figure 3 of "Practical Dependence Testing" by
Gina Goff, Ken Kennedy, and Chau-Wen Tseng from PLDI '91.
2018-04-17 13:57:02 -04:00
Jaebaek Seo
ff92339fff Format 2018-04-17 12:12:48 -04:00
Jaebaek Seo
d8b9306a4f Add more unit tests 2018-04-17 12:08:45 -04:00
Jaebaek Seo
79491259e0 Add constant folding for FNegate 2018-04-17 12:08:45 -04:00
Alan Baker
38359ba800 Fixes #1483. Validating Vulkan 1.1 barrier execution scopes
* Reworked how execution model limitations are checked
 * Now OpFunction checks which entry points call it and checks its
 registered limitations instead of building a call stack in the entry
 point
* New tests
* Moving function to entry point mapping into VState
2018-04-17 10:26:38 -04:00
David Neto
152b9a681e ADCE: Remove OpDecorateStringGOOGLE
Also fix a few failures to set "modified" status when removing
global values.

Add OpDecorateStringGOOGLE to decoration ordering

Fixes #1492
2018-04-17 10:24:30 -04:00
Alan Baker
0e80b86dbe Fixes #1472. Per-vertex variable validation fixes.
Relaxs checks for per-vertex builtin variables. If the builtin
decoration is applied to a variable, then those checks now allow a level
of arraying on the variable before checking the type consistency.

* Allows arrays of variables to be present for the per-vertex variables:
 * Position
 * PointSize
 * ClipDistance
 * CullDistance
* Updated tests
2018-04-16 12:58:35 -04:00
Rex Xu
7fe186476a Fix validation issues relevant to SPV_AMD_gpu_shader_int16.
Frexp/FrexpStruct allows exp to be either 16-bit or 32 bit integer if
SPV_AMD_gpu_shader_int16 is enabled.
2018-04-16 10:49:01 -04:00
Lei Zhang
a3bb782745 Travis CI: change to use the default email notification behavior 2018-04-16 10:44:42 -04:00
David Neto
e8814be732 Add validator test for OpBranch
Add test for case where OpBranch branches to a value (a function value).
Previous tests only checked a label value (name of a block.).

Update validate_id.cpp to remove the TODO for OpBranch and say that it
is already checked in validate_cfg.cpp
2018-04-16 10:27:51 -04:00
Steven Perron
d42f65e7c1 Use a bit vector in ADCE
The unordered_set in ADCE that holds all of the live instructions takes
a very long time to be destroyed.  In some shaders, it takes over 40% of
the time.

If we look at the unique ids of the live instructions, I believe they
are dense enough make a simple bit vector a good choice for to hold that
data.  When I check the density of the bit vector for larger shaders, we
are usually using less than 4 bytes per element in the vector, and
almost always less than 16.

So, in this commit, I introduce a simple bit vector class, and
use it in ADCE.

This help improve the compile time for some shaders on windows by the
40% mentioned above.

Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
2018-04-13 16:38:02 -04:00
Steven Perron
8190c26270 Change parameter to Mempass::RemovePhiOperands
Pass a hashtable by const ref instead of by value.  Big impact on
compile time.
2018-04-13 09:53:37 -04:00
Alan Baker
e805d1f8d7 Fixes #1469. Allow subgroup memory scope for Vulkan 1.1
* New error that prevents CrossDevice memory scope for all vulkan
* Old error specifically references Vulkan 1.0
* New tests
2018-04-12 13:16:04 -04:00
Alan Baker
c3ee210563 Fixes #1471. Adds missing environments to spriv-val help
* spirv-val: Added environments referenced in --version,
  but not mentioned in --help
2018-04-12 13:11:52 -04:00
Alan Baker
c522b697bf Fixes #1470. Don't restrict WGS storage class
* Removed restriction that workgroup size can only be on Input storage
class
* added test
2018-04-12 09:22:34 -04:00
Steven Perron
bc648fd76a Delete unused code in MemPass
Since the SSA rewriter was added, the code old phi insertion code is no
longer used.  It is going stale and should be deleted.
2018-04-11 15:40:33 -04:00
Steven Perron
c584ac4fc6 Don't allow an instance of a pass to be run multiple times. 2018-04-11 12:02:30 -04:00
Victor Lomuller
10e5d7cf13 Add a loop peeling pass.
For each loop in a function, the pass walks the loops from inner to outer most loop
and tries to peel loop for which a certain amount of iteration can be done before or after the loop.

To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.
2018-04-11 15:41:29 +01:00
Alexander Johnston
61b50b3bfa ZIV and SIV loop dependence analysis.
Provides functionality to perform ZIV and SIV dependency analysis tests
between a load and store within the same loop.

Dependency tests rely on scalar analysis to prove and disprove dependencies
with regard to the loop being analysed.

Based on the 1990 paper Practical Dependence Testing by Goff, Kennedy, Tseng

Adds support for marking loops in the loop nest as IRRELEVANT.
Loops are marked IRRELEVANT if the analysed instructions contain
no induction variables for the loops, i.e. the loops induction
variable is not relevent to the dependence of the store and load.
2018-04-11 09:32:42 -04:00
Steven Perron
53bc1623ec Fold OpDot
Adding three rules to fold OpDot (implemented as two).

- When an OpDot has two constants, then fold to the resulting const.

- When one of the inputs is the 0 vector, then fold to zero.

- When one of the inputs is a single 1 with 0s, then rewrite to an
OpCompositeExtract of the appropriate element.  This will help find
even more folding opportunities.

Contributes to #709.
2018-04-10 13:09:37 -04:00
Alan Baker
3020104ff2 Adding tests for OpenCL 1.2 and embedded profiles 2018-04-09 09:02:50 -04:00
Alan Baker
42840d15e4 Fixes #1433. Validate binary version
* Validates SPIR-V binary version against target environment
2018-04-06 22:41:50 -04:00
Lei Zhang
26a698c347 Fix PrimitiveId builtin check for Vulkan
According to Vulkan spec 1.1.72:

> The PrimitiveId decoration must be used only within fragment,
> tessellation control, tessellation evaluation, and geometry shaders.

> In a tessellation control or tessellation evaluation shader, any
> variable decorated with PrimitiveId must be declared using the Input
> storage class.

We were enforcing that PrimitiveId can only be used with Output
storage class for TCS and TES before.
2018-04-06 22:38:32 -04:00
David Neto
5f53c42a1e Update CHANGES 2018-04-06 16:42:56 -04:00
David Neto
a91cbfbf75 Optimizer: update extension whitelists
Add two new extensions:
- SPV_NV_shader_subgroup_partitioned
- SPV_EXT_descriptor_indexing
2018-04-06 15:56:20 -04:00
GregF
6fbfe1c016 Fix SSA rewrite for nested loops.
From the test case, the slice of the CFG that is interesting for the bug
is

25
|
v
30
|
v
31<-+
|   |
v   |
34--+

1. In block 25, we have a Phi candidate for %f with arguments
   %47 = Phi[%float_0, %0]. This merges %float_0 and a yet unknown
   argument from the external loop backedge.
2. We are now processing block 34:
   i. The load %35 = OpLoad %f triggers a Phi candidate to be placed in
      block 31.
  ii. The Phi candidate %50 = Phi needs two arguments. The one coming
      from block 30 is %47. But the one coming from block 34 (which we
      are now processing and have marked sealed), finds %50 itself as
      the reaching def for %f.
3. This wrongfully marks %50 as a copy-of Phi, which ultimately makes
   both %47 and %50 copy-of Phis that get eliminated.
2018-04-06 15:17:52 -04:00
David Neto
e025145c5d Test asm/dis support for SPV_EXT_descriptor_indexing 2018-04-06 13:33:34 -04:00
David Neto
6f80608b8a Test asm/dis support for SPV_NV_shader_subgroup_partitioned 2018-04-06 13:33:24 -04:00
Alan Baker
e66e305b46 Re-enabled checks for UConvert 2018-04-06 10:51:57 -04:00