Commit Graph

38 Commits

Author SHA1 Message Date
Steven Perron
bc648fd76a Delete unused code in MemPass
Since the SSA rewriter was added, the code old phi insertion code is no
longer used.  It is going stale and should be deleted.
2018-04-11 15:40:33 -04:00
Steven Perron
7c5d49bf2a Teach ADCE about OpImageTexelPointer
Currently OpImageTexelPointer operations are treat like a use of the
pointer, but it does
not look for the memory being referenced to make sure stores are not
removed.

This change teaches it so identify the memory being accessed, and
treats it as if that memory is loaded.

Fixes to #1445.
2018-04-04 13:45:29 -04:00
Diego Novillo
735d8a579e SSA rewrite pass.
This pass replaces the load/store elimination passes.  It implements the
SSA re-writing algorithm proposed in

     Simple and Efficient Construction of Static Single Assignment Form.
     Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
     In: Jhala R., De Bosschere K. (eds)
     Compiler Construction. CC 2013.
     Lecture Notes in Computer Science, vol 7791.
     Springer, Berlin, Heidelberg

     https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6

In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.

When a target variable is loaded, it queries the variable's reaching
definition.  If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.

The algorithm avoids repeated lookups using memoization.

For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA.  That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
2018-03-20 20:56:55 -04:00
Steven Perron
2cb589cc14 Remove uses DCEInst and call ADCE
The algorithm used in DCEInst to remove dead code is very slow.  It is
fine if you only want to remove a small number of instructions, but, if
you need to remove a large number of instructions, then the algorithm in
ADCE is much faster.

This PR removes the calls to DCEInst in the load-store removal passes
and adds a pass of ADCE afterwards.

A number of different iterations of the order of optimization, and I
believe this is the best I could find.

The results I have on 3 sets of shaders are:

Legalization:

Set 1: 5.39 -> 5.01
Set 2: 13.98 -> 8.38
Set 3: 98.00 -> 96.26

Performance passes:

Set 1: 6.90 -> 5.23
Set 2: 10.11 -> 6.62
Set 3: 253.69 -> 253.74

Size reduction passes:

Set 1: 7.16 -> 7.25
Set 2: 17.17 -> 16.81
Set 3: 112.06 -> 107.71

Note that the third set's compile time is large because of the large
number of basic blocks, not so much because of the number of
instructions.  That is why we don't see much gain there.
2018-02-27 21:06:08 -05:00
Victor Lomuller
3497a94460 Add loop unswitch pass.
It moves all conditional branching and switch whose conditions are loop
invariant and uniform. Before performing the loop unswitch we check that
the loop does not contain any instruction that would prevent it
(barriers, group instructions etc.).
2018-02-27 08:52:46 -05:00
GregF
46a9ec9d23 Opt: Check for side-effects in DCEInst()
This function now checks for side-effects before adding operand
instructions to the dead instruction work list.

Because this fix puts more pressure on IsCombinatorInstruction() to
be correct, this commit adds all OpConstant* and OpType* instructions
to combinator_ops_ set.

Fixes #1341.
2018-02-22 12:24:13 -05:00
Diego Novillo
6c75050136 Speed up Phi insertion.
On some shader code we have in our testsuite, Phi insertion is showing
massive compile time slowdowns, particularly during destruction.  The
specific shader I was looking at has about 600 variables to keep track
of and around 3200 basic blocks.  The algorithm is currently O(var x
blocks), which means maps with around 2M entries.  This was taking about
8 minutes of compile time.

This patch changes the tracking of stored variables to be more sparse.
Instead of having every basic block contain all the tracked variables in
the map, they now have only the variables actually stored in that block.

This speeds up deallocation, which brings down compile time to about
1m20s.

Note that this is not the definite fix for this.  I will re-write Phi
insertion to use a standard SSA rewriting algorithm
(https://github.com/KhronosGroup/SPIRV-Tools/issues/893).

This contributes to
https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
2018-02-20 12:04:06 -05:00
David Neto
87f9cfaba3 Disambiguate between const and nonconst ForEachSuccessorLabel
This helps VisualStudio 2013 compile the code.

Contributes to #1262
2018-02-02 17:54:40 -05:00
Alan Baker
6587d3f8a3 Adding early exit versions of several ForEach* methods
* Looked through code for instances where code would benefit from early
exit
 * Added a corresponding WhileEach* method and updated the code
2018-01-12 17:05:09 -05:00
Alan Baker
eb0c73dad6 Maintain instruction to block mapping in phi insertion
* Changed MemPass::InsertPhiInstructions to set basic blocks for new
phis
* Local SSA elim now maintains instr to block mapping
 * Added a test and confirmed it fails without the updated phis
* IRContext::set_instr_block no longer builds the map if the analysis is
invalid
* Added instruction to block mapping verification to
IRContext::IsConsistent()
2018-01-12 10:16:53 -05:00
Alan Baker
1b6cfd3409 Rewriting dead branch elimination.
Pass now paints live blocks and fixes constant branches and switches as
it goes. No longer requires structured control flow. It also removes
unreachable blocks as a side effect. It fixes the IR (phis) before doing
any code removal (other than terminator changes).

Added several unit tests for updated/new functionality.

Does not remove dead edge from a phi node:
* Checks that incoming edges are live in order to retain them
* Added BasicBlock::IsSuccessor
* added test

Fixing phi updates in the presence of extra backedge blocks

* Added tests to catch bug

Reworked how phis are updated

* Instead of creating a new Phi and RAUW'ing the old phi with it, I now
replace the phi operands, but maintain the def/use manager correctly.

For unreachable merge:

* When considering unreachable continue blocks the code now properly
checks whether the incoming edge will continue to be live.

Major refactoring for review

* Broke into 4 major functions
 * marking live blocks
 * marking structured targets
 * fixing phis
 * deleting blocks
2018-01-09 12:21:39 -05:00
Steven Perron
756b277fb8 Store all enabled capabilities in the feature manger.
In order to keep track of all of the implicit capabilities as well as
the explicit ones, we will add them all to the feature manager.  That is
the object that needs to be queried when checking if a capability is
enabled.

The name of the "HasCapability" function in the module was changed to
make it more obvious that it does not check for implied capabilities.

Keep an spv_context and AssemblyGrammar in IRContext
2017-12-21 11:14:53 -05:00
Steven Perron
79a00649b4 Allow pointers to pointers in logical addressing mode.
A few optimizations are updates to handle code that is suppose to be
using the logical addressing mode, but still has variables that contain
pointers as long as the pointer are to opaque objects.  This is called
"relaxed logical addressing".

|Instruction::GetBaseAddress| will check that pointers that are use meet
the relaxed logical addressing rules.  Optimization that now handle
relaxed logical addressing instead of logical addressing are:

 - aggressive dead-code elimination
 - local access chain convert
 - local store elimination passes.
2017-12-19 14:29:14 -05:00
GregF
78c025abe9 MultiStore: Support OpVariable Initialization
Treat an OpVariable with initialization as if it was an OpStore.
With PR #1073, this completes work for issue #1017.
2017-12-11 10:37:14 -05:00
Steven Perron
fd3a22042b DCEInst kill the same instruction twice.
In DCEInst, it is possible that the same instruction ends up in the
queue multiple times, if the same id is used multiple times in the
same instruction.

The solution is to keep the ids in a set, to ensure no duplication in
the list.
2017-12-04 18:15:35 -05:00
Steven Perron
65046eca7c Change IRContext::KillInst to delete instructions.
The current method of removing an instruction is to call ToNop.  The
problem with this is that it leaves around an instruction that later
passes will look at.  We should just delete the instruction.

In MemPass there is a utility routine called DCEInst.  It can delete
essentially any instruction, which can invalidate pointers now that they
are actually deleted.  The interface was changed to add a call back that
can be used to update any local data structures that contain
ir::Intruction*.
2017-12-04 11:07:45 -05:00
Pierre Moreau
69043963e4 Opt: Remove unused lambda captures
Those are reported as errors by clang 5.0.0, due to the flags -Werror
and -Wunused-lambda-capture.
2017-12-01 09:54:37 -05:00
Diego Novillo
83228137e1 Re-format source tree - NFC.
Re-formatted the source tree with the command:

$ /usr/bin/clang-format -style=file -i \
    $(find include source tools test utils -name '*.cpp' -or -name '*.h')

This required a fix to source/val/decoration.h.  It was not including
spirv.h, which broke builds when the #include headers were re-ordered by
clang-format.
2017-11-27 14:31:49 -05:00
Alan Baker
746bfd210a Adding new def -> use mapping container
Replaced representation of uses

* Changed uses from unordered_map<uint32_t, UseList> to
set<pairInstruction*, Instruction*>>
* Replaced GetUses with ForEachUser and ForEachUse functions
* updated passes to use new functions
* partially updated tests
* lots of cleanup still todo

Adding an unique id to Instruction generated by IRContext

Each instruction is given an unique id that can be used for ordering
purposes. The ids are generated via the IRContext.

Major changes:
* Instructions now contain a uint32_t for unique id and a cached context
pointer
 * Most constructors have been modified to take a context as input
 * unfortunately I cannot remove the default and copy constructors, but
 developers should avoid these
* Added accessors to parents of basic block and function
* Removed the copy constructors for BasicBlock and Function and replaced
them with Clone functions
* Reworked BuildModule to return an IRContext owning the built module
 * Since all instructions require a context, the context now becomes the
basic unit for IR
* Added a constructor to context to create an owned module internally
* Replaced uses of Instruction's copy constructor with Clone whereever I
found them
* Reworked the linker functionality to perform clones into a different
context instead of moves
* Updated many tests to be consistent with the above changes
 * Still need to add new tests to cover added functionality
* Added comparison operators to Instruction

Adding tests for Instruction, IRContext and IR loading

Fixed some header comments for BuildModule

Fixes to get tests passing again

* Reordered two linker steps to avoid use/def problems
* Fixed def/use manager uses in merge return pass
* Added early return for GetAnnotations
* Changed uses of Instruction::ToNop in passes to IRContext::KillInst

Simplifying the uses for some contexts in passes
2017-11-23 16:40:02 -05:00
GregF
e28edd458b Optimize loads/stores on nested structs
Also fix LocalAccessChainConvert test: nested structs now convert

Add InsertExtractElim test for nested struct
2017-11-21 17:56:03 -05:00
Alan Baker
a771713e42 Adding an unique id to Instruction generated by IRContext
Each instruction is given an unique id that can be used for ordering
purposes. The ids are generated via the IRContext.

Major changes:
* Instructions now contain a uint32_t for unique id and a cached context
pointer
 * Most constructors have been modified to take a context as input
 * unfortunately I cannot remove the default and copy constructors, but
 developers should avoid these
* Added accessors to parents of basic block and function
* Removed the copy constructors for BasicBlock and Function and replaced
them with Clone functions
* Reworked BuildModule to return an IRContext owning the built module
 * Since all instructions require a context, the context now becomes the
basic unit for IR
* Added a constructor to context to create an owned module internally
* Replaced uses of Instruction's copy constructor with Clone whereever I
found them
* Reworked the linker functionality to perform clones into a different
context instead of moves
* Updated many tests to be consistent with the above changes
 * Still need to add new tests to cover added functionality
* Added comparison operators to Instruction
* Added an internal option to LinkerOptions to verify merged ids are
unique
* Added a test for the linker to verify merged ids are unique

* Updated MergeReturnPass to supply a context
* Updated DecorationManager to supply a context for cloned decorations

* Reworked several portions of the def use tests in anticipation of next
set of changes
2017-11-20 17:49:10 -05:00
Steven Perron
eb4653a67f Add the decoration manager to the IRContext.
To make the decoration manger available everywhere, and to reduce the
number of times it needs to be build, I add one the IRContext.

As the same time, I move code that modifies decoration instruction into
the IRContext from mempass and the decoration manager.  This will make
it easier to keep everything up to date.

This should take care of issue #928.
2017-11-15 12:48:03 -05:00
Diego Novillo
98281ed411 Add analysis to compute mappings between instructions and basic blocks.
This analysis builds a map from instructions to the basic block that
contains them.  It is accessed via get_instr_block().  Once built, it is kept
up-to-date by the IRContext, as long as instructions are removed via
KillInst.

I have not yet marked passes that preserve this analysis. I will do it
in a separate change.

Other changes:

- Add documentation about analysis values requirement to be powers of 2.
- Force a re-build of the def-use manager in tests.
- Fix AllPreserveFirstOnlyAfterPassWithChange to use the
  DummyPassPreservesFirst pass.
- Fix sentinel value for IRContext::Analysis enum.

- Fix logic for checking if the instr<->block mapping is valid in KillInst.
2017-11-13 13:21:48 -05:00
Steven Perron
efe12ff5a1 Have all MemPasses preserve the def-use manager.
Originally the passes that extended from MemPass were those that are
of the def-use manager.  I am assuming they would be able to preserve
it because of that.

Added a check to verify consistency of the IRContext. The IRContext
relies on the pass to tell it if something is invalidated.
It is possible that the pass lied.  To help identify those situations,
we will check if the valid analyses are correct after each pass.

This will be enabled by default for the debug build, and disabled in the
production build.  It can be disabled in the debug build by adding
"-DSPIRV_CHECK_CONTEXT=OFF" to the cmake command.
2017-11-10 11:17:12 -05:00
Diego Novillo
d2938e4842 Re-format files in source, source/opt, source/util, source/val and tools.
NFC. This just makes sure every file is formatted following the
formatting definition in .clang-format.

Re-formatted with:

$ clang-format -i $(find source tools include -name '*.cpp')
$ clang-format -i $(find source tools include -name '*.h')
2017-11-08 14:03:08 -05:00
Steven Perron
f32d11f74b Add the IRContext (part 2): Add def-use manager
This change will move the instances of the def-use manager to the
IRContext.  This allows it to persists across optimization, and does
not have to be rebuilt multiple times.

Added test to ensure that the IRContext is validating and invalidating
the analyses correctly.
2017-11-08 13:35:34 -05:00
GregF
ac04b2faea Opt: Fix HasLoads to not report decoration as load. 2017-11-07 17:39:58 -05:00
Diego Novillo
fef669f30f Add a new class opt::CFG to represent the CFG for the module.
This class moves some of the CFG-related functionality into a new
class opt::CFG.  There is some other code related to the CFG in the
inliner and in opt::LocalSingleStoreElimPass that should also be moved,
but that require more changes than this pure restructuring.

I will move those bits in a follow-up PR.

Currently, the CFG is computed every time a pass is instantiated, but
this should be later moved to the new IRContext class that @s-perron is
working on.

Other re-factoring:

- Add BasicBlock::ContinueBlockIdIfAny. Re-factored out of MergeBlockIdIfAny
- Rewrite IsLoopHeader in terms of GetLoopMergeInst.
- Run clang-format on some files.
2017-11-02 10:37:03 -04:00
Steven Perron
476cae6f7d Add the IRContext (part 1)
This is the first part of adding the IRContext.  This class is meant to
hold the extra data that is build on top of the module that it
owns.

The first part will simply create the IRContext class and get it passed
to the passes in place of the module.  For now it does not have any
functionality of its own, but it acts more as a wrapper for the module.

The functions that I added to the IRContext are those that either
traverse the headers or add to them.  I did this because we may decide
to have other ways of dealing with these sections (for example adding a
type pool, or use the decoration manager).

I also added the function that add to the header because the IRContext
needs to know when an instruction is added to update other data
structures appropriately.

Note that there is still lots of work that needs to be done.  There are
still many places that change the module, and do not inform the context.
That will be the next step.
2017-10-31 13:46:05 -04:00
GregF
94bec26afe ADCE: Dead if elimination
Mark structured conditional branches live only if one or more instructions
in their associated construct is marked live. After closure, replace dead
structured conditional branches with a branch to its merge and remove
dead blocks.

ADCE: Dead If Elim: Remove duplicate StructuredOrder code

Also generalize ComputeStructuredOrder so that the caller can specify the
root block for the order. Phi insertion uses pseudo_entry_block and adce and
dead branch elim use the first block of the function.

ADCE: Dead If Elim: Pull redundant code out of InsertPhiInstructions

ADCE: Dead If Elim: Encapsulate CFG Cleanup Initialization

ADCE: Dead If Elim: Remove redundant code from ADCE initialization

ADCE: Dead If: Use CFGCleanup to eliminate newly dead blocks

Moved bulk of CFG Cleanup code into MemPass.
2017-10-31 11:51:30 -04:00
Diego Novillo
1040a95b3f Re-factor Phi insertion code out of LocalMultiStoreElimPass
Including a re-factor of common behaviour into class Pass:

The following functions are now in class Pass:

- IsLoopHeader.
- ComputeStructuredOrder
- ComputeStructuredSuccessors (annoyingly, I could not re-factor all
  instances of this function, the copy in common_uniform_elim_pass.cpp
  is slightly different and fails with the common implementation).
- GetPointeeTypeId
- TakeNextId
- FinalizeNextId
- MergeBlockIdIfAny

This is a NFC (non-functional change)
2017-10-27 15:28:08 -04:00
GregF
1e7994c085 Opt: Move *NextId functionality into MemPass 2017-10-13 15:22:19 -04:00
Pierre Moreau
86627f7b3f Implement Linker (module combiner)
Add extra iterators for ir::Module's sections
Add extra getters to ir::Function
Add a const version of BasicBlock::GetLabelInst()

Use the max of all inputs' version as version

Split debug in debug1 and debug2
- Debug1 instructions have to be placed before debug2 instructions.

Error out if different addressing or memory models are found

Exit early if no binaries were given

Error out if entry points are redeclared

Implement copy ctors for Function and BasicBlock
- Visual Studio ends up generating copy constructors that call deleted
  functions while compiling the linker code, while GCC and clang do not.
  So explicitly write those functions to avoid Visual Studio messing up.

Move removing duplicate capabilities to its own pass

Add functions running on all IDs present in an instruction

Remove duplicate SpvOpExtInstImport

Give default options value for link functions

Remove linkage capability if not making a library

Check types before allowing to link

Detect if two types/variables/functions have different decorations

Remove decorations of imported variables/functions and their types

Add a DecorationManager

Add a method for removing all decorations of id

Add methods for removing operands from instructions

Error out if one of the modules has a non-zero schema

Update README.md to talk about the linker

Do not freak out if an imported built-in variable has no export
2017-10-06 18:33:53 -04:00
GregF
c8c86a0d36 Opt: Have "size" passes process full entry point call tree.
Includes code to deal correctly with OpFunctionParameter. This
is needed by opaque propagation which may not exhaustively inline
entry point functions.

Adds ProcessEntryPointCallTree: a method to do work on the
functions in the entry point call trees in a deterministic order.
2017-08-18 10:16:01 -04:00
GregF
b0310a4156 ADCE: Add support for function calls
ADCE will now generate correct code in the presence of function calls.
This is needed for opaque type optimization needed by glslang. Currently
all function calls are marked as live. TODO: mark calls live only if they
write a non-local.
2017-08-10 17:30:05 -04:00
GregF
f0fe601dc8 AccessChainConvert: Add HasOnlySupportedRefs()
This avoids conversion on variables which will not ultimately be optimized.
Also removed an obsolete restriction from FindTargetVars(). Also added
decorates to supported refs (eg. RelaxedPrecision). Also fixed name to
IsNonTypeDecorate().
2017-08-04 18:11:44 -04:00
GregF
d9a450121e Mem2Reg: Allow Image and Sampler types as base target types. 2017-08-04 17:52:32 -04:00
GregF
c1b46eedbd Add MemPass, move all shared functions to it. 2017-08-02 14:24:02 -04:00