Commit Graph

1467 Commits

Author SHA1 Message Date
Steven Perron
93c4c184d5 Handle types with self references.
By using forward pointers, we are able to define a struct that has a
pointer to itself.  This could be directly or indirectly.  The current
implementation of the type manager did not handle this case.  There are
three changes that are made in this commit inorder to handle this case:

1) Change the handling of OpTypeForwardPointer

The current handling of OpTypeForwardsPointer is broken if there is a
reference to the pointer before the real definition.  When build the
type that contain the forward delared pointer, the type manager will ask
for the type for that ID, and will get a nullptr because it does not
exists.  This nullptr is not handleded very well.

The change is to keep track of the incomplete types the first time
through all of the types.  An incomplete type is a ForwardPointer or any
type that references an incomplete type.

Then we implement a second pass through the incomplete types that will
complete them.

2) Hashing types.

When hashing a type, we want to uses all of the subtypes as part of the
hash.  However, with types that reference them selves, this creates an
infinite recursion.  To get around this, we keep track of which types
have been seen on the path from the root type.  If we have see the
current type already then we can stop the recursion.

3) Comparing types.

In order to check if two types are the same, we must check that all of
their subtypes are the same as well.  This also causes an infinit
recursion.  The solution is to stop comparing the subtypes if we are
trying to compare two pointer types that we are already in the middle of
comparing.  The ideas is that if the two pointer are different, then in
progress compare will return false itself.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1578.
2018-05-30 15:48:38 -04:00
David Neto
6b83643cfe Start v2018.4-dev 2018-05-25 22:14:30 -04:00
David Neto
545d6ca26d Finalize v2018.3 2018-05-25 22:12:25 -04:00
David Neto
6a986d02a7 Update CHANGES 2018-05-25 21:58:26 -04:00
Steven Perron
745dd00af9 Fold FMix feeding Extract, and use the simplification pass.
We add a new rule to the folding rules to fold an FMix feeding an
extract when the alpha value for the element being extracted is either
0 or 1.  In those case, we can simple extract from one of the operands
to the FMix.

With that change the simplification pass completely subsumes the
insert-extract elimination pass.  So we remove the insert-extract
elimination passes and replce them with calls to the simplification
pass.

In a follow up PR, we should delete the insert-extract elimination pass.

Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.
2018-05-25 14:42:59 -04:00
Arseny Kapoulkine
f765d16bd9 Add external interface for creating a pass token
Currently it's impossible for external code to register a pass because
the only source file that can create pass tokens is optimizer.cpp. This
makes it hard to add passes that can't be upstreamed since you can't run
them from the usual pass sequence without reimplementing Optimizer.

This change adds a PassToken constructor that takes unique_ptr to
opt::Pass; if out-of-tree code implements opt::Pass it can register a
custom pass without having to add it to SPIRV-Tools source code.
2018-05-25 09:19:43 -04:00
dan sinclair
0a14a1f748 Validate that only a single OpMemoryModel is provided.
This CL adds validation that only a single OpMemoryModel is provided in the
SPIR-V binary.

Fixes #1574
2018-05-24 08:43:14 -04:00
dan sinclair
3b87dac56b Validate presence of OpMemoryModel.
According to the SPIR-V Spec, section 2.4 Logical Layout of a Module there
should be a single required OpMemoryModel instruction provided. This CL adds
validation that OpMemoryModel is provided to the SPIR-V validator.

Fixes #1207
2018-05-23 08:17:39 -04:00
Steven Perron
a579e720a8 Remove the limit on struct size in SROA.
Removes the limit on scalar replacement for the lagalization passes.
This is done by adding an option to the pass (and command line option)
to set the limit on maximum size of the composite that scalar
replacement is willing to divide.

Fixes #1494.
2018-05-18 10:03:46 -04:00
Steven Perron
f1f7cc870e Get ADCE to handle OpCopyMemory
ADCE does not treat OpCopyMemory as an instruction that references
memory.  Because of that stores are removed that should not be.

This change teaches ADCE that OpCopyMemory and OpCopyMemorySize both
loads from and stores to memory.  This will keep other stores live when
needed, and will allows ADCE to remove OpCopyMemory instructions as
well.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1556.
2018-05-16 13:50:47 -04:00
Lei Zhang
b09e3ce842
Allow ViewportIndex & Layer to be used in VS/DS with extension
SPV_EXT_shader_viewport_index_layer enables using ViewportIndex
and Layer in vertex and tessellation shaders.

Also, as per the Vulkan spec:

> The ViewportIndex decoration must be used only within vertex,
> tessellation evaluation, geometry, and fragment shaders.

> In a vertex, tessellation evaluation, or geometry shader, any
> variable decorated with ViewportIndex must be declared using
> the Output storage class.

> In a fragment shader, any variable decorated with ViewportIndex
> must be declared using the Input storage class.

Similarly for Layer.
2018-05-16 13:16:27 -04:00
Steven Perron
9b1a938ea1 SROA: Only create symbols that are loaded.
Currently in scalar replacement, we create a new variable for every
memeber of the composite being divided.  It is often overkill, because
not all of those members will be used.  This change will check which
elements are used and only create variable for the members that are
used.

This reduces the compile time for one set of shader from 248s to 165s.

Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
2018-05-16 10:48:25 -04:00
Steven Perron
0e1b7e5aef Fix getting operand without checking opcode.
Fixes https://github.com/KhronosGhttps://github.com/KhronosGroup/SPIRV-Tools/issues/1559roup/SPIRV-Tools/issues/1559.

There is an load of an operand of an instruction that was suppose to be
only for the OpCompositeExtract case.  However, an error caused it to
be loaded for every opcode, even those that do not have an operand in
that position.

We fix up that bug, and a couple other things noticed that the same
time.
2018-05-16 09:34:43 -04:00
Lei Zhang
efcc33e8a9
Support SpvOpExecutionModeId in SPIR-V logical layout 2018-05-16 08:43:50 -04:00
alan-baker
18ad1be7f9 Fixing MacOS compiler error 2018-05-15 12:23:27 -04:00
Steven Perron
f46f2d3e5d Remove redundant stores.
The code patterns generated by DXC around function calls can cause many
store to be storing the same value that was just loaded from the same
location:

```
%10 = OpLoad %type %var
OpStore %var %10
```

We want to clean these up very early on because they can cause other
transformations to do a lot of work.  For the cases I see, they can be
removed during local-single-block-elim.

For one set of shaders the compile time goes from 248s to 182s.  A 26%
improvement.

Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
2018-05-15 10:24:05 -04:00
Steven Perron
af430ec822 Add pass to fold a load feeding an extract.
We have already disabled common uniform elimination because it created
sequences of loads an entire uniform object, then we extract just a
single element.  This caused problems in some drivers, and is just
generally slow because it loads more memory than needed.

However, there are other way to get into this situation, so I've added
a pass that looks specifically for this pattern and removes it when only
a portion of the load is used.

Fixes #1547.
2018-05-14 15:40:34 -04:00
Steven Perron
804e8884c4 Fold fclamp feeding compare.
An FClamp instruction forces a values to be within a certain interval.
When the upper or lower bound of the FClamp is a constant and the value
being compared with is a constant, then in some case we can fold the
compared because the entire range is say less than the value.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1549.
2018-05-14 10:27:49 -04:00
Lei Zhang
e9cda70261 Adjust tests according to grammar change
* ConstOffsets now requires ImageGatherExtended
* Int8 does not require Kernel anymore
2018-05-10 16:32:59 -04:00
Steven Perron
9ec3f81e5c Remove dead Workgroup variables in ADCE.
If there is a shader with a variable in the workgroup storage class that
is stored to, but not loadeds, then we know nothing will read those
loads.  It should be safe to remove them.

This is implemented in ADCE by treating workgroup variables the same
way that private variables are treated.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1550.
2018-05-09 16:07:26 -04:00
Steven Perron
0856997df6 Allow ADCE to remove more instructions.
At this time, DCE will only remove an instruction if it is a combinator.
However, there are certain non-combinator instructions that can be
safely removed if their results are not used.  The derivative
instructions are on example.

We are also missing some instructions from the list of combinators
those are added as the same time.
2018-05-05 09:15:28 -04:00
Steven Perron
7d01643132 Allow hoisting code in if-conversion.
When doing if-conversion, we do not currently move code out of the side
nodes.  The reason for this is that it can increase the number of
instructions that get executed because both side nods will have to be
executed now.

In this commit, we add code to move an instruction, and all of the
instructions it depends on, out of a side node and into the header of
the selection construct.  However to keep the cost down, we only do it
when the two values in the OpPhi node compute the same value.  This way
we have to move only one of the instructions and the other becomes
unused most of the time.  So no real extra cost.

Makes the value number table an alalysis in the ir context.

Added more opcodes to list of code motion safe opcodes.

Fixes #1526.
2018-05-04 12:56:29 -04:00
Stephen McGroarty
1c2cbaf569 Add GetContinueBlock to loop class.
Previously, the loop class used the terms latch and continue block
interchangeably. This patch splits the two and corrects and tests some
uses of the old uses of GetLatchBlock.
2018-05-03 14:30:41 -04:00
Steven Perron
70bb3c1cc2 Fold divide and multiply by same value.
We want to fold code like (x*y)/x and other permutations of this.

Fixes #1531.
2018-05-02 10:18:37 -04:00
Diego Novillo
9843484736 Fix build. 2018-05-01 17:46:47 -04:00
Toomas Remmelg
1dc2458060 Add a loop fusion pass.
This pass will look for adjacent loops that are compatible and legal to
be fused.

Loops are compatible if:

- they both have one induction variable
- they have the same upper and lower bounds
    - same initial value
    - same condition
- they have the same update step
- they are adjacent
- there are no break/continue in either of them

Fusion is legal if:

- fused loops do not have any dependencies with dependence distance
  greater than 0 that did not exist in the original loops.
- there are no function calls in the loops (could have side-effects)
- there are no barriers in the loops

It will fuse all such loops as long as the number of registers used for
the fused loop stays under the threshold defined by
max_registers_per_loop.
2018-05-01 15:40:37 -04:00
Stephen McGroarty
9a5dd6fe88 Support loop fission.
Adds support for spliting loops whose register pressure exceeds a user
provided level. This pass will split a loop into two or more loops given
that the loop is a top level loop and that spliting the loop is legal.
Control flow is left intact for dead code elimination to remove.

This pass is enabled with the --loop-fission flag to spirv-opt.
2018-05-01 15:15:10 -04:00
Steven Perron
9ba0879ddf Improve Vector DCE
Track live scalars in VDCE as if they were single element vectors.

Handle the extended instructions for GLSL in VDCE.

Handle composite construct instructions in VDCE.
2018-04-30 11:55:50 -04:00
Steven Perron
a00a0a09ae Revert "Improvements to vector dce."
This reverts commit 2813722993.

A regression was found.  Undoing the change until it is fixed.
2018-04-27 10:33:19 -04:00
Alan Baker
4246abdc74 Fixes handling of kill and unreachable ops in inlining.
Fixes #1527

* Adds handling for copying OpKill and OpUnreachable and forces the
generation of a new basic block
* Adds tests to check
2018-04-27 09:42:37 -04:00
Steven Perron
e1bcd2b2d8 Fold OpVectorTimesScalar and OpPhi better.
If one of the operands to an OpVectorTimesScalar instruction is zero,
then the result will be the 0 vector. Currently we do not fold the
insturction unless both operands are constants. This change fixes that.

We also allow folding of OpPhi instructions where the incoming values
are either an OpUndef or the OpPhi instruction itself. As with other
cases, this can be simplified to the OpUndef.
2018-04-26 12:41:16 -04:00
Steven Perron
2813722993 Improvements to vector dce.
Track live scalars in VDCE as if they were single element vectors.

Handle the extended instructions for GLSL in VDCE.

Handle composite construct instructions in VDCE.

Fixes #1511.
2018-04-26 11:07:48 -04:00
Cort Stratton
72524db2de Fixes #1521: PadToWord() should use std::move() in && variant 2018-04-25 22:03:14 -04:00
Greg Fischer
268be6143d LocalSingleBlockElim: Add store-store elimination
Eliminate unused store to variable if followed by store to same
variable in same block.

Most significantly, this cleans up stores made unused by this pass.
These useless stores can inhibit subsequent optimizations, specifically
LocalSingleStoreElim. Eliminating them makes subsequent optimization more
effective.

The main effect of this pass is to simplify the work done by the SSA
rewriter.  It catches many local loads/stores that help speeding up the
work done by the main rewriter.
2018-04-25 10:30:18 -04:00
Steven Perron
ee8cd5c847 Add Dead insert elmination back in. 2018-04-24 10:10:30 -04:00
Steven Perron
2c0ce87210
Vector DCE (#1512)
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.

It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
2018-04-23 11:13:07 -04:00
David Neto
7a59283587 Another fix for old XCode: std::set explicit ctor in test code 2018-04-20 15:58:01 -04:00
Victor Lomuller
efc5061929 Dominator analysis interface clean.
Remove the CFG requirement when querying a dominator/post-dominator from an IRContext.

Updated all uses of the function and tests.
2018-04-20 15:41:59 -04:00
Jaebaek Seo
48802bad72 Constant folding for OpVectorTimesScalar 2018-04-20 13:43:04 -04:00
Victor Lomuller
0ec08c28c1 Add register liveness analysis.
For each function, the analysis determine which SSA registers are live
at the beginning of each basic block and which one are killed at
the end of the basic block.

It also includes utilities to simulate the register pressure for loop
fusion and fission.

The implementation is based on the paper "A non-iterative data-flow
algorithm for computing liveness sets in strict ssa programs" from
Boissinot et al.
2018-04-20 09:45:15 -04:00
Alan Baker
09c206b6fb Fixes #1480. Validate group non-uniform scopes.
* Adds new pass for validating non-uniform group instructions
 * Currently on checks execution scope for Vulkan 1.1 and SPIR-V 1.3
* Added test framework
2018-04-20 09:25:00 -04:00
David Neto
e7c2e91ded Fix for old XCode: std::set has explicit ctor 2018-04-19 16:33:12 -04:00
GregF
1c89da46ff Test/DependencyAnalysis: Fix uninitialized variables 2018-04-19 15:34:15 -04:00
Greg Fischer
df7f00f60e DeadInsertElim: Don't revisit select phi nodes during MarkInsertChain
Fixes #1487.
2018-04-19 14:40:00 -04:00
Jaebaek Seo
430a29335e Fix broken pointer of CommonUniformElimPass 2018-04-19 09:36:10 -04:00
Steven Perron
c20a718e00 Rewrite local-single-store-elim to not create large data structures.
The local-single-store-elim algorithm is not fundamentally bad.
However, when there are a large number of variables, some of the
maps that are used can become very large.  These large data structures
then take a very long time to be destroyed.  I've seen cases around 40%
if the time.

I've rewritten that algorithm to not use as much memory.  This give a
significant improvement when running a large number of shader through
DXC.

I've also made a small change to local-single-block-elim to delete the
loads that is has replaced.  That way local-single-store-elim will not
have to look at those.  local-single-store-elim now does the same thing.

The time for one set goes from 309s down to 126s.  For another set, the
time goes from 102s down to 88s.
2018-04-18 16:38:18 -04:00
Jaebaek Seo
0fa42996b5
Merge pull request #1461 from jaebaek/fnegate
Add constant folding for OpFNegate

Contributes to #709
2018-04-18 13:46:10 -04:00
Jaebaek Seo
3c5bd26668 Typo 2018-04-17 14:13:19 -04:00
Toomas Remmelg
0f335cf87e Add support for MIV and Delta test dependence analysis.
GCD MIV test as described in Chapter 3 of "Optimizing Compilers for
Modern Architectures: A Dependence-Based Approach" by Randy Allen, and
Ken Kennedy.

Delta test as described in Figure 3 of "Practical Dependence Testing" by
Gina Goff, Ken Kennedy, and Chau-Wen Tseng from PLDI '91.
2018-04-17 13:57:02 -04:00
Jaebaek Seo
ff92339fff Format 2018-04-17 12:12:48 -04:00