Commit Graph

833 Commits

Author SHA1 Message Date
alan-baker
b74d92a8c3
ADCE support for SPIR-V 1.4 entry points (#2561)
Fixes #2551

* Add support for 1.4 entry point interface lists
  * only input and output variables are automatically live
  * can clean up interfaces after DCE
  * added tests
* allow opt tests to specify a target environment
2019-05-07 14:52:22 -04:00
Steven Perron
6d04da22c6
Fix up type mismatches. (#2545)
Add functionality to fix-storage-class so that it can fix up mismatched
data types for pointers as well.

Fixes bugs in when fixing up storage class.

Move GenerateCopy to the Pass class to be reused.

The spirv-opt change for #2535.
2019-05-02 09:31:46 -04:00
Steven Perron
32af42616a
Change implementation of post order CFG traversal (#2543)
* Change implementation of post order CFG traversal

It seems like the recursion is going very deep, and causing some problem
is particular situations.  I've reimplemented the CFG post order
traversal to not use recursion.

Fixes #2539.
2019-04-29 17:09:20 -04:00
Steven Perron
64faf6d9cb
Fix undefined bit shift in sroa. (#2532)
There was a bit shift done on 32-bit values, but they should have been
done on 64-bit values.  This is fixed.  At the same time, uses of size_t
are repalaced by uint64_t to ensure these values are 64-bit.

A test case cannot be created because the code that was change is not
run at the moment since we do not split up vectors or matricies.  I do
not want to delete the code because I like to experitment with it every
once in a while.

Fixes #2528.
2019-04-26 12:52:23 -04:00
Ryan Harrison
b68af7ca8e
Add support for Private & Output to initializer decompose flag (#2537)
Fixes #2388
2019-04-25 16:24:32 -04:00
Ryan Harrison
048dcd38ce
Implement WebGPU->Vulkan initializer conversion for 'Function' variables (#2513)
WebGPU requires certain variables to be initialized, whereas there are
known issues with using initializers in Vulkan. This PR is the first
of three implementing a pass to decompose initialized variables into
a variable declaration followed by a store. This has been broken up
into multiple PRs, because there 3 distinct cases that need to be
handled, which require separate implementations.

This first PR implements the basic infrastructure that is needed, and
handling of Function storage class variables. Private and Output will
be handled in future PRs.

This is part of resolving #2388
2019-04-16 14:31:36 -04:00
Ryan Harrison
102e430a88
Add pass to legalize OpVectorShuffle for WebGPU (#2509)
In WebGPU, the component operand 0xFFFFFFFF is forbidden, but in
Vulkan it is used to indicate a value is undefined. When converting to
WebGPU, 0xFFFFFFFF needs to converted to a legal value, though the
specific one does not matter, since it was used to indicate an
undefined entry in the original code. Choosing to use 0, since the
operands are required to be on [0, N-1], so 0 is guaranteed to always
be valid.

Fixes #2349
2019-04-12 12:14:23 -04:00
Steven Perron
9047de51cb
Accept OpBitCast in fix storage class. (#2505)
Fixes http://crbug.com/950889.
2019-04-09 14:10:35 -04:00
Steven Perron
7ce37d66a8
Fix use of Logf to avoid format security warning (#2498)
When -Wformat-security is enabled, we are getting an error.  I do not
claim to fully understand when the warning is triggered or not, but this
one can be avoided by calling "Log" instead of "Logf" because the
formating string is not needed.
2019-04-08 11:06:48 -04:00
Ryan Harrison
0cb2d4079e
Add WebGPU->Vulkan and Vulkan->WebGPU flags in spirv-opt (#2496)
Renames the existing flag '--webgpu-mode' to '--vulkan-to-webgpu' for
the Vulkan->WebGPU operation, and adds a new flag '--webgpu-to-vulkan'
for the WebGPU->Vulkan operation.

Currently '--webgpu-to-vulkan' doesn't have any passes associated with
it yet, but further patches will implement them.

Fixes #2495
2019-04-05 15:12:26 -04:00
JasperNV
9766b22b33 spirv-opt: Behave a bit better in the face of unknown instructions (#2487)
* opt/ir_loader: Don't silently drop unknown instructions on the floor

Currently, if spirv-opt sees an instruction it does not know, it will
silently ignore it and move to the next one. This changes it
to be an error, as dropping it on the floor is likely to generate
invalid SPIR-V output.

* opt/optimizer: Complain a bit louder for unexpected binary changes

If a binary change happens despite a pass saying that the binaries
should be identical, this is indicative of a bug in the pass itself.

This does not change behavior for it to be an error, but simply emits a warning in this case.
2019-04-05 13:36:42 -04:00
Steven Perron
3a0bc9e724
Add fix storage class code. (#2434)
This pass tries to fix validation error due to a mismatch of storage classes
in instructions.  There is no guarantee that all such error will be fixed,
and it is possible that in fixing these errors, it could lead to other
errors.

Fixes #2430.
2019-04-05 13:12:08 -04:00
alan-baker
236bdc0065 Change prioritization of unreachable merge and continue (#2460)
Fixes #2452

Swaps priority of handling unreachable merge and continues so that the
back-edge is retained in the case a block is both a loop continue and
loop merge
2019-04-03 12:50:08 -04:00
Steven Perron
12e4a7b649
Handle variable pointer in some optimizations (#2490)
* Check var pointer capability in ADCE.

* Check var ptr capability for common uniform.

* Check var ptr capability in access chain convert.

Since we want this pass to run even if there are variable pointer on
storage buffers, we had to remove asserts that assumed there were no
variable pointers.  The functions with the asserts will now work, it
becomes the responsibility of the callers to deal with the output as
appropriate.

* Single block elimination and variable pointers.

It seems like the code in local single block elimination is able to
handle cases with variable pointers already.  This is because the
function `HasOnlySupportedRefs` ensures that variables that feed a
variable pointer are not candidates.

* Single store elimination and variable pointers.

It seems like the code in local single stroe elimination is able to
handle cases with variable pointers already.  This is because the
function `FindSingleStoreAndCheckUses` ensures that variables that feed
a variable pointer are not candidates.

* SSA rewriter and variable pointers.

It seems like the code in the two passes that call the SSA rewriter are
able to  handle cases with variable pointers already.  This is because the
function `HasOnlySupportedRefs` ensures that variables that feed
a variable pointer are not candidates.

Fixes #2458.
2019-04-03 12:47:51 -04:00
Ryan Harrison
01964e325f
Add pass to generate needed initializers for WebGPU (#2481)
Fixes #2387
2019-04-03 11:44:09 -04:00
alan-baker
4bd106b089
Handle dead infinite loops in DCE (#2471)
Fixes #2456

* When eliminating a structured construct that has an unreachable merge,
replace that unreachable terminator with an appropriate return
* New tests
2019-04-03 10:30:12 -04:00
alan-baker
c9874e5090
Fix merge return in the face of breaks (#2466)
Fixes #2453

* Enable addition of OpPhi instructions when the loop has multiple
predecessors of the merge due to a break
 * This can result in some values no longer dominating their uses
* Track return blocks in structured flow to produce OpPhis that have
multiple undef and non-undef arguments
* New tests to catch the bug
* When a block is predicated, mark the new body as a return if the old
block as already a return
2019-04-02 10:05:28 -04:00
alan-baker
0300a464a4 Maintain inst to block mapping in merge return (#2469)
Fixes #2455

Properly maintains instruction to block mapping for newly created phi instructions in merge return
2019-04-01 13:14:10 -04:00
Paul Thomson
fcb8453104
reduce: fix loop to selection pass for loops with combined header/continue block (#2480)
* Fix #2478. The fix is to just not try to simplify such loops. 
* Also added `BasicBlock::MergeBlockId()` and `BasicBlock::ContinueBlockId()`. 
* Some minor changes to `structured_loop_to_selection_reduction_opportunity.cpp`. 
* Added test.
2019-03-29 11:29:24 +00:00
alan-baker
2ff54e34ed
Handle function decls in Structured CFG analysis (#2474)
Fixes #2451

* Structured cfg analysis now handles functions with no basic blocks
* Added a test
2019-03-26 14:39:16 -04:00
alan-baker
42e6f1aa62
Add option to validate after each pass (#2462)
* New command-line option to opt: --validate-after-all
 * Pass manager will validate after each pass it runs
2019-03-26 14:38:59 -04:00
greg-lunarg
e1a76269b6 Bindless Validation: Descriptor Initialization Check (#2419)
If SPV_EXT_descriptor_indexing is enabled, add check that for a
descriptor-based reference, the descriptor is initialized. Initialization
data is stored in the debug input buffer, added to the length information
already there. This feature must be seperately enabled on the pass
creation routine. NOTE: Currently just supports image references; buffer
references are still TODO.
2019-03-19 09:53:43 -04:00
Ryan Harrison
e545522146
Add --strip-atomic-counter-memory (#2413)
Adds an optimization pass to remove usages of AtomicCounterMemory
bit. This bit is ignored in Vulkan environments and outright forbidden
in WebGPU ones.

Fixes #2242
2019-03-14 13:34:33 -04:00
Steven Perron
5186ffedb3
Remove duplicates from list of interface IDs in OpEntryPoint instruction (#2449)
* Remove duplicates from list of interface IDs in OpEntryPoint instruction

Fixes #2002.
2019-03-13 15:46:31 -04:00
Steven Perron
9d29c37ac5
Removing decorations when doing constant propagation. (#2444)
In constant propagation, decoration are transfered from the original
expression to the constant that will replace it.  This can be wrong
because there are no decorations that apply to constants.  We choose to
simply delete the decorations.

Fixes #2441
2019-03-13 10:40:49 -04:00
Steven Perron
d800bbbac9
Handle back edges better in dead branch elim. (#2417)
* Handle back edges better in dead branch elim.

Loop header must have exactly one back edge.  Sometimes the branch
with the back edge can be folded.  However, it should not be folded
if it removes the back edge.

The code to check this simply avoids folding the branch in the
continue block.  That needs to be changed to not fold the back edge,
wherever it is.

At the same time, the branch can be folded if it folds to a branch to
the header, because the back edge will still exist.

Fixes #2391.
2019-02-26 09:06:51 -05:00
Jeff Bolz
002ef361ca Add validation for SPV_NV_cooperative_matrix (#2404) 2019-02-25 17:43:11 -05:00
Steven Perron
fde69dcd80
Fix OpDot folding of half float vectors. (#2411)
* Fix OpDot folding of half float vectors.

The code that folds OpDot does not handle half floats correctly.  After
trying to multiple the first components, we get a nullptr because we
don't fold half float values.  This nullptr gets passed to the code that
does the addition, and causes an assert.

Fixes #2405.
2019-02-20 20:05:08 -05:00
Steven Perron
8eddde2e70
Don't change type of input and output var in dead member elim (#2412)
The types of input and output variables must match for the pipeline.  We
cannot see the uses in all of the shader, so dead member
elimination cannot safely change the type of input and output variables.
2019-02-20 18:59:41 -05:00
greg-lunarg
2f84b5de9a Bindless: Fix computation of set and binding for runtime bounds check (#2384)
Also fix test to use non-zero set and binding which will make error
more obvious.
2019-02-19 11:43:30 -05:00
dan sinclair
528fea2b1e
Fixup unused variables (#2402) 2019-02-19 11:11:04 -05:00
Steven Perron
78ac954c41
Mark type id of unknown instructions at fully used. (#2399) 2019-02-15 10:49:49 -05:00
greg-lunarg
9540f2d981 Instrumentation: Fix instruction index when multiple functions (#2389) 2019-02-15 09:49:18 -05:00
Steven Perron
1b0047f210
Add pass to remove dead members. (#2379)
Add a pass that looks for members of structs whose values do not affects
the output of the shader. Those members are then removed and just
treated like padding in the struct.
2019-02-14 13:42:35 -05:00
alan-baker
354205b3dc
Don't merge unreachable blocks (#2375)
Fixes #2374

* Block merging no longer merges unreachable blocks into their
successors
 * added a test
2019-02-12 09:24:01 -05:00
Ryan Harrison
12b3d7e9d6 Add strip-debug to webgpu-mode passes (#2368)
Fixes #2366
2019-02-08 14:26:17 -05:00
Alastair Donaldson
34c5ac614c
Fixes #2358. Added to the reducer the ability to remove a function t… (#2361)
* Fixes #2358.  Added to the reducer the ability to remove a function that is not directly called.  Factored out some code from the optimizer to help with this.
2019-02-08 16:20:29 +00:00
greg-lunarg
cf21146137 Expand bindless bounds checking to runtime-sized descriptor arrays (#2316) 2019-02-07 14:00:36 -05:00
Ryan Harrison
0f4bf0720a
Add flatten-decorations flag to webgpu-mode flags (#2348)
Fixes #2272
2019-02-05 14:07:53 -05:00
alan-baker
63e032f910
Remove unused lambda capture (#2350) 2019-01-31 15:57:45 -05:00
Alastair Donaldson
3b6fee3dae Fixes #2338. Added functionality to remove OpPhi instructions (replacing their uses) when merging blocks (#2339)
* Fixes #2338.  Added check for phi node before merging blocks.

* Added functionality to merge blocks A and B even when B starts with OpPhi instructions, by replacing uses of the OpPhi results with the definitions coming from A.  Added some tests for this.

* Fixed assertion.
2019-01-31 09:36:05 -05:00
Steven Perron
9ab1c0ddd0
Remove code sinking for -O. (#2340)
Community feedback says it is not generaly benificial, so we will remove
it from the standard optimization set.
2019-01-28 11:50:50 -05:00
Alastair Donaldson
3345fe6a9d
Extracted block merging functionality into its own utility file (#2325)
* Extracted useful functionality from block merger and exposed it as stand-alone methods.

* Separated these methods into a utility file.
2019-01-25 10:57:13 +00:00
greg-lunarg
a64c651e18 Fix Constants Analyses bug inserted by #2302 (#2306)
Need to also remove Constants from the valid_analyses set when
invalidated, otherwise Constants is not reinitialized before used.
2019-01-21 12:34:12 -05:00
Steven Perron
8df947d2d6
Handle instructions not in blocks in code sinking. (#2308)
When looking at the uses of the result of an instruction, code sinking
assumes that all uses are in a basic block.  However, this is not true
if there is a decoration or name for the result of that insturction.
This commit checks for this.

Fixes https://crbug.com/923243.
2019-01-21 12:09:56 -05:00
greg-lunarg
d14db341b8 Invalidate ConstantManager if TypeManager is invalidated... (#2302)
...as the ConstantManager contains pointers into the TypeManager.
2019-01-18 15:49:00 -05:00
Steven Perron
d6c067630d Handle extract with no index in VDCE. (#2305)
It is legal, but not generated by any SPIR-V producer: an OpCompositeExtract
with no indexes.  This is essentially just a copy of the object, so we
treat them that way.  We simply propagate the live variables of the
result to the operand.

Fixes https://crbug.com/919181.
2019-01-18 15:43:36 -05:00
Steven Perron
81fb2649bf
Handle access chain with no index in SROA. (#2304)
It is legal, but not generated by any SPIR-V producer: an OpAccessChain
with no indexes.  This is essentially just a copy of the pointer.

I have decided to treat it like an OpCopyObject.  In CheckUses, we
return that it is not okay.

When looking at this I realized that we had code in GetUsedComponents
that cannot be reached.  If there is a use in an OpCopyObject the it
will not call GetUsedComponents.  I removed that dead code.

Fixes https://crbug.com/918311.
2019-01-18 14:19:43 -05:00
Steven Perron
213e15e100
Fix overflow when negating INT_MIN. (#2293)
When doing (-INT_MIN) is considered overflow, so we cannot fold it by
actually performing the negation.

Fixes https://crbug.com/917991
2019-01-17 17:01:55 -05:00
Steven Perron
99c2c21cf4
Fix memory leak in unrolling. (#2301)
During unrolling a new loop is created, but its ownership is not clear
as it gets passed through the code. Changed something to unique_ptr to
make that clearer.

Fixes #2299.

Fixing other memory leaks at the same time.

Fixes #2296
Fixes #2297
2019-01-17 16:02:43 -05:00
Steven Perron
dd4157dcee
Sink (#2284)
Add code sinking pass. It will move OpLoad and OpAccessChain instructions as close as possible to their uses.

Part of #1611.
2019-01-17 15:56:36 -05:00
greg-lunarg
8d2d66f30c Fix vertex instrumentation to use VertexIndex and InstanceIndex (#2294)
...instead of VertexId and InstanceId
2019-01-16 18:02:07 -05:00
Steven Perron
49b5b0abc6
Fix up bit shifts by 32. (#2292)
In C++, a bit shift of the same size as the type is undefined, but it is
defined in spir-v.  When folding those cases, we have to be careful.  We
cannot simply do the shift in C++.

Fixes https://crbug.com/917697.
2019-01-16 15:52:23 -05:00
greg-lunarg
83bfdc976a Instrumentation: Add ArrayStride decoration to debug output buffer array (#2290) 2019-01-16 10:01:40 -05:00
alan-baker
06c9dc07bd
Upgrade modf and frexp (#2266)
Fixes #2138

* Modf and frexp are upgraded to use the struct version of the
instruction and generate an explicit store whose flags can be upgraded
separately
* Fixed major bug where availability and visibility were reversed for
non-copy memory instructions
* Fixed bug where availability and visibility scope operands were reversed for copy memory
* Upgraded all opt tests to use SPV_ENV_UNIVERSAL_1_3
* Upgrade tests moved into unified tests and removed standalone test
2019-01-07 12:36:38 -05:00
Steven Perron
241644a5a3
Have replace load size handle extact with no index. (#2261)
Fixes https://crbug.com/917774
2019-01-03 13:02:10 -05:00
Steven Perron
9f36c8bb72
Handle CompositeInsert with no indices in VDCE (#2258)
* Handle CompositeInsert with no indices in VDCE

In the spec, there it nothing that forces an OpCompositeInsert to have
an index, but VDCE assumes there is at least 1 in a couple places.

This commit updates VDCE to handle these cases.
2019-01-02 14:00:04 -05:00
Steven Perron
bdc2ab9356
In LICM don't place code between merge instruction and branch. (#2252)
Fixes #2210.
2018-12-20 18:33:52 -05:00
Steven Perron
c2013e248b
Make the constant and type manager analyses. (#2250)
Currently it is impossible to invalidate the constnat and type manager.
However, the compact ids pass changes the ids for the types and
constants, which makes them invalid.  This change will make them
analyses that have to been explicitly marked as preserved by passes.
This will allow compact ids to invalidate them.

Fixes #2220.
2018-12-20 18:00:05 +00:00
kholtnv
e49bd96f2c Added additional changes for the new AccelerationStructureNV type. (#2218)
* Added additional changes for the new AccelerationStructureNV type.

* Added additional changes for the new AccelerationStructureNV type.  Change tabs to space...

* Added additional changes for the new accelerationStructureNV type -- add proper type name.

Fix TypeManager.TypeStrings test:
[----------] 29 tests from TypeManager
[ RUN      ] TypeManager.TypeStrings
[       OK ] TypeManager.TypeStrings (7 ms)
2018-12-19 21:42:39 +00:00
Steven Perron
68b69e16aa
Update the continue target in merge return. (#2249)
When we are predicating the continue target for a loop, it can no longer
be the continue target because it will have a branch that exits the loop
and is not the bach edge.  The continue target will have to be the
target of that branch that is still in the loop.

Fixes #2211.
2018-12-19 21:24:49 +00:00
Steven Perron
ac7feace90
Fix missing OpPhi after merge return. (#2248)
The function `UpdatePhiNodes` was being called inconsistently.  In one
case, the cfg had already been updated to include the new edge, and in
another place the cfg was not updated.  This caused the function to
miss flagging a block as needing new phi nodes.  I picked that the cfg
should not be updated before making the call.  I documented it, and
change the call sites to match.

Fixes #2207.
2018-12-19 18:17:42 +00:00
Steven Perron
9d04f82bef
Ensure SROA gets the correct pointer type. (#2247)
We initially assumed that if the type manager returned the correct id
for the pointee type, that we would get the correct pointer type back,
but that is not true.  See the unit test added with this commit.  We
need to fall back to the linear search any time we are looking for a
pointer to a type that may not be unique.

At the same time, SROA considered an OpName on a variable to be a use of
the entire variable.  That has been fixed.

Fixes #2209.
2018-12-19 17:07:29 +00:00
Steven Perron
9e81c337f9
Place load after OpPhi instructions in block. (#2246)
We currently place the load instructions at the start of the basic block
that dominates all of the loads.  If that basic block contains OpPhi
instructions, then this will generate invalid code.  We just need to
search for a location that comes after all of the OpPhi instructions.

Fixes #2204.
2018-12-19 15:18:22 +00:00
Steven Perron
5ec2d1a8cd
Don't fold specialized branches in loop unswitch (#2245)
* Don't fold specialized branchs in loop unswitch

Folding branches can have a lot of special cases, and can be a little
error prone.  So I only want it in one place.  That will be in dead
branch elimination.  I will change loop unswitching to set the branches
that were being folded to have a constant condition.  Then subsequent
pass of dead branch elimination will be able to remove the code.

At the same time, I added a check that loop unswitching will not
unswitch a branch with a constant condition.  It is not useful to do it
because dead branch elimination will simple fold the branch anyway.
Also it avoid an infinite loop that would other wise be introduced by my
first change.

Fixes #2203.
2018-12-19 04:40:30 +00:00
Ryan Harrison
47c08a79c4
Implement initial --webgpu-mode flag (#2217)
Fixes #2166
2018-12-18 15:10:34 -05:00
Steven Perron
acd2781952
Handle id overflow in inlining. (#2196)
Have inlining return Failure if the ids overflow.

Part of #1841.
2018-12-18 19:34:03 +00:00
Steven Perron
1254335d13
Don't unswitch the latch block. (#2205)
Loop unswitching is unswitching the conditional branch that creates the
back-edge. In the version of the loop, where the bachedge is not taken,
there is no back-edge. This is what causes the validator to complain.

The solution I will go with will be to now unswitch a condition with a
back-edge. At this time we do not now if loop unswitching is used. We do
not include it in the optimization sets provided, nor is it used in
glslang's set. When there are opportunities and no breaks from the loop,
the loop with either be a single iteration loop, or an infinite loop.
There is no performance advantage to performing loop unswitching in
either of those cases. If there is a break, maintaining structured
control flow will be tricky. Unless we see a clear advantage to handling
these case, I would go with the safer simpler solution.

Fixes #2201.
2018-12-18 18:15:00 +00:00
Steven Perron
ff07c6df83
SSA-rewriter: make sure phi entries are unique. (#2206)
If there are multiple edges to a basic block, then the ssa rewriter will
create OpPhi instructions with duplicate entries.  This is invalid, and
it is fixed in this commit.

Fixes #2202.
2018-12-18 18:14:27 +00:00
Ryan Harrison
e0292c269d
Add --target-env flag to spirv-opt (#2216)
Fixes #2199
2018-12-17 16:54:23 -05:00
Jeff Bolz
24328a0554 Recognize OpTypeAccelerationStructureNV as a type instruction (#2190) 2018-12-11 19:03:55 -05:00
Steven Perron
e07dabc25f
Invalidate the decoration manager at the start of ADCE. (#2189)
* Invalidate the decoration manager at the start of ADCE.

If the decoration manager is kept live the the contex will try to keep
it up to date.  ADCE deals with group decorations by changing the
operands in |OpGroupDecorate| instructions directly without informing
the decoration manager.  This puts it in an invalid state, which will
cause an error when the context tries to update it.  To Avoid this
problem, we will invalidate the decoration manager upfront.

At the same time, the decoration manager is now considered when checking
the consistency of the decoration manager.
2018-12-10 13:24:33 -05:00
Steven Perron
0bc66a8ba9
Fix invalid OpPhi generated by merge-return. (#2172)
* Fix invalid OpPhi generated by merge-return.

When we create a new phi node for a value say %10, we have to replace
all of the uses of %10 that are no longer dominated by the def of %10
by the result id of the new phi.  However, if the use is in a phi node,
it is possible that the bb contains the use is not dominated by either.
In this case, needs to be handled differently.

* Split loop headers before add a new branch to them.

In merge return, Phi node in loop header that are also merges for loop
do not get updated correctly.  Those cases do not fit in with our
current analysis.  Doing this will simplify the code by reducing the
number of cases that have to be handled.
2018-12-07 14:10:30 -05:00
Steven Perron
2e4563d94f
Document in the context what happens with id overflow. (#2159)
Added documentation to the ir context to indicates that TakeNextId()
returns 0 when the max id is reached.  TODOs were added to each call
sight so that we know where we have to start to handle this case.

Handle id overflow in |SplitLoopHeader|.

Handle id overflow in |GetOrCreatePreHeaderBlock|.

Handle failure to create preheader in LICM.

Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1841.
2018-12-06 09:07:00 -05:00
Steven Perron
17cba4695c
Remove undefined behaviour when folding shifts. (#2157)
We currently simulate all shift operations when the two operand are
constants.  The problem is that if the shift amount is larger than
32, the result is undefined.

I'm changing the folder to return 0 if the shift value is too high.
That way, we will have defined behaviour.

https://crbug.com/910937.
2018-12-04 10:04:02 -05:00
alan-baker
e510b1bac5
Update memory model (#1904)
Upgrade to VulkanKHR memory model

* Converts Logical GLSL450 memory model to Logical VulkanKHR
* Adds extension and capability
* Removes deprecated decorations and replaces them with appropriate
flags on downstream instructions
* Support for Workgroup upgrades
* Support for copy memory
* Adding support for image functions
* Adding barrier upgrades and tests
* Use QueueFamilyKHR scope instead of device
2018-11-30 14:15:51 -05:00
Steven Perron
2d2a512691
Don't inline recursive functions. (#2130)
* Move ProcessFunction* function from pass to the context.

There are a few functions that are used to traverse the call tree.
They currently live in the Pass class, but they have nothing to do with
a pass, and may be needed outside of a pass.  They would be better in
the ir context, or in a specific call tree class if we ever have a need
for it.

* Don't inline recursive functions.

Inlining does not check if a function is recursive or not.  This has
been fine as long as the shader was a Vulkan shader, which forbid
recursive functions.  However, not all shaders are vulkan, so either
we limit inlining to Vulkan shaders or we teach it to look for recursive
functions.

I prefer to keep the passes as general as is reasonable.  The change
does not require much new code in inlining and gives a reason to refactor
some other code.

The changes are to add a member function to the Function class that
checks if that function is recursive or not.

Then this is used in inlining to not inlining a function call if it calls
a recursive function.

* Add id to function analysis

There are a few places that build a map from ids to Function whose
result is that id.  I decided to add an analysis to the context for this
to reduce that code, and simplify some of the functions.

* Add missing file.
2018-11-29 14:24:58 -05:00
Alastair Donaldson
3b13040cf9 New spirv-reduce reduction pass: operand to dominating id. (#2099)
* Added a reduction pass to replace ids with ids of the same type that dominate them.
* Introduce helper method for querying whether an operand type is an input id.
2018-11-26 17:06:21 -05:00
Daniel Koch
3b210d6a63 Add basic support for EXT_fragment_invocation_density (#2100)
Whitelisting the extension in optimizations
* copying what was done for NV_shading_rate
2018-11-23 10:21:19 -05:00
dan sinclair
15fdcf94d7 Add missing override to ProcessLinesPass 2018-11-19 19:24:48 -05:00
greg-lunarg
c37388f1ad Add passes to propagate and eliminate redundant line instructions (#2027). (#2039)
These are bookend passes designed to help preserve line information
across passes which delete, move and clone instructions. The propagation
pass attaches a debug line instruction to every instruction based on
SPIR-V line propagation rules. It should be performed before optimization.
The redundant line elimination pass eliminates all line instructions
which match the previous line instruction. This pass should be performed
at the end of optimization to reduce physical SPIR-V file size.

Fixes #2027.
2018-11-15 14:06:17 -05:00
Greg Fischer
d4a10590b7 Fix Instruction::IsFloatingPointFoldingAllowed()
Was looking for decorations based on opcode. Should use result_id.
2018-11-14 15:25:51 -07:00
Steven Perron
dc9d155d62
Fix folding of volatile store. (#2048)
When looking for the Volatile mask on a store, the instruction folder
accesses an out-of-bounds element.  We fix that up.

Fixes crbug.com/903530.
2018-11-14 13:52:18 -05:00
Steven Perron
a6150a3fe7
Don't assert on void function parameters. (#2047)
The type manager in spirv-opt currently asserts if a function parameter
has type void.  It is not exactly clear from the spec that this is
disallowed, even if it probably will be disallowed.  In either case,
asserts should be used to verify assumptions that will actually make a
difference to the code.  As far as the optimizer is concerned, a void
parameter does not matter.  I don't see the point of the assert.  I'll
just remove it and let the validator decide whether to accept it or not.

No test was added because it is not clear that it is legal, and should
not force us to accept it in the future unless the spec make it clear
that it is legal.

Fixes crbug.com/903088.
2018-11-14 12:43:43 -05:00
Steven Perron
ec5574a9c6
Instruction::GetBaseAddress to handle OpPtrAccessChain (#2050)
That function currently only handled OpPtrAccessChain if it was in the
middle of the chain, but not at the start.  Fixing that up.

Fixes crbug.com/905271.
2018-11-14 12:42:25 -05:00
dan sinclair
f343a15764
Add missing overrides (#2041) 2018-11-12 15:11:32 -05:00
greg-lunarg
1e9fc1aac1 Add base and core bindless validation instrumentation classes (#2014)
* Add base and core bindless validation instrumentation classes

* Fix formatting.

* Few more formatting fixes

* Fix build failure

* More build fixes

* Need to call non-const functions in order.

Specifically, these are functions which call TakeNextId(). These need to
be called in a specific order to guarantee that tests which do exact
compares will work across all platforms. c++ pretty much does not
guarantee order of evaluation of operands, so any such functions need to
be called separately in individual statements to guarantee order.

* More ordering.

* And more ordering.

* And more formatting.

* Attempt to fix NDK build

* Another attempt to address NDK build problem.

* One more attempt at NDK build failure

* Add instrument.hpp to BUILD.gn

* Some name improvement in instrument.hpp

* Change all types in instrument.hpp to int.

* Improve documentation in instrument.hpp

* Format fixes

* Comment clean up in instrument.hpp

* imageInst -> image_inst

* Fix GetLabel() issue.
2018-11-08 13:54:54 -05:00
greg-lunarg
6721478ef1 Don't assume one return means function can be inlined. (#2018) (#2025)
If there is only 1 return and it is in a loop, then the function cannot be inlined.

Fix condition when inlined code needs one-trip loop wrapper.  The dummy loop is needed when there is a return inside a selection construct.  Even if there is only 1 return.
2018-11-08 09:11:20 -05:00
Jeff Bolz
c06a35b902 Rename PCH macro to spvtools_pch to avoid conflicts with other projects. Also add pch to test/opt. (#2034) 2018-11-07 09:15:04 -05:00
Jeff Bolz
60fac96c6b Enable precompiled headers for spirv-tools(-shared) and some unit tests (#2026) 2018-11-06 09:26:23 -05:00
Steven Perron
f2cc71e5cb
Handle OpMemberDecorateStringGOOGLE in ACDE (#2029)
Add missing case to the switch statement for the annotation
instructions.

See https://github.com/KhronosGroup/glslang/issues/1561.
2018-11-02 13:42:45 -04:00
Jeff Bolz
fb996dce75 Add /Zm flag as a workaround for VS2013 build (#2023) 2018-10-31 07:59:43 -04:00
Steven Perron
6647884a13
Remove MemberDecorateStringGOOGLE during stript-refect. (#2021)
The strip-reflect pass is not removing the reflection decorations that
are decorating members.  With this commit, they will now be removed.

Fixes #2019.
2018-10-30 16:17:35 -04:00
alelenv
1c1e749f0b Add support for nv-raytracing-final (#2010)
Add support for nv-raytracing (non-experimental)
2018-10-25 14:07:46 -04:00
Steven Perron
18fe6d59e5
Fix dead branch elim infinite loop. (#2009)
When looking for a break from a selection construct, we do not realize
that a jump to the continue target of a loop containing the selection
is a break.  This causes and infinit loop, or possibly other failures.

Fixes #2004.
2018-10-24 09:10:30 -04:00
Steven Perron
0ba35798c3
Fix dead branch elim infinite loop. (#1997)
When looking for a break from a selection construct, we do not need to
look inside nested constructs.  However, if a loop header has an
unconditional branch, then we enter the loop.  Entering the loop causes
an infinite loop because we keep going through the loop.

The solution is to look for a merge block, if one exsits, even for block
terminated by an OpBranch.

Fixes #1979.
2018-10-22 13:59:20 -04:00
alan-baker
6e85d1a6fc
Fix restrictions in if conversion (#1998)
Fixes #1991

* Improved identification of potential conditional branches
* Pass changed to only work for shaders
* added a test to catch the bug
2018-10-19 15:16:46 -04:00
Jeff Bolz
dd1e837e1c Use per-configuration location for pch file (#1989) 2018-10-19 14:58:26 -04:00
Steven Perron
715afb0cea
Add a nullptr check to array copy propagation. (#1987)
We are missing a check for a nullptr that is causing things to fail.

Added an extra test case, and fixed up others.

This is the fix for https://github.com/Microsoft/DirectXShaderCompiler/issues/1598.
2018-10-19 12:53:40 -04:00
greg-lunarg
c4687889b7 Fix ADCE to treat OpUnreachable correctly during liveness analysis (#1984)
ADCE liveness algorithm should treat OpUnreachable at least like other
branch instructions. It was being treated as always live which was
preventing useless structured constructs from being eliminated.
OpUnreachable is generated by dead branch elimination which is now
being required by merge return, so this fix should accompany that
change.
2018-10-19 10:16:35 -04:00
Steven Perron
0e68bb3632
Only run merge-returnon reachable functions. (#1983)
We currently run merge-return on all functions, but
dead-branch-elimination only runs on function reachable from an entry
point or exported function.  Since dead-branch-elimination is needed for
merge-return, they have to match.

Fixes #1976.
2018-10-18 08:48:27 -04:00
greg-lunarg
ab45d69154 Fix ADCE liveness to include all enclosing control structures. (#1975)
Was removing control structures which didn't have data dependency
with enclosed live loop and otherwise did not contain live code.
An example is a counting loop around a live loop.

Fixes #1967.
2018-10-16 08:00:07 -04:00
Jeff Bolz
339d23275d Enable precompiled headers for MSVC (#1969) 2018-10-15 11:12:02 -04:00
greg-lunarg
e545564887 Consider atomics that load when analyzing live stores in ADCE (#1956) (#1958)
Consider atomics that load when analyzing live stores in ADCE.

Previously it asserted that the base of an OpImageTexelPointer should
be an image. It is actually a pointer to an image, so IsValidBasePointer
should suffice.
2018-10-12 08:46:35 -04:00
Steven Perron
82663f34c9
Check for unreachable blocks in merge-return. (#1966)
Merge return assumes that the only unreachable blocks are those needed
to keep the structured cfg valid.  Even those must be essentially empty
blocks.

If this is not the case, we get unpredictable behaviour.  This commit
add a check in merge return, and emits an error if it is not the case.

Added a pass of dead branch elimination before merge return in both the
performance and size passes.  It is a precondition of merge return.

Fixes #1962.
2018-10-10 15:18:15 -04:00
Steven Perron
4e266f775a
Fold divisions by 0. (#1963)
The current implementation in the folder when seeing a division by zero
is to assert.  In the release build, the compiler will attempt to
compute the value, which causes its own problems.

The solution I will go with is to fold the division, and just give it
the value of 0.  The same goes for remainder and mod operations.

Fixes #1961.
2018-10-10 11:17:26 -04:00
Steven Perron
497958d899 Removing HLSLCounterBuffer decorations when not needed. (#1954)
The HlslCounterBufferGOOGLE that was introduced changed the OpDecorateId
so that is can now reference an id other than the target.  If that other
id is used only in the decoration, then the definition of the id will be
removed because decoration do not count as real uses.

However, if the target of the decoration is still live the decoration
will not be removed.  This leaves a reference to an id that is not
defined.

There are two solutions to consider.  The first is that is the decoration
is kept, then the definition of the id should be kept live.  Implementing
this change would be involved because the way ADCE handles decorations
will have to be reimplemented.

The other solution is to remove the decoration the id is otherwise dead.
This works for this specific case.  Also this is the more desirable
behaviour in this case.  The id will always be the id of a variable that
belongs to a descriptor set.  If that variable is not bound and we do
not remove it, the driver will complain.

I chose to implement the second solution.  The first will be left to when
a case for it comes up.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1885.
2018-10-05 08:23:09 -04:00
Alan Baker
3b5960174f Don't scalarize spec constant sized arrays
Fixes #1952

* Prevent scalarization of arrays that are sized by a specialization
constant
2018-10-04 11:58:23 -04:00
Steven Perron
146eb3bdcf
Fix erroneous uses of the type manager in copy-prop-arrays. (#1942)
There are a few spots where copy propagate arrays is trying
to go from a Type to an id, but the type is not unique.  When generating
code this pass needs specific ids, otherwise we get type mismatches.
However, the ambigous types means we can sometimes get the wrong type
and generate invalid code.

That code has been rewritten to not rely on the type manager, and just
look at the instructions instead.

I have opened https://github.com/KhronosGroup/SPIRV-Tools/issues/1939 to
try to get a way to make this more robust.
2018-10-01 14:45:44 -04:00
Jeff Bolz
fe90a1d2dc Enable /MP4 (parallel build across 4 cores for MSVC) for SPIRV-Tools/source[/opt] (#1930) 2018-10-01 10:47:39 -04:00
Steven Perron
ddc705933d
Analyze uses for all instructions. (#1937)
* Analyze uses for all instructions.

The def-use manager needs to fill in the `inst_to_used_ids_` field for
every instruction.  This means we have to analyze the uses for every
instruction, even if they do not have any uses.

This mistake was not found earlier because there was a typo in the
equality check for def-use managers.  No new tests are needed.

While looking into this I found redundant work in block merge.  Cleaning
that up at the same time.

* Fix other transformations

Aggressive dead code elimination did not update the OpGroupDecorate
and the OpGroupMemberDecorate instructions properly when they are
updated.  That is fixed.

Dead branch elimination did not analyze the OpUnreachable instructions
that is would add.  That is taken care of.
2018-09-28 14:39:06 -04:00
Steven Perron
32381e30ef
Handle decoration groups with no decorations. (#1921)
In DecorationManager::RemoveDecorationsFrom, we do not remove the id
from a decoration group if the group has no decorations.  This causes
problems because KillNamesAndDecorates is suppose to remove all
references to the id, but in this case, there is still a reference.

This is fixed by adding a special case.

Also, there is the possibility of a double free because
RemoveDecorationsFrom will delete the instructions defining |id| when
|id| is a decoration group.  Later, KillInst would later write to memory
that has been deleted when trying to turn it into a Nop.  To fix this,
we will only remove the decorations that use |id| and not its definition
in RemoveDecorationsFrom.
2018-09-28 14:16:04 -04:00
Steven Perron
80564a56ec
Keep analyses live in unrolling (#1929)
Add code to keep the def-use manger and the inst-to-block mapping up-to-date. This means we do not have to rebuild them later.

To make this work, we will have to have to find places to update the
def-use manager. Updating the def-use manager is not straight forward
because we are unrolling loops, and we have circular references.

This forces one pass to register all of the definitions. A second one
to analyze the uses. Also because there will be references to the new
instructions in the old code, we want to register the definitions of the
new instructions early, so we can update the uses of the older code as
we go along.

The inst-to-block mapping is not too difficult. It can be done as instructions are created.

Fixes #1928.
2018-09-26 17:36:27 -04:00
Steven Perron
0e5fc7d75e
Allow 0 as argument to scalar replacement. (#1917)
A limit of 0 for the scalar replacement options it used to indicate that
there is no limit.  The current implementation does not allow 0.  This
should be fixed.
2018-09-26 09:58:28 -04:00
Steven Perron
b85fb4a300
Get KillNameAndDecorates to handle group decorations. (#1919)
It seems like the current implementation of KillNameAndDecorates does
not handle group decorations correctly.  The id being removed is not
removed from the OpGroupDecorate instructions.  Even worst, any
decorations that apply to that group are removed.

The solution is to use the function in the decoration manager that will
remove the decorations and update the instructions instead of doing the
work itself.
2018-09-25 12:57:44 -04:00
Chao Chen
6e2dab2ffd Add support for Nvidia Turing extensions 2018-09-19 20:46:14 -04:00
Steven Perron
9fbcce4ca1
Add unrolling to the legalization passes (#1903)
Adds unrolling to the legalization passes.

After enabling unrolling I found a bug when there is a self-referencing
phi node.  That has been fixed.

The test that checks for that the order of optimizations is correct also
needed to be updated.
2018-09-19 16:40:09 -04:00
Steven Perron
7075c49923
Add dummy loop in merge-return. (#1896)
The current implementation of merge return can create bad, but correct,
code.  When it is not in a loop construct, it will insert a lot of
extra branch around code.  The potentially large number of branches are
bad.  At the same time, it can separate code store to variables from
its uses hiding the fact that the store dominates the load.

This hurts the later analysis because the compiler thinks that multiple
values can reach a load, when there is really only 1.  This poorer
analysis leads to missed optimizations.

The solution is to create a dummy loop around the entire body of the
function, then we can break from that loop with a single branch.  Also
only new merge nodes would be those at the end of loops meaning that
most analysies will not be hurt.

Remove dead code for cases that are no longer possible.

It seems like some drivers expect there the be an OpSelectionMerge
before conditional branches, even if they are not strictly needed.
So we add them.
2018-09-18 08:52:47 -04:00
Steven Perron
5f599e700e
Fix infinite loop in dead-branch-elimination (#1891)
* Create structed cfg analysis.

There are lots of optimization that have to traverse the CFG in a
structured order just because it wants to know which constructs a
basic block in contained in.  This adds extra complexity to these
optimizations, for causes too much refactoring of older optimizations.

To help with this problem, I have written an analysis that can give this
information.

* Identify branches breaking from loops.

Dead branch elimination does a search for a conditional branch to the
end of the current selection construct.  This search assumes that the
only way to leave the construct is through the merge node.  But that is
not true.  The code can jump to the merge node of a loop that contains
the construct.

The search needs to take this into consideration.
2018-09-17 13:00:24 -04:00
Diego Novillo
4a4632264e Add IR dumping functions to use during debugging.
When using lldb and/or gdb I frequently get odd std::string failures
when using the IR printing instructions we have now.  This adds the
methods  Instruction::Dump(), BasicBlock::Dump() and Function::Dump() to
emit the output of the pretty print to stderr.

With this I can now reliably print IR from gdb and lldb sessions.
2018-09-14 14:28:34 -04:00
Steven Perron
6d5f1bc2e8
Allow merge blocks to merge two header blocks in some cases. (#1890)
In merge blocks, we do not allow the merging of two blocks with merge
instructions.  This is because if the two block are merged only 1 of
those instructions can exists.  However, if the successor block is the
merge block of the predecessor, then we can delete the merge instruction
in the predecessor.  In this case, we are able to merge the blocks.
2018-09-14 13:37:18 -04:00
Steven Perron
75c1bf2843
Add option for the max id bound. (#1870)
* Create a new entry point for the optimizer

Creates a new struct to hold the options for the optimizer, and creates
an entry point that take the optimizer options as a parameter.

The old entry point that takes validator options are now deprecated.
The validator options will be one of the optimizer options.

Part of the optimizer options will also be the upper bound on the id bound.

* Add a command line option to set the max value for the id bound.  The default is 0x3FFFFF.

* Modify `TakeNextIdBound` to return 0 when the limit is reached.
2018-09-10 11:49:41 -04:00
Steven Perron
416b1ab4f3
Have the constant manager take ownership of constants. (#1866)
* Have the constant manager take ownership of constants.

Right now the owner of an object of type contant that is in the
|const_pool_| of the constant manager is unclear.  The constant
manager does not delete them, there is no other reasonable owner.  This
causes memory leaks.

This change fixes the memory leaks by having the constant manager
take ownership of the constant that is stores in |const_pool_|.  Other
changes include interface changes to make it explicit that the constant
manager takes ownership of the object when a constant is registered
with the constant manager.

Fixes #1865.
2018-08-27 09:53:47 -04:00
Steven Perron
47ee776a2c Revert "Have the constant manager take ownership of constants."
This reverts commit b938b74bac.
2018-08-24 15:12:49 -04:00
Steven Perron
b938b74bac Have the constant manager take ownership of constants.
Right now the owner of an object of type contant that is in the
|const_pool_| of the constant manager is unclear.  The constant
manager does not delete them, there is no other reasonable owner.  This
causes memory leaks.

This change fixes the memory leaks by having the constant manager
take ownership of the constant that is stores in |const_pool_|.  Other
changes include interface changes to make it explicit that the constant
manager takes ownership of the object when a constant is registered
with the constant manager.
2018-08-24 15:08:12 -04:00
Steven Perron
d746681fe9
Copy decorations when creating new ids. (#1843)
* Copy decorations when creating new ids.

When creating a new value based on an old value, we need to copy the
decorations to the new id.  This change does this in 3 places:

1) The variable holding the return value of the function generated by
merge return should get decorations from the function.

2) The results of the OpPhi instructions should get decorations from the
variable they are replacing in the ssa writer.

3) In local access chain convert the intermediate struct (result of
OpCompositeInsert) generated for the store replacement should get its
decorations from the variable being stored to.

Fixes #1787.
2018-08-24 11:55:39 -04:00
Steven Perron
b4d3618f77
Don't "break" from selection constructs. (#1862)
If seems like at least 1 driver does not like a condition jump to the end
of a selection construct.  We are generating these in the merge return
pass.  This change stops merge return from generating this sequence.

Part of #1861.
2018-08-23 14:38:25 -04:00
Steven Perron
6c73b1fb70
Update the order when predicating blocks. (#1859)
When doing predicate blocks, we need to traverse every block in
structured order in order to keep track of which construct a block is
contained in.  The standard way of traversing code in structured order
is to create a list with all of the nodes in order.  However, when
predicating blocks, new blocks are created, and those blocks are missed.
This causes branches that go too far.

The solution is to update the order as new blocks are created.  Since
we are using an std::list, we do not have to worry about invalidation of
iterators when changing the list.
2018-08-23 12:59:31 -04:00
Steven Perron
d91d34e150
Fix VS2013 build break. (#1853) 2018-08-21 13:50:47 -04:00
Steven Perron
19264ef42c
Have PredicateBlocks jump the existing merge blocks. (#1849)
* Refactor PredicateBlocks

Refactor PredicateBlocks so that we know which constructs a return
is contained in.  Will be used later.

* Have PredicateBlocks jump the existing merge blocks.

In PredicateBlocks, we currently skip instructions with side effects,
but it still follows the same control flow (sort-of).  This causes a
problem, when we are trying to predicate code in a loop.  We skip all
of the code with side effects (IV increment), but still follow the
same control flow (jump back the start of the loop).  This creates an
infinite loop because the code will keep jumping back to the start of
the loop without changing the values that effect the exit condition.

This is a large change to merge-return.  When predicating a block that
is in a loop or merge construct, it will jump to the merge block of the
construct.  Once out of all constructs we will generate code as we did
before.
2018-08-21 12:04:08 -04:00
Steven Perron
d693a83e36
Handle breaks from structured-ifs in DCE. (#1848)
* Handle breaks from structured-ifs in DCE.

dead code elimination assumes that are conditional branches except for
breaks and continues in loops will have an OpSelectionMerge before them.
That is not true when breaking out of a selection construct.

The fix is to look for breaks in selection constructs in the same place
we look for breaks and continues for loops.
2018-08-21 11:54:44 -04:00
Steven Perron
45c235d41f
Have dead-branch-elim handle conditional exits from selections. (#1850)
When dead-branch-elim folds a conditional branch, it also deletes the
OpSelectionMerge instruction.  If that construct contains a
conditional branch to the merge node, it will not have its own
OpSelectionMerge.  When the headers merge instruction is deleted, the
the inner conditional branch will no longer be legal.  It will be a
selection to a node that is not a merge node.

We fix this up by moving the OpSelectionMerge to a new location if it is
still needed.
2018-08-21 11:49:56 -04:00
Diego Novillo
03000a3a38 Add testing framework for tools.
This forks the testing harness from https://github.com/google/shaderc
to allow testing CLI tools.

New features needed for SPIRV-Tools include:

1- A new PlaceHolder subclass for spirv shaders.  This place holder
   calls spirv-as to convert assembly input into SPIRV bytecode. This is
   required for most tools in SPIRV-Tools.

2- A minimal testing file for testing basic functionality of spirv-opt.

Add tests for all flags in spirv-opt.

1. Adds tests to check that known flags match the names that each pass
   advertises.
2. Adds tests to check that -O, -Os and --legalize-hlsl schedule the
   expected passes.
3. Adds more functionality to Expect classes to support regular
   expression matching on stderr.
4. Add checks for integer arguments to optimization flags.
5. Fixes #1817 by modifying the parsing of integer arguments in
   flags that take them.
6. Fixes -Oconfig file parsing (#1778). It reads every line of the file
   into a string and then parses that string by tokenizing every group of
   characters between whitespaces (using the standard cin reading
   operator).  This mimics shell command-line parsing, but it does not
   support quoting (and I'm not planning to).
2018-08-17 15:03:14 -04:00
Steven Perron
e065cc208f
Keep decorations when replacing loads in access-chain-convert. (#1829)
In local-access-chain-convert, we replace loads by load the entire
variable, then doing the extract.  The extract will have the same value
as the load.  However, if the load has a decoration on it, the
decoration is lost because we do not copy any them to the new id.

This is fixed by rewritting the load into the extract and keeping the
same result id.

This change has the effect that we do not call DCEInst on the loads
because the load is not being deleted, but replaced.  This could leave
OpAccessChain instructions around that are not used.  This is not a
problem for -O and -Os.  They run local_single_*_elim passes and then
dead code elimination.  The dce will remove the unused access chains,
and the load elimination passes work even if there are unused access
chains.  I have added test to them to ensure they will not loss
opportunities.

Fixes #1787.
2018-08-15 09:14:21 -04:00
dan sinclair
1963a2dbda
Use MakeUnique. (#1837)
This CL replaces instances of reset(new ..) with MakeUnique.
2018-08-14 15:01:50 -04:00
dan sinclair
1553025f4c
Move make_unique to source/util. (#1836)
This MakeUnique code is used in places other then source/opt so move it
to source/utils.
2018-08-14 12:44:54 -04:00
Steven Perron
bf24d9b4ac
Don't copy decorations twice when rebuilding a type. (#1835)
In `TypeManager::RebuildType`, the base cases call `Clone`, which will
copy the decorations for the type.  After that it breaks out of the
switch statement and copies the decorations again.

This has not causes any real problems yet because none of those types
are allowed to have decorations.  However to make the code more robust
it is best to not copy twice because it should be empty.

This way if a new base type or decoration is added that changes this
rule the code will be correct.
2018-08-14 11:26:14 -04:00
Steven Perron
bcb0b6935c
Reenable --skip-validation. (#1820)
In previous changes, the option `--skip-validation` was disabled.  This
change is to reenable it.
2018-08-13 13:18:46 -04:00
Steven Perron
5c8b4f5a1c
Validate the input to Optimizer::Run (#1799)
* Run the validator in the optimization fuzzers.

The optimizers assumes that the input to the optimizer is valid.  Since
the fuzzers do not check that the input is valid before passing the
spir-v to the optimizer, we are getting a few errors.

The solution is to run the validator in the optimizer to validate the
input.

For the legalization passes, we need to add an extra option to the
validator to accept certain types of variable pointers, even if the
capability is not given.  At the same time, we changed the option
"--legalize-hlsl" to relax the validator in the same way instead of
turning it off.
2018-08-08 11:16:19 -04:00
dan sinclair
9991d661f8
Fix readbility/braces warnings (#1804) 2018-08-07 09:09:47 -04:00
dan sinclair
eda2cfbe12
Cleanup includes. (#1795)
This Cl cleans up the include paths to be relative to the top level
directory. Various include-what-you-use fixes have been added.
2018-08-03 15:06:09 -04:00
dan sinclair
58a6876cee
Rewrite include guards (#1793)
This CL rewrites the include guards to make PRESUBMIT.py include guard
check happy.
2018-08-03 08:05:33 -04:00
Steven Perron
ce644d4a24
Update OpPhi instructions after splitting block. (#1783)
In the merge return pass, we will split a block, but not update the phi
instructions that reference the block.  Since the branch in the original
block is now part of the block with the new id, the phi nodes must be
updated.

This commit will change this.

I have also considered other places where an id of a basic block could
be referenced, and I don't think any of them need to change.

1) Branch and merge instructions: These jump to the start of the
original block, and so we want them to jump to the block that uses the
original id.  Nothing needs to change.

2) Names and decorations: I don't think it matters with block keeps the
name, and there are no decorations that apply to basic blocks.

Fixes #1736.
2018-08-02 11:02:50 -04:00
dan sinclair
a5a5ea0e2d
Remove using std::<foo> statements. (#1756)
Many of the files have using std::<foo> statements in them, but then the
use of <foo> will be inconsistently std::<foo> or <foo> scattered
through the file. This CL removes all of the using statements and
updates the code to have the required std:: prefix.
2018-08-01 14:58:12 -04:00
Steven Perron
c8c724cba7
Don't change decorations and names in merge return. (#1777)
When creating a new phi for a value in the function, merge return will
rewrite all uses of an id that are no longer dominated by its
definition.  Uses that are not in a basic block, like OpName or
decorations, are not dominated, but they should not be replaced.

Fixes #1736.
2018-08-01 13:47:09 -04:00
Alan Baker
755e5c9420 Transform to combine consecutive access chains
* Combines OpAccessChain, OpInBoundsAccessChain, OpPtrAccessChain and
OpInBoundsPtrAccessChain
* New folding rule to fold add with 0 for integers
 * Converts to a bitcast if the result type does not match the operand
 type
V
2018-07-31 13:42:47 -04:00
Diego Novillo
99fe61e724 Add API to create passes out of a list of command-line flags.
This re-implements the -Oconfig=<file> flag to use a new API that takes
a list of command-line flags representing optimization passes.

This moves the processing of flags that create new optimization passes
out of spirv-opt and into the library API.  Useful for other tools that
want to incorporate a facility similar to -Oconfig.

The main changes are:

1- Add a new public function Optimizer::RegisterPassesFromFlags. This
   takes a vector of strings.  Each string is assumed to have the form
   '--pass_name[=pass_args]'.  It creates and registers into the pass
   manager all the passes specified in the vector.  Each pass is
   validated internally.  Failure to create a pass instance causes the
   function to return false and a diagnostic is emitted to the
   registered message consumer.

2- Re-implements -Oconfig in spirv-opt to use the new API.
2018-07-27 15:10:08 -04:00
Alan Baker
b49f76fd62 Handle undef literal value in vector shuffle
Fixes #1731

* Updated folding rules related to vector shuffle to account for the
undef literal value:
 * FoldVectorShuffleFeedingShuffle
 * FoldVectorShuffleFeedingExtract
 * FoldVectorShuffleWithConstants
* These rules would commit memory violations due to treating the undef
literal value as an accessible composite component
2018-07-20 11:32:43 -04:00
dan sinclair
effafedcee
Replace opt::Instruction type and result cache with flags. (#1718)
Currentlty opt::Instruction class holds a cache of the result_id and
type_id for the instruction. That cache needs to be updated if the
underlying operand values are changes.

This CL changes the cache to being a flag if there is a type or result
id for the instruction. We then retrieve the value if needed from the
operands.
2018-07-20 11:09:30 -04:00
Alan Baker
3c19651733 Add variable pointer support to IsValidBasePointer
Fixes #1729

* Adds supported opcodes to IsValidBasePointer() enable by
VariablePointers and VariablePointersStorageBuffer capabilities
 * Added tests
2018-07-19 14:43:59 -04:00
Alan Baker
28199b80b7 Fix block ordering in dead branch elim
Fixes #1727

* If the pass finds any dead branches it can optimize then at the end of
the pass it reorders basic blocks to ensure they satisfy block ordering
requirements
 * Added some new tests
* While investigating this issue, found and fixed a non-deterministic
ordering of dominators
 * Now the edges used to construct the dominator tree are sorted
 according to posorder traversal indices
2018-07-19 11:17:57 -04:00
Steven Perron
208921efe8 Fix finding constant with particular type. (#1724)
With current implementation, the constant manager does not keep around
two constant with the same value but different types when the types
hash to the same value. So when you start looking for that constant you
will get a constant with the wrong type back.

I've made a few changes to the constant manager to fix this.  First off,
I have changed the map from constant to ids to be an std::multimap.
This way a single constant can be mapped to mutiple ids each
representing a different type.

Then when asking for an id of a constant, we can search all of the ids
associated with that constant in order to find the one with the correct
type.
2018-07-16 12:36:53 -04:00
Steven Perron
95b4d47e34 Fix infinite loop while folding OpVectorShuffle (#1722)
When folding an OpVectorShuffle where the first operand is defined by
an OpVectorShuffle, is unused, and is equal to the second, we end up
with an infinite loop.  This is because we think we change the
instruction, but it does not actually change.  So we keep trying to
folding the same instruction.

This commit fixes up that specific issue.  When the operand is unused,
we replace it with Null.
2018-07-13 12:43:00 -04:00
Steven Perron
63c1d8fb15
Fix size error when folding vector shuffle. (#1721)
When folding a vector shuffle that feeds another vector shuffle causes
the size of the first operand to change, when other indices have to be
adjusted reletive to the new size.
2018-07-13 11:20:02 -04:00
dan sinclair
c7da51a085
Cleanup extraneous namespace qualifies in source/opt. (#1716)
This CL follows up on the opt namespacing CLs by removing the
unnecessary opt:: and opt::analysis:: namespace prefixes.
2018-07-12 15:14:43 -04:00
dan sinclair
e477e7573e
Remove the module from opt::Function. (#1717)
The function class provides a {Set|Get}Parent call in order to provide
the context to the LoopDescriptor methods. This CL removes the module
from Function and provides the needed context directly to LoopDescriptor
on creation.
2018-07-12 14:42:05 -04:00
dan sinclair
3ded745f21
Cleanup CFG header. (#1715)
This CL removes some unused methods from CFG, makes the constructor
explicit and moves the using statement to the cpp file where it's used.
2018-07-12 14:40:40 -04:00
dan sinclair
6803e42bb5
Cleanup some pass code to get context directly. (#1714)
Instead of going through the instruction we can access the context()
directly from the pass.

Issue #1703
2018-07-12 11:13:32 -04:00
dan sinclair
a5e4a53217
Remove context() method from opt::Function (#1700)
This CL removes the context() method from opt::Function. In the places
where the context() was used we can retrieve, or provide, the context in
another fashion.
2018-07-12 10:16:15 -04:00
dan sinclair
4cc6cd184a
Pass the IRContext into the folding rules. (#1709)
This CL updates the folding rules to receive the IRContext as a paramter
instead of retrieving off of the Instruction.

Issue #1703
2018-07-12 09:12:23 -04:00
dan sinclair
f96b7f1cb9
use Pass::Run to set the context on each pass. (#1708)
Currently the IRContext is passed into the Pass::Process method. It is
then up to the individual pass to store the context into the context_
variable. This CL changes the Run method to store the context before
calling Process which no-longer receives the context as a parameter.
2018-07-12 09:08:45 -04:00
Steven Perron
e63551deac Add folding rule to merge a vector shuffle feeding another one. 2018-07-11 14:44:46 -04:00
dan sinclair
2cce2c5b97
Move tests into namespaces (#1689)
This CL moves the test into namespaces based on their directories.
2018-07-11 09:24:49 -04:00
Steven Perron
cbdbbe9a26 Fix up code to make ClangTidy happy.
Just a few changes to pass `std::function` objects by const reference
instead of by value.
2018-07-10 13:59:01 -04:00
dan sinclair
84846b7e76
Cleanup whitespace lint warnings. (#1690)
This CL cleans up the whitespace warnings and enables the check when
running 'git cl presubmit --all -uf'.
2018-07-10 13:09:46 -04:00
dan sinclair
e6b953361d
Move the ir namespace to opt. (#1680)
This CL moves the files in opt/ to consistenly be under the opt::
namespace. This frees up the ir:: namespace so it can be used to make a
shared ir represenation.
2018-07-09 11:32:29 -04:00
dan sinclair
3dad1cda11
Change libspirv to spvtools namespace (#1678)
This CL changes all of the libspirv namespace code to spvtools to match
the rest of the code base.
2018-07-07 09:38:00 -04:00
dan sinclair
76e0bde196 Move utils/ to spvtools::utils
Currently the utils/ folder uses both spvutils:: and spvtools::utils.
This CL changes the namespace to consistenly be spvtools::utils to match
the rest of the codebase.
2018-07-06 16:47:46 -04:00
Steven Perron
a45d4cac61 Move folding routines into a class
The folding routines are currently global functions.  They also rely on
data in an std::map that holds the folding rules for each opcode.  This
causes that map to not have a clear owner, and therefore never gets
deleted.

There has been a request to delete this map.  To implement this, we will
create a InstructionFolder class that owns the maps.  The IRContext will
own the InstructionFolder instance.  Then the global functions will
become public memeber functions of the InstructionFolder.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1659.
2018-07-05 17:52:43 -04:00
Steven Perron
9ecbcf5fc8 Make sure the constant folder get the correct type.
There are a few locations where we need to handle duplicate types.  We
cannot merge them because they may be needed for reflection.  When this
happens we need do some extra lookups in the type manager.

The specific fixes are:

1) When generating a constant through `GetDefiningInstruction` accept
and use an id for the desired type of the constant.  This will make sure
you get the type that is needed.

2) In Private-to-local, make sure we to update the def-use chains when a
new pointer type is created.

3) In the type manager, make sure that `FindPointerToType` returns a
pointer that points to the given type and not a duplicate type.

4) In scalar replacment, make sure the null constants that are created
are the correct type.
2018-07-05 14:34:30 -04:00
Steven Perron
101a9bcbb0 Add private to local to optimization and size passes.
Many optimization will run on function scope symbols only.  When symbols
are moved from private scope to function scople, then these optimizations
can do more.

I believe it is a good idea to run this pass with both -O and -Os.  To
get the most out of it it should be run ASAP after inlining and something
that remove all of the dead functions.
2018-07-04 21:26:09 -04:00
Steven Perron
465f2815cb Revert change and stop running remove duplicates.
Revert "Don't merge types of resources"

This reverts commit f393b0e480, but leaves
the tests that were added.  Added new test. These test are the so that,
if someone tries the same change I made, they will see the test that
they need to handle.

Don't run remove duplicates in -O and -Os

Romve duplicates was run to help reduce compile time when looking for
types in the type manager.  I've run compile time test on three sets
of shaders, and the compile time does not seem to change.

It should be safe to remove it.
2018-06-29 14:09:44 -04:00
Steven Perron
2eb9bfb5b6 Remove stores of undef.
When storing an undef, any value is valid, including the one already in
that memory location.  So we can avoid the store.
2018-06-29 09:49:19 -04:00
Greg Roth
4717d24e24 Fix assert during compact IDs pass (#1649)
During the compact IDs optimization pass, the result IDs of some
basic blocks can change. In spite of this, GetPreservedAnalyses
indicated that the CFG was preserved. But the CFG relies on
the basic blocks having the same IDs. Simply removing this flag
resolves the issue by preventing the CFG check.

Also Removes combinators and namemap preserved analyses from
compact IDs pass.
2018-06-27 19:29:08 -04:00
Steven Perron
f393b0e480 Don't merge types of resources
When doing reflection users care about the names of the variable, the
name of the type, and the name of the members.  Remove duplicates breaks
this because it removes the names one of the types when merging.

To fix this we have to keep the different types around for each
resource.  This commit adds code to remove duplicates to look for the
types uses to describe resources, and make sure they do not get merged.

However, allow merging of a type used in a resource with something not
used in a resource.  Was done when the non resource type came second.

This could have a negative effect on compile time, but it was not
expected to be much.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1372.
2018-06-27 13:57:07 -04:00
Alan Baker
0d43e10b4a Use type id when looking up vector type
Fixes #1634

* Vector components of composite constructs used wrong accessor
2018-06-25 09:47:29 -04:00
Steven Perron
1f7b1f1bf7 Small vector optimization for operands.
We replace the std::vector in the Operand class by a new class that does
a small size optimization.  This helps improve compile time on Windows.

Tested on three sets of shaders.  Trying various values for the small
vector.  The optimal value for the operand class was 2.  However, for
the Instruction class, using an std::vector was optimal.  Size of "0"
means that an std::vector was used.

                Instruction size
	        0      4      8
Operand Size

0               489    544    684
1               593    487
2               469    570
4               473
8               505

This is a single thread run of ~120 shaders.  For the multithreaded run
the results were the similar.  The basline time was ~62sec.  The
optimal configuration was an 2 for the OperandData and an
std::vector for the OperandList with a compile time of ~38sec.  Similar
expiriments were done with other sets of shaders.  The compile time still
improved, but not as much.

Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1609.
2018-06-12 13:41:08 -04:00
Steven Perron
a1f9e1342e Preserve inst-to-block and def-use in passes.
The following passes are updated to preserve the inst-to-block and
def-use analysies:

	private-to-local
	aggressive dead-code elimination
	dead branch elimination
	local-single-block elimination
	local-single-store elimination
	reduce load size
	compact ids (inst-to-block only)
	merge block
	dead-insert elimination
	ccp

The one execption is that compact ids still kills the def-use manager.
This is because it changes so many ids it is faster to kill and rebuild.

Does everything in
https://github.com/KhronosGroup/SPIRV-Tools/issues/1593 except for the
changes to merge return.
2018-06-04 13:48:30 -04:00
Steven Perron
fe2fbee294 Delete the insert-extract-elim pass.
Replaces anything that creates an insert-extract-elim pass and create
a simplifiation pass instead.  Then delete the implementation of the
pass.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.
2018-06-01 10:13:39 -04:00
Steven Perron
9a008835f4 Add store for var initializer in inlining.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1591.
2018-06-01 09:44:42 -04:00
Alan Baker
badcf73d00 Allow duplicate pointer types
Fixes #1577

* Remove validation requiring unique pointer types unless variable
pointers extension enabled
* Modified scalar replacement to always look for an undecorated pointer
2018-05-31 09:14:38 -04:00
Steven Perron
93c4c184d5 Handle types with self references.
By using forward pointers, we are able to define a struct that has a
pointer to itself.  This could be directly or indirectly.  The current
implementation of the type manager did not handle this case.  There are
three changes that are made in this commit inorder to handle this case:

1) Change the handling of OpTypeForwardPointer

The current handling of OpTypeForwardsPointer is broken if there is a
reference to the pointer before the real definition.  When build the
type that contain the forward delared pointer, the type manager will ask
for the type for that ID, and will get a nullptr because it does not
exists.  This nullptr is not handleded very well.

The change is to keep track of the incomplete types the first time
through all of the types.  An incomplete type is a ForwardPointer or any
type that references an incomplete type.

Then we implement a second pass through the incomplete types that will
complete them.

2) Hashing types.

When hashing a type, we want to uses all of the subtypes as part of the
hash.  However, with types that reference them selves, this creates an
infinite recursion.  To get around this, we keep track of which types
have been seen on the path from the root type.  If we have see the
current type already then we can stop the recursion.

3) Comparing types.

In order to check if two types are the same, we must check that all of
their subtypes are the same as well.  This also causes an infinit
recursion.  The solution is to stop comparing the subtypes if we are
trying to compare two pointer types that we are already in the middle of
comparing.  The ideas is that if the two pointer are different, then in
progress compare will return false itself.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1578.
2018-05-30 15:48:38 -04:00
Steven Perron
745dd00af9 Fold FMix feeding Extract, and use the simplification pass.
We add a new rule to the folding rules to fold an FMix feeding an
extract when the alpha value for the element being extracted is either
0 or 1.  In those case, we can simple extract from one of the operands
to the FMix.

With that change the simplification pass completely subsumes the
insert-extract elimination pass.  So we remove the insert-extract
elimination passes and replce them with calls to the simplification
pass.

In a follow up PR, we should delete the insert-extract elimination pass.

Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.
2018-05-25 14:42:59 -04:00
Arseny Kapoulkine
f765d16bd9 Add external interface for creating a pass token
Currently it's impossible for external code to register a pass because
the only source file that can create pass tokens is optimizer.cpp. This
makes it hard to add passes that can't be upstreamed since you can't run
them from the usual pass sequence without reimplementing Optimizer.

This change adds a PassToken constructor that takes unique_ptr to
opt::Pass; if out-of-tree code implements opt::Pass it can register a
custom pass without having to add it to SPIRV-Tools source code.
2018-05-25 09:19:43 -04:00
Steven Perron
a579e720a8 Remove the limit on struct size in SROA.
Removes the limit on scalar replacement for the lagalization passes.
This is done by adding an option to the pass (and command line option)
to set the limit on maximum size of the composite that scalar
replacement is willing to divide.

Fixes #1494.
2018-05-18 10:03:46 -04:00
Steven Perron
f1f7cc870e Get ADCE to handle OpCopyMemory
ADCE does not treat OpCopyMemory as an instruction that references
memory.  Because of that stores are removed that should not be.

This change teaches ADCE that OpCopyMemory and OpCopyMemorySize both
loads from and stores to memory.  This will keep other stores live when
needed, and will allows ADCE to remove OpCopyMemory instructions as
well.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1556.
2018-05-16 13:50:47 -04:00
Steven Perron
9b1a938ea1 SROA: Only create symbols that are loaded.
Currently in scalar replacement, we create a new variable for every
memeber of the composite being divided.  It is often overkill, because
not all of those members will be used.  This change will check which
elements are used and only create variable for the members that are
used.

This reduces the compile time for one set of shader from 248s to 165s.

Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
2018-05-16 10:48:25 -04:00
Steven Perron
0e1b7e5aef Fix getting operand without checking opcode.
Fixes https://github.com/KhronosGhttps://github.com/KhronosGroup/SPIRV-Tools/issues/1559roup/SPIRV-Tools/issues/1559.

There is an load of an operand of an instruction that was suppose to be
only for the OpCompositeExtract case.  However, an error caused it to
be loaded for every opcode, even those that do not have an operand in
that position.

We fix up that bug, and a couple other things noticed that the same
time.
2018-05-16 09:34:43 -04:00
Steven Perron
f46f2d3e5d Remove redundant stores.
The code patterns generated by DXC around function calls can cause many
store to be storing the same value that was just loaded from the same
location:

```
%10 = OpLoad %type %var
OpStore %var %10
```

We want to clean these up very early on because they can cause other
transformations to do a lot of work.  For the cases I see, they can be
removed during local-single-block-elim.

For one set of shaders the compile time goes from 248s to 182s.  A 26%
improvement.

Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
2018-05-15 10:24:05 -04:00
Steven Perron
af430ec822 Add pass to fold a load feeding an extract.
We have already disabled common uniform elimination because it created
sequences of loads an entire uniform object, then we extract just a
single element.  This caused problems in some drivers, and is just
generally slow because it loads more memory than needed.

However, there are other way to get into this situation, so I've added
a pass that looks specifically for this pattern and removes it when only
a portion of the load is used.

Fixes #1547.
2018-05-14 15:40:34 -04:00
Steven Perron
804e8884c4 Fold fclamp feeding compare.
An FClamp instruction forces a values to be within a certain interval.
When the upper or lower bound of the FClamp is a constant and the value
being compared with is a constant, then in some case we can fold the
compared because the entire range is say less than the value.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1549.
2018-05-14 10:27:49 -04:00
Steven Perron
9ec3f81e5c Remove dead Workgroup variables in ADCE.
If there is a shader with a variable in the workgroup storage class that
is stored to, but not loadeds, then we know nothing will read those
loads.  It should be safe to remove them.

This is implemented in ADCE by treating workgroup variables the same
way that private variables are treated.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1550.
2018-05-09 16:07:26 -04:00
Steven Perron
0856997df6 Allow ADCE to remove more instructions.
At this time, DCE will only remove an instruction if it is a combinator.
However, there are certain non-combinator instructions that can be
safely removed if their results are not used.  The derivative
instructions are on example.

We are also missing some instructions from the list of combinators
those are added as the same time.
2018-05-05 09:15:28 -04:00
Steven Perron
7d01643132 Allow hoisting code in if-conversion.
When doing if-conversion, we do not currently move code out of the side
nodes.  The reason for this is that it can increase the number of
instructions that get executed because both side nods will have to be
executed now.

In this commit, we add code to move an instruction, and all of the
instructions it depends on, out of a side node and into the header of
the selection construct.  However to keep the cost down, we only do it
when the two values in the OpPhi node compute the same value.  This way
we have to move only one of the instructions and the other becomes
unused most of the time.  So no real extra cost.

Makes the value number table an alalysis in the ir context.

Added more opcodes to list of code motion safe opcodes.

Fixes #1526.
2018-05-04 12:56:29 -04:00
Stephen McGroarty
1c2cbaf569 Add GetContinueBlock to loop class.
Previously, the loop class used the terms latch and continue block
interchangeably. This patch splits the two and corrects and tests some
uses of the old uses of GetLatchBlock.
2018-05-03 14:30:41 -04:00
Steven Perron
70bb3c1cc2 Fold divide and multiply by same value.
We want to fold code like (x*y)/x and other permutations of this.

Fixes #1531.
2018-05-02 10:18:37 -04:00
Toomas Remmelg
1dc2458060 Add a loop fusion pass.
This pass will look for adjacent loops that are compatible and legal to
be fused.

Loops are compatible if:

- they both have one induction variable
- they have the same upper and lower bounds
    - same initial value
    - same condition
- they have the same update step
- they are adjacent
- there are no break/continue in either of them

Fusion is legal if:

- fused loops do not have any dependencies with dependence distance
  greater than 0 that did not exist in the original loops.
- there are no function calls in the loops (could have side-effects)
- there are no barriers in the loops

It will fuse all such loops as long as the number of registers used for
the fused loop stays under the threshold defined by
max_registers_per_loop.
2018-05-01 15:40:37 -04:00
Stephen McGroarty
9a5dd6fe88 Support loop fission.
Adds support for spliting loops whose register pressure exceeds a user
provided level. This pass will split a loop into two or more loops given
that the loop is a top level loop and that spliting the loop is legal.
Control flow is left intact for dead code elimination to remove.

This pass is enabled with the --loop-fission flag to spirv-opt.
2018-05-01 15:15:10 -04:00
Steven Perron
9ba0879ddf Improve Vector DCE
Track live scalars in VDCE as if they were single element vectors.

Handle the extended instructions for GLSL in VDCE.

Handle composite construct instructions in VDCE.
2018-04-30 11:55:50 -04:00
Steven Perron
a00a0a09ae Revert "Improvements to vector dce."
This reverts commit 2813722993.

A regression was found.  Undoing the change until it is fixed.
2018-04-27 10:33:19 -04:00
Alan Baker
4246abdc74 Fixes handling of kill and unreachable ops in inlining.
Fixes #1527

* Adds handling for copying OpKill and OpUnreachable and forces the
generation of a new basic block
* Adds tests to check
2018-04-27 09:42:37 -04:00
Steven Perron
e1bcd2b2d8 Fold OpVectorTimesScalar and OpPhi better.
If one of the operands to an OpVectorTimesScalar instruction is zero,
then the result will be the 0 vector. Currently we do not fold the
insturction unless both operands are constants. This change fixes that.

We also allow folding of OpPhi instructions where the incoming values
are either an OpUndef or the OpPhi instruction itself. As with other
cases, this can be simplified to the OpUndef.
2018-04-26 12:41:16 -04:00
Steven Perron
2813722993 Improvements to vector dce.
Track live scalars in VDCE as if they were single element vectors.

Handle the extended instructions for GLSL in VDCE.

Handle composite construct instructions in VDCE.

Fixes #1511.
2018-04-26 11:07:48 -04:00
Greg Fischer
268be6143d LocalSingleBlockElim: Add store-store elimination
Eliminate unused store to variable if followed by store to same
variable in same block.

Most significantly, this cleans up stores made unused by this pass.
These useless stores can inhibit subsequent optimizations, specifically
LocalSingleStoreElim. Eliminating them makes subsequent optimization more
effective.

The main effect of this pass is to simplify the work done by the SSA
rewriter.  It catches many local loads/stores that help speeding up the
work done by the main rewriter.
2018-04-25 10:30:18 -04:00
Steven Perron
ee8cd5c847 Add Dead insert elmination back in. 2018-04-24 10:10:30 -04:00
Steven Perron
2c0ce87210
Vector DCE (#1512)
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.

It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
2018-04-23 11:13:07 -04:00
Victor Lomuller
efc5061929 Dominator analysis interface clean.
Remove the CFG requirement when querying a dominator/post-dominator from an IRContext.

Updated all uses of the function and tests.
2018-04-20 15:41:59 -04:00
Jaebaek Seo
48802bad72 Constant folding for OpVectorTimesScalar 2018-04-20 13:43:04 -04:00
Victor Lomuller
0ec08c28c1 Add register liveness analysis.
For each function, the analysis determine which SSA registers are live
at the beginning of each basic block and which one are killed at
the end of the basic block.

It also includes utilities to simulate the register pressure for loop
fusion and fission.

The implementation is based on the paper "A non-iterative data-flow
algorithm for computing liveness sets in strict ssa programs" from
Boissinot et al.
2018-04-20 09:45:15 -04:00
David Neto
e7c2e91ded Fix for old XCode: std::set has explicit ctor 2018-04-19 16:33:12 -04:00
Greg Fischer
df7f00f60e DeadInsertElim: Don't revisit select phi nodes during MarkInsertChain
Fixes #1487.
2018-04-19 14:40:00 -04:00
Jaebaek Seo
430a29335e Fix broken pointer of CommonUniformElimPass 2018-04-19 09:36:10 -04:00
Steven Perron
c20a718e00 Rewrite local-single-store-elim to not create large data structures.
The local-single-store-elim algorithm is not fundamentally bad.
However, when there are a large number of variables, some of the
maps that are used can become very large.  These large data structures
then take a very long time to be destroyed.  I've seen cases around 40%
if the time.

I've rewritten that algorithm to not use as much memory.  This give a
significant improvement when running a large number of shader through
DXC.

I've also made a small change to local-single-block-elim to delete the
loads that is has replaced.  That way local-single-store-elim will not
have to look at those.  local-single-store-elim now does the same thing.

The time for one set goes from 309s down to 126s.  For another set, the
time goes from 102s down to 88s.
2018-04-18 16:38:18 -04:00
Jaebaek Seo
0fa42996b5
Merge pull request #1461 from jaebaek/fnegate
Add constant folding for OpFNegate

Contributes to #709
2018-04-18 13:46:10 -04:00
Toomas Remmelg
0f335cf87e Add support for MIV and Delta test dependence analysis.
GCD MIV test as described in Chapter 3 of "Optimizing Compilers for
Modern Architectures: A Dependence-Based Approach" by Randy Allen, and
Ken Kennedy.

Delta test as described in Figure 3 of "Practical Dependence Testing" by
Gina Goff, Ken Kennedy, and Chau-Wen Tseng from PLDI '91.
2018-04-17 13:57:02 -04:00
Jaebaek Seo
d8b9306a4f Add more unit tests 2018-04-17 12:08:45 -04:00
Jaebaek Seo
79491259e0 Add constant folding for FNegate 2018-04-17 12:08:45 -04:00
David Neto
152b9a681e ADCE: Remove OpDecorateStringGOOGLE
Also fix a few failures to set "modified" status when removing
global values.

Add OpDecorateStringGOOGLE to decoration ordering

Fixes #1492
2018-04-17 10:24:30 -04:00
Steven Perron
d42f65e7c1 Use a bit vector in ADCE
The unordered_set in ADCE that holds all of the live instructions takes
a very long time to be destroyed.  In some shaders, it takes over 40% of
the time.

If we look at the unique ids of the live instructions, I believe they
are dense enough make a simple bit vector a good choice for to hold that
data.  When I check the density of the bit vector for larger shaders, we
are usually using less than 4 bytes per element in the vector, and
almost always less than 16.

So, in this commit, I introduce a simple bit vector class, and
use it in ADCE.

This help improve the compile time for some shaders on windows by the
40% mentioned above.

Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
2018-04-13 16:38:02 -04:00
Steven Perron
8190c26270 Change parameter to Mempass::RemovePhiOperands
Pass a hashtable by const ref instead of by value.  Big impact on
compile time.
2018-04-13 09:53:37 -04:00
Steven Perron
bc648fd76a Delete unused code in MemPass
Since the SSA rewriter was added, the code old phi insertion code is no
longer used.  It is going stale and should be deleted.
2018-04-11 15:40:33 -04:00
Steven Perron
c584ac4fc6 Don't allow an instance of a pass to be run multiple times. 2018-04-11 12:02:30 -04:00
Victor Lomuller
10e5d7cf13 Add a loop peeling pass.
For each loop in a function, the pass walks the loops from inner to outer most loop
and tries to peel loop for which a certain amount of iteration can be done before or after the loop.

To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.
2018-04-11 15:41:29 +01:00
Alexander Johnston
61b50b3bfa ZIV and SIV loop dependence analysis.
Provides functionality to perform ZIV and SIV dependency analysis tests
between a load and store within the same loop.

Dependency tests rely on scalar analysis to prove and disprove dependencies
with regard to the loop being analysed.

Based on the 1990 paper Practical Dependence Testing by Goff, Kennedy, Tseng

Adds support for marking loops in the loop nest as IRRELEVANT.
Loops are marked IRRELEVANT if the analysed instructions contain
no induction variables for the loops, i.e. the loops induction
variable is not relevent to the dependence of the store and load.
2018-04-11 09:32:42 -04:00
Steven Perron
53bc1623ec Fold OpDot
Adding three rules to fold OpDot (implemented as two).

- When an OpDot has two constants, then fold to the resulting const.

- When one of the inputs is the 0 vector, then fold to zero.

- When one of the inputs is a single 1 with 0s, then rewrite to an
OpCompositeExtract of the appropriate element.  This will help find
even more folding opportunities.

Contributes to #709.
2018-04-10 13:09:37 -04:00
David Neto
a91cbfbf75 Optimizer: update extension whitelists
Add two new extensions:
- SPV_NV_shader_subgroup_partitioned
- SPV_EXT_descriptor_indexing
2018-04-06 15:56:20 -04:00
GregF
6fbfe1c016 Fix SSA rewrite for nested loops.
From the test case, the slice of the CFG that is interesting for the bug
is

25
|
v
30
|
v
31<-+
|   |
v   |
34--+

1. In block 25, we have a Phi candidate for %f with arguments
   %47 = Phi[%float_0, %0]. This merges %float_0 and a yet unknown
   argument from the external loop backedge.
2. We are now processing block 34:
   i. The load %35 = OpLoad %f triggers a Phi candidate to be placed in
      block 31.
  ii. The Phi candidate %50 = Phi needs two arguments. The one coming
      from block 30 is %47. But the one coming from block 34 (which we
      are now processing and have marked sealed), finds %50 itself as
      the reaching def for %f.
3. This wrongfully marks %50 as a copy-of Phi, which ultimately makes
   both %47 and %50 copy-of Phis that get eliminated.
2018-04-06 15:17:52 -04:00
Steven Perron
742454968d OpName and decorations should not stop array copy prop. 2018-04-04 22:24:10 -04:00
Steven Perron
7c5d49bf2a Teach ADCE about OpImageTexelPointer
Currently OpImageTexelPointer operations are treat like a use of the
pointer, but it does
not look for the memory being referenced to make sure stores are not
removed.

This change teaches it so identify the memory being accessed, and
treats it as if that memory is loaded.

Fixes to #1445.
2018-04-04 13:45:29 -04:00
Steven Perron
c33af63264 Teach array copy propagation about OpImageTexelPointer.
OpImageTexelPointer acts like a special kind of load.  It is not an
array load, but it also cannot be removed the same way a regular
load can.  The type of propagation that needs to be done is similar
to what we do for arrays, so I want to merge that code into that
optmization.

Contributers to #1445.
2018-04-04 13:42:51 -04:00
Steven Perron
e64a4656b3 Teach the private to local about OpImageTexelPointer.
OpImageTexelPointer acts like a special kind of load.  It is still
safe to change the storage class of a variable used in a
OpImageTexalPointer instruction.

Contributes to #1445.
2018-04-04 13:42:35 -04:00
Steven Perron
cbceeceab4 In copy-prop-arrays, indentify copies via OpCompositeInsert
When the original code copies an entire array or struct one element at a
time, this turns into a series of OpCompositeInsert instruction followed
by a store of the whole array.  We currently miss opportunities in copy
propagate arrays because we do not recognize this as a copy.

This commit adds code to copy propagate arrays to identify this code
pattern.

Also updates the performance passed to run array copy propagation.
2018-03-29 09:39:55 -04:00
Steven Perron
d8ca09821d Handle non-constant accesses in memory objects (copy prop arrays)
The first implementation of MemroyObject, which is used in copy
propagate arrays, forced the access chain to be like the access chains
in OpCompositeExtract.  This excluded the possibility of the memory
object from representing an array element that was extracted with a
variable index.   Looking at the code, that restriction is not
neccessary.  I also see some opportunities for doing this in some real
shaders.

Contributes to #1430.
2018-03-28 20:23:47 -04:00
Stephen McGroarty
ad7e4b8401 Initial patch for scalar evolution analysis
This patch adds support for the analysis of scalars in loops. It works
by traversing the defuse chain to build a DAG of scalar operations and
then simplifies the DAG by folding constants and grouping like terms.
It represents induction variables as recurrent expressions with respect
to a given loop and can simplify DAGs containing recurrent expression by
rewritting the entire DAG to be a recurrent expression with respect to
the same loop.
2018-03-28 16:34:23 -04:00
Steven Perron
c26866ee74 Preserve analyses after copy propagate arrays
Contributes to #1430.
2018-03-28 10:38:52 -04:00
Steven Perron
5e07ab1358 Handle more cases in copy propagate arrays.
When we change the type of an object that gets stored, we do not want to
change the type of the memory location being stored to.  In order to
still be able to do the rewrite, we will decompose and rebuild the
object so it is the type that can be stored.

Fixes #1416.
2018-03-27 11:04:49 -04:00
Steven Perron
c4dc046399 Copy propagate arrays
The sprir-v generated from HLSL code contain many copyies of very large
arrays.  Not only are these time consumming, but they also cause
problems for drivers because they require too much space.

To work around this, we will implement an array copy propagation.  Note
that we will not implement a complete array data flow analysis in order
to implement this.  We will be looking for very simple cases:

1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.

The hard part is keeping all of the types correct.  We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.

Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent.  With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values.  See |Struct::GetExtraHashWords|.

Contributes to #1416.
2018-03-26 14:44:41 -04:00
Eleni Maria Stea
045cc8f75b Fixes compile errors generated with -Wpedantic
This patch fixes the compile errors generated when the options
SPIRV_WARN_EVERYTHING and SPIRV_WERROR (that force -Wpedantic) are
set to cmake.
2018-03-22 09:40:11 -04:00
Steven Perron
dbb35c4260 Fixed remaining review comments from #1380 2018-03-21 16:47:01 -04:00
Diego Novillo
2e644e4578 Fix VS2013 build failures. 2018-03-20 21:44:17 -04:00
Jaebaek Seo
3b594e1630 Add --time-report to spirv-opt
This patch adds a new option --time-report to spirv-opt.  For each pass
executed by spirv-opt, the flag prints resource utilization for the pass
(CPU time, wall time, RSS and page faults)

This fixes issue #1378
2018-03-20 21:30:06 -04:00
Diego Novillo
735d8a579e SSA rewrite pass.
This pass replaces the load/store elimination passes.  It implements the
SSA re-writing algorithm proposed in

     Simple and Efficient Construction of Static Single Assignment Form.
     Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
     In: Jhala R., De Bosschere K. (eds)
     Compiler Construction. CC 2013.
     Lecture Notes in Computer Science, vol 7791.
     Springer, Berlin, Heidelberg

     https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6

In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.

When a target variable is loaded, it queries the variable's reaching
definition.  If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.

The algorithm avoids repeated lookups using memoization.

For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA.  That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
2018-03-20 20:56:55 -04:00
Victor Lomuller
bdf421cf40 Add loop peeling utility
The loop peeler util takes a loop as input and create a new one before.
The iterator of the duplicated loop then set to accommodate the number
of iteration required for the peeling.

The loop peeling pass that decided to do the peeling and profitability
analysis is left for a follow-up PR.
2018-03-20 10:21:10 -04:00
Steven Perron
b3daa93b46 Change merge return pass to handle structured cfg.
We are seeing shaders that have multiple returns in a functions.  These
functions must get inlined for legalization purposes; however, the
inliner does not know how to inline functions that have multiple
returns.

The solution we will go with it to improve the merge return pass to
handle structured control flow.

Note that the merge return pass will assume the cfg has been cleanedup
by dead branch elimination.

Fixes #857.
2018-03-19 13:49:04 -04:00
David Neto
844e186cf7 Add --strip-reflect pass
Strips reflection info. This is limited to decorations and
decoration instructions related to the SPV_GOOGLE_hlsl_functionality1
extension.
It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1.
It will also remove the OpExtension for SPV_GOOGLE_decorate_string
if there are no further remaining uses of OpDecorateStringGOOGLE.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398
2018-03-15 21:20:42 -04:00
David Neto
2e3aec23ca Add recent Google extensions to optimizer whitelists
Optimizations should work in the presence of recent
SPV_GOOGLE_decorate_string and SPV_GOOGLE_hlsl_functionality1

SPV_GOOGLE_decorate_string:
- Adds operation OpDecorateStringGOOGLE to decorate an object with decorations
  having string operands.

SPV_GOOGLE_hlsl_functionality1:
- Adds HlslSemanticGOOGLE, used to decorate an interface variable with
  an HLSL semantic string.  Optimizations already preserve those variables
  as required because they are interface variables (with uses), independent
  of whether they have HLSL decorations.

- Adds HlslCounterBufferGOOGLE, used to associate a buffer with a
  counter variable.

Fixes #1391
2018-03-15 11:16:20 -04:00
Alan Baker
9f3a1c85cc NFC: Speed up dead insert phi traversal on Windows. 2018-03-14 17:45:47 -04:00
David Neto
884933366b Teach DecorationManager about OpDecorateStringGOOGLE
Also add more decoration manager test coverage for OpDecorateId.

Fixes #1396
2018-03-13 22:18:33 -04:00
Alan Baker
7e03e76a5f Fixes #1402. Don't merge non-branch terminators into loop header.
Added tests
2018-03-13 22:16:17 -04:00
Alan Baker
43d1609183 Fixes #1407. Removing assertion against void pointer
Added test
2018-03-13 19:45:20 -04:00
Alan Baker
4065adf05d Fixes #1404. Don't DCE workgroup size
Added test.
2018-03-13 19:38:31 -04:00
Greg Fischer
077249b67f Fix InsertFeedingExtract rule when extract remains. 2018-03-12 22:06:23 -04:00
Pierre Moreau
5bd55f10cd Reimplement the DecorationManager
This reimplementation fixes several issues when removing decorations associated
to an ID (partially addresses #1174 and gives tools for fixing #898), as well
as making it easier to remove groups; a few additional tests have been added.

DecorationManager::RemoveDecoration() will still not delete dead decorations it
created, but I do not think it is its job either; given the following input

```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
OpDecorate %2 Restrict
%2      = OpDecorationGroup
OpGroupDecorate %2 %1 %3
OpDecorate %4 Invariant
%4      = OpDecorationGroup
OpGroupDecorate %4 %2
%uint   = OpTypeInt 32 0
%1      = OpVariable %uint Uniform
%3      = OpVariable %uint Uniform
```

which of the following two outputs would you expect RemoveDecoration(2) to produce:

```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
%uint = OpTypeInt 32 0
%1 = OpVariable %uint Uniform
%3 = OpVariable %uint Uniform
```

or

```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
OpDecorate %4 Invariant
%4      = OpDecorationGroup
%uint   = OpTypeInt 32 0
%1      = OpVariable %uint Uniform
%3      = OpVariable %uint Uniform
```

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/924
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1174
2018-03-12 09:56:14 -04:00
David Neto
340370eddb Remove extension whitelist from some transforms
Remove extension whitelists from transforms that are essentially
combinatorial (and avoiding pointers) or which affect only control flow.
It's very very unlikely an extension will add a new control flow construct.

Remove from:
- dead branch elimination
- dead insertion elimination
- insert extract elimination
- block merge

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1392
2018-03-08 12:25:49 -05:00
Rex Xu
314cfa29b2 Add missing SPV extension strings 2018-03-08 21:54:00 +08:00
Alan Baker
bc9cfee6fa Fixes #1385. Grab correct input to calculate indices.
* Added tests to catch the bug
2018-03-07 16:07:40 -05:00
Alan Baker
5f50e6209c Fixes #1376. Don't handle half folding gracefully.
* Added early returns to folding rules to prevent half attempts
* Added some tests
2018-03-06 14:00:02 -05:00
David Neto
5f69f75126 Support SPV_GOOGLE_decorate_string and SPV_GOOGLE_hlsl_functionality1
This commit add assembling, disassembling, and basic validation for two
Google extensions to better support HLSL translation.
2018-03-05 13:34:13 -05:00
Steven Perron
9ba50e34f2 Avoid generating duplicate names when merging types
The merging types we do not remove other information related to the
types.  We simply leave it duplicated, and hope it is removed later.
This is what happens with decorations.  They are removed in the next
phase of remove duplicates.  However, for OpNames that is not the case.
We end up with two different names for the same id, which does not make
sense.

The solution is to remove the names and decorations for the type being
removed instead of rewriting them to refer to the other type.

Note that it is possible that if the first type does not have a name,
then the types will end up with no name.  That is fine because the names
should not have any semantic significance anyway.

The was identified in issue #1372, but this does not fix that issue.
2018-03-05 12:02:50 -05:00
Alan Baker
52bceb3569 Handles more cases of redundant selects
* Handles OpConstantNull and vector types
 * vector selects (except against a null) are converted to vector
 shuffles
* Added tests
2018-03-02 14:28:08 -05:00
Alan Baker
824625760b Fixes #1361. Mark all non-constant global values as varying in CCP
* Also mark function parameters as varying
* Conservatively mark assignment instructions as varying if any input is
varying after attempting to fold
* Added a test to catch this case
2018-03-01 15:24:41 -05:00
Alan Baker
ce5941a642 Fixes #1357. Support null constants better in folding
* getFloatConstantKind() now handles OpConstantNull
* PerformOperation() now handles OpConstantNull for vectors
* Fixed some instances where we would attempt to merge a division by 0
* added tests
2018-02-28 23:12:27 -05:00
GregF
bdaf8d56fb Opt: Add constant folding for FToI and IToF 2018-02-28 23:08:52 -05:00
Alan Baker
9457cabbce Fixes #1354. Do not merge integer division.
* Removes merging of div with a div or mul for integers
* Updated tests
2018-02-28 13:33:21 -05:00
Steven Perron
588f4fcc95 Add more folding rules for vector shuffle.
Adds rule to fold OpVectorShuffle with constant inputs.

Adds rules to fold OpCompositeExtrac being fed by an OpVectorShuffle.
2018-02-27 21:20:22 -05:00
Victor Lomuller
90e1637ce4 Remove Function::GetBlocks pushed by accident 2018-02-27 21:07:10 -05:00
Steven Perron
2cb589cc14 Remove uses DCEInst and call ADCE
The algorithm used in DCEInst to remove dead code is very slow.  It is
fine if you only want to remove a small number of instructions, but, if
you need to remove a large number of instructions, then the algorithm in
ADCE is much faster.

This PR removes the calls to DCEInst in the load-store removal passes
and adds a pass of ADCE afterwards.

A number of different iterations of the order of optimization, and I
believe this is the best I could find.

The results I have on 3 sets of shaders are:

Legalization:

Set 1: 5.39 -> 5.01
Set 2: 13.98 -> 8.38
Set 3: 98.00 -> 96.26

Performance passes:

Set 1: 6.90 -> 5.23
Set 2: 10.11 -> 6.62
Set 3: 253.69 -> 253.74

Size reduction passes:

Set 1: 7.16 -> 7.25
Set 2: 17.17 -> 16.81
Set 3: 112.06 -> 107.71

Note that the third set's compile time is large because of the large
number of basic blocks, not so much because of the number of
instructions.  That is why we don't see much gain there.
2018-02-27 21:06:08 -05:00
David Neto
0c13467161 Consistently include latest spirv.h header file.
Use indirection through latest_version_spirv.h

Also, when generating enum tables, use the unified1 JSON grammar since
it now has FragmentFullyCoveredEXT but the other JSON grammars don't.
They are starting to fall behind.
2018-02-27 18:47:29 -05:00
Alan Baker
802cf053c7 Merge arithmetic with non-trivial constant operands
Adding basis of arithmetic merging

* Refactored constant collection in ConstantManager
* New rules:
 * consecutive negates
 * negate of arithmetic op with a constant
 * consecutive muls
 * reciprocal of div

* Removed IRContext::CanFoldFloatingPoint
 * replaced by Instruction::IsFloatingPointFoldingAllowed
* Fixed some bad tests
* added some header comments

Added PerformIntegerOperation

* minor fixes to constants and tests
* fixed IntMultiplyBy1 to work with 64 bit ints
* added tests for integer mul merging

Adding test for vector integer multiply merging

Adding support for merging integer add and sub through negate

* Added tests

Adding rules to merge mult with preceding divide

* Has a couple tests, but needs more
* Added more comments

Fixed bug in integer division folding

* Will no longer merge through integer division if there would be a
remainder in the division
* Added a bunch more tests

Adding rules to merge divide and multiply through divide

* Improved comments
* Added tests

Adding rules to handle mul or div of a negation

* Added tests

Changes for review

* Early exit if no constants are involved in more functions
* fixed some comments
* removed unused declaration
* clarified some logic

Adding new rules for add and subtract

* Fold adds of adds, subtracts or negates
* Fold subtracts of adds, subtracts or negates
* Added tests
2018-02-27 13:02:13 -05:00
Victor Lomuller
3497a94460 Add loop unswitch pass.
It moves all conditional branching and switch whose conditions are loop
invariant and uniform. Before performing the loop unswitch we check that
the loop does not contain any instruction that would prevent it
(barriers, group instructions etc.).
2018-02-27 08:52:46 -05:00
Stephen McGroarty
e354984b09 Unroller support for multiple induction variables
Support for multiple induction variables within a loop and support for
loop condition operands <= and >=.
2018-02-27 11:50:08 +00:00
Steven Perron
94af58a350 Clean up variables before sroa
In some shaders there are a lot of very large and deeply nested
structures.  This creates a lot of work for scalar replacement.  Also,
since commit ca4457b we have been very aggressive as rewriting
variables.  This has causes a large increase in compile time in creating
and then deleting the instructions.

To help low the costs, I want to run a cleanup of some of the easy loads
and stores to remove.  This reduces the number of symbols sroa has to
work on.  It also reduces the amount of code the simplifier has to
simplify because it was not generated by sroa.

To confirm the improvement, I ran numbers on three different sets of
shaders:

Time to run --legalize-hlsl:

Set #1: 55.89s -> 12.0s
Set #2: 1m44s -> 1m40.5s
Set #3: 6.8s -> 5.7s

Time to run -O

Set #1: 18.8s -> 10.9s
Set #2: 5m44s -> 4m17s
Set #3: 7.8s -> 7.8s

Contributes to #1328.
2018-02-22 21:40:58 -05:00
Steven Perron
3f19c2031a Preserve analysies in the simplification pass
Fixes a bug at the same time.  In `UpdateDefUse`, if the definition
already exists, we are not suppose to analyse it again.  When you do
the entries for the definition are deleted, and we don't want that.
The check for this was wrong.
2018-02-22 16:06:30 -05:00
GregF
46a9ec9d23 Opt: Check for side-effects in DCEInst()
This function now checks for side-effects before adding operand
instructions to the dead instruction work list.

Because this fix puts more pressure on IsCombinatorInstruction() to
be correct, this commit adds all OpConstant* and OpType* instructions
to combinator_ops_ set.

Fixes #1341.
2018-02-22 12:24:13 -05:00
Alan Baker
01760d2f0f Fixes #1338. Handle OpConstantNull in branch/switch conditions
* No longer assume the branch/switch condition must be bool or int
constants (respectively)
* Added a couple unit tests for each case
2018-02-21 10:22:39 -05:00
Steven Perron
51ecc7318f Reduce instruction create and deletion during inlining.
When inlining a function call the instructions in the same basic block
as the call get cloned.  The clone is added to the set of new blocks
containing the inlined code, and the original instructions are deleted.

This PR will change this so that we simply move the instructions to the
new blocks.  This saves on the creation and deletion of the
instructions.

Contributes to #1328.
2018-02-21 09:50:47 -05:00
Steven Perron
c1b936637e Add Insert-extract elimination back into legalization passes.
Fixes #1326.
2018-02-21 09:46:51 -05:00
Arseny Kapoulkine
309be423cc Add folding for redundant add/sub/mul/div/mix operations
This change implements instruction folding for arithmetic operations
that are redundant, specifically:

  x + 0 = 0 + x = x
  x - 0 = x
  0 - x = -x
  x * 0 = 0 * x = 0
  x * 1 = 1 * x = x
  0 / x = 0
  x / 1 = x
  mix(a, b, 0) = a
  mix(a, b, 1) = b

Cache ExtInst import id in feature manager

This allows us to avoid string lookups during optimization; for now we
just cache GLSL std450 import id but I can imagine caching more sets as
they become utilized by the optimizer.

Add tests for add/sub/mul/div/mix folding

The tests cover scalar float/double cases, and some vector cases.

Since most of the code for floating point folding is shared, the tests
for vector folding are not as exhaustive as scalar.

To test sub->negate folding I had to implement a custom fixture.
2018-02-20 18:29:27 -05:00
Steven Perron
fa3ac3cc33 Revert "Preserve analysies in the simplification pass"
This reverts commit ec3bbf093e.
2018-02-20 18:21:25 -05:00
Steven Perron
ec3bbf093e Preserve analysies in the simplification pass
Building the def-use chains is very expensive, so we do not want to
invalidate them it if is not necessary.  At the moment, it seems like
most optimizatoins are good at not invalidating the def-use chains, but
simplification does.

This PR get the simlification pass to keep the analysies valid.

Contributes to #1328.
2018-02-20 14:45:08 -05:00
Diego Novillo
6c75050136 Speed up Phi insertion.
On some shader code we have in our testsuite, Phi insertion is showing
massive compile time slowdowns, particularly during destruction.  The
specific shader I was looking at has about 600 variables to keep track
of and around 3200 basic blocks.  The algorithm is currently O(var x
blocks), which means maps with around 2M entries.  This was taking about
8 minutes of compile time.

This patch changes the tracking of stored variables to be more sparse.
Instead of having every basic block contain all the tracked variables in
the map, they now have only the variables actually stored in that block.

This speeds up deallocation, which brings down compile time to about
1m20s.

Note that this is not the definite fix for this.  I will re-write Phi
insertion to use a standard SSA rewriting algorithm
(https://github.com/KhronosGroup/SPIRV-Tools/issues/893).

This contributes to
https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
2018-02-20 12:04:06 -05:00
Steven Perron
9d95a91a9f Fix folding insert feeding extract
I mixed up two cases when folding an OpCompositeExtract that is feed by
and OpCompositeInsert.  The specific cases are demonstracted in the new
test.  I mixed up the conditions for the cases, and treated one like the
other.

Fixes #1323.
2018-02-20 11:22:51 -05:00
Alan Baker
c3f34d8bf3 Fixes #1300. Adding checks for bad CCP transitions and unsettled values
* Now track propagation status and assert on bad statuses
 * Added helper methods to access instruction propagation status
* Modified the phi meet operator to properly reflect the paper it is
based on
* Modified SSA edge addition so that all edge are added, but only on
state changes
* Fixed a bug in instruction simulation where interesting conditional
branches would not mark the interesting edge as executed
 * Added a test to catch this bug
* Added an ostream operator for SSAPropagator::PropStatus
2018-02-18 19:41:34 -05:00
Steven Perron
04cd63e5b9 Make better use of simplification pass
The simplification pass works better after all of the dead branches are
removed.  So swapping them around in the legalization passes.  Also
adding the simplification pass to performance passes right after dead
branch elimination.

Added CCP to the legalization passes so we can propagate the constants
into the branchs, and remove as many branches a possible.  CCP is
designed to still get opportunities even if the branches are dead, so it
is a good place for it.

Fixes #1118
2018-02-16 20:46:49 -05:00
Arseny Kapoulkine
1054413600 Add constant folding rules for floating-point comparison
This change handles all 6 regular comparison types in two variations,
ordered (true if values are ordered *and* comparison is true) and
unordered (true if values are unordered *or* comparison is true).

Ordered comparison matches the default floating-point behavior on host
but we use std::isnan to check ordering explicitly anyway.

This change also slightly reworks the floating-point folding support
code to make it possible to define a folding operation that returns
boolean instead of floating point.

These tests exhaustively test ordered/unordered comparisons for
float/double.

Since for NaN inputs the comparison result doesn't depend on the
comparison function, we just test == and !=; NaN inputs result in true
unordered comparisons and false ordered comparisons.
2018-02-16 20:41:22 -05:00
Arseny Kapoulkine
27d23a92a0 Remove constants from constant manager in KillInst
Registering a constant in constant manager establishes a relation
between instruction that defined it and constant object. On complex
shaders this could result in the constant definition getting removed as
part of one of the DCE pass, and a subsequent simplification pass trying
to use the defining instruction for the constant.

To fix this, we now remove associated constant entries from constant
manager when killing constant instructions; the constant object is still
registered and can be remapped to a new instruction later.

GetDefiningInstruction shouldn't ever return nullptr after this change
so add an assertion to check for that.
2018-02-16 20:37:12 -05:00
Steven Perron
50f307f889 Simplify OpPhi instructions referencing unreachable continues
In dead branch elimination, we already recognize unreachable continue
blocks, and update OpPhi instruction accordingly.  This change adds an
extra check: if the head block has exactly 1 other incoming edge, then
replace the OpPhi with the value from that edge.

Fixes #1314.
2018-02-16 18:58:03 -05:00
Steven Perron
3756b387f3 Get CCP to use the constant floating point rules.
Fixes #1311
2018-02-16 13:49:47 -05:00
Lei Zhang
f3a10470d3
Avoid using static unordered_map (#1304)
unordered_map is not POD. Using it as static may cause problems
when operator new() and operator delete() is customized.

Also changed some function signatures to use const char* instead
of std::string, which will give caller the flexibility to avoid
creating a std::string.
2018-02-15 10:19:15 -05:00
Arseny Kapoulkine
32a8e04c7d Add folding of redundant OpSelect insns
We can fold OpSelect into one of the operands in two cases:

- condition is constant
- both results are the same

Even if the original shader doesn't have either of these, if-conversion
pass sometimes ends up generating instructions like

   %7127 = OpSelect %int %3220 %7058 %7058

And this optimization cleans them up.
2018-02-15 10:03:22 -05:00
Steven Perron
0e9f2f948a Add id to name map
Adding a map from an id to it set of OpName and OpMemberName
instructions.  This will be used in KillNameAndDecorates to kill the
names for the ids that are being removed.

In my test, the compile time for 50 shaders went from 1m57s to 55s.
This was on linux using the release build.

Fixes #1290.
2018-02-14 15:53:13 -05:00
Steven Perron
6669d8163d Fold binary floating point operators.
Adds the floating rules for FAdd, FDiv, FMul, and FSub.

Contributes to #1164.
2018-02-14 15:48:15 -05:00
Stephen McGroarty
dd8400e150 Initial support for loop unrolling.
This patch adds initial support for loop unrolling in the form of a
series of utility classes which perform the unrolling. The pass can
be run with the command spirv-opt --loop-unroll. This will unroll
loops within the module which have the unroll hint set. The unroller
imposes a number of requirements on the loops it can unroll. These are
documented in the comments for the LoopUtils::CanPerformUnroll method in
loop_utils.h. Some of the restrictions will be lifted in future patches.
2018-02-14 15:44:38 -05:00
Alan Baker
229ebc0665 Fixes #1295. Mark undef values as varying in ccp.
* Undef now marked as varying in ccp
 * this prevents incorrect meet operations since phis were always not
 interesting
* added a test to catch the bug
2018-02-14 10:21:26 -05:00
Diego Novillo
08699920ad Cleanup. Use proper #include guard. NFC. 2018-02-12 13:21:48 -05:00
Steven Perron
06b437dedc Avoid using the def-use manager during inlining.
There seems to only be a single location where the def-use manager is
used.  It is to get information about a type.  We can do that with the
type manager instead.

Fixes #1285
2018-02-12 09:47:55 -05:00
Arseny Kapoulkine
70bf3514e8 Fix spirv.h include to rely on include paths
This is important when SPIRV-Headers are not checked out to external/
folder and mirrors other places in the code where spirv.h is included.
2018-02-09 18:29:17 -08:00
Steven Perron
1d7b1423f9 Add folding of OpCompositeExtract and OpConstantComposite constant instructions.
Create files for constant folding rules.

Add the rules for OpConstantComposite and OpCompositeExtract.
2018-02-09 17:52:33 -05:00
Steven Perron
1a849ffb60 Add header files missing from CMakeLists.txt 2018-02-08 23:02:22 -05:00
Alexander Johnston
84ccd0b9ae Loop invariant code motion initial implementation 2018-02-08 22:55:47 -05:00