Commit Graph

680 Commits

Author SHA1 Message Date
greg-lunarg
95386f9e45 Instrument: Fix version 2 output record write for tess eval shaders. (#2782)
Fix output record write for tess eval shaders.

Also change command line for bindless instrumentation to use use
output record version 2.
2019-08-09 08:22:41 -04:00
Steven Perron
4b64beb1ae
Add descriptor array scalar replacement (#2742)
Creates a pass that will replace a descriptor array with individual variables.  See #2740 for details.

Fixes #2740.
2019-08-08 10:53:19 -04:00
greg-lunarg
29af42df12 Add SPV_EXT_physical_storage_buffer to opt whitelists (#2779)
This also fixes ADCE to not remove possibly needed OpTypeForwardPointer.
The bug, its fix and the corresponding test have a circular dependency
with the extension, so they are packaged together.
2019-08-08 09:45:59 -04:00
Steven Perron
b029d3697e
Handle RelaxedPrecision in SROA (#2788)
If a member of a struct has a relaxed precision, sroa will not split the
struct.  This means we do not get all cases.  This commit handles these
cases.  The other part is that the decoration needs to be passed on to
the new variables.

Fixes #2786
2019-08-07 12:17:26 -04:00
alan-baker
3726b500b1
Treat access chain indexes as signed in SROA (#2776)
Fixes #2768

* In scalar replacement, interpret access chain indexes as signed counts
* Use Constant::GetSignExtendedValue and Constant::GetZeroExtendedValue
where appropriate
* new tests
2019-07-31 15:39:33 -04:00
David Neto
31590104ec
Add pass to inject code for robust-buffer-access semantics (#2771)
spirv-opt: Add --graphics-robust-access

Clamps access chain indices so they are always
in bounds.

Assumes:
- Logical addressing mode
- No runtime-array-descriptor-indexing
- No variable pointers

Adds stub code for clamping coordinate and samples
for OpImageTexelPointer.

Adds SinglePassRunAndFail optimizer test fixture.

Android.mk: add source/opt/graphics_robust_access_pass.cpp

Adds Constant::GetSignExtendedValue, Constant::GetZeroExtendedValue
2019-07-30 19:52:46 -04:00
David Neto
7621034aae
Add opt test fixture method SinglePassRunAndFail (#2770)
Checks for failure status code and matches against
the expected error message.
2019-07-30 10:38:46 -04:00
Diego Novillo
49797609b7
Protect against out-of-bounds references when folding OpCompositeExtract (#2774)
This fixes #2608.

The original test case had an out-of-bounds reference that ended up
folding into OpCompositeExtract that was indexing right outside the
constant composite.

The returned constant would then cause a segfault during constant
propagation.
2019-07-29 13:27:40 -07:00
alan-baker
7fd2365b06
Don't move debug or decorations when folding (#2772)
Fixes #2764

* Don't replace all uses when simplifying instructions, instead only
update non-debug, non-decoration uses
  * added a test
* Add a new version of RAUW that takes a predicate to decide whether to
replace the use or not
  * used in simplification pass
2019-07-29 16:20:43 -04:00
Diego Novillo
9559cdbdf0
Fix #2609 - Handle out-of-bounds scalar replacements. (#2767)
* Fix #2609 - Handle out-of-bounds scalar replacements.

When SROA tries to do a replacement for an OpAccessChain that is exactly
one element out of bounds, the code was trying to access its internal
array of replacements and segfaulting.

This protects the code from doing this, and it additionally fixes the
way SROA works by not returning failure when it refuses to do a
replacement.  Instead of failing the optimization pass, SROA will now
simply refuse to do the replacement and keep going.

Additionally, this patch fixes the SROA logic to now return a proper status so we can
correctly state that the pass made no changes to the IR if it only found
invalid references.
2019-07-26 12:33:40 -04:00
Steven Perron
bb0e2f65bb
Fix check for unreachable blocks in merge-return (#2762)
Merge return expects unreachable merge block to look a certain way, and
unreachable continue blocks to look a certain way.  What if an
unreachable block is both a merge and a continue?  The continue is
suppose to take precedent, but merge-return implements it with the merge
taking precedent.  This change flips that around.

Fixes #2746
2019-07-25 09:34:18 -04:00
Steven Perron
c7fcb8c3b9
Process OpDecorateId in ADCE (#2761)
* Process OpDecorateId in ADCE

When there is an OpDecorateId instruction that is live,
the ids that is references must be kept live.  This change
adds them to the worklist.

I've also updated a validator check to allow OpDecorateId
to be able to apply to decoration groups.

Fixes #1759.

* Remove dead code.
2019-07-24 14:43:49 -04:00
Steven Perron
fb83b6fbb5
Record correct dominators in merge return (#2760)
In merge return, we need to know the original dominator for a block in order to
traverse code from the original dominator to the new dominator and add
appropriate Phi nodes.  The current code gets this wrong because the dominator
tree is build as needed.  The first time we get the immediate dominator for a
function we just built the dominator tree and it takes into account that a
block has been split.  The second time it does not.

This inconsistency needs to be fixed.  We do that by recording the original
dominator for all blocks at the start of the pass.

If we were to record just the basic block, that could change if the block is
split.  We want to traverse the code in the body of the original dominator,
whatever block it ends up in.  To make this easy to track, we not save the
terminator instruction to represent the original dominator.

Fixes #2745
2019-07-24 13:56:54 -04:00
Steven Perron
c9190a54da
SSA rewriter: Don't use trivial phis (#2757)
When a phi candidate is marked as trivial, we are suppose to update all
of its uses to the reference the value that it is being folded to.
However, the code updates the uses misses `defs_at_block_`.  So at a
later time, the id for the trivial phi can reemerge.

Fixes #2744
2019-07-23 17:59:30 -04:00
greg-lunarg
3855447d93 Bindless Instrument: Make init check depend solely on input_init_enabled (#2753)
* Bindless Instrument: Make init check depend solely on input_init_enabled

Previously was dependent on presense of descriptor_indexing extension
in SPIR-V, but this missed some cases. Tests updated to refect this new
policy.

* Fix format.
2019-07-22 13:51:39 -04:00
Steven Perron
aa9e8f5380
Revert "Do not inline OpKill Instructions (#2713)" (#2749)
This reverts commit fe7cc9c612.
2019-07-17 14:59:05 -04:00
Steven Perron
230c9e4371
Fix bug in merge return (#2734)
* Fix bug in merge return

The merge return pass seems to assume that the only new edges in the cfg
are from return block to merge blocks.  However, it is possible that a
merge block branches to a merge block when it did not before.

This change add a new variable to track all of the new edges.  It also
renames some other variables and cleans us the code to make it a bit
easier to read.

Fixes #2702.
2019-07-16 09:11:22 -04:00
Jason Macnak
1fedf72e50 Allow ray tracing shaders in inst bindle check pass. (#2733)
Adds the ray tracing stages (ray gen, intersection, any hit, closest hit,
miss, and callable) to the allowed stages in pass instrumentation and add
debug records for these stages to output the global launch id.

More information for ray tracing shaders:
- https://github.com/KhronosGroup/GLSL/blob/master/extensions/nv/GLSL_NV_ray_tracing.txt
2019-07-15 16:24:42 -04:00
greg-lunarg
92c41ff1e7 Remove Common Uniform Elimination Pass (#2731)
Remove Common Uniform Elimination Pass

Fixes #2520.
2019-07-12 11:02:10 -04:00
Steven Perron
5ce8cf781f
Change the order branches are simplified in dead branch elim (#2728)
Dead branch elimination needs to know about the constructs that a block is contained it when determining what to do with its merge instruction.  We currently fold branches in block as we see them, which is parent constructs before their children.  This causes the struct cfg analysis to crash because it tries to get the parent construct for a block after the parent has been folded.

This can be fixed by folding the branch of the children before the parents.

Fixes #2667.
2019-07-10 14:59:44 -04:00
Thomas Roughton
cd153db8ed Add —preserve-bindings and —preserve-spec-constants (#2693)
Add optimizer options to for preservation of spec constants and variable with
binding decorations.  They are to be preserved even if they are unused.
2019-07-10 14:12:19 -04:00
Steven Perron
86e45efe15
Handle decorations better in some optimizations (#2716)
There are a couple spots where we are not looking at decorations when we should.

1. Value numbering is suppose to assign a different value number to ids if they have different decorations.  However that is not being done for OpCopyObject and OpPhi.

1. Instruction simplification is propagating OpCopyObject instruction without checking for decorations.  It should only do that if no decorations are being lost.

Add a new function to the decoration manager to check if the decorations of one id are a subset of the decorations of another.

Fixes #2715.
2019-07-10 11:37:16 -04:00
Steven Perron
37e8f79946
Perform merge return with single return in loop. (#2714)
Inlining does not inline functions that have a single return that is in a loop.  This is because the return cannot be replaced by a branch outside of the loop easily.  Merge return knows how to rewrite the function so the return is replaced by a branch.

Fixes #2038.
2019-07-04 14:14:49 -04:00
Steven Perron
fe7cc9c612
Do not inline OpKill Instructions (#2713)
It is illegal to inline an OpKill instruction into a continue construct because the continue header will no longer dominate the backedge.

This commit adds a check for this, and does not inline.

If we still want to be able to inline a function that contains an OpKill, we can add a new pass that will wrap OpKill instructions into its own function with just the single instruction.

I do not believe that this is a common case right now, so I will not do that yet.

Fixes #2433.
2019-07-04 12:08:23 -04:00
Jason Macnak
e6e3e2ccc6 Update type for loaded builtin GlobalInvocationID in pass instrumentation (#2705)
When working on descriptor indexing validation for compute shaders, the
gl_GlobalInvocationID builtin was being loaded as uint which would cause
compute shaders instrumented by the bindless check pass to have:

%83 = OpLoad %uint %gl_GlobalInvocationID
%84 = OpCompositeExtract %uint %83 0
%85 = OpCompositeExtract %uint %83 1
%86 = OpCompositeExtract %uint %83 2

which results in validation failures:

error: line 127: Reached non-composite type while indexes still remain
to be traversed.
%84 = OpCompositeExtract %uint %83 0

for trying to extract a uint from a uint.
2019-06-28 09:46:16 -04:00
alan-baker
2090d7a2d2
Handle volatile memory semantics in upgrade (#2674)
* If an atomic is decorated with volatile add the volatile bit to its
memory semantics
2019-06-17 16:01:37 -04:00
alan-baker
59983a6010 Validate variable initializer type (#2668)
Fixes #249

* The pointed to type of Result Type must match the initializer type
* Had to update some opt tests to be valid
2019-06-15 00:34:18 -04:00
greg-lunarg
43fb2403a6 Instrument: Fix code for version 2 output format. (#2655)
Correct record size. Also bring version 2 tests up to version 1
equivalence.
2019-06-06 11:35:34 -04:00
David Neto
d01a3c3b4b
Optimizer: Handle array type with OpSpecConstantOp length (#2652)
When it's an OpConstant or OpSpecConstant, then the literal
values are compared.  If the OpSpecConstant also has a SpecId
decoration, then that's also compared.

Otherwise, it's an OpSpecConstantOp and we only compare the
ID of the OpSpecConstantOp instruction itself.

Fixes #2649
2019-06-05 16:35:50 -04:00
Pierre Moreau
e7866de4b1 Linker: Better type comparison for OpTypeArray and OpTypeForwardPointer (#2580)
* Types: Avoid comparing IDs for in Type::IsSameImpl

When linking, we end up with duplicate types for imported and exported
types, that needs to be removed. The current code would reject valid
import/export pairs of symbols due to IDs mismatch, even if the types or
constants behind those ID were the same.

Enabled remaining type_match_test

Fixes #2442
2019-05-29 16:12:02 -04:00
Ryan Harrison
0125b28ed4
Add compact ids to WebGPU <-> Vulkan transformations (#2639)
Fixes #2634
2019-05-29 12:58:37 -07:00
greg-lunarg
3d62cb8148 Instrument: Add version 2 of record formats (#2630)
New version has additional word in stage-specific section. Also
some changes in content for tesselation and compute shaders. Either
version can be invoked at pass creation. This is done to ease integration
and updating of validation layers. Version 1 is deprecated and eventually
will go away.

Also sneaking in fix to version 1 compute shaders.
2019-05-29 15:08:21 -04:00
Steven Perron
6c7db9c630
Handle nested breaks from switches. (#2624)
* Handle nested breaks from switches.

There was a recent decision made to allow branches to the merge node of
a switch even if the switch is not the first enclosing construct.  They
can be generated by glslang from break statements in switches.

Dead branch elimination seems to be the only optimization that will
break because of this change, so I will update that optimizations.

The change made are:

- Track switches in structured cfg analysis.
- In Dead branch elimination:
  - Look for nested breaks that will require a switch instruction.
  - Rewrite, but don't delete, switchs that are required even if it
    could be replaced by an unconditional branch.
  - When looking for the first break, consider the merge of a switch
    as well.

See #2612.

* Fix variable names and comments.

* Add tests for the struct cfg analysis and switches.

* Fix typos in comments.
2019-05-27 16:28:14 -04:00
Steven Perron
d9c00e1d2d Add folding rules for OpQuantizeToF16 (#2614)
Adding the folding rules for OpQuantizeToF16, and fixed some matching
tests to check identify new lines.
2019-05-21 23:15:01 -07:00
Steven Perron
0982f0212e
Using the instruction folder to fold OpSpecConstantOp (#2598)
In order to try to reduce code duplication and to be able
to fold more cases, we want to use the instruction folder
when folding an OpSpecConstantOp with constant operands.

A couple other changes are need to make this work.  First
GetDefiningInstruction| in the constant manager is able
to handle |type_id| being logically equivalent to another
type, so we updated the interface, and removed the assert.

Some tests were also updated because we not generate
better code because constants are not duplicated as much
as before.

No need for new tests.  The functionality of the instruction folder is
already tested.  There are tests check that the instruction folder is
being used correctly for OpCompositeExtract and OpVectorShuffle in the
existing test cases.

Fixes #2585.
2019-05-21 12:45:00 -04:00
greg-lunarg
9dfd4b8358 Bindless Validation: Instrument descriptor-based loads and stores (#2583)
Essentially, support UBOs and SSBOs, scalar and array (sized and unsized).
2019-05-15 19:43:23 -04:00
alan-baker
fc7b5d8c6a Mem model spv 1.4 (#2565)
* Update memory model support for SPIR-V 1.4

Fixes #2552

* Upgrade memory model now supports two memory access operands for
OpCopyMemory*
  * in all cases the pass will first generate two operands by either
  adding them or copying
  * updates accounts for multiple operands
  * tests
2019-05-15 19:06:37 -04:00
Steven Perron
84503583c6
Handle id overflow in sroa better. (#2582)
There is a case where sroa is not handling id overflow gracefully.  It
is handled and an error message is output when the ids overflow.

Fixes https://crbug.com/961030.
2019-05-15 09:29:28 -04:00
alan-baker
2947e88f79 Update instrumentation passes to handle 1.4 interfaces (#2573)
Fixes #2556

Added variables get added to entry point interfaces
Add to input buffer too
2019-05-10 11:08:28 -04:00
alan-baker
87c4ef8a9c
Do not fold floating point if float controls used (#2569)
Fixes #2558

* Mark floating point instructions as non-foldable if any
SPV_KHR_float_controls capabilities are present
  * tests
2019-05-10 11:03:22 -04:00
Ryan Harrison
f6d9a17843
Add pass to fix some invalid unreachable blocks for WebGPU (#2563)
Attempts to split up unreachable blocks that are used both as a
merge-block and a continue-target.

Fixes #2429
2019-05-09 12:56:10 -04:00
alan-baker
cc3e93c4e6
Add tests for folding 1.4 selects (#2568)
Fixes #2554

* Folding rules already handle 1.4 selects so I simply added some tests
2019-05-08 14:06:04 -04:00
alan-baker
ea5e1b62e1
Update priv-to-local for SPIR-V 1.4 (#2567)
Fixes #2555

* Fix a bug in validation where interfaces were considered non-unique
between different entry points targeting the same function
  * added a test
* Update private to local pass to remove localized private variables
from entry point interfaces
  * added tests
2019-05-08 12:38:49 -04:00
alan-baker
b74d92a8c3
ADCE support for SPIR-V 1.4 entry points (#2561)
Fixes #2551

* Add support for 1.4 entry point interface lists
  * only input and output variables are automatically live
  * can clean up interfaces after DCE
  * added tests
* allow opt tests to specify a target environment
2019-05-07 14:52:22 -04:00
David Neto
63f57d95d6
Support SPIR-V 1.4 (#2550)
* SPIR-V 1.4 headers, add SPV_ENV_UNIVERSAL_1_4

* Support --target-env spv1.4 in help for command line tools

* Support asm/dis of UniformId decoration

* Validate UniformId decoration

* Fix version check on instructions and operands

Also register decorations used with OpDecorateId

* Extension lists can differ between enums that match

Example: SubgroupMaskEq vs SubgroupMaskEqKHR

* Validate scope value for Uniform decoration, for SPIR-V 1.4

* More unioning of exts

* Preserve grammar order within an enum value

* 1.4: Validate OpSelect over composites

* Tools default to 1.4

* Add asm/dis test for OpCopyLogical

* 1.4: asm/dis tests for PtrEqual, PtrNotEqual, PtrDiff

* Basic asm/Dis test for OpCopyMemory

* Test asm/dis OpCopyMemory with 2-memory access

Add asm/dis tests for OpCopyMemorySized

Requires grammar update to add second optional memory access operand
to OpCopyMemory and OpCopyMemorySized

* Validate one or two memory accesses on OpCopyMemory*

* Check av/vis on CopyMemory source and target memory access

This is a proposed rule. See
https://gitlab.khronos.org/spirv/SPIR-V/issues/413

* Validate operation for OpSpecConstantOp

* Validate NonWritable decoration

Also permit NonWritable on members of UBO and SSBO.

* SPIR-V 1.4: NonWrtiable can decorate Function and Private vars

* Update optimizer CLI tests for SPIR-V 1.4

* Testing tools: Give expected SPIR-V version in message

* SPIR-V 1.4 validation for entry point interfaces

* Allow only unique interfaces
* Allow all global variables
* Check that all statically used global variables are listed
* new tests

* Add validation fixture CompileFailure

* Add 1.4 validation for pointer comparisons

* New tests

* Validate with image operands SignExtend, ZeroExtend

Since we don't actually know the image texel format, we can't fully
validate.  We need more context.

But we can make sure we allow the new image operands in known-good
cases.

* Validate OpCopyLogical

* Recursively checks subtypes
* new tests

* Add SPIR-V 1.4 tests for NoSignedWrap, NoUnsignedWrap

* Allow scalar conditions in 1.4 with OpSelect

* Allows scalar conditions with vector operands
* new tests

* Validate uniform id scope as an execution scope

* Validate the values of memory and execution scopes are valid scope
values
* new test

* Remove SPIR-V 1.4 Vulkan 1.0 environment

* SPIR-V 1.4 requires Vulkan 1.1

* FIX: include string for spvLog

* FIX: validate nonwritable

* FIX: test case suite for member decorate string

* FIX: test case for hlsl functionality1

* Validation test fixture: ease debugging

* Use binary version for SPIR-V 1.4 specific features

* Switch checks based on the SPIR-V version from the target environment
to instead use the version from the binary
* Moved header parsing into the ValidationState_t constructor (where
version based features are set)
* Added new versions of tests that assemble a 1.3 binary and validate a
1.4 environment

* Fix test for update to SPIR-V 1.4 headers

* Fix formatting

* Ext inst lookup: Add Vulkan 1.1 env with SPIR-V 1.4

* Update spirv-val help

* Operand version checks should use module version

Use the module version instead of the target environment version.

* Fix comment about two-access form of OpCopyMemory
2019-05-07 12:27:18 -04:00
Steven Perron
6d04da22c6
Fix up type mismatches. (#2545)
Add functionality to fix-storage-class so that it can fix up mismatched
data types for pointers as well.

Fixes bugs in when fixing up storage class.

Move GenerateCopy to the Pass class to be reused.

The spirv-opt change for #2535.
2019-05-02 09:31:46 -04:00
Steven Perron
32af42616a
Change implementation of post order CFG traversal (#2543)
* Change implementation of post order CFG traversal

It seems like the recursion is going very deep, and causing some problem
is particular situations.  I've reimplemented the CFG post order
traversal to not use recursion.

Fixes #2539.
2019-04-29 17:09:20 -04:00
Ryan Harrison
b68af7ca8e
Add support for Private & Output to initializer decompose flag (#2537)
Fixes #2388
2019-04-25 16:24:32 -04:00
Ryan Harrison
048dcd38ce
Implement WebGPU->Vulkan initializer conversion for 'Function' variables (#2513)
WebGPU requires certain variables to be initialized, whereas there are
known issues with using initializers in Vulkan. This PR is the first
of three implementing a pass to decompose initialized variables into
a variable declaration followed by a store. This has been broken up
into multiple PRs, because there 3 distinct cases that need to be
handled, which require separate implementations.

This first PR implements the basic infrastructure that is needed, and
handling of Function storage class variables. Private and Output will
be handled in future PRs.

This is part of resolving #2388
2019-04-16 14:31:36 -04:00
Ryan Harrison
102e430a88
Add pass to legalize OpVectorShuffle for WebGPU (#2509)
In WebGPU, the component operand 0xFFFFFFFF is forbidden, but in
Vulkan it is used to indicate a value is undefined. When converting to
WebGPU, 0xFFFFFFFF needs to converted to a legal value, though the
specific one does not matter, since it was used to indicate an
undefined entry in the original code. Choosing to use 0, since the
operands are required to be on [0, N-1], so 0 is guaranteed to always
be valid.

Fixes #2349
2019-04-12 12:14:23 -04:00
Steven Perron
9047de51cb
Accept OpBitCast in fix storage class. (#2505)
Fixes http://crbug.com/950889.
2019-04-09 14:10:35 -04:00
Ryan Harrison
0cb2d4079e
Add WebGPU->Vulkan and Vulkan->WebGPU flags in spirv-opt (#2496)
Renames the existing flag '--webgpu-mode' to '--vulkan-to-webgpu' for
the Vulkan->WebGPU operation, and adds a new flag '--webgpu-to-vulkan'
for the WebGPU->Vulkan operation.

Currently '--webgpu-to-vulkan' doesn't have any passes associated with
it yet, but further patches will implement them.

Fixes #2495
2019-04-05 15:12:26 -04:00
Steven Perron
3a0bc9e724
Add fix storage class code. (#2434)
This pass tries to fix validation error due to a mismatch of storage classes
in instructions.  There is no guarantee that all such error will be fixed,
and it is possible that in fixing these errors, it could lead to other
errors.

Fixes #2430.
2019-04-05 13:12:08 -04:00
alan-baker
236bdc0065 Change prioritization of unreachable merge and continue (#2460)
Fixes #2452

Swaps priority of handling unreachable merge and continues so that the
back-edge is retained in the case a block is both a loop continue and
loop merge
2019-04-03 12:50:08 -04:00
Steven Perron
12e4a7b649
Handle variable pointer in some optimizations (#2490)
* Check var pointer capability in ADCE.

* Check var ptr capability for common uniform.

* Check var ptr capability in access chain convert.

Since we want this pass to run even if there are variable pointer on
storage buffers, we had to remove asserts that assumed there were no
variable pointers.  The functions with the asserts will now work, it
becomes the responsibility of the callers to deal with the output as
appropriate.

* Single block elimination and variable pointers.

It seems like the code in local single block elimination is able to
handle cases with variable pointers already.  This is because the
function `HasOnlySupportedRefs` ensures that variables that feed a
variable pointer are not candidates.

* Single store elimination and variable pointers.

It seems like the code in local single stroe elimination is able to
handle cases with variable pointers already.  This is because the
function `FindSingleStoreAndCheckUses` ensures that variables that feed
a variable pointer are not candidates.

* SSA rewriter and variable pointers.

It seems like the code in the two passes that call the SSA rewriter are
able to  handle cases with variable pointers already.  This is because the
function `HasOnlySupportedRefs` ensures that variables that feed
a variable pointer are not candidates.

Fixes #2458.
2019-04-03 12:47:51 -04:00
Ryan Harrison
01964e325f
Add pass to generate needed initializers for WebGPU (#2481)
Fixes #2387
2019-04-03 11:44:09 -04:00
alan-baker
4bd106b089
Handle dead infinite loops in DCE (#2471)
Fixes #2456

* When eliminating a structured construct that has an unreachable merge,
replace that unreachable terminator with an appropriate return
* New tests
2019-04-03 10:30:12 -04:00
alan-baker
c9874e5090
Fix merge return in the face of breaks (#2466)
Fixes #2453

* Enable addition of OpPhi instructions when the loop has multiple
predecessors of the merge due to a break
 * This can result in some values no longer dominating their uses
* Track return blocks in structured flow to produce OpPhis that have
multiple undef and non-undef arguments
* New tests to catch the bug
* When a block is predicated, mark the new body as a return if the old
block as already a return
2019-04-02 10:05:28 -04:00
alan-baker
0300a464a4 Maintain inst to block mapping in merge return (#2469)
Fixes #2455

Properly maintains instruction to block mapping for newly created phi instructions in merge return
2019-04-01 13:14:10 -04:00
alan-baker
320a7de5c9
Validate that OpUnreacahble is not statically reachable (#2473)
* Adds a validator check that ensures no block reachable from the entry
block is terminated by OpUnreachable
* Updated tests
* Added new tests
2019-03-29 10:49:37 -04:00
alan-baker
2ff54e34ed
Handle function decls in Structured CFG analysis (#2474)
Fixes #2451

* Structured cfg analysis now handles functions with no basic blocks
* Added a test
2019-03-26 14:39:16 -04:00
greg-lunarg
e1a76269b6 Bindless Validation: Descriptor Initialization Check (#2419)
If SPV_EXT_descriptor_indexing is enabled, add check that for a
descriptor-based reference, the descriptor is initialized. Initialization
data is stored in the debug input buffer, added to the length information
already there. This feature must be seperately enabled on the pass
creation routine. NOTE: Currently just supports image references; buffer
references are still TODO.
2019-03-19 09:53:43 -04:00
Ryan Harrison
e545522146
Add --strip-atomic-counter-memory (#2413)
Adds an optimization pass to remove usages of AtomicCounterMemory
bit. This bit is ignored in Vulkan environments and outright forbidden
in WebGPU ones.

Fixes #2242
2019-03-14 13:34:33 -04:00
Steven Perron
5186ffedb3
Remove duplicates from list of interface IDs in OpEntryPoint instruction (#2449)
* Remove duplicates from list of interface IDs in OpEntryPoint instruction

Fixes #2002.
2019-03-13 15:46:31 -04:00
Steven Perron
9d29c37ac5
Removing decorations when doing constant propagation. (#2444)
In constant propagation, decoration are transfered from the original
expression to the constant that will replace it.  This can be wrong
because there are no decorations that apply to constants.  We choose to
simply delete the decorations.

Fixes #2441
2019-03-13 10:40:49 -04:00
Steven Perron
d800bbbac9
Handle back edges better in dead branch elim. (#2417)
* Handle back edges better in dead branch elim.

Loop header must have exactly one back edge.  Sometimes the branch
with the back edge can be folded.  However, it should not be folded
if it removes the back edge.

The code to check this simply avoids folding the branch in the
continue block.  That needs to be changed to not fold the back edge,
wherever it is.

At the same time, the branch can be folded if it folds to a branch to
the header, because the back edge will still exist.

Fixes #2391.
2019-02-26 09:06:51 -05:00
Sarah
4c43afcade
It is invalid to apply both Restrict and Aliased to the same <id> (#2408)
to fix #2408 - It is invalid to apply both Restrict and Aliased to the same
2019-02-21 12:03:52 -05:00
Steven Perron
fde69dcd80
Fix OpDot folding of half float vectors. (#2411)
* Fix OpDot folding of half float vectors.

The code that folds OpDot does not handle half floats correctly.  After
trying to multiple the first components, we get a nullptr because we
don't fold half float values.  This nullptr gets passed to the code that
does the addition, and causes an assert.

Fixes #2405.
2019-02-20 20:05:08 -05:00
Steven Perron
8eddde2e70
Don't change type of input and output var in dead member elim (#2412)
The types of input and output variables must match for the pipeline.  We
cannot see the uses in all of the shader, so dead member
elimination cannot safely change the type of input and output variables.
2019-02-20 18:59:41 -05:00
greg-lunarg
2f84b5de9a Bindless: Fix computation of set and binding for runtime bounds check (#2384)
Also fix test to use non-zero set and binding which will make error
more obvious.
2019-02-19 11:43:30 -05:00
Ryan Harrison
6d20f62570
Refactor webgpu-mode pass ran tests to be parameterized (#2395)
Fixes #2394
2019-02-15 11:08:05 -05:00
Steven Perron
78ac954c41
Mark type id of unknown instructions at fully used. (#2399) 2019-02-15 10:49:49 -05:00
greg-lunarg
9540f2d981 Instrumentation: Fix instruction index when multiple functions (#2389) 2019-02-15 09:49:18 -05:00
Steven Perron
1b0047f210
Add pass to remove dead members. (#2379)
Add a pass that looks for members of structs whose values do not affects
the output of the shader. Those members are then removed and just
treated like padding in the struct.
2019-02-14 13:42:35 -05:00
alan-baker
354205b3dc
Don't merge unreachable blocks (#2375)
Fixes #2374

* Block merging no longer merges unreachable blocks into their
successors
 * added a test
2019-02-12 09:24:01 -05:00
Ryan Harrison
12b3d7e9d6 Add strip-debug to webgpu-mode passes (#2368)
Fixes #2366
2019-02-08 14:26:17 -05:00
greg-lunarg
cf21146137 Expand bindless bounds checking to runtime-sized descriptor arrays (#2316) 2019-02-07 14:00:36 -05:00
Ryan Harrison
0f4bf0720a
Add flatten-decorations flag to webgpu-mode flags (#2348)
Fixes #2272
2019-02-05 14:07:53 -05:00
Alastair Donaldson
3b6fee3dae Fixes #2338. Added functionality to remove OpPhi instructions (replacing their uses) when merging blocks (#2339)
* Fixes #2338.  Added check for phi node before merging blocks.

* Added functionality to merge blocks A and B even when B starts with OpPhi instructions, by replacing uses of the OpPhi results with the definitions coming from A.  Added some tests for this.

* Fixed assertion.
2019-01-31 09:36:05 -05:00
Steven Perron
464111eaef
Remove use of deprecated googletest macro (#2286)
* Remove use of deprecated googletest macro

INSTANTIATE_TEST_CASE_P has been deprecated.  We need to use
INSTANTIATE_TEST_SUITE_P instead.

* Remove extra commas from test suites.
2019-01-29 18:56:52 -05:00
Steven Perron
8df947d2d6
Handle instructions not in blocks in code sinking. (#2308)
When looking at the uses of the result of an instruction, code sinking
assumes that all uses are in a basic block.  However, this is not true
if there is a decoration or name for the result of that insturction.
This commit checks for this.

Fixes https://crbug.com/923243.
2019-01-21 12:09:56 -05:00
Steven Perron
d6c067630d Handle extract with no index in VDCE. (#2305)
It is legal, but not generated by any SPIR-V producer: an OpCompositeExtract
with no indexes.  This is essentially just a copy of the object, so we
treat them that way.  We simply propagate the live variables of the
result to the operand.

Fixes https://crbug.com/919181.
2019-01-18 15:43:36 -05:00
Steven Perron
81fb2649bf
Handle access chain with no index in SROA. (#2304)
It is legal, but not generated by any SPIR-V producer: an OpAccessChain
with no indexes.  This is essentially just a copy of the pointer.

I have decided to treat it like an OpCopyObject.  In CheckUses, we
return that it is not okay.

When looking at this I realized that we had code in GetUsedComponents
that cannot be reached.  If there is a use in an OpCopyObject the it
will not call GetUsedComponents.  I removed that dead code.

Fixes https://crbug.com/918311.
2019-01-18 14:19:43 -05:00
Steven Perron
213e15e100
Fix overflow when negating INT_MIN. (#2293)
When doing (-INT_MIN) is considered overflow, so we cannot fold it by
actually performing the negation.

Fixes https://crbug.com/917991
2019-01-17 17:01:55 -05:00
Steven Perron
99c2c21cf4
Fix memory leak in unrolling. (#2301)
During unrolling a new loop is created, but its ownership is not clear
as it gets passed through the code. Changed something to unique_ptr to
make that clearer.

Fixes #2299.

Fixing other memory leaks at the same time.

Fixes #2296
Fixes #2297
2019-01-17 16:02:43 -05:00
Steven Perron
dd4157dcee
Sink (#2284)
Add code sinking pass. It will move OpLoad and OpAccessChain instructions as close as possible to their uses.

Part of #1611.
2019-01-17 15:56:36 -05:00
greg-lunarg
8d2d66f30c Fix vertex instrumentation to use VertexIndex and InstanceIndex (#2294)
...instead of VertexId and InstanceId
2019-01-16 18:02:07 -05:00
Steven Perron
49b5b0abc6
Fix up bit shifts by 32. (#2292)
In C++, a bit shift of the same size as the type is undefined, but it is
defined in spir-v.  When folding those cases, we have to be careful.  We
cannot simply do the shift in C++.

Fixes https://crbug.com/917697.
2019-01-16 15:52:23 -05:00
greg-lunarg
83bfdc976a Instrumentation: Add ArrayStride decoration to debug output buffer array (#2290) 2019-01-16 10:01:40 -05:00
alan-baker
06c9dc07bd
Upgrade modf and frexp (#2266)
Fixes #2138

* Modf and frexp are upgraded to use the struct version of the
instruction and generate an explicit store whose flags can be upgraded
separately
* Fixed major bug where availability and visibility were reversed for
non-copy memory instructions
* Fixed bug where availability and visibility scope operands were reversed for copy memory
* Upgraded all opt tests to use SPV_ENV_UNIVERSAL_1_3
* Upgrade tests moved into unified tests and removed standalone test
2019-01-07 12:36:38 -05:00
Steven Perron
241644a5a3
Have replace load size handle extact with no index. (#2261)
Fixes https://crbug.com/917774
2019-01-03 13:02:10 -05:00
Steven Perron
9f36c8bb72
Handle CompositeInsert with no indices in VDCE (#2258)
* Handle CompositeInsert with no indices in VDCE

In the spec, there it nothing that forces an OpCompositeInsert to have
an index, but VDCE assumes there is at least 1 in a couple places.

This commit updates VDCE to handle these cases.
2019-01-02 14:00:04 -05:00
Steven Perron
bdc2ab9356
In LICM don't place code between merge instruction and branch. (#2252)
Fixes #2210.
2018-12-20 18:33:52 -05:00
kholtnv
e49bd96f2c Added additional changes for the new AccelerationStructureNV type. (#2218)
* Added additional changes for the new AccelerationStructureNV type.

* Added additional changes for the new AccelerationStructureNV type.  Change tabs to space...

* Added additional changes for the new accelerationStructureNV type -- add proper type name.

Fix TypeManager.TypeStrings test:
[----------] 29 tests from TypeManager
[ RUN      ] TypeManager.TypeStrings
[       OK ] TypeManager.TypeStrings (7 ms)
2018-12-19 21:42:39 +00:00
Steven Perron
68b69e16aa
Update the continue target in merge return. (#2249)
When we are predicating the continue target for a loop, it can no longer
be the continue target because it will have a branch that exits the loop
and is not the bach edge.  The continue target will have to be the
target of that branch that is still in the loop.

Fixes #2211.
2018-12-19 21:24:49 +00:00
Steven Perron
ac7feace90
Fix missing OpPhi after merge return. (#2248)
The function `UpdatePhiNodes` was being called inconsistently.  In one
case, the cfg had already been updated to include the new edge, and in
another place the cfg was not updated.  This caused the function to
miss flagging a block as needing new phi nodes.  I picked that the cfg
should not be updated before making the call.  I documented it, and
change the call sites to match.

Fixes #2207.
2018-12-19 18:17:42 +00:00
Steven Perron
9d04f82bef
Ensure SROA gets the correct pointer type. (#2247)
We initially assumed that if the type manager returned the correct id
for the pointee type, that we would get the correct pointer type back,
but that is not true.  See the unit test added with this commit.  We
need to fall back to the linear search any time we are looking for a
pointer to a type that may not be unique.

At the same time, SROA considered an OpName on a variable to be a use of
the entire variable.  That has been fixed.

Fixes #2209.
2018-12-19 17:07:29 +00:00
Steven Perron
9e81c337f9
Place load after OpPhi instructions in block. (#2246)
We currently place the load instructions at the start of the basic block
that dominates all of the loads.  If that basic block contains OpPhi
instructions, then this will generate invalid code.  We just need to
search for a location that comes after all of the OpPhi instructions.

Fixes #2204.
2018-12-19 15:18:22 +00:00
Steven Perron
5ec2d1a8cd
Don't fold specialized branches in loop unswitch (#2245)
* Don't fold specialized branchs in loop unswitch

Folding branches can have a lot of special cases, and can be a little
error prone.  So I only want it in one place.  That will be in dead
branch elimination.  I will change loop unswitching to set the branches
that were being folded to have a constant condition.  Then subsequent
pass of dead branch elimination will be able to remove the code.

At the same time, I added a check that loop unswitching will not
unswitch a branch with a constant condition.  It is not useful to do it
because dead branch elimination will simple fold the branch anyway.
Also it avoid an infinite loop that would other wise be introduced by my
first change.

Fixes #2203.
2018-12-19 04:40:30 +00:00
Steven Perron
1254335d13
Don't unswitch the latch block. (#2205)
Loop unswitching is unswitching the conditional branch that creates the
back-edge. In the version of the loop, where the bachedge is not taken,
there is no back-edge. This is what causes the validator to complain.

The solution I will go with will be to now unswitch a condition with a
back-edge. At this time we do not now if loop unswitching is used. We do
not include it in the optimization sets provided, nor is it used in
glslang's set. When there are opportunities and no breaks from the loop,
the loop with either be a single iteration loop, or an infinite loop.
There is no performance advantage to performing loop unswitching in
either of those cases. If there is a break, maintaining structured
control flow will be tricky. Unless we see a clear advantage to handling
these case, I would go with the safer simpler solution.

Fixes #2201.
2018-12-18 18:15:00 +00:00
Steven Perron
ff07c6df83
SSA-rewriter: make sure phi entries are unique. (#2206)
If there are multiple edges to a basic block, then the ssa rewriter will
create OpPhi instructions with duplicate entries.  This is invalid, and
it is fixed in this commit.

Fixes #2202.
2018-12-18 18:14:27 +00:00
Jeff Bolz
24328a0554 Recognize OpTypeAccelerationStructureNV as a type instruction (#2190) 2018-12-11 19:03:55 -05:00
Steven Perron
e07dabc25f
Invalidate the decoration manager at the start of ADCE. (#2189)
* Invalidate the decoration manager at the start of ADCE.

If the decoration manager is kept live the the contex will try to keep
it up to date.  ADCE deals with group decorations by changing the
operands in |OpGroupDecorate| instructions directly without informing
the decoration manager.  This puts it in an invalid state, which will
cause an error when the context tries to update it.  To Avoid this
problem, we will invalidate the decoration manager upfront.

At the same time, the decoration manager is now considered when checking
the consistency of the decoration manager.
2018-12-10 13:24:33 -05:00
Jeff Bolz
8fc8dfe4a5 Add PCH_FILE for upgrade_memory_model target. MSVC doesn't like building pass_utils.cpp twice in the same folder with different PCH settings. (#2186) 2018-12-10 10:54:19 -05:00
Steven Perron
0bc66a8ba9
Fix invalid OpPhi generated by merge-return. (#2172)
* Fix invalid OpPhi generated by merge-return.

When we create a new phi node for a value say %10, we have to replace
all of the uses of %10 that are no longer dominated by the def of %10
by the result id of the new phi.  However, if the use is in a phi node,
it is possible that the bb contains the use is not dominated by either.
In this case, needs to be handled differently.

* Split loop headers before add a new branch to them.

In merge return, Phi node in loop header that are also merges for loop
do not get updated correctly.  Those cases do not fit in with our
current analysis.  Doing this will simplify the code by reducing the
number of cases that have to be handled.
2018-12-07 14:10:30 -05:00
David Neto
6df6194db8
Validate Uniform decoration (#2181) 2018-12-07 09:32:57 -05:00
Steven Perron
2e4563d94f
Document in the context what happens with id overflow. (#2159)
Added documentation to the ir context to indicates that TakeNextId()
returns 0 when the max id is reached.  TODOs were added to each call
sight so that we know where we have to start to handle this case.

Handle id overflow in |SplitLoopHeader|.

Handle id overflow in |GetOrCreatePreHeaderBlock|.

Handle failure to create preheader in LICM.

Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1841.
2018-12-06 09:07:00 -05:00
Steven Perron
17cba4695c
Remove undefined behaviour when folding shifts. (#2157)
We currently simulate all shift operations when the two operand are
constants.  The problem is that if the shift amount is larger than
32, the result is undefined.

I'm changing the folder to return 0 if the shift value is too high.
That way, we will have defined behaviour.

https://crbug.com/910937.
2018-12-04 10:04:02 -05:00
alan-baker
e510b1bac5
Update memory model (#1904)
Upgrade to VulkanKHR memory model

* Converts Logical GLSL450 memory model to Logical VulkanKHR
* Adds extension and capability
* Removes deprecated decorations and replaces them with appropriate
flags on downstream instructions
* Support for Workgroup upgrades
* Support for copy memory
* Adding support for image functions
* Adding barrier upgrades and tests
* Use QueueFamilyKHR scope instead of device
2018-11-30 14:15:51 -05:00
Steven Perron
2d2a512691
Don't inline recursive functions. (#2130)
* Move ProcessFunction* function from pass to the context.

There are a few functions that are used to traverse the call tree.
They currently live in the Pass class, but they have nothing to do with
a pass, and may be needed outside of a pass.  They would be better in
the ir context, or in a specific call tree class if we ever have a need
for it.

* Don't inline recursive functions.

Inlining does not check if a function is recursive or not.  This has
been fine as long as the shader was a Vulkan shader, which forbid
recursive functions.  However, not all shaders are vulkan, so either
we limit inlining to Vulkan shaders or we teach it to look for recursive
functions.

I prefer to keep the passes as general as is reasonable.  The change
does not require much new code in inlining and gives a reason to refactor
some other code.

The changes are to add a member function to the Function class that
checks if that function is recursive or not.

Then this is used in inlining to not inlining a function call if it calls
a recursive function.

* Add id to function analysis

There are a few places that build a map from ids to Function whose
result is that id.  I decided to add an analysis to the context for this
to reduce that code, and simplify some of the functions.

* Add missing file.
2018-11-29 14:24:58 -05:00
alan-baker
3d56cddb75
Validate pointer variables (#2111)
Fixes #2104

* Checks the rules for logical addressing and variable pointers
 * Has an out for relaxed logical pointers
* Updated PassFixture to expose validator options
 * enabled relaxed logical pointers for some tests
* New validator tests
2018-11-27 16:47:10 -05:00
Ryan Harrison
d7cd1203a4 Ensure for OpVariable that result type and storage class operand agree (#2052)
From SPIR-V spec, section 3.32.8 on OpVariable:
  Its Storage Class operand must be the same as the Storage Class
  operand of the result type.

Fixes #941
2018-11-16 11:22:11 -05:00
greg-lunarg
c37388f1ad Add passes to propagate and eliminate redundant line instructions (#2027). (#2039)
These are bookend passes designed to help preserve line information
across passes which delete, move and clone instructions. The propagation
pass attaches a debug line instruction to every instruction based on
SPIR-V line propagation rules. It should be performed before optimization.
The redundant line elimination pass eliminates all line instructions
which match the previous line instruction. This pass should be performed
at the end of optimization to reduce physical SPIR-V file size.

Fixes #2027.
2018-11-15 14:06:17 -05:00
Steven Perron
dc9d155d62
Fix folding of volatile store. (#2048)
When looking for the Volatile mask on a store, the instruction folder
accesses an out-of-bounds element.  We fix that up.

Fixes crbug.com/903530.
2018-11-14 13:52:18 -05:00
Steven Perron
ec5574a9c6
Instruction::GetBaseAddress to handle OpPtrAccessChain (#2050)
That function currently only handled OpPtrAccessChain if it was in the
middle of the chain, but not at the start.  Fixing that up.

Fixes crbug.com/905271.
2018-11-14 12:42:25 -05:00
greg-lunarg
1e9fc1aac1 Add base and core bindless validation instrumentation classes (#2014)
* Add base and core bindless validation instrumentation classes

* Fix formatting.

* Few more formatting fixes

* Fix build failure

* More build fixes

* Need to call non-const functions in order.

Specifically, these are functions which call TakeNextId(). These need to
be called in a specific order to guarantee that tests which do exact
compares will work across all platforms. c++ pretty much does not
guarantee order of evaluation of operands, so any such functions need to
be called separately in individual statements to guarantee order.

* More ordering.

* And more ordering.

* And more formatting.

* Attempt to fix NDK build

* Another attempt to address NDK build problem.

* One more attempt at NDK build failure

* Add instrument.hpp to BUILD.gn

* Some name improvement in instrument.hpp

* Change all types in instrument.hpp to int.

* Improve documentation in instrument.hpp

* Format fixes

* Comment clean up in instrument.hpp

* imageInst -> image_inst

* Fix GetLabel() issue.
2018-11-08 13:54:54 -05:00
greg-lunarg
6721478ef1 Don't assume one return means function can be inlined. (#2018) (#2025)
If there is only 1 return and it is in a loop, then the function cannot be inlined.

Fix condition when inlined code needs one-trip loop wrapper.  The dummy loop is needed when there is a return inside a selection construct.  Even if there is only 1 return.
2018-11-08 09:11:20 -05:00
Jeff Bolz
c06a35b902 Rename PCH macro to spvtools_pch to avoid conflicts with other projects. Also add pch to test/opt. (#2034) 2018-11-07 09:15:04 -05:00
Jeff Bolz
60fac96c6b Enable precompiled headers for spirv-tools(-shared) and some unit tests (#2026) 2018-11-06 09:26:23 -05:00
Steven Perron
f2cc71e5cb
Handle OpMemberDecorateStringGOOGLE in ACDE (#2029)
Add missing case to the switch statement for the annotation
instructions.

See https://github.com/KhronosGroup/glslang/issues/1561.
2018-11-02 13:42:45 -04:00
dan sinclair
9e6f5134d1
Reduce number of test targets (#2024)
This CL takes the various opt unit tests and makes a single executable
instead of one per test. This reduces the number of build targets by
~125 when building with ninja.
2018-11-01 10:19:37 -04:00
Steven Perron
6647884a13
Remove MemberDecorateStringGOOGLE during stript-refect. (#2021)
The strip-reflect pass is not removing the reflection decorations that
are decorating members.  With this commit, they will now be removed.

Fixes #2019.
2018-10-30 16:17:35 -04:00
Steven Perron
18fe6d59e5
Fix dead branch elim infinite loop. (#2009)
When looking for a break from a selection construct, we do not realize
that a jump to the continue target of a loop containing the selection
is a break.  This causes and infinit loop, or possibly other failures.

Fixes #2004.
2018-10-24 09:10:30 -04:00
Steven Perron
0ba35798c3
Fix dead branch elim infinite loop. (#1997)
When looking for a break from a selection construct, we do not need to
look inside nested constructs.  However, if a loop header has an
unconditional branch, then we enter the loop.  Entering the loop causes
an infinite loop because we keep going through the loop.

The solution is to look for a merge block, if one exsits, even for block
terminated by an OpBranch.

Fixes #1979.
2018-10-22 13:59:20 -04:00
alan-baker
6e85d1a6fc
Fix restrictions in if conversion (#1998)
Fixes #1991

* Improved identification of potential conditional branches
* Pass changed to only work for shaders
* added a test to catch the bug
2018-10-19 15:16:46 -04:00
Steven Perron
715afb0cea
Add a nullptr check to array copy propagation. (#1987)
We are missing a check for a nullptr that is causing things to fail.

Added an extra test case, and fixed up others.

This is the fix for https://github.com/Microsoft/DirectXShaderCompiler/issues/1598.
2018-10-19 12:53:40 -04:00
greg-lunarg
c4687889b7 Fix ADCE to treat OpUnreachable correctly during liveness analysis (#1984)
ADCE liveness algorithm should treat OpUnreachable at least like other
branch instructions. It was being treated as always live which was
preventing useless structured constructs from being eliminated.
OpUnreachable is generated by dead branch elimination which is now
being required by merge return, so this fix should accompany that
change.
2018-10-19 10:16:35 -04:00
Steven Perron
0e68bb3632
Only run merge-returnon reachable functions. (#1983)
We currently run merge-return on all functions, but
dead-branch-elimination only runs on function reachable from an entry
point or exported function.  Since dead-branch-elimination is needed for
merge-return, they have to match.

Fixes #1976.
2018-10-18 08:48:27 -04:00
greg-lunarg
ab45d69154 Fix ADCE liveness to include all enclosing control structures. (#1975)
Was removing control structures which didn't have data dependency
with enclosed live loop and otherwise did not contain live code.
An example is a counting loop around a live loop.

Fixes #1967.
2018-10-16 08:00:07 -04:00
Nuno Subtil
5bc30788fd Fix gtest.h include in test/opt/pass_utils.h
Fixes builds where googletest is outside the SPIRV-Tools tree.
2018-10-12 10:22:25 -04:00
Steven Perron
82663f34c9
Check for unreachable blocks in merge-return. (#1966)
Merge return assumes that the only unreachable blocks are those needed
to keep the structured cfg valid.  Even those must be essentially empty
blocks.

If this is not the case, we get unpredictable behaviour.  This commit
add a check in merge return, and emits an error if it is not the case.

Added a pass of dead branch elimination before merge return in both the
performance and size passes.  It is a precondition of merge return.

Fixes #1962.
2018-10-10 15:18:15 -04:00
Steven Perron
4e266f775a
Fold divisions by 0. (#1963)
The current implementation in the folder when seeing a division by zero
is to assert.  In the release build, the compiler will attempt to
compute the value, which causes its own problems.

The solution I will go with is to fold the division, and just give it
the value of 0.  The same goes for remainder and mod operations.

Fixes #1961.
2018-10-10 11:17:26 -04:00
alan-baker
fae1e61ab8
Fix bug in construct block calculation (#1964)
Fixes #1960

* Only allows blocks that are dominated by the header
* Fixed a bad loop fusion test
* Added a test derived from the reported bug
2018-10-10 11:14:01 -04:00
Steven Perron
497958d899 Removing HLSLCounterBuffer decorations when not needed. (#1954)
The HlslCounterBufferGOOGLE that was introduced changed the OpDecorateId
so that is can now reference an id other than the target.  If that other
id is used only in the decoration, then the definition of the id will be
removed because decoration do not count as real uses.

However, if the target of the decoration is still live the decoration
will not be removed.  This leaves a reference to an id that is not
defined.

There are two solutions to consider.  The first is that is the decoration
is kept, then the definition of the id should be kept live.  Implementing
this change would be involved because the way ADCE handles decorations
will have to be reimplemented.

The other solution is to remove the decoration the id is otherwise dead.
This works for this specific case.  Also this is the more desirable
behaviour in this case.  The id will always be the id of a variable that
belongs to a descriptor set.  If that variable is not bound and we do
not remove it, the driver will complain.

I chose to implement the second solution.  The first will be left to when
a case for it comes up.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1885.
2018-10-05 08:23:09 -04:00
Jaebaek Seo
ebcc58b5f8 Validator: function scope variable at start of entry block #1923
All OpVariable instructions in a function must be the first
instructions in the first block.
2018-10-04 15:05:47 -04:00
Alan Baker
3b5960174f Don't scalarize spec constant sized arrays
Fixes #1952

* Prevent scalarization of arrays that are sized by a specialization
constant
2018-10-04 11:58:23 -04:00
Steven Perron
c4c68712c4
Make EFFCEE required (#1943)
Fixes #1912.

Remove the non-effcee build as EFFCEE is now required.
2018-10-04 10:00:11 -04:00
Alan Baker
a77bb2e54b Add validation for execution modes
* Check rules from Execution Mode tables, 2.16.2 and the Vulkan
environment spec

* Allows MeshNV execution model with the following execution modes
 * LocalSize, LocalSizeId, OutputPoints and OutputVertices
 * Done to not break their validation
2018-10-02 10:22:23 -04:00
Steven Perron
146eb3bdcf
Fix erroneous uses of the type manager in copy-prop-arrays. (#1942)
There are a few spots where copy propagate arrays is trying
to go from a Type to an id, but the type is not unique.  When generating
code this pass needs specific ids, otherwise we get type mismatches.
However, the ambigous types means we can sometimes get the wrong type
and generate invalid code.

That code has been rewritten to not rely on the type manager, and just
look at the instructions instead.

I have opened https://github.com/KhronosGroup/SPIRV-Tools/issues/1939 to
try to get a way to make this more robust.
2018-10-01 14:45:44 -04:00
Steven Perron
32381e30ef
Handle decoration groups with no decorations. (#1921)
In DecorationManager::RemoveDecorationsFrom, we do not remove the id
from a decoration group if the group has no decorations.  This causes
problems because KillNamesAndDecorates is suppose to remove all
references to the id, but in this case, there is still a reference.

This is fixed by adding a special case.

Also, there is the possibility of a double free because
RemoveDecorationsFrom will delete the instructions defining |id| when
|id| is a decoration group.  Later, KillInst would later write to memory
that has been deleted when trying to turn it into a Nop.  To fix this,
we will only remove the decorations that use |id| and not its definition
in RemoveDecorationsFrom.
2018-09-28 14:16:04 -04:00
Steven Perron
b85fb4a300
Get KillNameAndDecorates to handle group decorations. (#1919)
It seems like the current implementation of KillNameAndDecorates does
not handle group decorations correctly.  The id being removed is not
removed from the OpGroupDecorate instructions.  Even worst, any
decorations that apply to that group are removed.

The solution is to use the function in the decoration manager that will
remove the decorations and update the instructions instead of doing the
work itself.
2018-09-25 12:57:44 -04:00
Lei Zhang
9bfe0eb25e Wrap tests needing effcee inside SPIRV_EFFCEE 2018-09-20 11:51:40 -04:00
Steven Perron
9fbcce4ca1
Add unrolling to the legalization passes (#1903)
Adds unrolling to the legalization passes.

After enabling unrolling I found a bug when there is a self-referencing
phi node.  That has been fixed.

The test that checks for that the order of optimizations is correct also
needed to be updated.
2018-09-19 16:40:09 -04:00
Steven Perron
7075c49923
Add dummy loop in merge-return. (#1896)
The current implementation of merge return can create bad, but correct,
code.  When it is not in a loop construct, it will insert a lot of
extra branch around code.  The potentially large number of branches are
bad.  At the same time, it can separate code store to variables from
its uses hiding the fact that the store dominates the load.

This hurts the later analysis because the compiler thinks that multiple
values can reach a load, when there is really only 1.  This poorer
analysis leads to missed optimizations.

The solution is to create a dummy loop around the entire body of the
function, then we can break from that loop with a single branch.  Also
only new merge nodes would be those at the end of loops meaning that
most analysies will not be hurt.

Remove dead code for cases that are no longer possible.

It seems like some drivers expect there the be an OpSelectionMerge
before conditional branches, even if they are not strictly needed.
So we add them.
2018-09-18 08:52:47 -04:00
Steven Perron
5f599e700e
Fix infinite loop in dead-branch-elimination (#1891)
* Create structed cfg analysis.

There are lots of optimization that have to traverse the CFG in a
structured order just because it wants to know which constructs a
basic block in contained in.  This adds extra complexity to these
optimizations, for causes too much refactoring of older optimizations.

To help with this problem, I have written an analysis that can give this
information.

* Identify branches breaking from loops.

Dead branch elimination does a search for a conditional branch to the
end of the current selection construct.  This search assumes that the
only way to leave the construct is through the merge node.  But that is
not true.  The code can jump to the merge node of a loop that contains
the construct.

The search needs to take this into consideration.
2018-09-17 13:00:24 -04:00
Steven Perron
6d5f1bc2e8
Allow merge blocks to merge two header blocks in some cases. (#1890)
In merge blocks, we do not allow the merging of two blocks with merge
instructions.  This is because if the two block are merged only 1 of
those instructions can exists.  However, if the successor block is the
merge block of the predecessor, then we can delete the merge instruction
in the predecessor.  In this case, we are able to merge the blocks.
2018-09-14 13:37:18 -04:00
Steven Perron
75c1bf2843
Add option for the max id bound. (#1870)
* Create a new entry point for the optimizer

Creates a new struct to hold the options for the optimizer, and creates
an entry point that take the optimizer options as a parameter.

The old entry point that takes validator options are now deprecated.
The validator options will be one of the optimizer options.

Part of the optimizer options will also be the upper bound on the id bound.

* Add a command line option to set the max value for the id bound.  The default is 0x3FFFFF.

* Modify `TakeNextIdBound` to return 0 when the limit is reached.
2018-09-10 11:49:41 -04:00
Steven Perron
d746681fe9
Copy decorations when creating new ids. (#1843)
* Copy decorations when creating new ids.

When creating a new value based on an old value, we need to copy the
decorations to the new id.  This change does this in 3 places:

1) The variable holding the return value of the function generated by
merge return should get decorations from the function.

2) The results of the OpPhi instructions should get decorations from the
variable they are replacing in the ssa writer.

3) In local access chain convert the intermediate struct (result of
OpCompositeInsert) generated for the store replacement should get its
decorations from the variable being stored to.

Fixes #1787.
2018-08-24 11:55:39 -04:00
Steven Perron
b4d3618f77
Don't "break" from selection constructs. (#1862)
If seems like at least 1 driver does not like a condition jump to the end
of a selection construct.  We are generating these in the merge return
pass.  This change stops merge return from generating this sequence.

Part of #1861.
2018-08-23 14:38:25 -04:00
Steven Perron
6c73b1fb70
Update the order when predicating blocks. (#1859)
When doing predicate blocks, we need to traverse every block in
structured order in order to keep track of which construct a block is
contained in.  The standard way of traversing code in structured order
is to create a list with all of the nodes in order.  However, when
predicating blocks, new blocks are created, and those blocks are missed.
This causes branches that go too far.

The solution is to update the order as new blocks are created.  Since
we are using an std::list, we do not have to worry about invalidation of
iterators when changing the list.
2018-08-23 12:59:31 -04:00
Steven Perron
19264ef42c
Have PredicateBlocks jump the existing merge blocks. (#1849)
* Refactor PredicateBlocks

Refactor PredicateBlocks so that we know which constructs a return
is contained in.  Will be used later.

* Have PredicateBlocks jump the existing merge blocks.

In PredicateBlocks, we currently skip instructions with side effects,
but it still follows the same control flow (sort-of).  This causes a
problem, when we are trying to predicate code in a loop.  We skip all
of the code with side effects (IV increment), but still follow the
same control flow (jump back the start of the loop).  This creates an
infinite loop because the code will keep jumping back to the start of
the loop without changing the values that effect the exit condition.

This is a large change to merge-return.  When predicating a block that
is in a loop or merge construct, it will jump to the merge block of the
construct.  Once out of all constructs we will generate code as we did
before.
2018-08-21 12:04:08 -04:00
Steven Perron
d693a83e36
Handle breaks from structured-ifs in DCE. (#1848)
* Handle breaks from structured-ifs in DCE.

dead code elimination assumes that are conditional branches except for
breaks and continues in loops will have an OpSelectionMerge before them.
That is not true when breaking out of a selection construct.

The fix is to look for breaks in selection constructs in the same place
we look for breaks and continues for loops.
2018-08-21 11:54:44 -04:00
Steven Perron
45c235d41f
Have dead-branch-elim handle conditional exits from selections. (#1850)
When dead-branch-elim folds a conditional branch, it also deletes the
OpSelectionMerge instruction.  If that construct contains a
conditional branch to the merge node, it will not have its own
OpSelectionMerge.  When the headers merge instruction is deleted, the
the inner conditional branch will no longer be legal.  It will be a
selection to a node that is not a merge node.

We fix this up by moving the OpSelectionMerge to a new location if it is
still needed.
2018-08-21 11:49:56 -04:00
Steven Perron
e065cc208f
Keep decorations when replacing loads in access-chain-convert. (#1829)
In local-access-chain-convert, we replace loads by load the entire
variable, then doing the extract.  The extract will have the same value
as the load.  However, if the load has a decoration on it, the
decoration is lost because we do not copy any them to the new id.

This is fixed by rewritting the load into the extract and keeping the
same result id.

This change has the effect that we do not call DCEInst on the loads
because the load is not being deleted, but replaced.  This could leave
OpAccessChain instructions around that are not used.  This is not a
problem for -O and -Os.  They run local_single_*_elim passes and then
dead code elimination.  The dce will remove the unused access chains,
and the load elimination passes work even if there are unused access
chains.  I have added test to them to ensure they will not loss
opportunities.

Fixes #1787.
2018-08-15 09:14:21 -04:00
dan sinclair
ef678672fb
Remove source/message.h (#1838)
The code in source/message was only used in a single set of tests to
format the output results. This CL changes the test to verify the
message instead of all the error values and removes the source/message
code.
2018-08-14 15:41:21 -04:00
dan sinclair
1963a2dbda
Use MakeUnique. (#1837)
This CL replaces instances of reset(new ..) with MakeUnique.
2018-08-14 15:01:50 -04:00
dan sinclair
1553025f4c
Move make_unique to source/util. (#1836)
This MakeUnique code is used in places other then source/opt so move it
to source/utils.
2018-08-14 12:44:54 -04:00
Steven Perron
5c8b4f5a1c
Validate the input to Optimizer::Run (#1799)
* Run the validator in the optimization fuzzers.

The optimizers assumes that the input to the optimizer is valid.  Since
the fuzzers do not check that the input is valid before passing the
spir-v to the optimizer, we are getting a few errors.

The solution is to run the validator in the optimizer to validate the
input.

For the legalization passes, we need to add an extra option to the
validator to accept certain types of variable pointers, even if the
capability is not given.  At the same time, we changed the option
"--legalize-hlsl" to relax the validator in the same way instead of
turning it off.
2018-08-08 11:16:19 -04:00
dan sinclair
05057c9846
Fixup readabilty/inheritance warnings (#1805)
This CL removes the un-needed virtual qualifiers from methods already
marked as override.
2018-08-07 09:10:03 -04:00
dan sinclair
eda2cfbe12
Cleanup includes. (#1795)
This Cl cleans up the include paths to be relative to the top level
directory. Various include-what-you-use fixes have been added.
2018-08-03 15:06:09 -04:00
dan sinclair
7861df9bb3
Remove using namespace commands. (#1794)
This CL removes the last two 'using namespace' commands.
2018-08-03 08:05:52 -04:00
dan sinclair
58a6876cee
Rewrite include guards (#1793)
This CL rewrites the include guards to make PRESUBMIT.py include guard
check happy.
2018-08-03 08:05:33 -04:00
Steven Perron
ce644d4a24
Update OpPhi instructions after splitting block. (#1783)
In the merge return pass, we will split a block, but not update the phi
instructions that reference the block.  Since the branch in the original
block is now part of the block with the new id, the phi nodes must be
updated.

This commit will change this.

I have also considered other places where an id of a basic block could
be referenced, and I don't think any of them need to change.

1) Branch and merge instructions: These jump to the start of the
original block, and so we want them to jump to the block that uses the
original id.  Nothing needs to change.

2) Names and decorations: I don't think it matters with block keeps the
name, and there are no decorations that apply to basic blocks.

Fixes #1736.
2018-08-02 11:02:50 -04:00
Steven Perron
c8c724cba7
Don't change decorations and names in merge return. (#1777)
When creating a new phi for a value in the function, merge return will
rewrite all uses of an id that are no longer dominated by its
definition.  Uses that are not in a basic block, like OpName or
decorations, are not dominated, but they should not be replaced.

Fixes #1736.
2018-08-01 13:47:09 -04:00
Alan Baker
755e5c9420 Transform to combine consecutive access chains
* Combines OpAccessChain, OpInBoundsAccessChain, OpPtrAccessChain and
OpInBoundsPtrAccessChain
* New folding rule to fold add with 0 for integers
 * Converts to a bitcast if the result type does not match the operand
 type
V
2018-07-31 13:42:47 -04:00
Diego Novillo
99fe61e724 Add API to create passes out of a list of command-line flags.
This re-implements the -Oconfig=<file> flag to use a new API that takes
a list of command-line flags representing optimization passes.

This moves the processing of flags that create new optimization passes
out of spirv-opt and into the library API.  Useful for other tools that
want to incorporate a facility similar to -Oconfig.

The main changes are:

1- Add a new public function Optimizer::RegisterPassesFromFlags. This
   takes a vector of strings.  Each string is assumed to have the form
   '--pass_name[=pass_args]'.  It creates and registers into the pass
   manager all the passes specified in the vector.  Each pass is
   validated internally.  Failure to create a pass instance causes the
   function to return false and a diagnostic is emitted to the
   registered message consumer.

2- Re-implements -Oconfig in spirv-opt to use the new API.
2018-07-27 15:10:08 -04:00
Alan Baker
b49f76fd62 Handle undef literal value in vector shuffle
Fixes #1731

* Updated folding rules related to vector shuffle to account for the
undef literal value:
 * FoldVectorShuffleFeedingShuffle
 * FoldVectorShuffleFeedingExtract
 * FoldVectorShuffleWithConstants
* These rules would commit memory violations due to treating the undef
literal value as an accessible composite component
2018-07-20 11:32:43 -04:00
Alan Baker
3c19651733 Add variable pointer support to IsValidBasePointer
Fixes #1729

* Adds supported opcodes to IsValidBasePointer() enable by
VariablePointers and VariablePointersStorageBuffer capabilities
 * Added tests
2018-07-19 14:43:59 -04:00
Alan Baker
28199b80b7 Fix block ordering in dead branch elim
Fixes #1727

* If the pass finds any dead branches it can optimize then at the end of
the pass it reorders basic blocks to ensure they satisfy block ordering
requirements
 * Added some new tests
* While investigating this issue, found and fixed a non-deterministic
ordering of dominators
 * Now the edges used to construct the dominator tree are sorted
 according to posorder traversal indices
2018-07-19 11:17:57 -04:00
Steven Perron
208921efe8 Fix finding constant with particular type. (#1724)
With current implementation, the constant manager does not keep around
two constant with the same value but different types when the types
hash to the same value. So when you start looking for that constant you
will get a constant with the wrong type back.

I've made a few changes to the constant manager to fix this.  First off,
I have changed the map from constant to ids to be an std::multimap.
This way a single constant can be mapped to mutiple ids each
representing a different type.

Then when asking for an id of a constant, we can search all of the ids
associated with that constant in order to find the one with the correct
type.
2018-07-16 12:36:53 -04:00
Steven Perron
95b4d47e34 Fix infinite loop while folding OpVectorShuffle (#1722)
When folding an OpVectorShuffle where the first operand is defined by
an OpVectorShuffle, is unused, and is equal to the second, we end up
with an infinite loop.  This is because we think we change the
instruction, but it does not actually change.  So we keep trying to
folding the same instruction.

This commit fixes up that specific issue.  When the operand is unused,
we replace it with Null.
2018-07-13 12:43:00 -04:00
Steven Perron
63c1d8fb15
Fix size error when folding vector shuffle. (#1721)
When folding a vector shuffle that feeds another vector shuffle causes
the size of the first operand to change, when other indices have to be
adjusted reletive to the new size.
2018-07-13 11:20:02 -04:00
dan sinclair
e477e7573e
Remove the module from opt::Function. (#1717)
The function class provides a {Set|Get}Parent call in order to provide
the context to the LoopDescriptor methods. This CL removes the module
from Function and provides the needed context directly to LoopDescriptor
on creation.
2018-07-12 14:42:05 -04:00
dan sinclair
a5e4a53217
Remove context() method from opt::Function (#1700)
This CL removes the context() method from opt::Function. In the places
where the context() was used we can retrieve, or provide, the context in
another fashion.
2018-07-12 10:16:15 -04:00
dan sinclair
f96b7f1cb9
use Pass::Run to set the context on each pass. (#1708)
Currently the IRContext is passed into the Pass::Process method. It is
then up to the individual pass to store the context into the context_
variable. This CL changes the Run method to store the context before
calling Process which no-longer receives the context as a parameter.
2018-07-12 09:08:45 -04:00
Steven Perron
e63551deac Add folding rule to merge a vector shuffle feeding another one. 2018-07-11 14:44:46 -04:00
dan sinclair
2cce2c5b97
Move tests into namespaces (#1689)
This CL moves the test into namespaces based on their directories.
2018-07-11 09:24:49 -04:00
dan sinclair
84846b7e76
Cleanup whitespace lint warnings. (#1690)
This CL cleans up the whitespace warnings and enables the check when
running 'git cl presubmit --all -uf'.
2018-07-10 13:09:46 -04:00
Sean Purcell
9532aede29 Fix unused param errors when Effcee not present 2018-07-10 11:57:42 -04:00
dan sinclair
a3e3869540
Convert validation to use libspriv::Instruction where possible. (#1663)
For the instructions which execute after the IdPass check we can provide
the Instruction instead of the spv_parsed_instruction_t. This
Instruction class provides a bit more context (like the source line)
that is not available from spv_parsed_instruction_t.
2018-07-10 10:57:52 -04:00
dan sinclair
e6b953361d
Move the ir namespace to opt. (#1680)
This CL moves the files in opt/ to consistenly be under the opt::
namespace. This frees up the ir:: namespace so it can be used to make a
shared ir represenation.
2018-07-09 11:32:29 -04:00
dan sinclair
3dad1cda11
Change libspirv to spvtools namespace (#1678)
This CL changes all of the libspirv namespace code to spvtools to match
the rest of the code base.
2018-07-07 09:38:00 -04:00
Steven Perron
a45d4cac61 Move folding routines into a class
The folding routines are currently global functions.  They also rely on
data in an std::map that holds the folding rules for each opcode.  This
causes that map to not have a clear owner, and therefore never gets
deleted.

There has been a request to delete this map.  To implement this, we will
create a InstructionFolder class that owns the maps.  The IRContext will
own the InstructionFolder instance.  Then the global functions will
become public memeber functions of the InstructionFolder.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1659.
2018-07-05 17:52:43 -04:00
Steven Perron
9ecbcf5fc8 Make sure the constant folder get the correct type.
There are a few locations where we need to handle duplicate types.  We
cannot merge them because they may be needed for reflection.  When this
happens we need do some extra lookups in the type manager.

The specific fixes are:

1) When generating a constant through `GetDefiningInstruction` accept
and use an id for the desired type of the constant.  This will make sure
you get the type that is needed.

2) In Private-to-local, make sure we to update the def-use chains when a
new pointer type is created.

3) In the type manager, make sure that `FindPointerToType` returns a
pointer that points to the given type and not a duplicate type.

4) In scalar replacment, make sure the null constants that are created
are the correct type.
2018-07-05 14:34:30 -04:00
Steven Perron
465f2815cb Revert change and stop running remove duplicates.
Revert "Don't merge types of resources"

This reverts commit f393b0e480, but leaves
the tests that were added.  Added new test. These test are the so that,
if someone tries the same change I made, they will see the test that
they need to handle.

Don't run remove duplicates in -O and -Os

Romve duplicates was run to help reduce compile time when looking for
types in the type manager.  I've run compile time test on three sets
of shaders, and the compile time does not seem to change.

It should be safe to remove it.
2018-06-29 14:09:44 -04:00
Steven Perron
2eb9bfb5b6 Remove stores of undef.
When storing an undef, any value is valid, including the one already in
that memory location.  So we can avoid the store.
2018-06-29 09:49:19 -04:00
Greg Roth
7cc9f36283 Add test for CFG alteration by compact IDs
The Compact IDs pass can corrupt the CFG, but first the CFG has to
be setup. To do this, a test that builds the CFG, then performs the
compact IDs pass, then checks context consistency.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1648
2018-06-28 08:48:34 -06:00
Steven Perron
f393b0e480 Don't merge types of resources
When doing reflection users care about the names of the variable, the
name of the type, and the name of the members.  Remove duplicates breaks
this because it removes the names one of the types when merging.

To fix this we have to keep the different types around for each
resource.  This commit adds code to remove duplicates to look for the
types uses to describe resources, and make sure they do not get merged.

However, allow merging of a type used in a resource with something not
used in a resource.  Was done when the non resource type came second.

This could have a negative effect on compile time, but it was not
expected to be much.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1372.
2018-06-27 13:57:07 -04:00
Alan Baker
0d43e10b4a Use type id when looking up vector type
Fixes #1634

* Vector components of composite constructs used wrong accessor
2018-06-25 09:47:29 -04:00
Corentin Wallez
ba602c9059 Add a WIP WebGPU environment. It disallows OpUndef
Add SPV_ENV_WEBGPU_0 for work-in-progress WebGPU.

val: Disallow OpUndef in WebGPU env

Silence unused variable warnings when !defined(SPIRV_EFFCE)

Limit visibility of validate_instruction.cpp's symbols
  Only InstructionPass needs to be visible so all other functions are put
  in an anonymous namespace inside the libspirv namespace.
2018-06-21 15:53:15 -04:00
Alan Baker
4f866abfd8 Validate static uses of interfaces
Fixes #1120

Checks that all static uses of the Input and Output variables are listed
as interfaces in each corresponding entry point declaration.
 * Changed validation state to track interface lists
 * updated many tests
* Modified validation state to store entry point names
 * Combined with interface list and called EntryPointDescription
 * Updated uses
* Changed interface validation error messages to output entry point name
in addtion to ID
2018-06-13 10:56:14 -04:00
David Neto
700ebd3442 Make fewer test executables
Try to reduce the amount of disk space used by especially by debug builds,
which may be contributing to AppVeyor failures.

Collapses tests in categories:
- validator
- loop optimizations
- dominator analysis
- linker

Contributes to #1615
2018-06-12 09:48:42 -04:00
Alan Baker
06de86863b Check for invalid branches into construct body.
Fixes #1281

* New structured cfg check: all non-construct header blocks'
predecessors must come from within the construct
* New function to calculate blocks in a construct

* Fixed a bug in BasicBlock type bitset

Relaxing check to not consider unreachable predecessors

* Fixing broken common uniform elim test
2018-06-11 19:23:44 -04:00
Steven Perron
fe2fbee294 Delete the insert-extract-elim pass.
Replaces anything that creates an insert-extract-elim pass and create
a simplifiation pass instead.  Then delete the implementation of the
pass.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.
2018-06-01 10:13:39 -04:00
Steven Perron
9a008835f4 Add store for var initializer in inlining.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1591.
2018-06-01 09:44:42 -04:00
Steven Perron
93c4c184d5 Handle types with self references.
By using forward pointers, we are able to define a struct that has a
pointer to itself.  This could be directly or indirectly.  The current
implementation of the type manager did not handle this case.  There are
three changes that are made in this commit inorder to handle this case:

1) Change the handling of OpTypeForwardPointer

The current handling of OpTypeForwardsPointer is broken if there is a
reference to the pointer before the real definition.  When build the
type that contain the forward delared pointer, the type manager will ask
for the type for that ID, and will get a nullptr because it does not
exists.  This nullptr is not handleded very well.

The change is to keep track of the incomplete types the first time
through all of the types.  An incomplete type is a ForwardPointer or any
type that references an incomplete type.

Then we implement a second pass through the incomplete types that will
complete them.

2) Hashing types.

When hashing a type, we want to uses all of the subtypes as part of the
hash.  However, with types that reference them selves, this creates an
infinite recursion.  To get around this, we keep track of which types
have been seen on the path from the root type.  If we have see the
current type already then we can stop the recursion.

3) Comparing types.

In order to check if two types are the same, we must check that all of
their subtypes are the same as well.  This also causes an infinit
recursion.  The solution is to stop comparing the subtypes if we are
trying to compare two pointer types that we are already in the middle of
comparing.  The ideas is that if the two pointer are different, then in
progress compare will return false itself.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1578.
2018-05-30 15:48:38 -04:00
Steven Perron
745dd00af9 Fold FMix feeding Extract, and use the simplification pass.
We add a new rule to the folding rules to fold an FMix feeding an
extract when the alpha value for the element being extracted is either
0 or 1.  In those case, we can simple extract from one of the operands
to the FMix.

With that change the simplification pass completely subsumes the
insert-extract elimination pass.  So we remove the insert-extract
elimination passes and replce them with calls to the simplification
pass.

In a follow up PR, we should delete the insert-extract elimination pass.

Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.
2018-05-25 14:42:59 -04:00
Steven Perron
a579e720a8 Remove the limit on struct size in SROA.
Removes the limit on scalar replacement for the lagalization passes.
This is done by adding an option to the pass (and command line option)
to set the limit on maximum size of the composite that scalar
replacement is willing to divide.

Fixes #1494.
2018-05-18 10:03:46 -04:00
Steven Perron
f1f7cc870e Get ADCE to handle OpCopyMemory
ADCE does not treat OpCopyMemory as an instruction that references
memory.  Because of that stores are removed that should not be.

This change teaches ADCE that OpCopyMemory and OpCopyMemorySize both
loads from and stores to memory.  This will keep other stores live when
needed, and will allows ADCE to remove OpCopyMemory instructions as
well.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1556.
2018-05-16 13:50:47 -04:00
Steven Perron
9b1a938ea1 SROA: Only create symbols that are loaded.
Currently in scalar replacement, we create a new variable for every
memeber of the composite being divided.  It is often overkill, because
not all of those members will be used.  This change will check which
elements are used and only create variable for the members that are
used.

This reduces the compile time for one set of shader from 248s to 165s.

Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
2018-05-16 10:48:25 -04:00
Steven Perron
0e1b7e5aef Fix getting operand without checking opcode.
Fixes https://github.com/KhronosGhttps://github.com/KhronosGroup/SPIRV-Tools/issues/1559roup/SPIRV-Tools/issues/1559.

There is an load of an operand of an instruction that was suppose to be
only for the OpCompositeExtract case.  However, an error caused it to
be loaded for every opcode, even those that do not have an operand in
that position.

We fix up that bug, and a couple other things noticed that the same
time.
2018-05-16 09:34:43 -04:00
alan-baker
18ad1be7f9 Fixing MacOS compiler error 2018-05-15 12:23:27 -04:00
Steven Perron
f46f2d3e5d Remove redundant stores.
The code patterns generated by DXC around function calls can cause many
store to be storing the same value that was just loaded from the same
location:

```
%10 = OpLoad %type %var
OpStore %var %10
```

We want to clean these up very early on because they can cause other
transformations to do a lot of work.  For the cases I see, they can be
removed during local-single-block-elim.

For one set of shaders the compile time goes from 248s to 182s.  A 26%
improvement.

Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
2018-05-15 10:24:05 -04:00
Steven Perron
af430ec822 Add pass to fold a load feeding an extract.
We have already disabled common uniform elimination because it created
sequences of loads an entire uniform object, then we extract just a
single element.  This caused problems in some drivers, and is just
generally slow because it loads more memory than needed.

However, there are other way to get into this situation, so I've added
a pass that looks specifically for this pattern and removes it when only
a portion of the load is used.

Fixes #1547.
2018-05-14 15:40:34 -04:00
Steven Perron
804e8884c4 Fold fclamp feeding compare.
An FClamp instruction forces a values to be within a certain interval.
When the upper or lower bound of the FClamp is a constant and the value
being compared with is a constant, then in some case we can fold the
compared because the entire range is say less than the value.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1549.
2018-05-14 10:27:49 -04:00
Steven Perron
9ec3f81e5c Remove dead Workgroup variables in ADCE.
If there is a shader with a variable in the workgroup storage class that
is stored to, but not loadeds, then we know nothing will read those
loads.  It should be safe to remove them.

This is implemented in ADCE by treating workgroup variables the same
way that private variables are treated.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1550.
2018-05-09 16:07:26 -04:00
Steven Perron
7d01643132 Allow hoisting code in if-conversion.
When doing if-conversion, we do not currently move code out of the side
nodes.  The reason for this is that it can increase the number of
instructions that get executed because both side nods will have to be
executed now.

In this commit, we add code to move an instruction, and all of the
instructions it depends on, out of a side node and into the header of
the selection construct.  However to keep the cost down, we only do it
when the two values in the OpPhi node compute the same value.  This way
we have to move only one of the instructions and the other becomes
unused most of the time.  So no real extra cost.

Makes the value number table an alalysis in the ir context.

Added more opcodes to list of code motion safe opcodes.

Fixes #1526.
2018-05-04 12:56:29 -04:00
Stephen McGroarty
1c2cbaf569 Add GetContinueBlock to loop class.
Previously, the loop class used the terms latch and continue block
interchangeably. This patch splits the two and corrects and tests some
uses of the old uses of GetLatchBlock.
2018-05-03 14:30:41 -04:00
Steven Perron
70bb3c1cc2 Fold divide and multiply by same value.
We want to fold code like (x*y)/x and other permutations of this.

Fixes #1531.
2018-05-02 10:18:37 -04:00
Toomas Remmelg
1dc2458060 Add a loop fusion pass.
This pass will look for adjacent loops that are compatible and legal to
be fused.

Loops are compatible if:

- they both have one induction variable
- they have the same upper and lower bounds
    - same initial value
    - same condition
- they have the same update step
- they are adjacent
- there are no break/continue in either of them

Fusion is legal if:

- fused loops do not have any dependencies with dependence distance
  greater than 0 that did not exist in the original loops.
- there are no function calls in the loops (could have side-effects)
- there are no barriers in the loops

It will fuse all such loops as long as the number of registers used for
the fused loop stays under the threshold defined by
max_registers_per_loop.
2018-05-01 15:40:37 -04:00
Stephen McGroarty
9a5dd6fe88 Support loop fission.
Adds support for spliting loops whose register pressure exceeds a user
provided level. This pass will split a loop into two or more loops given
that the loop is a top level loop and that spliting the loop is legal.
Control flow is left intact for dead code elimination to remove.

This pass is enabled with the --loop-fission flag to spirv-opt.
2018-05-01 15:15:10 -04:00
Steven Perron
9ba0879ddf Improve Vector DCE
Track live scalars in VDCE as if they were single element vectors.

Handle the extended instructions for GLSL in VDCE.

Handle composite construct instructions in VDCE.
2018-04-30 11:55:50 -04:00
Steven Perron
a00a0a09ae Revert "Improvements to vector dce."
This reverts commit 2813722993.

A regression was found.  Undoing the change until it is fixed.
2018-04-27 10:33:19 -04:00
Alan Baker
4246abdc74 Fixes handling of kill and unreachable ops in inlining.
Fixes #1527

* Adds handling for copying OpKill and OpUnreachable and forces the
generation of a new basic block
* Adds tests to check
2018-04-27 09:42:37 -04:00
Steven Perron
e1bcd2b2d8 Fold OpVectorTimesScalar and OpPhi better.
If one of the operands to an OpVectorTimesScalar instruction is zero,
then the result will be the 0 vector. Currently we do not fold the
insturction unless both operands are constants. This change fixes that.

We also allow folding of OpPhi instructions where the incoming values
are either an OpUndef or the OpPhi instruction itself. As with other
cases, this can be simplified to the OpUndef.
2018-04-26 12:41:16 -04:00
Steven Perron
2813722993 Improvements to vector dce.
Track live scalars in VDCE as if they were single element vectors.

Handle the extended instructions for GLSL in VDCE.

Handle composite construct instructions in VDCE.

Fixes #1511.
2018-04-26 11:07:48 -04:00
Greg Fischer
268be6143d LocalSingleBlockElim: Add store-store elimination
Eliminate unused store to variable if followed by store to same
variable in same block.

Most significantly, this cleans up stores made unused by this pass.
These useless stores can inhibit subsequent optimizations, specifically
LocalSingleStoreElim. Eliminating them makes subsequent optimization more
effective.

The main effect of this pass is to simplify the work done by the SSA
rewriter.  It catches many local loads/stores that help speeding up the
work done by the main rewriter.
2018-04-25 10:30:18 -04:00
Steven Perron
2c0ce87210
Vector DCE (#1512)
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.

It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
2018-04-23 11:13:07 -04:00
David Neto
7a59283587 Another fix for old XCode: std::set explicit ctor in test code 2018-04-20 15:58:01 -04:00
Victor Lomuller
efc5061929 Dominator analysis interface clean.
Remove the CFG requirement when querying a dominator/post-dominator from an IRContext.

Updated all uses of the function and tests.
2018-04-20 15:41:59 -04:00
Jaebaek Seo
48802bad72 Constant folding for OpVectorTimesScalar 2018-04-20 13:43:04 -04:00
Victor Lomuller
0ec08c28c1 Add register liveness analysis.
For each function, the analysis determine which SSA registers are live
at the beginning of each basic block and which one are killed at
the end of the basic block.

It also includes utilities to simulate the register pressure for loop
fusion and fission.

The implementation is based on the paper "A non-iterative data-flow
algorithm for computing liveness sets in strict ssa programs" from
Boissinot et al.
2018-04-20 09:45:15 -04:00
GregF
1c89da46ff Test/DependencyAnalysis: Fix uninitialized variables 2018-04-19 15:34:15 -04:00
Jaebaek Seo
430a29335e Fix broken pointer of CommonUniformElimPass 2018-04-19 09:36:10 -04:00
Steven Perron
c20a718e00 Rewrite local-single-store-elim to not create large data structures.
The local-single-store-elim algorithm is not fundamentally bad.
However, when there are a large number of variables, some of the
maps that are used can become very large.  These large data structures
then take a very long time to be destroyed.  I've seen cases around 40%
if the time.

I've rewritten that algorithm to not use as much memory.  This give a
significant improvement when running a large number of shader through
DXC.

I've also made a small change to local-single-block-elim to delete the
loads that is has replaced.  That way local-single-store-elim will not
have to look at those.  local-single-store-elim now does the same thing.

The time for one set goes from 309s down to 126s.  For another set, the
time goes from 102s down to 88s.
2018-04-18 16:38:18 -04:00
Jaebaek Seo
0fa42996b5
Merge pull request #1461 from jaebaek/fnegate
Add constant folding for OpFNegate

Contributes to #709
2018-04-18 13:46:10 -04:00
Jaebaek Seo
3c5bd26668 Typo 2018-04-17 14:13:19 -04:00
Toomas Remmelg
0f335cf87e Add support for MIV and Delta test dependence analysis.
GCD MIV test as described in Chapter 3 of "Optimizing Compilers for
Modern Architectures: A Dependence-Based Approach" by Randy Allen, and
Ken Kennedy.

Delta test as described in Figure 3 of "Practical Dependence Testing" by
Gina Goff, Ken Kennedy, and Chau-Wen Tseng from PLDI '91.
2018-04-17 13:57:02 -04:00
Jaebaek Seo
ff92339fff Format 2018-04-17 12:12:48 -04:00
Jaebaek Seo
d8b9306a4f Add more unit tests 2018-04-17 12:08:45 -04:00
Jaebaek Seo
79491259e0 Add constant folding for FNegate 2018-04-17 12:08:45 -04:00
David Neto
152b9a681e ADCE: Remove OpDecorateStringGOOGLE
Also fix a few failures to set "modified" status when removing
global values.

Add OpDecorateStringGOOGLE to decoration ordering

Fixes #1492
2018-04-17 10:24:30 -04:00
Victor Lomuller
10e5d7cf13 Add a loop peeling pass.
For each loop in a function, the pass walks the loops from inner to outer most loop
and tries to peel loop for which a certain amount of iteration can be done before or after the loop.

To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.
2018-04-11 15:41:29 +01:00
Alexander Johnston
61b50b3bfa ZIV and SIV loop dependence analysis.
Provides functionality to perform ZIV and SIV dependency analysis tests
between a load and store within the same loop.

Dependency tests rely on scalar analysis to prove and disprove dependencies
with regard to the loop being analysed.

Based on the 1990 paper Practical Dependence Testing by Goff, Kennedy, Tseng

Adds support for marking loops in the loop nest as IRRELEVANT.
Loops are marked IRRELEVANT if the analysed instructions contain
no induction variables for the loops, i.e. the loops induction
variable is not relevent to the dependence of the store and load.
2018-04-11 09:32:42 -04:00
Steven Perron
53bc1623ec Fold OpDot
Adding three rules to fold OpDot (implemented as two).

- When an OpDot has two constants, then fold to the resulting const.

- When one of the inputs is the 0 vector, then fold to zero.

- When one of the inputs is a single 1 with 0s, then rewrite to an
OpCompositeExtract of the appropriate element.  This will help find
even more folding opportunities.

Contributes to #709.
2018-04-10 13:09:37 -04:00
GregF
6fbfe1c016 Fix SSA rewrite for nested loops.
From the test case, the slice of the CFG that is interesting for the bug
is

25
|
v
30
|
v
31<-+
|   |
v   |
34--+

1. In block 25, we have a Phi candidate for %f with arguments
   %47 = Phi[%float_0, %0]. This merges %float_0 and a yet unknown
   argument from the external loop backedge.
2. We are now processing block 34:
   i. The load %35 = OpLoad %f triggers a Phi candidate to be placed in
      block 31.
  ii. The Phi candidate %50 = Phi needs two arguments. The one coming
      from block 30 is %47. But the one coming from block 34 (which we
      are now processing and have marked sealed), finds %50 itself as
      the reaching def for %f.
3. This wrongfully marks %50 as a copy-of Phi, which ultimately makes
   both %47 and %50 copy-of Phis that get eliminated.
2018-04-06 15:17:52 -04:00
Steven Perron
742454968d OpName and decorations should not stop array copy prop. 2018-04-04 22:24:10 -04:00
Steven Perron
7c5d49bf2a Teach ADCE about OpImageTexelPointer
Currently OpImageTexelPointer operations are treat like a use of the
pointer, but it does
not look for the memory being referenced to make sure stores are not
removed.

This change teaches it so identify the memory being accessed, and
treats it as if that memory is loaded.

Fixes to #1445.
2018-04-04 13:45:29 -04:00
Steven Perron
c33af63264 Teach array copy propagation about OpImageTexelPointer.
OpImageTexelPointer acts like a special kind of load.  It is not an
array load, but it also cannot be removed the same way a regular
load can.  The type of propagation that needs to be done is similar
to what we do for arrays, so I want to merge that code into that
optmization.

Contributers to #1445.
2018-04-04 13:42:51 -04:00
Steven Perron
e64a4656b3 Teach the private to local about OpImageTexelPointer.
OpImageTexelPointer acts like a special kind of load.  It is still
safe to change the storage class of a variable used in a
OpImageTexalPointer instruction.

Contributes to #1445.
2018-04-04 13:42:35 -04:00
Neil Roberts
57a2441791 hex_float: Use max_digits10 for the float precision
CPPreference.com has this description of digits10:

“The value of std::numeric_limits<T>::digits10 is the number of
 base-10 digits that can be represented by the type T without change,
 that is, any number with this many significant decimal digits can be
 converted to a value of type T and back to decimal form, without
 change due to rounding or overflow.”

This means that any number with this many digits can be represented
accurately in the corresponding type. A change in any digit in a
number after that may or may not cause it a different bitwise
representation. Therefore this isn’t necessarily enough precision to
accurately represent the value in text. Instead we need max_digits10
which has the following description:

“The value of std::numeric_limits<T>::max_digits10 is the number of
 base-10 digits that are necessary to uniquely represent all distinct
 values of the type T, such as necessary for
 serialization/deserialization to text.”

The patch includes a test case in hex_float_test which tries to do a
round-robin conversion of a number that requires more than 6 decimal
places to be accurately represented. This would fail without the
patch.

Sadly this also breaks a bunch of other tests. Some of the tests in
hex_float_test use ldexp and then compare it with a value which is not
the same as the one returned by ldexp but instead is the value rounded
to 6 decimals. Others use values that are not evenly representable as
a binary floating fraction but then happened to generate the same
value when rounded to 6 decimals. Where the actual value didn’t seem
to matter these have been changed with different values that can be
represented as a binary fraction.
2018-04-03 12:53:10 -04:00
Lei Zhang
fc9f621e8b Add missing <iterator> header for std::back_inserter 2018-03-30 11:30:25 -04:00
Steven Perron
cbceeceab4 In copy-prop-arrays, indentify copies via OpCompositeInsert
When the original code copies an entire array or struct one element at a
time, this turns into a series of OpCompositeInsert instruction followed
by a store of the whole array.  We currently miss opportunities in copy
propagate arrays because we do not recognize this as a copy.

This commit adds code to copy propagate arrays to identify this code
pattern.

Also updates the performance passed to run array copy propagation.
2018-03-29 09:39:55 -04:00
Steven Perron
d8ca09821d Handle non-constant accesses in memory objects (copy prop arrays)
The first implementation of MemroyObject, which is used in copy
propagate arrays, forced the access chain to be like the access chains
in OpCompositeExtract.  This excluded the possibility of the memory
object from representing an array element that was extracted with a
variable index.   Looking at the code, that restriction is not
neccessary.  I also see some opportunities for doing this in some real
shaders.

Contributes to #1430.
2018-03-28 20:23:47 -04:00
Stephen McGroarty
ad7e4b8401 Initial patch for scalar evolution analysis
This patch adds support for the analysis of scalars in loops. It works
by traversing the defuse chain to build a DAG of scalar operations and
then simplifies the DAG by folding constants and grouping like terms.
It represents induction variables as recurrent expressions with respect
to a given loop and can simplify DAGs containing recurrent expression by
rewritting the entire DAG to be a recurrent expression with respect to
the same loop.
2018-03-28 16:34:23 -04:00
Alan Baker
97c8fdccd2 Adding OpPhi validation rules.
* Added tests
* Fixes SSA check for unreachable phi parents
* Fixes invalid cfg cleanup test
2018-03-27 17:26:26 -04:00
Steven Perron
5e07ab1358 Handle more cases in copy propagate arrays.
When we change the type of an object that gets stored, we do not want to
change the type of the memory location being stored to.  In order to
still be able to do the rewrite, we will decompose and rebuild the
object so it is the type that can be stored.

Fixes #1416.
2018-03-27 11:04:49 -04:00
Steven Perron
c4dc046399 Copy propagate arrays
The sprir-v generated from HLSL code contain many copyies of very large
arrays.  Not only are these time consumming, but they also cause
problems for drivers because they require too much space.

To work around this, we will implement an array copy propagation.  Note
that we will not implement a complete array data flow analysis in order
to implement this.  We will be looking for very simple cases:

1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.

The hard part is keeping all of the types correct.  We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.

Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent.  With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values.  See |Struct::GetExtraHashWords|.

Contributes to #1416.
2018-03-26 14:44:41 -04:00
Diego Novillo
735d8a579e SSA rewrite pass.
This pass replaces the load/store elimination passes.  It implements the
SSA re-writing algorithm proposed in

     Simple and Efficient Construction of Static Single Assignment Form.
     Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
     In: Jhala R., De Bosschere K. (eds)
     Compiler Construction. CC 2013.
     Lecture Notes in Computer Science, vol 7791.
     Springer, Berlin, Heidelberg

     https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6

In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.

When a target variable is loaded, it queries the variable's reaching
definition.  If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.

The algorithm avoids repeated lookups using memoization.

For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA.  That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
2018-03-20 20:56:55 -04:00
Victor Lomuller
bdf421cf40 Add loop peeling utility
The loop peeler util takes a loop as input and create a new one before.
The iterator of the duplicated loop then set to accommodate the number
of iteration required for the peeling.

The loop peeling pass that decided to do the peeling and profitability
analysis is left for a follow-up PR.
2018-03-20 10:21:10 -04:00