1. Set the debug scope and line information for the new replacement
instructions.
2. Replace DebugDeclare and DebugValue if their OpVariable or value
operands are replaced by scalars. It uses 'Indexes' operand of
DebugValue. For example,
struct S { int a; int b;}
S foo; // before scalar replacement
int foo_a; // after scalar replacement
int foo_b;
DebugDeclare %dbg_foo %foo %null_expr // before
DebugValue %dbg_foo %foo_a %Deref_expr 0 // after
DebugValue %dbg_foo %foo_b %Deref_expr 1 // means Value(foo.members[1]) == Deref(%foo_b)
Essentially, it marks all DebugInfo instructions in functions (and their operands) as live. It treats DebugDeclare and DebugValue with Deref as loads and so marks Stores of their variables as live.
It marks each DebugGlobalVariables as live except for its variable. After closure, it rechecks if the variable is live. If not, the DebugGlobalVariable instruction's variable operand is set to DebugInfoNone, per the DebugInfo spec.
This pass basically follows the same process as ssa-rewrite: it adds a DebugValue after each Store and removes the DebugDeclare or DebugValue Deref. It only does this if all instructions that are dependent on the Store are Loads and are replaced.
This commit lets the vector DCE pass preserve the OpenCL.DebugInfo.100
information properly. When the vector DCE pass determines the liveness
of instructions, the debug instructions must not affect the decision. In
addition, when it kills some instructions, it has to kill DebugValue
instructions that use the killed instructions. When it updates some
composite values to meaningful values (not undef), it has to remove
DebugValue because the value information becomes incorrect.
The decision to reduce the load must be not affected by debug
instructions. For example, even when a DebugValue references a
result id of a loaded composite value, this change lets the
reduce-load-size pass reduce the load if the full composite value is not
used anywhere other than the DebugValue.
When the pass replaces the local variable `OpVariable` ids to their
corresponding pointers, we have to update operands of DebugValue or
DebugDeclare instructions.
When there are multiple entries and the shader has a variable with
WorkGroup storage class, those multiple entry functions store values to
the variable. Since ADCE pass uses def-use chains to propagate the work
list, some of instructions in the work list are not actually a part of
the currently processed function. As a result, it adds instructions in
other functions and put them in |live_insts_|. However, it does not
have the control flow information for those instructions in other
functions i.e., |block2headerBranch_| and |header2nextHeaderBranch_|.
When it processes those instructions (they are added when it processes a
different function), it skips handling them because they are already in
|live_insts_| and does not check |block2headerBranch_| and
|header2nextHeaderBranch_|, which results in skipping some branches.
Even though those branches are live branches, it considers they are dead
branches.
For many spirv-opt passes such as simplify-instructions pass, we have to
correctly clear the OpenCL.DebugInfo.100 debug information for
KillInst() and ReplaceAllUses(). If we keep some debug information that
disappeared because of KillInst() and ReplaceAllUses(), adding new
DebugValue instructions based on the existing DebugDeclare information
will generate incorrect information. This CL update DebugInfoManager
and IRContext to correctly clear debug information.
* Support load and extract pattern in desc_sroa.
* Fix typo in comments.
* Load replacement var before use; and added test.
* fix formatting
* Address code review comments.
Add OpenCL.DebugInfo.100 `DebugValue` instructions for store
and phi instructions of local variables to provide the debugger with
the updated values of local variables correctly.
Handles the OpenCL100Debug extension in inlining. It preserves the information that is available while also adding the debug inlined at for all of the inlining that it does.
Reject folding comparisons with unfoldable types.
Fixes#3343
When CCP is evaluating an instruction, it was trying to fold a
comparison with 64 bit integers. This was causing a fold failure later
since the folder still cannot deal with 64 bit integers.
ssa-rewrite fails in `MemPass::GetPtr()` when the SPIR-V code contains
`OpLoad` for the result id of `OpConstantNull` because of the out of
index access for an operand to get the base address. This commit fixes
it.
Fixes#3344
In this PR, the classes that represent the adjust branch weights
transformation and fuzzer pass were implemented. This transformation
adjusts the branch weights of a OpBranchConditional instruction.
- No longer inline functions with early exits. Merge return can modify them so they can be inlined.
- Otherwise no functional change, should be just refactoring.
* Do merge return if the return is not at the end of the function.
We will remove the code in inlining to handle a return in the middle of
a function. To inline those functions, we need to run merge return to
move the return to the end of the function.
Generalizes the IsReadOnlyVariable() method, and related methods, so
that they can be used to ask whether pointer result ids are read-only.
Fixes#3324.
When moving blocks around, we ended up with a nullptr for a basic block,
and it was left in the list for a little bit. However, in that time, it
would end up being dereferenced while traversing the function.
To fix this, we delete it right away. This was found in an asan build
that runs our current tests. No new tests are needed, but I did add
extra check asan checks for our asan bot.
Many high-level languages like HLSL and GLSL generate termination
instructions such as return and branch from the actual part of the
high-level language code like return and if statements. This commit lets
IrLoader set `DebugScope` for termination instructions.
We need an analysis for OpenCL.DebugInfo.100 extension instructions such
as a map between function id and its DebugFunction. This commit add an
analysis for it.
* Preserve debug info in eliminate-dead-functions
The elimination of dead functions makes OpFunction operand of
DebugFunction invalid. This commit replaces the operand with
DebugInfoNone.
* Handle more cases in dead member elim
- Rewrite composite insert and extract operations on SpecConstnatOp.
- Leaves assert for Access chain instructions, which are only allowed
for kernels.
- Other operations do not require any extra code will no longer cause an
assert.
Fixes#3284.
Fixes#3282.
When DebugScope is given in SPIR-V, each instruction following the
DebugScope is from the lexical scope pointed by the DebugScope in
the high level language. We add DebugScope struction to keep the
scope information in Instruction class. When ir_loader loads
DebugScope/DebugNoScope, it keeps the scope information in
|last_dbg_scope_| and lets following instructions have that scope
information.
In terms of DebugDeclare/DebugValue, if it is in a function body
but outside of a basic block, we keep it in |debug_insts_in_header_|
of Function class. If it is in a basic block, we keep it as a normal
instruction i.e., in a instruction list of BasicBlock.
Create a pass to instrument OpDebugPrintf instructions. This pass replaces all OpDebugPrintf instructions with instructions to write a record containing the string id and the all specified values into a special printf output buffer (if space allows). This pass is designed to support the printf validation in the Vulkan validation layers.
Fixes#3210
This adds support for replacing TimeAMD with OpReadClockKHR. The scope
for OpReadClockKHR is fixed to be a subgroup because TimeAMD operates
only on subgroup.
* Implement constant folding for many transcendentals
This change adds support for folding of sin/cos/tan/asin/acos/atan,
exp/log/exp2/log2, sqrt, atan2 and pow.
The mechanism allows to use any C function to implement folding in the
future; for now I limited the actual additions to the most commonly used
intrinsics in the shaders.
Unary folder had to be tweaked to work with extended instructions - for
extended instructions, constants.size() == 2 and constants[0] ==
nullptr. This adjustment is similar to the one binary folder already
performs.
Fixes#1390.
* Fix Android build
On old versions of Android NDK, we don't get std::exp2/std::log2
because of partial C++11 support.
We do get ::exp2, but not ::log2 so we need to emulate that.
We must treat a branch to the merge node of a switch that is in the
header of a construct as a nested construced. The original merge
instruction is still needed in that case.
* Allow OpExtInst for DebugInfo between secion 9 and 10
Fixes#3086
* Handle spirv-opt errors on DebugInfo Ext
* Add IR Loader test
* Fix ir loader bug
* Handle DebugFunction/DebugTypeMember forward reference
* Add test cases (forward reference to function)
* Support old DebugInfo extension
* Validate local debug info out of function
As explained in #3118, spirv-opt merge-blocks pass causes a
spirv-val error when an OpBranch has an OpLine in front of it.
OpLoopMerge
OpBranch ; Will be killed by merge-blocks pass
OpLabel ; Will be killed by merge-blocks pass
OpLine ; will be placed between OpLoopMerge and OpBranch - error!
OpBranch
To fix this issue, this commit moves line info of OpBranch to
OpLoopMerge.
Fixes#3118
Add support for SPV_KHR_non_semantic_info
This entails a couple of changes:
- Allowing unknown OpExtInstImport that begin with the prefix `NonSemantic.`
- Allowing OpExtInst that reference any of those sets to contain unknown
ext inst instruction numbers, and assume the format is always a series of IDs
as guaranteed by the extension.
- Allowing those OpExtInst to appear in the types/variables/constants section.
- Not stripping OpString in the --strip-debug pass, since it may be referenced
by these non-semantic OpExtInsts.
- Stripping them instead in the --strip-reflect pass.
* Add adjacency validation of non-semantic OpExtInst
- We validate and test that OpExtInst cannot appear before or between
OpPhi instructions, or before/between OpFunctionParameter
instructions.
* Change non-semantic extinst type to single value
* Add helper function spvExtInstIsNonSemantic() which will check if the extinst
set is non-semantic or not, either the unknown generic value or any future
recognised non-semantic set.
* Add test of a complex non-semantic extinst
* Use DefUseManager in StripDebugInfoPass to strip some OpStrings
* Any OpString used by a non-semantic instruction cannot be stripped, all others
can so we search for uses to see if each string can be removed.
* We only do this if the non-semantic debug info extension is enabled, otherwise
all strings can be trivially removed.
* Silence -Winconsistent-missing-override in protobufs
* Make Instrumentation format version 2 the default (Step 1)
Add new interfaces without version number argument. Remove version 1
logic and tests. Version interfaces will be removed in step 2 after
layers have transitioned to new interface.
* Add error messages to InstrumentPass().
* Don't crash when folding construct of empty struct
An OpCompositeConstruct of an empty struct will be folded to a constant
under normal circumstances. However, if the id limit has been reached
and the constant cannot be generated, then other folding rules will be
tried.
These rules do not handle the case of an empty struct. We add allow it
to be handled.
Fixes http://crbug/1030194
* Changes based on the review.
Access chain indices are always interpreted as signed integers.
So use signed clamp instead of unsigned clamp. We must also
clamp to the max signed int for the index type.
Fixes#3072
* Validate that if a construct contains a header and it's merge is
reachable, the construct also contains the merge
* updated block merging to not merge into the continue
* update inlining to mark the original block of a single block loop as
the continue
* updated some tests
* remove dead code
* rename kBlockTypeHeader to kBlockTypeSelection for clarity
Wrap-opkill will create a new function, invalidating the id-to-func map.
The preserved analyses for the pass have been updated to reflect that.
Also adding consistency check for the id-to-func map. With this new
check, old tests identify this problem. No new tests are needed.
Fixes#3038
Implements the following simplifications:
(a - b) + b => a
(a * b) + (a * c) => a * (b + c)
Also adds logic to simplification to handle rules that create new operations
that might need simplification, such as the second rule above.
Only perform the second simplification if the multiplies have the add as their
only use. Otherwise this is a deoptimization of size and performance.
We have a check that ensures that the optimizer did not change the
binary when it says that it did not. However, when the binary is
converted back to a binary, we made a decision to remove OpNop
instructions. This means that any spv file that contains a NOP
originally will fail this check.
To get around this, we convert the module to a second binary that keeps
the OpNop instructions. That binary is compared against the original.
Fixes https://crbug.com/1010191
Added exports for libraries. External libraries that themselves use
libraries require all dependencies have exports, so not having exports can
cause major problems when used within other projects.
Install paths for exports are now placed in the proper directories expected
by Windows and *nix systems. Config files are generated as well, which
should work with CMake's find_package() function once installed.
We want to handle OpKill better. The wrap opkill causes lots of extra
code to be generated, even when they are not needed to avoid the main
problem: OpKill cannot be found directly in a continue construct.
This change will be more selective on which functions the OpKill will be
wrapped and inlining will avoid inlining.
Fixes#2912
* Add continue construct analysis to struct cfg analysis
Add the ability to identify which blocks are in the continue construct for a
loop, and to get functions that are called from those blocks, directly or
indirectly.
Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/2912.
There is nothing in the spir-v spec that says the last
instructions in a module cannot be OpLine or OpNoLine.
However, the code that parses the module will simply drop
these instructions.
We add code that will preserve these instructions.
Strip-debug-info is updated to remove these instructions.
Fixes https://crbug.com/1000689.
* Handle extract with no indexes
It is possible that OpCompositeExtract instructions will not have any
indexes. This is not handled well by scalar replacement and instruction
folding.
Fixes https://crbug.com/1006435
* Fix typo.
* Use OpReturn* in wrap-opkill
The warp-opkill pass is generating incorrect code. It is placing an
OpUnreachable at the end of a basic block, when the block can be
reached. We can't reach the end of the block, but we can reach the end.
Instead we will add a return instruction.
Fixes#2875.
The warp-opkill pass is generating incorrect code. It is placing an
OpUnreachable at the end of a basic block, when the block can be
reached. We can't reach the end of the block, but we can reach the end.
Instead we will add a return instruction.
Fixes#2875.
Many of the places in copy propagate arrays assumes that integer constant will be defined by an OpConstant instruction. That is not always true. We fix these spots by allowing for an OpConstantNull.
If an OpKill instruction is inlined into a continue construct, then the
spir-v is no longer valid. To avoid this issue, we do inline into an
OpKill at all. This method was chosen because it is difficult to keep
track of whether or not you are in a continue construct while changing
the function that is being inlined into. This will work well with wrap
OpKill because every will still be inlined except for the OpKill
instruction itself.
Fixes#2554Fixes#2433
This reverts commit aa9e8f5380.
* Handle id overflow in the ssa rewriter.
Remove LocalSSAElim pass at the same time. It does the same thing as the SSARewrite pass. Then even share almost all of the same code.
Fixes crbug.com/997246
The first pass applies the RelaxedPrecision decoration to all executable
instructions with float32 based type results. The second pass converts
all executable instructions with RelaxedPrecision result to the equivalent
float16 type, inserting converts where necessary.
Add the first steps to removing the AMD extension VK_AMD_shader_ballot.
Splitting up to make the PRs smaller.
Adding utilities to add capabilities and change the version of the
module.
Replaces the instructions:
OpGroupIAddNonUniformAMD = 5000
OpGroupFAddNonUniformAMD = 5001
OpGroupFMinNonUniformAMD = 5002
OpGroupUMinNonUniformAMD = 5003
OpGroupSMinNonUniformAMD = 5004
OpGroupFMaxNonUniformAMD = 5005
OpGroupUMaxNonUniformAMD = 5006
OpGroupSMaxNonUniformAMD = 5007
and extentend instructions
WriteInvocationAMD = 3
MbcntAMD = 4
Part of #2814
If they are not aliased, the function will always print the message:
"Binary unexpectedly changed despite optimizer saying there was no change"
Which is (usually) totally bogus.
Fixes#2798
* Refactor instruction folders
We want to refactor the instruction folder to allow different sets of
rules to be added to the instruction folder. We might want different
sets of rules in different circumstances.
We also need a way to add rules for extended instructions. Changes are
made to the FoldingRules class and ConstFoldingRules class to enable
that.
We added tests to check that we can fold extended instructions using the
new framework.
At the same time, I noticed that there were two tests that did not tests
what they were suppose to. They could not be easily salvaged. #2813 was
opened to track adding the new tests.
Now we need to handle id overflow when we overflow while replacing uses of the variable. While looking at this code, I noticed an error in the way we handle access chains that cannot be replaced because of overflow. Name it will make some change, and then give up by returning SuccessWithoutChange. But it was changed.
This is fixed up by returning Failure if we notice the error at the time of rewriting the users. This is for both id overflow or out-of-bounds accesses.
Code is added to "CheckUses" to remove variables that have out-of-bounds accesses from the candidate list, so we don't even try to rewrite its uses.
Fixes https://crbug.com/995032
If we run out of ids when creating a new variable, sroa does not recognize
the error, and continues doing work. This leads to segmentation faults.
Fixes https://crbug/969655
`#include <source/util/string_utils.h>` works only when we specify
`include_directories(${CMAKE_CURRENT_SOURCE_DIR}/)` in
cmake. It is hard to set the source directory as a include path
in some build systems e.g., bazel. Using the relative path easily
solves this issue. This commit uses
`#include "source/util/string_utils.h"` instead of
`#include <source/util/string_utils.h>`.
We are no able to inline OpKill instructions into a continue construct.
See #2433. However, we have to be able to inline to correctly do
legalization. This commit creates a pass that will wrap OpKill
instructions into a function of its own. That way we are able to inline
the rest of the code.
The follow up to this will be to not inline any function that contains
an OpKill.
Fixes#2726
This also fixes ADCE to not remove possibly needed OpTypeForwardPointer.
The bug, its fix and the corresponding test have a circular dependency
with the extension, so they are packaged together.
If a member of a struct has a relaxed precision, sroa will not split the
struct. This means we do not get all cases. This commit handles these
cases. The other part is that the decoration needs to be passed on to
the new variables.
Fixes#2786
Fixes#2768
* In scalar replacement, interpret access chain indexes as signed counts
* Use Constant::GetSignExtendedValue and Constant::GetZeroExtendedValue
where appropriate
* new tests
spirv-opt: Add --graphics-robust-access
Clamps access chain indices so they are always
in bounds.
Assumes:
- Logical addressing mode
- No runtime-array-descriptor-indexing
- No variable pointers
Adds stub code for clamping coordinate and samples
for OpImageTexelPointer.
Adds SinglePassRunAndFail optimizer test fixture.
Android.mk: add source/opt/graphics_robust_access_pass.cpp
Adds Constant::GetSignExtendedValue, Constant::GetZeroExtendedValue
This makes it symmetric with the result type of ...->element_type which
returns a const Type.
So now we can write code like this:
analysis::Vector v = ...
analysis::Vector(v->element_type(), 2);
This fixes#2608.
The original test case had an out-of-bounds reference that ended up
folding into OpCompositeExtract that was indexing right outside the
constant composite.
The returned constant would then cause a segfault during constant
propagation.
Fixes#2764
* Don't replace all uses when simplifying instructions, instead only
update non-debug, non-decoration uses
* added a test
* Add a new version of RAUW that takes a predicate to decide whether to
replace the use or not
* used in simplification pass
* Fix#2609 - Handle out-of-bounds scalar replacements.
When SROA tries to do a replacement for an OpAccessChain that is exactly
one element out of bounds, the code was trying to access its internal
array of replacements and segfaulting.
This protects the code from doing this, and it additionally fixes the
way SROA works by not returning failure when it refuses to do a
replacement. Instead of failing the optimization pass, SROA will now
simply refuse to do the replacement and keep going.
Additionally, this patch fixes the SROA logic to now return a proper status so we can
correctly state that the pass made no changes to the IR if it only found
invalid references.
Merge return expects unreachable merge block to look a certain way, and
unreachable continue blocks to look a certain way. What if an
unreachable block is both a merge and a continue? The continue is
suppose to take precedent, but merge-return implements it with the merge
taking precedent. This change flips that around.
Fixes#2746
* Process OpDecorateId in ADCE
When there is an OpDecorateId instruction that is live,
the ids that is references must be kept live. This change
adds them to the worklist.
I've also updated a validator check to allow OpDecorateId
to be able to apply to decoration groups.
Fixes#1759.
* Remove dead code.
In merge return, we need to know the original dominator for a block in order to
traverse code from the original dominator to the new dominator and add
appropriate Phi nodes. The current code gets this wrong because the dominator
tree is build as needed. The first time we get the immediate dominator for a
function we just built the dominator tree and it takes into account that a
block has been split. The second time it does not.
This inconsistency needs to be fixed. We do that by recording the original
dominator for all blocks at the start of the pass.
If we were to record just the basic block, that could change if the block is
split. We want to traverse the code in the body of the original dominator,
whatever block it ends up in. To make this easy to track, we not save the
terminator instruction to represent the original dominator.
Fixes#2745
When a phi candidate is marked as trivial, we are suppose to update all
of its uses to the reference the value that it is being folded to.
However, the code updates the uses misses `defs_at_block_`. So at a
later time, the id for the trivial phi can reemerge.
Fixes#2744
* Bindless Instrument: Make init check depend solely on input_init_enabled
Previously was dependent on presense of descriptor_indexing extension
in SPIR-V, but this missed some cases. Tests updated to refect this new
policy.
* Fix format.
* Fix bug in merge return
The merge return pass seems to assume that the only new edges in the cfg
are from return block to merge blocks. However, it is possible that a
merge block branches to a merge block when it did not before.
This change add a new variable to track all of the new edges. It also
renames some other variables and cleans us the code to make it a bit
easier to read.
Fixes#2702.
Dead branch elimination needs to know about the constructs that a block is contained it when determining what to do with its merge instruction. We currently fold branches in block as we see them, which is parent constructs before their children. This causes the struct cfg analysis to crash because it tries to get the parent construct for a block after the parent has been folded.
This can be fixed by folding the branch of the children before the parents.
Fixes#2667.
There are a couple spots where we are not looking at decorations when we should.
1. Value numbering is suppose to assign a different value number to ids if they have different decorations. However that is not being done for OpCopyObject and OpPhi.
1. Instruction simplification is propagating OpCopyObject instruction without checking for decorations. It should only do that if no decorations are being lost.
Add a new function to the decoration manager to check if the decorations of one id are a subset of the decorations of another.
Fixes#2715.
Inlining does not inline functions that have a single return that is in a loop. This is because the return cannot be replaced by a branch outside of the loop easily. Merge return knows how to rewrite the function so the return is replaced by a branch.
Fixes#2038.
It is illegal to inline an OpKill instruction into a continue construct because the continue header will no longer dominate the backedge.
This commit adds a check for this, and does not inline.
If we still want to be able to inline a function that contains an OpKill, we can add a new pass that will wrap OpKill instructions into its own function with just the single instruction.
I do not believe that this is a common case right now, so I will not do that yet.
Fixes#2433.
When working on descriptor indexing validation for compute shaders, the
gl_GlobalInvocationID builtin was being loaded as uint which would cause
compute shaders instrumented by the bindless check pass to have:
%83 = OpLoad %uint %gl_GlobalInvocationID
%84 = OpCompositeExtract %uint %83 0
%85 = OpCompositeExtract %uint %83 1
%86 = OpCompositeExtract %uint %83 2
which results in validation failures:
error: line 127: Reached non-composite type while indexes still remain
to be traversed.
%84 = OpCompositeExtract %uint %83 0
for trying to extract a uint from a uint.
When it's an OpConstant or OpSpecConstant, then the literal
values are compared. If the OpSpecConstant also has a SpecId
decoration, then that's also compared.
Otherwise, it's an OpSpecConstantOp and we only compare the
ID of the OpSpecConstantOp instruction itself.
Fixes#2649
* Types: Avoid comparing IDs for in Type::IsSameImpl
When linking, we end up with duplicate types for imported and exported
types, that needs to be removed. The current code would reject valid
import/export pairs of symbols due to IDs mismatch, even if the types or
constants behind those ID were the same.
Enabled remaining type_match_test
Fixes#2442
New version has additional word in stage-specific section. Also
some changes in content for tesselation and compute shaders. Either
version can be invoked at pass creation. This is done to ease integration
and updating of validation layers. Version 1 is deprecated and eventually
will go away.
Also sneaking in fix to version 1 compute shaders.
* Handle nested breaks from switches.
There was a recent decision made to allow branches to the merge node of
a switch even if the switch is not the first enclosing construct. They
can be generated by glslang from break statements in switches.
Dead branch elimination seems to be the only optimization that will
break because of this change, so I will update that optimizations.
The change made are:
- Track switches in structured cfg analysis.
- In Dead branch elimination:
- Look for nested breaks that will require a switch instruction.
- Rewrite, but don't delete, switchs that are required even if it
could be replaced by an unconditional branch.
- When looking for the first break, consider the merge of a switch
as well.
See #2612.
* Fix variable names and comments.
* Add tests for the struct cfg analysis and switches.
* Fix typos in comments.
In order to try to reduce code duplication and to be able
to fold more cases, we want to use the instruction folder
when folding an OpSpecConstantOp with constant operands.
A couple other changes are need to make this work. First
GetDefiningInstruction| in the constant manager is able
to handle |type_id| being logically equivalent to another
type, so we updated the interface, and removed the assert.
Some tests were also updated because we not generate
better code because constants are not duplicated as much
as before.
No need for new tests. The functionality of the instruction folder is
already tested. There are tests check that the instruction folder is
being used correctly for OpCompositeExtract and OpVectorShuffle in the
existing test cases.
Fixes#2585.
* Update memory model support for SPIR-V 1.4
Fixes#2552
* Upgrade memory model now supports two memory access operands for
OpCopyMemory*
* in all cases the pass will first generate two operands by either
adding them or copying
* updates accounts for multiple operands
* tests
There is a case where sroa is not handling id overflow gracefully. It
is handled and an error message is output when the ids overflow.
Fixes https://crbug.com/961030.
Fixes#2555
* Fix a bug in validation where interfaces were considered non-unique
between different entry points targeting the same function
* added a test
* Update private to local pass to remove localized private variables
from entry point interfaces
* added tests
Fixes#2551
* Add support for 1.4 entry point interface lists
* only input and output variables are automatically live
* can clean up interfaces after DCE
* added tests
* allow opt tests to specify a target environment
Add functionality to fix-storage-class so that it can fix up mismatched
data types for pointers as well.
Fixes bugs in when fixing up storage class.
Move GenerateCopy to the Pass class to be reused.
The spirv-opt change for #2535.
* Change implementation of post order CFG traversal
It seems like the recursion is going very deep, and causing some problem
is particular situations. I've reimplemented the CFG post order
traversal to not use recursion.
Fixes#2539.
There was a bit shift done on 32-bit values, but they should have been
done on 64-bit values. This is fixed. At the same time, uses of size_t
are repalaced by uint64_t to ensure these values are 64-bit.
A test case cannot be created because the code that was change is not
run at the moment since we do not split up vectors or matricies. I do
not want to delete the code because I like to experitment with it every
once in a while.
Fixes#2528.
WebGPU requires certain variables to be initialized, whereas there are
known issues with using initializers in Vulkan. This PR is the first
of three implementing a pass to decompose initialized variables into
a variable declaration followed by a store. This has been broken up
into multiple PRs, because there 3 distinct cases that need to be
handled, which require separate implementations.
This first PR implements the basic infrastructure that is needed, and
handling of Function storage class variables. Private and Output will
be handled in future PRs.
This is part of resolving #2388
In WebGPU, the component operand 0xFFFFFFFF is forbidden, but in
Vulkan it is used to indicate a value is undefined. When converting to
WebGPU, 0xFFFFFFFF needs to converted to a legal value, though the
specific one does not matter, since it was used to indicate an
undefined entry in the original code. Choosing to use 0, since the
operands are required to be on [0, N-1], so 0 is guaranteed to always
be valid.
Fixes#2349
When -Wformat-security is enabled, we are getting an error. I do not
claim to fully understand when the warning is triggered or not, but this
one can be avoided by calling "Log" instead of "Logf" because the
formating string is not needed.
Renames the existing flag '--webgpu-mode' to '--vulkan-to-webgpu' for
the Vulkan->WebGPU operation, and adds a new flag '--webgpu-to-vulkan'
for the WebGPU->Vulkan operation.
Currently '--webgpu-to-vulkan' doesn't have any passes associated with
it yet, but further patches will implement them.
Fixes#2495
* opt/ir_loader: Don't silently drop unknown instructions on the floor
Currently, if spirv-opt sees an instruction it does not know, it will
silently ignore it and move to the next one. This changes it
to be an error, as dropping it on the floor is likely to generate
invalid SPIR-V output.
* opt/optimizer: Complain a bit louder for unexpected binary changes
If a binary change happens despite a pass saying that the binaries
should be identical, this is indicative of a bug in the pass itself.
This does not change behavior for it to be an error, but simply emits a warning in this case.
This pass tries to fix validation error due to a mismatch of storage classes
in instructions. There is no guarantee that all such error will be fixed,
and it is possible that in fixing these errors, it could lead to other
errors.
Fixes#2430.
Fixes#2452
Swaps priority of handling unreachable merge and continues so that the
back-edge is retained in the case a block is both a loop continue and
loop merge
* Check var pointer capability in ADCE.
* Check var ptr capability for common uniform.
* Check var ptr capability in access chain convert.
Since we want this pass to run even if there are variable pointer on
storage buffers, we had to remove asserts that assumed there were no
variable pointers. The functions with the asserts will now work, it
becomes the responsibility of the callers to deal with the output as
appropriate.
* Single block elimination and variable pointers.
It seems like the code in local single block elimination is able to
handle cases with variable pointers already. This is because the
function `HasOnlySupportedRefs` ensures that variables that feed a
variable pointer are not candidates.
* Single store elimination and variable pointers.
It seems like the code in local single stroe elimination is able to
handle cases with variable pointers already. This is because the
function `FindSingleStoreAndCheckUses` ensures that variables that feed
a variable pointer are not candidates.
* SSA rewriter and variable pointers.
It seems like the code in the two passes that call the SSA rewriter are
able to handle cases with variable pointers already. This is because the
function `HasOnlySupportedRefs` ensures that variables that feed
a variable pointer are not candidates.
Fixes#2458.
Fixes#2456
* When eliminating a structured construct that has an unreachable merge,
replace that unreachable terminator with an appropriate return
* New tests
Fixes#2453
* Enable addition of OpPhi instructions when the loop has multiple
predecessors of the merge due to a break
* This can result in some values no longer dominating their uses
* Track return blocks in structured flow to produce OpPhis that have
multiple undef and non-undef arguments
* New tests to catch the bug
* When a block is predicated, mark the new body as a return if the old
block as already a return
* Fix#2478. The fix is to just not try to simplify such loops.
* Also added `BasicBlock::MergeBlockId()` and `BasicBlock::ContinueBlockId()`.
* Some minor changes to `structured_loop_to_selection_reduction_opportunity.cpp`.
* Added test.
If SPV_EXT_descriptor_indexing is enabled, add check that for a
descriptor-based reference, the descriptor is initialized. Initialization
data is stored in the debug input buffer, added to the length information
already there. This feature must be seperately enabled on the pass
creation routine. NOTE: Currently just supports image references; buffer
references are still TODO.
Adds an optimization pass to remove usages of AtomicCounterMemory
bit. This bit is ignored in Vulkan environments and outright forbidden
in WebGPU ones.
Fixes#2242
In constant propagation, decoration are transfered from the original
expression to the constant that will replace it. This can be wrong
because there are no decorations that apply to constants. We choose to
simply delete the decorations.
Fixes#2441
* Handle back edges better in dead branch elim.
Loop header must have exactly one back edge. Sometimes the branch
with the back edge can be folded. However, it should not be folded
if it removes the back edge.
The code to check this simply avoids folding the branch in the
continue block. That needs to be changed to not fold the back edge,
wherever it is.
At the same time, the branch can be folded if it folds to a branch to
the header, because the back edge will still exist.
Fixes#2391.
* Fix OpDot folding of half float vectors.
The code that folds OpDot does not handle half floats correctly. After
trying to multiple the first components, we get a nullptr because we
don't fold half float values. This nullptr gets passed to the code that
does the addition, and causes an assert.
Fixes#2405.
The types of input and output variables must match for the pipeline. We
cannot see the uses in all of the shader, so dead member
elimination cannot safely change the type of input and output variables.
Add a pass that looks for members of structs whose values do not affects
the output of the shader. Those members are then removed and just
treated like padding in the struct.
* Fixes#2358. Added to the reducer the ability to remove a function that is not directly called. Factored out some code from the optimizer to help with this.
* Fixes#2338. Added check for phi node before merging blocks.
* Added functionality to merge blocks A and B even when B starts with OpPhi instructions, by replacing uses of the OpPhi results with the definitions coming from A. Added some tests for this.
* Fixed assertion.
When looking at the uses of the result of an instruction, code sinking
assumes that all uses are in a basic block. However, this is not true
if there is a decoration or name for the result of that insturction.
This commit checks for this.
Fixes https://crbug.com/923243.
It is legal, but not generated by any SPIR-V producer: an OpCompositeExtract
with no indexes. This is essentially just a copy of the object, so we
treat them that way. We simply propagate the live variables of the
result to the operand.
Fixes https://crbug.com/919181.
It is legal, but not generated by any SPIR-V producer: an OpAccessChain
with no indexes. This is essentially just a copy of the pointer.
I have decided to treat it like an OpCopyObject. In CheckUses, we
return that it is not okay.
When looking at this I realized that we had code in GetUsedComponents
that cannot be reached. If there is a use in an OpCopyObject the it
will not call GetUsedComponents. I removed that dead code.
Fixes https://crbug.com/918311.
During unrolling a new loop is created, but its ownership is not clear
as it gets passed through the code. Changed something to unique_ptr to
make that clearer.
Fixes#2299.
Fixing other memory leaks at the same time.
Fixes#2296Fixes#2297
In C++, a bit shift of the same size as the type is undefined, but it is
defined in spir-v. When folding those cases, we have to be careful. We
cannot simply do the shift in C++.
Fixes https://crbug.com/917697.
Fixes#2138
* Modf and frexp are upgraded to use the struct version of the
instruction and generate an explicit store whose flags can be upgraded
separately
* Fixed major bug where availability and visibility were reversed for
non-copy memory instructions
* Fixed bug where availability and visibility scope operands were reversed for copy memory
* Upgraded all opt tests to use SPV_ENV_UNIVERSAL_1_3
* Upgrade tests moved into unified tests and removed standalone test
* Handle CompositeInsert with no indices in VDCE
In the spec, there it nothing that forces an OpCompositeInsert to have
an index, but VDCE assumes there is at least 1 in a couple places.
This commit updates VDCE to handle these cases.
Currently it is impossible to invalidate the constnat and type manager.
However, the compact ids pass changes the ids for the types and
constants, which makes them invalid. This change will make them
analyses that have to been explicitly marked as preserved by passes.
This will allow compact ids to invalidate them.
Fixes#2220.
* Added additional changes for the new AccelerationStructureNV type.
* Added additional changes for the new AccelerationStructureNV type. Change tabs to space...
* Added additional changes for the new accelerationStructureNV type -- add proper type name.
Fix TypeManager.TypeStrings test:
[----------] 29 tests from TypeManager
[ RUN ] TypeManager.TypeStrings
[ OK ] TypeManager.TypeStrings (7 ms)
When we are predicating the continue target for a loop, it can no longer
be the continue target because it will have a branch that exits the loop
and is not the bach edge. The continue target will have to be the
target of that branch that is still in the loop.
Fixes#2211.
The function `UpdatePhiNodes` was being called inconsistently. In one
case, the cfg had already been updated to include the new edge, and in
another place the cfg was not updated. This caused the function to
miss flagging a block as needing new phi nodes. I picked that the cfg
should not be updated before making the call. I documented it, and
change the call sites to match.
Fixes#2207.
We initially assumed that if the type manager returned the correct id
for the pointee type, that we would get the correct pointer type back,
but that is not true. See the unit test added with this commit. We
need to fall back to the linear search any time we are looking for a
pointer to a type that may not be unique.
At the same time, SROA considered an OpName on a variable to be a use of
the entire variable. That has been fixed.
Fixes#2209.
We currently place the load instructions at the start of the basic block
that dominates all of the loads. If that basic block contains OpPhi
instructions, then this will generate invalid code. We just need to
search for a location that comes after all of the OpPhi instructions.
Fixes#2204.
* Don't fold specialized branchs in loop unswitch
Folding branches can have a lot of special cases, and can be a little
error prone. So I only want it in one place. That will be in dead
branch elimination. I will change loop unswitching to set the branches
that were being folded to have a constant condition. Then subsequent
pass of dead branch elimination will be able to remove the code.
At the same time, I added a check that loop unswitching will not
unswitch a branch with a constant condition. It is not useful to do it
because dead branch elimination will simple fold the branch anyway.
Also it avoid an infinite loop that would other wise be introduced by my
first change.
Fixes#2203.
Loop unswitching is unswitching the conditional branch that creates the
back-edge. In the version of the loop, where the bachedge is not taken,
there is no back-edge. This is what causes the validator to complain.
The solution I will go with will be to now unswitch a condition with a
back-edge. At this time we do not now if loop unswitching is used. We do
not include it in the optimization sets provided, nor is it used in
glslang's set. When there are opportunities and no breaks from the loop,
the loop with either be a single iteration loop, or an infinite loop.
There is no performance advantage to performing loop unswitching in
either of those cases. If there is a break, maintaining structured
control flow will be tricky. Unless we see a clear advantage to handling
these case, I would go with the safer simpler solution.
Fixes#2201.
If there are multiple edges to a basic block, then the ssa rewriter will
create OpPhi instructions with duplicate entries. This is invalid, and
it is fixed in this commit.
Fixes#2202.
* Invalidate the decoration manager at the start of ADCE.
If the decoration manager is kept live the the contex will try to keep
it up to date. ADCE deals with group decorations by changing the
operands in |OpGroupDecorate| instructions directly without informing
the decoration manager. This puts it in an invalid state, which will
cause an error when the context tries to update it. To Avoid this
problem, we will invalidate the decoration manager upfront.
At the same time, the decoration manager is now considered when checking
the consistency of the decoration manager.
* Fix invalid OpPhi generated by merge-return.
When we create a new phi node for a value say %10, we have to replace
all of the uses of %10 that are no longer dominated by the def of %10
by the result id of the new phi. However, if the use is in a phi node,
it is possible that the bb contains the use is not dominated by either.
In this case, needs to be handled differently.
* Split loop headers before add a new branch to them.
In merge return, Phi node in loop header that are also merges for loop
do not get updated correctly. Those cases do not fit in with our
current analysis. Doing this will simplify the code by reducing the
number of cases that have to be handled.
Added documentation to the ir context to indicates that TakeNextId()
returns 0 when the max id is reached. TODOs were added to each call
sight so that we know where we have to start to handle this case.
Handle id overflow in |SplitLoopHeader|.
Handle id overflow in |GetOrCreatePreHeaderBlock|.
Handle failure to create preheader in LICM.
Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1841.
We currently simulate all shift operations when the two operand are
constants. The problem is that if the shift amount is larger than
32, the result is undefined.
I'm changing the folder to return 0 if the shift value is too high.
That way, we will have defined behaviour.
https://crbug.com/910937.
Upgrade to VulkanKHR memory model
* Converts Logical GLSL450 memory model to Logical VulkanKHR
* Adds extension and capability
* Removes deprecated decorations and replaces them with appropriate
flags on downstream instructions
* Support for Workgroup upgrades
* Support for copy memory
* Adding support for image functions
* Adding barrier upgrades and tests
* Use QueueFamilyKHR scope instead of device
* Move ProcessFunction* function from pass to the context.
There are a few functions that are used to traverse the call tree.
They currently live in the Pass class, but they have nothing to do with
a pass, and may be needed outside of a pass. They would be better in
the ir context, or in a specific call tree class if we ever have a need
for it.
* Don't inline recursive functions.
Inlining does not check if a function is recursive or not. This has
been fine as long as the shader was a Vulkan shader, which forbid
recursive functions. However, not all shaders are vulkan, so either
we limit inlining to Vulkan shaders or we teach it to look for recursive
functions.
I prefer to keep the passes as general as is reasonable. The change
does not require much new code in inlining and gives a reason to refactor
some other code.
The changes are to add a member function to the Function class that
checks if that function is recursive or not.
Then this is used in inlining to not inlining a function call if it calls
a recursive function.
* Add id to function analysis
There are a few places that build a map from ids to Function whose
result is that id. I decided to add an analysis to the context for this
to reduce that code, and simplify some of the functions.
* Add missing file.
* Added a reduction pass to replace ids with ids of the same type that dominate them.
* Introduce helper method for querying whether an operand type is an input id.
These are bookend passes designed to help preserve line information
across passes which delete, move and clone instructions. The propagation
pass attaches a debug line instruction to every instruction based on
SPIR-V line propagation rules. It should be performed before optimization.
The redundant line elimination pass eliminates all line instructions
which match the previous line instruction. This pass should be performed
at the end of optimization to reduce physical SPIR-V file size.
Fixes#2027.
The type manager in spirv-opt currently asserts if a function parameter
has type void. It is not exactly clear from the spec that this is
disallowed, even if it probably will be disallowed. In either case,
asserts should be used to verify assumptions that will actually make a
difference to the code. As far as the optimizer is concerned, a void
parameter does not matter. I don't see the point of the assert. I'll
just remove it and let the validator decide whether to accept it or not.
No test was added because it is not clear that it is legal, and should
not force us to accept it in the future unless the spec make it clear
that it is legal.
Fixes crbug.com/903088.
That function currently only handled OpPtrAccessChain if it was in the
middle of the chain, but not at the start. Fixing that up.
Fixes crbug.com/905271.
* Add base and core bindless validation instrumentation classes
* Fix formatting.
* Few more formatting fixes
* Fix build failure
* More build fixes
* Need to call non-const functions in order.
Specifically, these are functions which call TakeNextId(). These need to
be called in a specific order to guarantee that tests which do exact
compares will work across all platforms. c++ pretty much does not
guarantee order of evaluation of operands, so any such functions need to
be called separately in individual statements to guarantee order.
* More ordering.
* And more ordering.
* And more formatting.
* Attempt to fix NDK build
* Another attempt to address NDK build problem.
* One more attempt at NDK build failure
* Add instrument.hpp to BUILD.gn
* Some name improvement in instrument.hpp
* Change all types in instrument.hpp to int.
* Improve documentation in instrument.hpp
* Format fixes
* Comment clean up in instrument.hpp
* imageInst -> image_inst
* Fix GetLabel() issue.
If there is only 1 return and it is in a loop, then the function cannot be inlined.
Fix condition when inlined code needs one-trip loop wrapper. The dummy loop is needed when there is a return inside a selection construct. Even if there is only 1 return.
When looking for a break from a selection construct, we do not realize
that a jump to the continue target of a loop containing the selection
is a break. This causes and infinit loop, or possibly other failures.
Fixes#2004.
When looking for a break from a selection construct, we do not need to
look inside nested constructs. However, if a loop header has an
unconditional branch, then we enter the loop. Entering the loop causes
an infinite loop because we keep going through the loop.
The solution is to look for a merge block, if one exsits, even for block
terminated by an OpBranch.
Fixes#1979.
ADCE liveness algorithm should treat OpUnreachable at least like other
branch instructions. It was being treated as always live which was
preventing useless structured constructs from being eliminated.
OpUnreachable is generated by dead branch elimination which is now
being required by merge return, so this fix should accompany that
change.
We currently run merge-return on all functions, but
dead-branch-elimination only runs on function reachable from an entry
point or exported function. Since dead-branch-elimination is needed for
merge-return, they have to match.
Fixes#1976.
Was removing control structures which didn't have data dependency
with enclosed live loop and otherwise did not contain live code.
An example is a counting loop around a live loop.
Fixes#1967.
Consider atomics that load when analyzing live stores in ADCE.
Previously it asserted that the base of an OpImageTexelPointer should
be an image. It is actually a pointer to an image, so IsValidBasePointer
should suffice.
Merge return assumes that the only unreachable blocks are those needed
to keep the structured cfg valid. Even those must be essentially empty
blocks.
If this is not the case, we get unpredictable behaviour. This commit
add a check in merge return, and emits an error if it is not the case.
Added a pass of dead branch elimination before merge return in both the
performance and size passes. It is a precondition of merge return.
Fixes#1962.
The current implementation in the folder when seeing a division by zero
is to assert. In the release build, the compiler will attempt to
compute the value, which causes its own problems.
The solution I will go with is to fold the division, and just give it
the value of 0. The same goes for remainder and mod operations.
Fixes#1961.
The HlslCounterBufferGOOGLE that was introduced changed the OpDecorateId
so that is can now reference an id other than the target. If that other
id is used only in the decoration, then the definition of the id will be
removed because decoration do not count as real uses.
However, if the target of the decoration is still live the decoration
will not be removed. This leaves a reference to an id that is not
defined.
There are two solutions to consider. The first is that is the decoration
is kept, then the definition of the id should be kept live. Implementing
this change would be involved because the way ADCE handles decorations
will have to be reimplemented.
The other solution is to remove the decoration the id is otherwise dead.
This works for this specific case. Also this is the more desirable
behaviour in this case. The id will always be the id of a variable that
belongs to a descriptor set. If that variable is not bound and we do
not remove it, the driver will complain.
I chose to implement the second solution. The first will be left to when
a case for it comes up.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1885.
There are a few spots where copy propagate arrays is trying
to go from a Type to an id, but the type is not unique. When generating
code this pass needs specific ids, otherwise we get type mismatches.
However, the ambigous types means we can sometimes get the wrong type
and generate invalid code.
That code has been rewritten to not rely on the type manager, and just
look at the instructions instead.
I have opened https://github.com/KhronosGroup/SPIRV-Tools/issues/1939 to
try to get a way to make this more robust.
* Analyze uses for all instructions.
The def-use manager needs to fill in the `inst_to_used_ids_` field for
every instruction. This means we have to analyze the uses for every
instruction, even if they do not have any uses.
This mistake was not found earlier because there was a typo in the
equality check for def-use managers. No new tests are needed.
While looking into this I found redundant work in block merge. Cleaning
that up at the same time.
* Fix other transformations
Aggressive dead code elimination did not update the OpGroupDecorate
and the OpGroupMemberDecorate instructions properly when they are
updated. That is fixed.
Dead branch elimination did not analyze the OpUnreachable instructions
that is would add. That is taken care of.
In DecorationManager::RemoveDecorationsFrom, we do not remove the id
from a decoration group if the group has no decorations. This causes
problems because KillNamesAndDecorates is suppose to remove all
references to the id, but in this case, there is still a reference.
This is fixed by adding a special case.
Also, there is the possibility of a double free because
RemoveDecorationsFrom will delete the instructions defining |id| when
|id| is a decoration group. Later, KillInst would later write to memory
that has been deleted when trying to turn it into a Nop. To fix this,
we will only remove the decorations that use |id| and not its definition
in RemoveDecorationsFrom.
Add code to keep the def-use manger and the inst-to-block mapping up-to-date. This means we do not have to rebuild them later.
To make this work, we will have to have to find places to update the
def-use manager. Updating the def-use manager is not straight forward
because we are unrolling loops, and we have circular references.
This forces one pass to register all of the definitions. A second one
to analyze the uses. Also because there will be references to the new
instructions in the old code, we want to register the definitions of the
new instructions early, so we can update the uses of the older code as
we go along.
The inst-to-block mapping is not too difficult. It can be done as instructions are created.
Fixes#1928.
A limit of 0 for the scalar replacement options it used to indicate that
there is no limit. The current implementation does not allow 0. This
should be fixed.
It seems like the current implementation of KillNameAndDecorates does
not handle group decorations correctly. The id being removed is not
removed from the OpGroupDecorate instructions. Even worst, any
decorations that apply to that group are removed.
The solution is to use the function in the decoration manager that will
remove the decorations and update the instructions instead of doing the
work itself.
Adds unrolling to the legalization passes.
After enabling unrolling I found a bug when there is a self-referencing
phi node. That has been fixed.
The test that checks for that the order of optimizations is correct also
needed to be updated.
The current implementation of merge return can create bad, but correct,
code. When it is not in a loop construct, it will insert a lot of
extra branch around code. The potentially large number of branches are
bad. At the same time, it can separate code store to variables from
its uses hiding the fact that the store dominates the load.
This hurts the later analysis because the compiler thinks that multiple
values can reach a load, when there is really only 1. This poorer
analysis leads to missed optimizations.
The solution is to create a dummy loop around the entire body of the
function, then we can break from that loop with a single branch. Also
only new merge nodes would be those at the end of loops meaning that
most analysies will not be hurt.
Remove dead code for cases that are no longer possible.
It seems like some drivers expect there the be an OpSelectionMerge
before conditional branches, even if they are not strictly needed.
So we add them.