* Update to final ray tracing extensions
Drop Provisional from ray tracing enums
sed -ie 's/RayQueryProvisionalKHR/RayQueryKHR/g' **/*
sed -ie 's/RayTracingProvisionalKHR/RayTracingKHR/g' **/*
Add terminator support for SpvOpIgnoreIntersectionKHR and SpvOpTerminateRayKHR
Update deps for SPIRV-Headers
* Update capability dependencies for MeshShadingNV
Accommodate https://github.com/KhronosGroup/SPIRV-Headers/pull/180
MeshShadingNV: enables PrimitiveId, Layer, and ViewportIndex
Co-authored-by: Daniel Koch <dkoch@nvidia.com>
Fix buffer oob instrumentation for matrix refs.
Matrix stride decoration is not on matrix type but is a member decoration
on the enclosing struct type. Also correctly apply matrix stride depending
on row or column major.
spirv-opt has a bug that `DebugInfoManager::AddDebugValueWithIndex()` does not
preserve `Indexes` operands of
[DebugValue](https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.DebugInfo.100.html#DebugValue).
It has to preserve all of those `Indexes` operands, but it preserves only the first index
operand.
This PR removes `DebugInfoManager::AddDebugValueWithIndex()` and lets the spirv-opt
use `DebugInfoManager::AddDebugValueForDecl()`.
`DebugInfoManager::AddDebugValueForDecl()` preserves the Indexes operand correctly.
The front-end language compiler would simply emit DebugDeclare for
a variable when it is declared, which is effective through the variable's
scope. Since DebugDeclare only maps an OpVariable to a local variable,
the information can be removed when an optimization pass uses the
loaded value of the variable. DebugValue can be used to specify the
value of a variable. For each value update or phi instruction of a variable,
we can add DebugValue to help debugger inspect the variable at any
point of the program execution.
For example,
float a = 3;
... (complicated cfg) ...
foo(a); // <-- variable inspection: debugger can find DebugValue of `float a` in the nearest dominant
For the code with complicated CFG e.g., for-loop, if-statement, we
need help of ssa-rewrite to analyze the effective value of each variable
in each basic block.
If the value update of the variable happens only once and it dominates
all its uses, local-single-store-elim pass conducts the same value update
with ssa-rewrite and we have to let it add DebugValue for the value assignment.
One main issue is that we have to add DebugValue only when the value
update of a variable is visible to DebugDeclare. For example,
```
{ // scope1
%stack = OpVariable %ptr_int %int_3
{ // scope2
DebugDeclare %foo %stack <-- local variable "foo" in high-level language source code is declared as OpVariable "%stack"
// add DebugValue "foo = 3"
...
Store %stack %int_7 <-- foo = 7, add DebugValue "foo = 7"
...
// debugger can inspect the value of "foo"
}
Store %stack %int_11 <-- out of "scope2" i.e., scope of "foo". DO NOT add DebugValue "foo = 11"
}
```
However, the initalization of a variable is an exception.
For example, an argument passing of an inlined function must be done out of
the function's scope, but we must add a DebugValue for it.
```
// in HLSL
bar(float arg) { ... }
...
float foo = 3;
bar(foo);
// in SPIR-V
%arg = OpVariable
OpStore %arg %foo <-- Argument passing. Out of "float arg" scope, but we must add DebugValue for "float arg"
... body of function bar(float arg) ...
```
This PR handles the except case in local-single-store-elim pass. It adds
DebugValue for a store that is considered as an initialization.
The same exception handling code for ssa-rewrite is done by this commit: df4198e50e.
This fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/3873.
In the presence of variable pointers, the reaching definition may be
another pointer. For example, the following fragment:
%2 = OpVariable %_ptr_Input_float Input
%11 = OpVariable %_ptr_Function__ptr_Input_float Function
OpStore %11 %2
%12 = OpLoad %_ptr_Input_float %11
%13 = OpLoad %float %12
corresponds to the pseudo-code:
layout(location = 0) in flat float *%2
float %13;
float *%12;
float **%11;
*%11 = %2;
%12 = *%11;
%13 = *%12;
which ultimately, should correspond to:
%13 = *%2;
During rewriting, the pointer %12 is found to be replaceable by %2.
However, when processing the load %13 = *%12, the type of %12's reaching
definition is another float pointer (%2), instead of a float value.
When this happens, we need to continue looking up the reaching definition
chain until we get to a float value or a non-target var (i.e. a variable
that cannot be SSA replaced, like %2 in this case since it is a function
argument).
Removing PropagateLineInfoPass and RedundantLineInfoElimPass from
56d0f5035 makes unit tests of many open source projects fail.
It will happen before submitting this glslang PR
https://github.com/KhronosGroup/glslang/pull/2440. This commit will be
git-reverted after merging the glslang PR.
This commit add support for optimizer to not inline functions with DontInline control flag, so that the [noinline] attribute in HLSL will be useful in DXC SPIR-V generation.
This is part of work of github.com/microsoft/DirectXShaderCompiler/issues/3158
Based on the OpLine spec, an OpLine instruction must be applied to
the instructions physically following it up to the first occurrence
of the next end of block, the next OpLine instruction, or the next
OpNoLine instruction.
```
OpLine %file 0 0
OpNoLine
OpLine %file 1 1
OpStore %foo %int_1
%value = OpLoad %int %foo
OpLine %file 2 2
```
For the above code, the current spirv-opt keeps three line
instructions `OpLine %file 0 0`, `OpNoLine`, and `OpLine %file 1 1`
in `std::vector<Instruction> dbg_line_insts_` of Instruction class
for `OpStore %foo %int_1`. It does not put any line instruction to
`std::vector<Instruction> dbg_line_insts_` of
`%value = OpLoad %int %foo` even though `OpLine %file 1 1` must be
applied to `%value = OpLoad %int %foo` based on the spec.
This results in the missing line information for
`%value = OpLoad %int %foo` while each spirv-opt pass optimizes the
code. We have to put `OpLine %file 1 1` to
`std::vector<Instruction> dbg_line_insts_` of
both `%value = OpLoad %int %foo` and `OpStore %foo %int_1`.
This commit conducts the line instruction propagation and skips
emitting the eliminated line instructions at the end, which are the same
with PropagateLineInfoPass and RedundantLineInfoElimPass. This
commit removes PropagateLineInfoPass and RedundantLineInfoElimPass.
KhronosGroup/glslang#2440 is a related PR that stop using
PropagateLineInfoPass and RedundantLineInfoElimPass from glslang.
When the code in this PR applied, the glslang tests will pass.
If enabled the following targets will be created:
* `${SPIRV_TOOLS}-static` - `STATIC` library. Has full public symbol visibility.
* `${SPIRV_TOOLS}-shared` - `SHARED` library. Has default-hidden symbol visibility.
* `${SPIRV_TOOLS}` - will alias to one of above, based on BUILD_SHARED_LIBS.
If disabled the following targets will be created:
* `${SPIRV_TOOLS}` - either `STATIC` or `SHARED` based on the new `SPIRV_TOOLS_LIBRARY_TYPE` flag. Has full public symbol visibility.
* `${SPIRV_TOOLS}-shared` - `SHARED` library. Has default-hidden symbol visibility.
Defaults to `ON`, matching existing build behavior.
This flag can be used by package maintainers to ensure that all libraries are built as shared objects.
For some cases, we have DebugDecl invisible to a value assignment, but
the value assignment information is important i.e., debugger cannot inspect
the variable without the information. For example, a parameter of an inlined
function must have its value assignment i.e., argument passing out of its
function scope. If we simply remove DebugDecl because it is invisible to the
argument passing, we cannot inspec the variable.
This PR
- Adds DebugValue for DebugDecl invisible to a value assignment. We use
the value of the variable in the basic block that contains DebugDecl, which is
found by ssa-rewrite. If the value instruction does not dominate DebugDecl,
we use the value of the variable in the immediate dominator of the basic block.
- Checks the visibility of DebugDecl for Phi value assignment based on the
all value operands of the Phi. Since Phi just references multiple values from
multiple basic blocks, scopes of value operands must be regarded as the scope
of the Phi.
`DebugInfoManager::AddDebugValueIfVarDeclIsVisible()` adds
OpenCL.DebugInfo.100 DebugValue from DebugDeclare only when the
DebugDeclare is visible to the give scope. It helps us correctly
handle a reference variable e.g.,
{ // scope #1.
int foo = /* init */;
{ // scope #2.
int& bar = foo;
...
in the above code, we must not propagate DebugValue of `int& bar` for
store instructions in the scope #1 because it is alive only in
the scope #2.
We have an exception: If the given DebugDeclare is used for a function
parameter, `DebugInfoManager::AddDebugValueIfVarDeclIsVisible()` has
to always add DebugValue instruction regardless
of the scope. It is because the initializer (store instruction) for
the function parameter can be out of the function parameter's scope
(the function) in particular when the function was inlined.
Without this change, the function parameter value information always
disappears whenever we run the function inlining pass and
`DebugInfoManager::AddDebugValueIfVarDeclIsVisible()`.
1. DebugValue/DebugDeclare references of load/store must not change
the behaviors of the convert-local-access-chains pass
2. We have to properly set the scope and line information of new
instructions made by the convert-local-access-chains pass
In #3636, I missed that the instruction folder may create more than a
single constant per call. Since CCP was only checking whether one
constant had been created after folding, it was wrongly thinking that
the IR had not changed.
Fixes#3738.
This PR adds the NestingDepth function to StructuredCFGAnalysis.
This function, given a block id, returns the number of merge
constructs containing it.
This is needed by spirv-fuzz, but it makes sense to add it to
StructuredCFGAnalaysis, which contains related functionalities.
When we update OpenCL.DebugInfo.100 lexical scopes e.g., DebugFunction,
we have to replace DebugScope of each instruction that uses the lexical
scope correctly.
It is possible that the result of a void function call is used. In case
it is used, we need something that still defines its id after inlining.
We use an undef for that purpose.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/3704
CCP should mark IR changed if it created new constants.
This fixes#3636.
When CCP is simulating statements, it will sometimes successfully fold
an instruction, which laters switches to varying. The initial fold of
the instruction may generate a new constant K.
The problem we were running into is when K never gets propagated to the
IR. Its definition will still exist, so CCP should mark the IR modified
in this case.
In fixing this bug, I noticed that an existing test was suffering from
the same bug. The change also makes PassTest::SinglePassRunAndMatch()
return the result from the pass, so that we can check that the pass
marks the IR modified in this case.
In the existing code, ADCE pass does not check DebugScope of an
instruction when it checks the users of each instruction, which results
in removing OpenCL.Debug.100 instructions that are only used by
DebugScope. This commit lets ADCE pass add DebugScope of an instruction
to the live instruction set when the instruction is added to the live
instruction set.
* No longer blindly add global non-semantic info instructions to global
types and values
* functions now have a list of non-semantic instructions that succeed
them in the global scope
* global non-semantic instructions go in global types and values if
they appear before any function, otherwise they are attached to the
immediate function predecessor in the module
* changed ADCE to use the function removal utility
* Modified EliminateFunction to have special handling for non-semantic
instructions in the global scope
* non-semantic instructions are moved to an earlier function (or full
global set) if the function they are attached to is eliminated
* Added IRContext::KillNonSemanticInfo to remove the tree of
non-semantic instructions that use an instruction
* this is used in function elimination
* There is still significant work in the optimizer to handle
non-semantic instructions fully in the optimizer
For each local variable, ssa-rewrite should remove its DebugDeclare
if and only if it is replaced by any number of DebugValues for store
and phi instructions.
For example, when we have two variables `a` whose DebugDeclare
will be replaced to DebugValues by ssa-rewrite pass and `b` whose
DebugDeclare will not be replaced, we have to remove only DebugDeclare
for `a`, not `b`.
When we copy the loop body to unroll it, we have to copy its
instructions but DebugDeclare or DebugValue used for the declaration
i.e., DebugValue with Deref must not be copied and only the first block
can contain those instructions.
Rename the `${SPIRV_TOOLS}` target to `${SPIRV_TOOLS}-static` and alias `${SPIRV_TOOLS}` to either `${SPIRV_TOOLS}-static` or `${SPIRV_TOOLS}-shared` depending on `BUILD_SHARED_LIBS`.
Re-point all internal uses of `${SPIRV_TOOLS}` to `${SPIRV_TOOLS}-static`.
`${SPIRV_TOOLS}-static` is explicitly renamed to just `${SPIRV_TOOLS}` to ensure the name does not change from current behavior.
Build the `SPIRV-Tools-*` libraries as static, as this is what they always were.
Force the external targets `gmock` and `effcee` to be built statically. These either do not support being built as shared libraries, or require special flags.
Issue: #3482
1. Set the debug scope and line information for the new replacement
instructions.
2. Replace DebugDeclare and DebugValue if their OpVariable or value
operands are replaced by scalars. It uses 'Indexes' operand of
DebugValue. For example,
struct S { int a; int b;}
S foo; // before scalar replacement
int foo_a; // after scalar replacement
int foo_b;
DebugDeclare %dbg_foo %foo %null_expr // before
DebugValue %dbg_foo %foo_a %Deref_expr 0 // after
DebugValue %dbg_foo %foo_b %Deref_expr 1 // means Value(foo.members[1]) == Deref(%foo_b)
Essentially, it marks all DebugInfo instructions in functions (and their operands) as live. It treats DebugDeclare and DebugValue with Deref as loads and so marks Stores of their variables as live.
It marks each DebugGlobalVariables as live except for its variable. After closure, it rechecks if the variable is live. If not, the DebugGlobalVariable instruction's variable operand is set to DebugInfoNone, per the DebugInfo spec.
This pass basically follows the same process as ssa-rewrite: it adds a DebugValue after each Store and removes the DebugDeclare or DebugValue Deref. It only does this if all instructions that are dependent on the Store are Loads and are replaced.
This commit lets the vector DCE pass preserve the OpenCL.DebugInfo.100
information properly. When the vector DCE pass determines the liveness
of instructions, the debug instructions must not affect the decision. In
addition, when it kills some instructions, it has to kill DebugValue
instructions that use the killed instructions. When it updates some
composite values to meaningful values (not undef), it has to remove
DebugValue because the value information becomes incorrect.
The decision to reduce the load must be not affected by debug
instructions. For example, even when a DebugValue references a
result id of a loaded composite value, this change lets the
reduce-load-size pass reduce the load if the full composite value is not
used anywhere other than the DebugValue.
When the pass replaces the local variable `OpVariable` ids to their
corresponding pointers, we have to update operands of DebugValue or
DebugDeclare instructions.
When there are multiple entries and the shader has a variable with
WorkGroup storage class, those multiple entry functions store values to
the variable. Since ADCE pass uses def-use chains to propagate the work
list, some of instructions in the work list are not actually a part of
the currently processed function. As a result, it adds instructions in
other functions and put them in |live_insts_|. However, it does not
have the control flow information for those instructions in other
functions i.e., |block2headerBranch_| and |header2nextHeaderBranch_|.
When it processes those instructions (they are added when it processes a
different function), it skips handling them because they are already in
|live_insts_| and does not check |block2headerBranch_| and
|header2nextHeaderBranch_|, which results in skipping some branches.
Even though those branches are live branches, it considers they are dead
branches.
For many spirv-opt passes such as simplify-instructions pass, we have to
correctly clear the OpenCL.DebugInfo.100 debug information for
KillInst() and ReplaceAllUses(). If we keep some debug information that
disappeared because of KillInst() and ReplaceAllUses(), adding new
DebugValue instructions based on the existing DebugDeclare information
will generate incorrect information. This CL update DebugInfoManager
and IRContext to correctly clear debug information.
* Support load and extract pattern in desc_sroa.
* Fix typo in comments.
* Load replacement var before use; and added test.
* fix formatting
* Address code review comments.
Add OpenCL.DebugInfo.100 `DebugValue` instructions for store
and phi instructions of local variables to provide the debugger with
the updated values of local variables correctly.
Handles the OpenCL100Debug extension in inlining. It preserves the information that is available while also adding the debug inlined at for all of the inlining that it does.
Reject folding comparisons with unfoldable types.
Fixes#3343
When CCP is evaluating an instruction, it was trying to fold a
comparison with 64 bit integers. This was causing a fold failure later
since the folder still cannot deal with 64 bit integers.
ssa-rewrite fails in `MemPass::GetPtr()` when the SPIR-V code contains
`OpLoad` for the result id of `OpConstantNull` because of the out of
index access for an operand to get the base address. This commit fixes
it.
Fixes#3344
In this PR, the classes that represent the adjust branch weights
transformation and fuzzer pass were implemented. This transformation
adjusts the branch weights of a OpBranchConditional instruction.
- No longer inline functions with early exits. Merge return can modify them so they can be inlined.
- Otherwise no functional change, should be just refactoring.
* Do merge return if the return is not at the end of the function.
We will remove the code in inlining to handle a return in the middle of
a function. To inline those functions, we need to run merge return to
move the return to the end of the function.
Generalizes the IsReadOnlyVariable() method, and related methods, so
that they can be used to ask whether pointer result ids are read-only.
Fixes#3324.
When moving blocks around, we ended up with a nullptr for a basic block,
and it was left in the list for a little bit. However, in that time, it
would end up being dereferenced while traversing the function.
To fix this, we delete it right away. This was found in an asan build
that runs our current tests. No new tests are needed, but I did add
extra check asan checks for our asan bot.
Many high-level languages like HLSL and GLSL generate termination
instructions such as return and branch from the actual part of the
high-level language code like return and if statements. This commit lets
IrLoader set `DebugScope` for termination instructions.
We need an analysis for OpenCL.DebugInfo.100 extension instructions such
as a map between function id and its DebugFunction. This commit add an
analysis for it.
* Preserve debug info in eliminate-dead-functions
The elimination of dead functions makes OpFunction operand of
DebugFunction invalid. This commit replaces the operand with
DebugInfoNone.
* Handle more cases in dead member elim
- Rewrite composite insert and extract operations on SpecConstnatOp.
- Leaves assert for Access chain instructions, which are only allowed
for kernels.
- Other operations do not require any extra code will no longer cause an
assert.
Fixes#3284.
Fixes#3282.
When DebugScope is given in SPIR-V, each instruction following the
DebugScope is from the lexical scope pointed by the DebugScope in
the high level language. We add DebugScope struction to keep the
scope information in Instruction class. When ir_loader loads
DebugScope/DebugNoScope, it keeps the scope information in
|last_dbg_scope_| and lets following instructions have that scope
information.
In terms of DebugDeclare/DebugValue, if it is in a function body
but outside of a basic block, we keep it in |debug_insts_in_header_|
of Function class. If it is in a basic block, we keep it as a normal
instruction i.e., in a instruction list of BasicBlock.
Create a pass to instrument OpDebugPrintf instructions. This pass replaces all OpDebugPrintf instructions with instructions to write a record containing the string id and the all specified values into a special printf output buffer (if space allows). This pass is designed to support the printf validation in the Vulkan validation layers.
Fixes#3210
This adds support for replacing TimeAMD with OpReadClockKHR. The scope
for OpReadClockKHR is fixed to be a subgroup because TimeAMD operates
only on subgroup.
* Implement constant folding for many transcendentals
This change adds support for folding of sin/cos/tan/asin/acos/atan,
exp/log/exp2/log2, sqrt, atan2 and pow.
The mechanism allows to use any C function to implement folding in the
future; for now I limited the actual additions to the most commonly used
intrinsics in the shaders.
Unary folder had to be tweaked to work with extended instructions - for
extended instructions, constants.size() == 2 and constants[0] ==
nullptr. This adjustment is similar to the one binary folder already
performs.
Fixes#1390.
* Fix Android build
On old versions of Android NDK, we don't get std::exp2/std::log2
because of partial C++11 support.
We do get ::exp2, but not ::log2 so we need to emulate that.
We must treat a branch to the merge node of a switch that is in the
header of a construct as a nested construced. The original merge
instruction is still needed in that case.
* Allow OpExtInst for DebugInfo between secion 9 and 10
Fixes#3086
* Handle spirv-opt errors on DebugInfo Ext
* Add IR Loader test
* Fix ir loader bug
* Handle DebugFunction/DebugTypeMember forward reference
* Add test cases (forward reference to function)
* Support old DebugInfo extension
* Validate local debug info out of function
As explained in #3118, spirv-opt merge-blocks pass causes a
spirv-val error when an OpBranch has an OpLine in front of it.
OpLoopMerge
OpBranch ; Will be killed by merge-blocks pass
OpLabel ; Will be killed by merge-blocks pass
OpLine ; will be placed between OpLoopMerge and OpBranch - error!
OpBranch
To fix this issue, this commit moves line info of OpBranch to
OpLoopMerge.
Fixes#3118
Add support for SPV_KHR_non_semantic_info
This entails a couple of changes:
- Allowing unknown OpExtInstImport that begin with the prefix `NonSemantic.`
- Allowing OpExtInst that reference any of those sets to contain unknown
ext inst instruction numbers, and assume the format is always a series of IDs
as guaranteed by the extension.
- Allowing those OpExtInst to appear in the types/variables/constants section.
- Not stripping OpString in the --strip-debug pass, since it may be referenced
by these non-semantic OpExtInsts.
- Stripping them instead in the --strip-reflect pass.
* Add adjacency validation of non-semantic OpExtInst
- We validate and test that OpExtInst cannot appear before or between
OpPhi instructions, or before/between OpFunctionParameter
instructions.
* Change non-semantic extinst type to single value
* Add helper function spvExtInstIsNonSemantic() which will check if the extinst
set is non-semantic or not, either the unknown generic value or any future
recognised non-semantic set.
* Add test of a complex non-semantic extinst
* Use DefUseManager in StripDebugInfoPass to strip some OpStrings
* Any OpString used by a non-semantic instruction cannot be stripped, all others
can so we search for uses to see if each string can be removed.
* We only do this if the non-semantic debug info extension is enabled, otherwise
all strings can be trivially removed.
* Silence -Winconsistent-missing-override in protobufs
* Make Instrumentation format version 2 the default (Step 1)
Add new interfaces without version number argument. Remove version 1
logic and tests. Version interfaces will be removed in step 2 after
layers have transitioned to new interface.
* Add error messages to InstrumentPass().
* Don't crash when folding construct of empty struct
An OpCompositeConstruct of an empty struct will be folded to a constant
under normal circumstances. However, if the id limit has been reached
and the constant cannot be generated, then other folding rules will be
tried.
These rules do not handle the case of an empty struct. We add allow it
to be handled.
Fixes http://crbug/1030194
* Changes based on the review.
Access chain indices are always interpreted as signed integers.
So use signed clamp instead of unsigned clamp. We must also
clamp to the max signed int for the index type.
Fixes#3072
* Validate that if a construct contains a header and it's merge is
reachable, the construct also contains the merge
* updated block merging to not merge into the continue
* update inlining to mark the original block of a single block loop as
the continue
* updated some tests
* remove dead code
* rename kBlockTypeHeader to kBlockTypeSelection for clarity
Wrap-opkill will create a new function, invalidating the id-to-func map.
The preserved analyses for the pass have been updated to reflect that.
Also adding consistency check for the id-to-func map. With this new
check, old tests identify this problem. No new tests are needed.
Fixes#3038
Implements the following simplifications:
(a - b) + b => a
(a * b) + (a * c) => a * (b + c)
Also adds logic to simplification to handle rules that create new operations
that might need simplification, such as the second rule above.
Only perform the second simplification if the multiplies have the add as their
only use. Otherwise this is a deoptimization of size and performance.
We have a check that ensures that the optimizer did not change the
binary when it says that it did not. However, when the binary is
converted back to a binary, we made a decision to remove OpNop
instructions. This means that any spv file that contains a NOP
originally will fail this check.
To get around this, we convert the module to a second binary that keeps
the OpNop instructions. That binary is compared against the original.
Fixes https://crbug.com/1010191
Added exports for libraries. External libraries that themselves use
libraries require all dependencies have exports, so not having exports can
cause major problems when used within other projects.
Install paths for exports are now placed in the proper directories expected
by Windows and *nix systems. Config files are generated as well, which
should work with CMake's find_package() function once installed.
We want to handle OpKill better. The wrap opkill causes lots of extra
code to be generated, even when they are not needed to avoid the main
problem: OpKill cannot be found directly in a continue construct.
This change will be more selective on which functions the OpKill will be
wrapped and inlining will avoid inlining.
Fixes#2912
* Add continue construct analysis to struct cfg analysis
Add the ability to identify which blocks are in the continue construct for a
loop, and to get functions that are called from those blocks, directly or
indirectly.
Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/2912.
There is nothing in the spir-v spec that says the last
instructions in a module cannot be OpLine or OpNoLine.
However, the code that parses the module will simply drop
these instructions.
We add code that will preserve these instructions.
Strip-debug-info is updated to remove these instructions.
Fixes https://crbug.com/1000689.
* Handle extract with no indexes
It is possible that OpCompositeExtract instructions will not have any
indexes. This is not handled well by scalar replacement and instruction
folding.
Fixes https://crbug.com/1006435
* Fix typo.
* Use OpReturn* in wrap-opkill
The warp-opkill pass is generating incorrect code. It is placing an
OpUnreachable at the end of a basic block, when the block can be
reached. We can't reach the end of the block, but we can reach the end.
Instead we will add a return instruction.
Fixes#2875.
The warp-opkill pass is generating incorrect code. It is placing an
OpUnreachable at the end of a basic block, when the block can be
reached. We can't reach the end of the block, but we can reach the end.
Instead we will add a return instruction.
Fixes#2875.
Many of the places in copy propagate arrays assumes that integer constant will be defined by an OpConstant instruction. That is not always true. We fix these spots by allowing for an OpConstantNull.
If an OpKill instruction is inlined into a continue construct, then the
spir-v is no longer valid. To avoid this issue, we do inline into an
OpKill at all. This method was chosen because it is difficult to keep
track of whether or not you are in a continue construct while changing
the function that is being inlined into. This will work well with wrap
OpKill because every will still be inlined except for the OpKill
instruction itself.
Fixes#2554Fixes#2433
This reverts commit aa9e8f5380.
* Handle id overflow in the ssa rewriter.
Remove LocalSSAElim pass at the same time. It does the same thing as the SSARewrite pass. Then even share almost all of the same code.
Fixes crbug.com/997246
The first pass applies the RelaxedPrecision decoration to all executable
instructions with float32 based type results. The second pass converts
all executable instructions with RelaxedPrecision result to the equivalent
float16 type, inserting converts where necessary.
Add the first steps to removing the AMD extension VK_AMD_shader_ballot.
Splitting up to make the PRs smaller.
Adding utilities to add capabilities and change the version of the
module.
Replaces the instructions:
OpGroupIAddNonUniformAMD = 5000
OpGroupFAddNonUniformAMD = 5001
OpGroupFMinNonUniformAMD = 5002
OpGroupUMinNonUniformAMD = 5003
OpGroupSMinNonUniformAMD = 5004
OpGroupFMaxNonUniformAMD = 5005
OpGroupUMaxNonUniformAMD = 5006
OpGroupSMaxNonUniformAMD = 5007
and extentend instructions
WriteInvocationAMD = 3
MbcntAMD = 4
Part of #2814
If they are not aliased, the function will always print the message:
"Binary unexpectedly changed despite optimizer saying there was no change"
Which is (usually) totally bogus.
Fixes#2798
* Refactor instruction folders
We want to refactor the instruction folder to allow different sets of
rules to be added to the instruction folder. We might want different
sets of rules in different circumstances.
We also need a way to add rules for extended instructions. Changes are
made to the FoldingRules class and ConstFoldingRules class to enable
that.
We added tests to check that we can fold extended instructions using the
new framework.
At the same time, I noticed that there were two tests that did not tests
what they were suppose to. They could not be easily salvaged. #2813 was
opened to track adding the new tests.
Now we need to handle id overflow when we overflow while replacing uses of the variable. While looking at this code, I noticed an error in the way we handle access chains that cannot be replaced because of overflow. Name it will make some change, and then give up by returning SuccessWithoutChange. But it was changed.
This is fixed up by returning Failure if we notice the error at the time of rewriting the users. This is for both id overflow or out-of-bounds accesses.
Code is added to "CheckUses" to remove variables that have out-of-bounds accesses from the candidate list, so we don't even try to rewrite its uses.
Fixes https://crbug.com/995032
If we run out of ids when creating a new variable, sroa does not recognize
the error, and continues doing work. This leads to segmentation faults.
Fixes https://crbug/969655
`#include <source/util/string_utils.h>` works only when we specify
`include_directories(${CMAKE_CURRENT_SOURCE_DIR}/)` in
cmake. It is hard to set the source directory as a include path
in some build systems e.g., bazel. Using the relative path easily
solves this issue. This commit uses
`#include "source/util/string_utils.h"` instead of
`#include <source/util/string_utils.h>`.
We are no able to inline OpKill instructions into a continue construct.
See #2433. However, we have to be able to inline to correctly do
legalization. This commit creates a pass that will wrap OpKill
instructions into a function of its own. That way we are able to inline
the rest of the code.
The follow up to this will be to not inline any function that contains
an OpKill.
Fixes#2726
This also fixes ADCE to not remove possibly needed OpTypeForwardPointer.
The bug, its fix and the corresponding test have a circular dependency
with the extension, so they are packaged together.
If a member of a struct has a relaxed precision, sroa will not split the
struct. This means we do not get all cases. This commit handles these
cases. The other part is that the decoration needs to be passed on to
the new variables.
Fixes#2786
Fixes#2768
* In scalar replacement, interpret access chain indexes as signed counts
* Use Constant::GetSignExtendedValue and Constant::GetZeroExtendedValue
where appropriate
* new tests
spirv-opt: Add --graphics-robust-access
Clamps access chain indices so they are always
in bounds.
Assumes:
- Logical addressing mode
- No runtime-array-descriptor-indexing
- No variable pointers
Adds stub code for clamping coordinate and samples
for OpImageTexelPointer.
Adds SinglePassRunAndFail optimizer test fixture.
Android.mk: add source/opt/graphics_robust_access_pass.cpp
Adds Constant::GetSignExtendedValue, Constant::GetZeroExtendedValue
This makes it symmetric with the result type of ...->element_type which
returns a const Type.
So now we can write code like this:
analysis::Vector v = ...
analysis::Vector(v->element_type(), 2);
This fixes#2608.
The original test case had an out-of-bounds reference that ended up
folding into OpCompositeExtract that was indexing right outside the
constant composite.
The returned constant would then cause a segfault during constant
propagation.
Fixes#2764
* Don't replace all uses when simplifying instructions, instead only
update non-debug, non-decoration uses
* added a test
* Add a new version of RAUW that takes a predicate to decide whether to
replace the use or not
* used in simplification pass
* Fix#2609 - Handle out-of-bounds scalar replacements.
When SROA tries to do a replacement for an OpAccessChain that is exactly
one element out of bounds, the code was trying to access its internal
array of replacements and segfaulting.
This protects the code from doing this, and it additionally fixes the
way SROA works by not returning failure when it refuses to do a
replacement. Instead of failing the optimization pass, SROA will now
simply refuse to do the replacement and keep going.
Additionally, this patch fixes the SROA logic to now return a proper status so we can
correctly state that the pass made no changes to the IR if it only found
invalid references.
Merge return expects unreachable merge block to look a certain way, and
unreachable continue blocks to look a certain way. What if an
unreachable block is both a merge and a continue? The continue is
suppose to take precedent, but merge-return implements it with the merge
taking precedent. This change flips that around.
Fixes#2746
* Process OpDecorateId in ADCE
When there is an OpDecorateId instruction that is live,
the ids that is references must be kept live. This change
adds them to the worklist.
I've also updated a validator check to allow OpDecorateId
to be able to apply to decoration groups.
Fixes#1759.
* Remove dead code.
In merge return, we need to know the original dominator for a block in order to
traverse code from the original dominator to the new dominator and add
appropriate Phi nodes. The current code gets this wrong because the dominator
tree is build as needed. The first time we get the immediate dominator for a
function we just built the dominator tree and it takes into account that a
block has been split. The second time it does not.
This inconsistency needs to be fixed. We do that by recording the original
dominator for all blocks at the start of the pass.
If we were to record just the basic block, that could change if the block is
split. We want to traverse the code in the body of the original dominator,
whatever block it ends up in. To make this easy to track, we not save the
terminator instruction to represent the original dominator.
Fixes#2745
When a phi candidate is marked as trivial, we are suppose to update all
of its uses to the reference the value that it is being folded to.
However, the code updates the uses misses `defs_at_block_`. So at a
later time, the id for the trivial phi can reemerge.
Fixes#2744
* Bindless Instrument: Make init check depend solely on input_init_enabled
Previously was dependent on presense of descriptor_indexing extension
in SPIR-V, but this missed some cases. Tests updated to refect this new
policy.
* Fix format.
* Fix bug in merge return
The merge return pass seems to assume that the only new edges in the cfg
are from return block to merge blocks. However, it is possible that a
merge block branches to a merge block when it did not before.
This change add a new variable to track all of the new edges. It also
renames some other variables and cleans us the code to make it a bit
easier to read.
Fixes#2702.
Dead branch elimination needs to know about the constructs that a block is contained it when determining what to do with its merge instruction. We currently fold branches in block as we see them, which is parent constructs before their children. This causes the struct cfg analysis to crash because it tries to get the parent construct for a block after the parent has been folded.
This can be fixed by folding the branch of the children before the parents.
Fixes#2667.
There are a couple spots where we are not looking at decorations when we should.
1. Value numbering is suppose to assign a different value number to ids if they have different decorations. However that is not being done for OpCopyObject and OpPhi.
1. Instruction simplification is propagating OpCopyObject instruction without checking for decorations. It should only do that if no decorations are being lost.
Add a new function to the decoration manager to check if the decorations of one id are a subset of the decorations of another.
Fixes#2715.
Inlining does not inline functions that have a single return that is in a loop. This is because the return cannot be replaced by a branch outside of the loop easily. Merge return knows how to rewrite the function so the return is replaced by a branch.
Fixes#2038.
It is illegal to inline an OpKill instruction into a continue construct because the continue header will no longer dominate the backedge.
This commit adds a check for this, and does not inline.
If we still want to be able to inline a function that contains an OpKill, we can add a new pass that will wrap OpKill instructions into its own function with just the single instruction.
I do not believe that this is a common case right now, so I will not do that yet.
Fixes#2433.
When working on descriptor indexing validation for compute shaders, the
gl_GlobalInvocationID builtin was being loaded as uint which would cause
compute shaders instrumented by the bindless check pass to have:
%83 = OpLoad %uint %gl_GlobalInvocationID
%84 = OpCompositeExtract %uint %83 0
%85 = OpCompositeExtract %uint %83 1
%86 = OpCompositeExtract %uint %83 2
which results in validation failures:
error: line 127: Reached non-composite type while indexes still remain
to be traversed.
%84 = OpCompositeExtract %uint %83 0
for trying to extract a uint from a uint.
When it's an OpConstant or OpSpecConstant, then the literal
values are compared. If the OpSpecConstant also has a SpecId
decoration, then that's also compared.
Otherwise, it's an OpSpecConstantOp and we only compare the
ID of the OpSpecConstantOp instruction itself.
Fixes#2649
* Types: Avoid comparing IDs for in Type::IsSameImpl
When linking, we end up with duplicate types for imported and exported
types, that needs to be removed. The current code would reject valid
import/export pairs of symbols due to IDs mismatch, even if the types or
constants behind those ID were the same.
Enabled remaining type_match_test
Fixes#2442
New version has additional word in stage-specific section. Also
some changes in content for tesselation and compute shaders. Either
version can be invoked at pass creation. This is done to ease integration
and updating of validation layers. Version 1 is deprecated and eventually
will go away.
Also sneaking in fix to version 1 compute shaders.
* Handle nested breaks from switches.
There was a recent decision made to allow branches to the merge node of
a switch even if the switch is not the first enclosing construct. They
can be generated by glslang from break statements in switches.
Dead branch elimination seems to be the only optimization that will
break because of this change, so I will update that optimizations.
The change made are:
- Track switches in structured cfg analysis.
- In Dead branch elimination:
- Look for nested breaks that will require a switch instruction.
- Rewrite, but don't delete, switchs that are required even if it
could be replaced by an unconditional branch.
- When looking for the first break, consider the merge of a switch
as well.
See #2612.
* Fix variable names and comments.
* Add tests for the struct cfg analysis and switches.
* Fix typos in comments.
In order to try to reduce code duplication and to be able
to fold more cases, we want to use the instruction folder
when folding an OpSpecConstantOp with constant operands.
A couple other changes are need to make this work. First
GetDefiningInstruction| in the constant manager is able
to handle |type_id| being logically equivalent to another
type, so we updated the interface, and removed the assert.
Some tests were also updated because we not generate
better code because constants are not duplicated as much
as before.
No need for new tests. The functionality of the instruction folder is
already tested. There are tests check that the instruction folder is
being used correctly for OpCompositeExtract and OpVectorShuffle in the
existing test cases.
Fixes#2585.
* Update memory model support for SPIR-V 1.4
Fixes#2552
* Upgrade memory model now supports two memory access operands for
OpCopyMemory*
* in all cases the pass will first generate two operands by either
adding them or copying
* updates accounts for multiple operands
* tests
There is a case where sroa is not handling id overflow gracefully. It
is handled and an error message is output when the ids overflow.
Fixes https://crbug.com/961030.
Fixes#2555
* Fix a bug in validation where interfaces were considered non-unique
between different entry points targeting the same function
* added a test
* Update private to local pass to remove localized private variables
from entry point interfaces
* added tests
Fixes#2551
* Add support for 1.4 entry point interface lists
* only input and output variables are automatically live
* can clean up interfaces after DCE
* added tests
* allow opt tests to specify a target environment
Add functionality to fix-storage-class so that it can fix up mismatched
data types for pointers as well.
Fixes bugs in when fixing up storage class.
Move GenerateCopy to the Pass class to be reused.
The spirv-opt change for #2535.
* Change implementation of post order CFG traversal
It seems like the recursion is going very deep, and causing some problem
is particular situations. I've reimplemented the CFG post order
traversal to not use recursion.
Fixes#2539.
There was a bit shift done on 32-bit values, but they should have been
done on 64-bit values. This is fixed. At the same time, uses of size_t
are repalaced by uint64_t to ensure these values are 64-bit.
A test case cannot be created because the code that was change is not
run at the moment since we do not split up vectors or matricies. I do
not want to delete the code because I like to experitment with it every
once in a while.
Fixes#2528.
WebGPU requires certain variables to be initialized, whereas there are
known issues with using initializers in Vulkan. This PR is the first
of three implementing a pass to decompose initialized variables into
a variable declaration followed by a store. This has been broken up
into multiple PRs, because there 3 distinct cases that need to be
handled, which require separate implementations.
This first PR implements the basic infrastructure that is needed, and
handling of Function storage class variables. Private and Output will
be handled in future PRs.
This is part of resolving #2388
In WebGPU, the component operand 0xFFFFFFFF is forbidden, but in
Vulkan it is used to indicate a value is undefined. When converting to
WebGPU, 0xFFFFFFFF needs to converted to a legal value, though the
specific one does not matter, since it was used to indicate an
undefined entry in the original code. Choosing to use 0, since the
operands are required to be on [0, N-1], so 0 is guaranteed to always
be valid.
Fixes#2349
When -Wformat-security is enabled, we are getting an error. I do not
claim to fully understand when the warning is triggered or not, but this
one can be avoided by calling "Log" instead of "Logf" because the
formating string is not needed.
Renames the existing flag '--webgpu-mode' to '--vulkan-to-webgpu' for
the Vulkan->WebGPU operation, and adds a new flag '--webgpu-to-vulkan'
for the WebGPU->Vulkan operation.
Currently '--webgpu-to-vulkan' doesn't have any passes associated with
it yet, but further patches will implement them.
Fixes#2495
* opt/ir_loader: Don't silently drop unknown instructions on the floor
Currently, if spirv-opt sees an instruction it does not know, it will
silently ignore it and move to the next one. This changes it
to be an error, as dropping it on the floor is likely to generate
invalid SPIR-V output.
* opt/optimizer: Complain a bit louder for unexpected binary changes
If a binary change happens despite a pass saying that the binaries
should be identical, this is indicative of a bug in the pass itself.
This does not change behavior for it to be an error, but simply emits a warning in this case.
This pass tries to fix validation error due to a mismatch of storage classes
in instructions. There is no guarantee that all such error will be fixed,
and it is possible that in fixing these errors, it could lead to other
errors.
Fixes#2430.
Fixes#2452
Swaps priority of handling unreachable merge and continues so that the
back-edge is retained in the case a block is both a loop continue and
loop merge
* Check var pointer capability in ADCE.
* Check var ptr capability for common uniform.
* Check var ptr capability in access chain convert.
Since we want this pass to run even if there are variable pointer on
storage buffers, we had to remove asserts that assumed there were no
variable pointers. The functions with the asserts will now work, it
becomes the responsibility of the callers to deal with the output as
appropriate.
* Single block elimination and variable pointers.
It seems like the code in local single block elimination is able to
handle cases with variable pointers already. This is because the
function `HasOnlySupportedRefs` ensures that variables that feed a
variable pointer are not candidates.
* Single store elimination and variable pointers.
It seems like the code in local single stroe elimination is able to
handle cases with variable pointers already. This is because the
function `FindSingleStoreAndCheckUses` ensures that variables that feed
a variable pointer are not candidates.
* SSA rewriter and variable pointers.
It seems like the code in the two passes that call the SSA rewriter are
able to handle cases with variable pointers already. This is because the
function `HasOnlySupportedRefs` ensures that variables that feed
a variable pointer are not candidates.
Fixes#2458.
Fixes#2456
* When eliminating a structured construct that has an unreachable merge,
replace that unreachable terminator with an appropriate return
* New tests
Fixes#2453
* Enable addition of OpPhi instructions when the loop has multiple
predecessors of the merge due to a break
* This can result in some values no longer dominating their uses
* Track return blocks in structured flow to produce OpPhis that have
multiple undef and non-undef arguments
* New tests to catch the bug
* When a block is predicated, mark the new body as a return if the old
block as already a return
* Fix#2478. The fix is to just not try to simplify such loops.
* Also added `BasicBlock::MergeBlockId()` and `BasicBlock::ContinueBlockId()`.
* Some minor changes to `structured_loop_to_selection_reduction_opportunity.cpp`.
* Added test.