The existing spirv-opt `DebugInfoManager::AddDebugValueForDecl()` sets
the scope and line info of the new added DebugValue using the scope and
line of DebugDeclare. This is wrong because only a single DebugDeclare
must exist under a scope while we have to add DebugValue for all the
places where the variable's value is updated. Therefore, we have to set
the scope and line of DebugValue based on the places of the variable
updates.
This bug makes
https://github.com/google/amber/blob/main/tests/cases/debugger_hlsl_shadowed_vars.amber
fail. This commit fixes the bug.
The front-end language compiler would simply emit DebugDeclare for
a variable when it is declared, which is effective through the variable's
scope. Since DebugDeclare only maps an OpVariable to a local variable,
the information can be removed when an optimization pass uses the
loaded value of the variable. DebugValue can be used to specify the
value of a variable. For each value update or phi instruction of a variable,
we can add DebugValue to help debugger inspect the variable at any
point of the program execution.
For example,
float a = 3;
... (complicated cfg) ...
foo(a); // <-- variable inspection: debugger can find DebugValue of `float a` in the nearest dominant
For the code with complicated CFG e.g., for-loop, if-statement, we
need help of ssa-rewrite to analyze the effective value of each variable
in each basic block.
If the value update of the variable happens only once and it dominates
all its uses, local-single-store-elim pass conducts the same value update
with ssa-rewrite and we have to let it add DebugValue for the value assignment.
One main issue is that we have to add DebugValue only when the value
update of a variable is visible to DebugDeclare. For example,
```
{ // scope1
%stack = OpVariable %ptr_int %int_3
{ // scope2
DebugDeclare %foo %stack <-- local variable "foo" in high-level language source code is declared as OpVariable "%stack"
// add DebugValue "foo = 3"
...
Store %stack %int_7 <-- foo = 7, add DebugValue "foo = 7"
...
// debugger can inspect the value of "foo"
}
Store %stack %int_11 <-- out of "scope2" i.e., scope of "foo". DO NOT add DebugValue "foo = 11"
}
```
However, the initalization of a variable is an exception.
For example, an argument passing of an inlined function must be done out of
the function's scope, but we must add a DebugValue for it.
```
// in HLSL
bar(float arg) { ... }
...
float foo = 3;
bar(foo);
// in SPIR-V
%arg = OpVariable
OpStore %arg %foo <-- Argument passing. Out of "float arg" scope, but we must add DebugValue for "float arg"
... body of function bar(float arg) ...
```
This PR handles the except case in local-single-store-elim pass. It adds
DebugValue for a store that is considered as an initialization.
The same exception handling code for ssa-rewrite is done by this commit: df4198e50e.
For some cases, we have DebugDecl invisible to a value assignment, but
the value assignment information is important i.e., debugger cannot inspect
the variable without the information. For example, a parameter of an inlined
function must have its value assignment i.e., argument passing out of its
function scope. If we simply remove DebugDecl because it is invisible to the
argument passing, we cannot inspec the variable.
This PR
- Adds DebugValue for DebugDecl invisible to a value assignment. We use
the value of the variable in the basic block that contains DebugDecl, which is
found by ssa-rewrite. If the value instruction does not dominate DebugDecl,
we use the value of the variable in the immediate dominator of the basic block.
- Checks the visibility of DebugDecl for Phi value assignment based on the
all value operands of the Phi. Since Phi just references multiple values from
multiple basic blocks, scopes of value operands must be regarded as the scope
of the Phi.
`DebugInfoManager::AddDebugValueIfVarDeclIsVisible()` adds
OpenCL.DebugInfo.100 DebugValue from DebugDeclare only when the
DebugDeclare is visible to the give scope. It helps us correctly
handle a reference variable e.g.,
{ // scope #1.
int foo = /* init */;
{ // scope #2.
int& bar = foo;
...
in the above code, we must not propagate DebugValue of `int& bar` for
store instructions in the scope #1 because it is alive only in
the scope #2.
We have an exception: If the given DebugDeclare is used for a function
parameter, `DebugInfoManager::AddDebugValueIfVarDeclIsVisible()` has
to always add DebugValue instruction regardless
of the scope. It is because the initializer (store instruction) for
the function parameter can be out of the function parameter's scope
(the function) in particular when the function was inlined.
Without this change, the function parameter value information always
disappears whenever we run the function inlining pass and
`DebugInfoManager::AddDebugValueIfVarDeclIsVisible()`.
This pass basically follows the same process as ssa-rewrite: it adds a DebugValue after each Store and removes the DebugDeclare or DebugValue Deref. It only does this if all instructions that are dependent on the Store are Loads and are replaced.
* Check var pointer capability in ADCE.
* Check var ptr capability for common uniform.
* Check var ptr capability in access chain convert.
Since we want this pass to run even if there are variable pointer on
storage buffers, we had to remove asserts that assumed there were no
variable pointers. The functions with the asserts will now work, it
becomes the responsibility of the callers to deal with the output as
appropriate.
* Single block elimination and variable pointers.
It seems like the code in local single block elimination is able to
handle cases with variable pointers already. This is because the
function `HasOnlySupportedRefs` ensures that variables that feed a
variable pointer are not candidates.
* Single store elimination and variable pointers.
It seems like the code in local single stroe elimination is able to
handle cases with variable pointers already. This is because the
function `FindSingleStoreAndCheckUses` ensures that variables that feed
a variable pointer are not candidates.
* SSA rewriter and variable pointers.
It seems like the code in the two passes that call the SSA rewriter are
able to handle cases with variable pointers already. This is because the
function `HasOnlySupportedRefs` ensures that variables that feed
a variable pointer are not candidates.
Fixes#2458.
Fixes#2104
* Checks the rules for logical addressing and variable pointers
* Has an out for relaxed logical pointers
* Updated PassFixture to expose validator options
* enabled relaxed logical pointers for some tests
* New validator tests
In local-access-chain-convert, we replace loads by load the entire
variable, then doing the extract. The extract will have the same value
as the load. However, if the load has a decoration on it, the
decoration is lost because we do not copy any them to the new id.
This is fixed by rewritting the load into the extract and keeping the
same result id.
This change has the effect that we do not call DCEInst on the loads
because the load is not being deleted, but replaced. This could leave
OpAccessChain instructions around that are not used. This is not a
problem for -O and -Os. They run local_single_*_elim passes and then
dead code elimination. The dce will remove the unused access chains,
and the load elimination passes work even if there are unused access
chains. I have added test to them to ensure they will not loss
opportunities.
Fixes#1787.
The local-single-store-elim algorithm is not fundamentally bad.
However, when there are a large number of variables, some of the
maps that are used can become very large. These large data structures
then take a very long time to be destroyed. I've seen cases around 40%
if the time.
I've rewritten that algorithm to not use as much memory. This give a
significant improvement when running a large number of shader through
DXC.
I've also made a small change to local-single-block-elim to delete the
loads that is has replaced. That way local-single-store-elim will not
have to look at those. local-single-store-elim now does the same thing.
The time for one set goes from 309s down to 126s. For another set, the
time goes from 102s down to 88s.
The algorithm used in DCEInst to remove dead code is very slow. It is
fine if you only want to remove a small number of instructions, but, if
you need to remove a large number of instructions, then the algorithm in
ADCE is much faster.
This PR removes the calls to DCEInst in the load-store removal passes
and adds a pass of ADCE afterwards.
A number of different iterations of the order of optimization, and I
believe this is the best I could find.
The results I have on 3 sets of shaders are:
Legalization:
Set 1: 5.39 -> 5.01
Set 2: 13.98 -> 8.38
Set 3: 98.00 -> 96.26
Performance passes:
Set 1: 6.90 -> 5.23
Set 2: 10.11 -> 6.62
Set 3: 253.69 -> 253.74
Size reduction passes:
Set 1: 7.16 -> 7.25
Set 2: 17.17 -> 16.81
Set 3: 112.06 -> 107.71
Note that the third set's compile time is large because of the large
number of basic blocks, not so much because of the number of
instructions. That is why we don't see much gain there.
This function now checks for side-effects before adding operand
instructions to the dead instruction work list.
Because this fix puts more pressure on IsCombinatorInstruction() to
be correct, this commit adds all OpConstant* and OpType* instructions
to combinator_ops_ set.
Fixes#1341.
A few optimizations are updates to handle code that is suppose to be
using the logical addressing mode, but still has variables that contain
pointers as long as the pointer are to opaque objects. This is called
"relaxed logical addressing".
|Instruction::GetBaseAddress| will check that pointers that are use meet
the relaxed logical addressing rules. Optimization that now handle
relaxed logical addressing instead of logical addressing are:
- aggressive dead-code elimination
- local access chain convert
- local store elimination passes.
Re-formatted the source tree with the command:
$ /usr/bin/clang-format -style=file -i \
$(find include source tools test utils -name '*.cpp' -or -name '*.h')
This required a fix to source/val/decoration.h. It was not including
spirv.h, which broke builds when the #include headers were re-ordered by
clang-format.