convert-to-sampled-image pass converts images and/or samplers with
given pairs of descriptor set and binding to sampled image.
If a pair of an image and a sampler have the same pair of descriptor
set and binding that is one of the given pairs, they will be
converted to a sampled image. In addition, if only an image has the
descriptor set and binding that is one of the given pairs, it will
be converted to a sampled image as well.
For example, when we have
%a = OpLoad %type_2d_image %texture
%b = OpLoad %type_sampler %sampler
%combined = OpSampledImage %type_sampled_image %a %b
%value = OpImageSampleExplicitLod %v4float %combined ...
1. If %texture and %sampler have the same descriptor set and binding
%combine_texture_and_sampler = OpVaraible %ptr_type_sampled_image_Uniform
...
%combined = OpLoad %type_sampled_image %combine_texture_and_sampler
%value = OpImageSampleExplicitLod %v4float %combined ...
2. If %texture and %sampler have different pairs of descriptor set and binding
%a = OpLoad %type_sampled_image %texture
%extracted_image = OpImage %type_2d_image %a
%b = OpLoad %type_sampler %sampler
%combined = OpSampledImage %type_sampled_image %extracted_image %b
%value = OpImageSampleExplicitLod %v4float %combined ...
The new pass will removed interface variable on the OpEntryPoint instruction when they are not statically referenced in the call tree of the entry point.
It can be enabled on the command line using the options `remove-unused-interface-variables`.
This pass converts an internal form of GLSLstd450 Interpolate ops
to the externally valid form. The external form takes the lvalue
of the interpolant. The internal form can do a load of the interpolant.
The pass replaces the load with its pointer. The internal form is
generated by glslang and possibly other frontends for HLSL shaders.
The new pass is called as part of HLSL legalization after all
propagation is complete.
Also adds internal interpolate form to pre-legalization validation
Removing PropagateLineInfoPass and RedundantLineInfoElimPass from
56d0f5035 makes unit tests of many open source projects fail.
It will happen before submitting this glslang PR
https://github.com/KhronosGroup/glslang/pull/2440. This commit will be
git-reverted after merging the glslang PR.
Based on the OpLine spec, an OpLine instruction must be applied to
the instructions physically following it up to the first occurrence
of the next end of block, the next OpLine instruction, or the next
OpNoLine instruction.
```
OpLine %file 0 0
OpNoLine
OpLine %file 1 1
OpStore %foo %int_1
%value = OpLoad %int %foo
OpLine %file 2 2
```
For the above code, the current spirv-opt keeps three line
instructions `OpLine %file 0 0`, `OpNoLine`, and `OpLine %file 1 1`
in `std::vector<Instruction> dbg_line_insts_` of Instruction class
for `OpStore %foo %int_1`. It does not put any line instruction to
`std::vector<Instruction> dbg_line_insts_` of
`%value = OpLoad %int %foo` even though `OpLine %file 1 1` must be
applied to `%value = OpLoad %int %foo` based on the spec.
This results in the missing line information for
`%value = OpLoad %int %foo` while each spirv-opt pass optimizes the
code. We have to put `OpLine %file 1 1` to
`std::vector<Instruction> dbg_line_insts_` of
both `%value = OpLoad %int %foo` and `OpStore %foo %int_1`.
This commit conducts the line instruction propagation and skips
emitting the eliminated line instructions at the end, which are the same
with PropagateLineInfoPass and RedundantLineInfoElimPass. This
commit removes PropagateLineInfoPass and RedundantLineInfoElimPass.
KhronosGroup/glslang#2440 is a related PR that stop using
PropagateLineInfoPass and RedundantLineInfoElimPass from glslang.
When the code in this PR applied, the glslang tests will pass.
Create a pass to instrument OpDebugPrintf instructions. This pass replaces all OpDebugPrintf instructions with instructions to write a record containing the string id and the all specified values into a special printf output buffer (if space allows). This pass is designed to support the printf validation in the Vulkan validation layers.
Fixes#3210
* Handle id overflow in the ssa rewriter.
Remove LocalSSAElim pass at the same time. It does the same thing as the SSARewrite pass. Then even share almost all of the same code.
Fixes crbug.com/997246
The first pass applies the RelaxedPrecision decoration to all executable
instructions with float32 based type results. The second pass converts
all executable instructions with RelaxedPrecision result to the equivalent
float16 type, inserting converts where necessary.
Add the first steps to removing the AMD extension VK_AMD_shader_ballot.
Splitting up to make the PRs smaller.
Adding utilities to add capabilities and change the version of the
module.
Replaces the instructions:
OpGroupIAddNonUniformAMD = 5000
OpGroupFAddNonUniformAMD = 5001
OpGroupFMinNonUniformAMD = 5002
OpGroupUMinNonUniformAMD = 5003
OpGroupSMinNonUniformAMD = 5004
OpGroupFMaxNonUniformAMD = 5005
OpGroupUMaxNonUniformAMD = 5006
OpGroupSMaxNonUniformAMD = 5007
and extentend instructions
WriteInvocationAMD = 3
MbcntAMD = 4
Part of #2814
We are no able to inline OpKill instructions into a continue construct.
See #2433. However, we have to be able to inline to correctly do
legalization. This commit creates a pass that will wrap OpKill
instructions into a function of its own. That way we are able to inline
the rest of the code.
The follow up to this will be to not inline any function that contains
an OpKill.
Fixes#2726
spirv-opt: Add --graphics-robust-access
Clamps access chain indices so they are always
in bounds.
Assumes:
- Logical addressing mode
- No runtime-array-descriptor-indexing
- No variable pointers
Adds stub code for clamping coordinate and samples
for OpImageTexelPointer.
Adds SinglePassRunAndFail optimizer test fixture.
Android.mk: add source/opt/graphics_robust_access_pass.cpp
Adds Constant::GetSignExtendedValue, Constant::GetZeroExtendedValue
WebGPU requires certain variables to be initialized, whereas there are
known issues with using initializers in Vulkan. This PR is the first
of three implementing a pass to decompose initialized variables into
a variable declaration followed by a store. This has been broken up
into multiple PRs, because there 3 distinct cases that need to be
handled, which require separate implementations.
This first PR implements the basic infrastructure that is needed, and
handling of Function storage class variables. Private and Output will
be handled in future PRs.
This is part of resolving #2388
In WebGPU, the component operand 0xFFFFFFFF is forbidden, but in
Vulkan it is used to indicate a value is undefined. When converting to
WebGPU, 0xFFFFFFFF needs to converted to a legal value, though the
specific one does not matter, since it was used to indicate an
undefined entry in the original code. Choosing to use 0, since the
operands are required to be on [0, N-1], so 0 is guaranteed to always
be valid.
Fixes#2349
This pass tries to fix validation error due to a mismatch of storage classes
in instructions. There is no guarantee that all such error will be fixed,
and it is possible that in fixing these errors, it could lead to other
errors.
Fixes#2430.
Adds an optimization pass to remove usages of AtomicCounterMemory
bit. This bit is ignored in Vulkan environments and outright forbidden
in WebGPU ones.
Fixes#2242
Add a pass that looks for members of structs whose values do not affects
the output of the shader. Those members are then removed and just
treated like padding in the struct.
Upgrade to VulkanKHR memory model
* Converts Logical GLSL450 memory model to Logical VulkanKHR
* Adds extension and capability
* Removes deprecated decorations and replaces them with appropriate
flags on downstream instructions
* Support for Workgroup upgrades
* Support for copy memory
* Adding support for image functions
* Adding barrier upgrades and tests
* Use QueueFamilyKHR scope instead of device
These are bookend passes designed to help preserve line information
across passes which delete, move and clone instructions. The propagation
pass attaches a debug line instruction to every instruction based on
SPIR-V line propagation rules. It should be performed before optimization.
The redundant line elimination pass eliminates all line instructions
which match the previous line instruction. This pass should be performed
at the end of optimization to reduce physical SPIR-V file size.
Fixes#2027.
* Add base and core bindless validation instrumentation classes
* Fix formatting.
* Few more formatting fixes
* Fix build failure
* More build fixes
* Need to call non-const functions in order.
Specifically, these are functions which call TakeNextId(). These need to
be called in a specific order to guarantee that tests which do exact
compares will work across all platforms. c++ pretty much does not
guarantee order of evaluation of operands, so any such functions need to
be called separately in individual statements to guarantee order.
* More ordering.
* And more ordering.
* And more formatting.
* Attempt to fix NDK build
* Another attempt to address NDK build problem.
* One more attempt at NDK build failure
* Add instrument.hpp to BUILD.gn
* Some name improvement in instrument.hpp
* Change all types in instrument.hpp to int.
* Improve documentation in instrument.hpp
* Format fixes
* Comment clean up in instrument.hpp
* imageInst -> image_inst
* Fix GetLabel() issue.
* Create a new entry point for the optimizer
Creates a new struct to hold the options for the optimizer, and creates
an entry point that take the optimizer options as a parameter.
The old entry point that takes validator options are now deprecated.
The validator options will be one of the optimizer options.
Part of the optimizer options will also be the upper bound on the id bound.
* Add a command line option to set the max value for the id bound. The default is 0x3FFFFF.
* Modify `TakeNextIdBound` to return 0 when the limit is reached.
* Combines OpAccessChain, OpInBoundsAccessChain, OpPtrAccessChain and
OpInBoundsPtrAccessChain
* New folding rule to fold add with 0 for integers
* Converts to a bitcast if the result type does not match the operand
type
V
We have already disabled common uniform elimination because it created
sequences of loads an entire uniform object, then we extract just a
single element. This caused problems in some drivers, and is just
generally slow because it loads more memory than needed.
However, there are other way to get into this situation, so I've added
a pass that looks specifically for this pattern and removes it when only
a portion of the load is used.
Fixes#1547.
This pass will look for adjacent loops that are compatible and legal to
be fused.
Loops are compatible if:
- they both have one induction variable
- they have the same upper and lower bounds
- same initial value
- same condition
- they have the same update step
- they are adjacent
- there are no break/continue in either of them
Fusion is legal if:
- fused loops do not have any dependencies with dependence distance
greater than 0 that did not exist in the original loops.
- there are no function calls in the loops (could have side-effects)
- there are no barriers in the loops
It will fuse all such loops as long as the number of registers used for
the fused loop stays under the threshold defined by
max_registers_per_loop.
Adds support for spliting loops whose register pressure exceeds a user
provided level. This pass will split a loop into two or more loops given
that the loop is a top level loop and that spliting the loop is legal.
Control flow is left intact for dead code elimination to remove.
This pass is enabled with the --loop-fission flag to spirv-opt.
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.
It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
For each loop in a function, the pass walks the loops from inner to outer most loop
and tries to peel loop for which a certain amount of iteration can be done before or after the loop.
To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.
The sprir-v generated from HLSL code contain many copyies of very large
arrays. Not only are these time consumming, but they also cause
problems for drivers because they require too much space.
To work around this, we will implement an array copy propagation. Note
that we will not implement a complete array data flow analysis in order
to implement this. We will be looking for very simple cases:
1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.
The hard part is keeping all of the types correct. We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.
Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent. With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values. See |Struct::GetExtraHashWords|.
Contributes to #1416.
This pass replaces the load/store elimination passes. It implements the
SSA re-writing algorithm proposed in
Simple and Efficient Construction of Static Single Assignment Form.
Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
In: Jhala R., De Bosschere K. (eds)
Compiler Construction. CC 2013.
Lecture Notes in Computer Science, vol 7791.
Springer, Berlin, Heidelberg
https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6
In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.
When a target variable is loaded, it queries the variable's reaching
definition. If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.
The algorithm avoids repeated lookups using memoization.
For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA. That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
Strips reflection info. This is limited to decorations and
decoration instructions related to the SPV_GOOGLE_hlsl_functionality1
extension.
It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1.
It will also remove the OpExtension for SPV_GOOGLE_decorate_string
if there are no further remaining uses of OpDecorateStringGOOGLE.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398
It moves all conditional branching and switch whose conditions are loop
invariant and uniform. Before performing the loop unswitch we check that
the loop does not contain any instruction that would prevent it
(barriers, group instructions etc.).
This patch adds initial support for loop unrolling in the form of a
series of utility classes which perform the unrolling. The pass can
be run with the command spirv-opt --loop-unroll. This will unroll
loops within the module which have the unroll hint set. The unroller
imposes a number of requirements on the loops it can unroll. These are
documented in the comments for the LoopUtils::CanPerformUnroll method in
loop_utils.h. Some of the restrictions will be lifted in future patches.
Creates a pass that will remove instructions that are invalid for the
current shader stage. For the instruction to be considered for replacement
1) The opcode must be valid for a shader modules.
2) The opcode must be invalid for the current shader stage.
3) All entry points to the module must be for the same shader stage.
4) The function containing the instruction must be reachable from an entry point.
Fixes#1247.
* Handles simple cases only
* Identifies phis in blocks with two predecessors and attempts to
convert the phi to an select
* does not perform code motion currently so the converted values must
dominate the join point (e.g. can't be defined in the branches)
* limited for now to two predecessors, but can be extended to handle
more cases
* Adding if conversion to -O and -Os
We have come across a driver bug where and OpUnreachable inside a loop
is causing the shader to go into an infinite loop. This commit will try
to avoid this bug by turning OpUnreachable instructions that are
contained in a loop into branches to the loop merge block.
This is not added to "-O" and "-Os" because it should only be used if
the driver being targeted has this problem.
Fixes#1209.
This implements the conditional constant propagation pass proposed in
Constant propagation with conditional branches,
Wegman and Zadeck, ACM TOPLAS 13(2):181-210.
The main logic resides in CCPPass::VisitInstruction. Instruction that
may produce a constant value are evaluated with the constant folder. If
they produce a new constant, the instruction is considered interesting.
Otherwise, it's considered varying (for unfoldable instructions) or
just not interesting (when not enough operands have a constant value).
The other main piece of logic is in CCPPass::VisitBranch. This
evaluates the selector of the branch. When it's found to be a known
value, it computes the destination basic block and sets it. This tells
the propagator which branches to follow.
The patch required extensions to the constant manager as well. Instead
of hashing the Constant pointers, this patch changes the constant pool
to hash the contents of the Constant. This allows the lookups to be
done using the actual values of the Constant, preventing duplicate
definitions.
When a private variable is used in a single function, it can be
converted to a function scope variable in that function. This adds a
pass that does that. The pass can be enabled using the option
`--private-to-local`.
This transformation allows other transformations to act on these
variables.
Also moved `FindPointerToType` from the inline class to the type manager.
types. This allows the lookup of type declaration ids from arbitrarily
constructed types. Users should be cautious when dealing with non-unique
types (structs and potentially pointers) to get the exact id if
necessary.
* Changed the spec composite constant folder to handle ambiguous composites
* Added functionality to create necessary instructions for a type
* Added ability to remove ids from the type manager