Renames the existing flag '--webgpu-mode' to '--vulkan-to-webgpu' for
the Vulkan->WebGPU operation, and adds a new flag '--webgpu-to-vulkan'
for the WebGPU->Vulkan operation.
Currently '--webgpu-to-vulkan' doesn't have any passes associated with
it yet, but further patches will implement them.
Fixes#2495
This pass tries to fix validation error due to a mismatch of storage classes
in instructions. There is no guarantee that all such error will be fixed,
and it is possible that in fixing these errors, it could lead to other
errors.
Fixes#2430.
Fixes#2452
Swaps priority of handling unreachable merge and continues so that the
back-edge is retained in the case a block is both a loop continue and
loop merge
* Check var pointer capability in ADCE.
* Check var ptr capability for common uniform.
* Check var ptr capability in access chain convert.
Since we want this pass to run even if there are variable pointer on
storage buffers, we had to remove asserts that assumed there were no
variable pointers. The functions with the asserts will now work, it
becomes the responsibility of the callers to deal with the output as
appropriate.
* Single block elimination and variable pointers.
It seems like the code in local single block elimination is able to
handle cases with variable pointers already. This is because the
function `HasOnlySupportedRefs` ensures that variables that feed a
variable pointer are not candidates.
* Single store elimination and variable pointers.
It seems like the code in local single stroe elimination is able to
handle cases with variable pointers already. This is because the
function `FindSingleStoreAndCheckUses` ensures that variables that feed
a variable pointer are not candidates.
* SSA rewriter and variable pointers.
It seems like the code in the two passes that call the SSA rewriter are
able to handle cases with variable pointers already. This is because the
function `HasOnlySupportedRefs` ensures that variables that feed
a variable pointer are not candidates.
Fixes#2458.
Fixes#2456
* When eliminating a structured construct that has an unreachable merge,
replace that unreachable terminator with an appropriate return
* New tests
Fixes#2488
* Validator doesn't identify back-edge of the loop, so the merge is
never set
* Construct::blocks() has safe uses of `merge` so the assert can be
removed
* Added a test
Fixes#2453
* Enable addition of OpPhi instructions when the loop has multiple
predecessors of the merge due to a break
* This can result in some values no longer dominating their uses
* Track return blocks in structured flow to produce OpPhis that have
multiple undef and non-undef arguments
* New tests to catch the bug
* When a block is predicated, mark the new body as a return if the old
block as already a return
* Fix#2478. The fix is to just not try to simplify such loops.
* Also added `BasicBlock::MergeBlockId()` and `BasicBlock::ContinueBlockId()`.
* Some minor changes to `structured_loop_to_selection_reduction_opportunity.cpp`.
* Added test.
Fix#2475. Fix#2476.
* Improve reducer algorithm: shrink granularity, remove an early return, no lazy initialization, notify pass if binary is interesting, add comments.
* Add fail-on-validation-error option to fail a reduction if an invalid state is reached; useful for tests.
* Set fail-on-validation-error in tests.
* Improve some documentation comments.
* Add Reducer::AddDefaultReductionPasses so tests (and other library consumers) can add the default reduction passes.
* Add CLIMessageConsumer in test_reduce so we can see messages for tricky tests.
* Remove test RemoveUnreferencedInstructionReductionPassTest_ApplyReduction because it was indirectly testing the reduction algorithm, not the RemoveUnreferencedInstruction pass.
* Tweak tests where needed.
Fix#2396
* Check that initial state is valid. Add kInitialStateInvalid.
* Fix RemoveOpnameAndRemoveUnreferenced test; turns out the original shader is invalid, but we never notice because we don't check this and the reduced shader is valid; fix original shader. Assert reduction status is kComplete.
* Always check return value from `Reducer::Run`.
* Change Reducer::Run to *not* immediately copy the input binary.
Changing the stored value for a sampled image consumer to be the
instruction instead of result ID, since not all instructions have
result IDs. Using result IDs led to a potential crash when using
OpReturnValue, which doesn't have a result ID. OpReturnValue is not a
legal consumer, but the validator needs to look at the instruction to
determine this, thus storing the pointer to the instruction, instead
of trying to fetch the pointer using the instruction.
Issue #1528 covers fixing the check.
Fixes#2463
If SPV_EXT_descriptor_indexing is enabled, add check that for a
descriptor-based reference, the descriptor is initialized. Initialization
data is stored in the debug input buffer, added to the length information
already there. This feature must be seperately enabled on the pass
creation routine. NOTE: Currently just supports image references; buffer
references are still TODO.
Adds an optimization pass to remove usages of AtomicCounterMemory
bit. This bit is ignored in Vulkan environments and outright forbidden
in WebGPU ones.
Fixes#2242
If relax-logical-pointer is enabled, this commit makes Validator
accept function param even when its Storage Class is different from
the expected one.
Related to #2423, #2430
In constant propagation, decoration are transfered from the original
expression to the constant that will replace it. This can be wrong
because there are no decorations that apply to constants. We choose to
simply delete the decorations.
Fixes#2441
In WebGPU all blocks are required to be reachable, unless they are one of two
specific degenerate cases for merge-block or continue-target. This PR adds in
checking for these conditions.
Fixes#2068
Updated script to work with python3 and python2.
Added required tools.
We added a section to the readme to mention the tools that are needed to
build and test spirv-tools. For the compiler, the compilers used by the
bots are mentioned.
The bots have been changed. The windows bots will not use python 3.6 for testing. The other bots will still use python 2.7. Both Python2 and Python3 will be tested.
Fixes#2407.
Fixes#1856.
* Handle back edges better in dead branch elim.
Loop header must have exactly one back edge. Sometimes the branch
with the back edge can be folded. However, it should not be folded
if it removes the back edge.
The code to check this simply avoids folding the branch in the
continue block. That needs to be changed to not fold the back edge,
wherever it is.
At the same time, the branch can be folded if it folds to a branch to
the header, because the back edge will still exist.
Fixes#2391.
In relaxed addressing mode, we want to accept non memory objects
because this is a very natural translation of hlsl. It should be fixed
by legalization by inlining the calls.
* Fix OpDot folding of half float vectors.
The code that folds OpDot does not handle half floats correctly. After
trying to multiple the first components, we get a nullptr because we
don't fold half float values. This nullptr gets passed to the code that
does the addition, and causes an assert.
Fixes#2405.
The types of input and output variables must match for the pipeline. We
cannot see the uses in all of the shader, so dead member
elimination cannot safely change the type of input and output variables.
Add a pass that looks for members of structs whose values do not affects
the output of the shader. Those members are then removed and just
treated like padding in the struct.
* Fixes#2358. Added to the reducer the ability to remove a function that is not directly called. Factored out some code from the optimizer to help with this.
Fixes#2120
Enhanced the reducer so that it can merge blocks together, leveraging the functionality extracted from the block_merge pass in the optimizer.
* Fixes#2338. Added check for phi node before merging blocks.
* Added functionality to merge blocks A and B even when B starts with OpPhi instructions, by replacing uses of the OpPhi results with the definitions coming from A. Added some tests for this.
* Fixed assertion.
* Remove use of deprecated googletest macro
INSTANTIATE_TEST_CASE_P has been deprecated. We need to use
INSTANTIATE_TEST_SUITE_P instead.
* Remove extra commas from test suites.
This CL adds in the specific checks required for WebGPU, enables
running the builtin checks for WebGPU, and refactors the existing
testing infrastructure to support testing the new checks.
This PR is part of resolving #2276
In a recent PR, we allowed a forward reference for the element type in
an array declaration. However, we do not have other check to make sure
the forward reference is a pointer type first reference in
OpTypeForawrdPointer. We add that check.
Fixes https://crbug.com/920074.
When looking at the uses of the result of an instruction, code sinking
assumes that all uses are in a basic block. However, this is not true
if there is a decoration or name for the result of that insturction.
This commit checks for this.
Fixes https://crbug.com/923243.
It is legal, but not generated by any SPIR-V producer: an OpCompositeExtract
with no indexes. This is essentially just a copy of the object, so we
treat them that way. We simply propagate the live variables of the
result to the operand.
Fixes https://crbug.com/919181.
It is legal, but not generated by any SPIR-V producer: an OpAccessChain
with no indexes. This is essentially just a copy of the pointer.
I have decided to treat it like an OpCopyObject. In CheckUses, we
return that it is not okay.
When looking at this I realized that we had code in GetUsedComponents
that cannot be reached. If there is a use in an OpCopyObject the it
will not call GetUsedComponents. I removed that dead code.
Fixes https://crbug.com/918311.
During unrolling a new loop is created, but its ownership is not clear
as it gets passed through the code. Changed something to unique_ptr to
make that clearer.
Fixes#2299.
Fixing other memory leaks at the same time.
Fixes#2296Fixes#2297
In C++, a bit shift of the same size as the type is undefined, but it is
defined in spir-v. When folding those cases, we have to be careful. We
cannot simply do the shift in C++.
Fixes https://crbug.com/917697.
Fixes#2138
* Modf and frexp are upgraded to use the struct version of the
instruction and generate an explicit store whose flags can be upgraded
separately
* Fixed major bug where availability and visibility were reversed for
non-copy memory instructions
* Fixed bug where availability and visibility scope operands were reversed for copy memory
* Upgraded all opt tests to use SPV_ENV_UNIVERSAL_1_3
* Upgrade tests moved into unified tests and removed standalone test
Broader check for ids that require a type
Fixes https://crbug.com/911700
* Adds a broader check for when id operands require a type
* updated a few tests
* added a test to catch the original issue
* Handle CompositeInsert with no indices in VDCE
In the spec, there it nothing that forces an OpCompositeInsert to have
an index, but VDCE assumes there is at least 1 in a couple places.
This commit updates VDCE to handle these cases.
* Added additional changes for the new AccelerationStructureNV type.
* Added NVIDIA ray tracing storage classes for checking in ValidateVariable.
* For NVIDIA ray tracing storage classes added test to load bool type (allowed) in new storage class.
This Cl changes the binary parser to keep track of the instruction count
being processed. The parser will then use that instruction number as the
error number, instead of the binary word.
This should make it easier to match the error up to what the
disassembler would output for the error.
Issue #2091
When processing options in a file, it does have access to the
ValidatorOptions and OptimizerOptions object, so options that change
those do not work. We just need to pass it in.
Fixes#2219.
* Added additional changes for the new AccelerationStructureNV type.
* Added additional changes for the new AccelerationStructureNV type. Change tabs to space...
* Added additional changes for the new accelerationStructureNV type -- add proper type name.
Fix TypeManager.TypeStrings test:
[----------] 29 tests from TypeManager
[ RUN ] TypeManager.TypeStrings
[ OK ] TypeManager.TypeStrings (7 ms)
When we are predicating the continue target for a loop, it can no longer
be the continue target because it will have a branch that exits the loop
and is not the bach edge. The continue target will have to be the
target of that branch that is still in the loop.
Fixes#2211.
The function `UpdatePhiNodes` was being called inconsistently. In one
case, the cfg had already been updated to include the new edge, and in
another place the cfg was not updated. This caused the function to
miss flagging a block as needing new phi nodes. I picked that the cfg
should not be updated before making the call. I documented it, and
change the call sites to match.
Fixes#2207.
We initially assumed that if the type manager returned the correct id
for the pointee type, that we would get the correct pointer type back,
but that is not true. See the unit test added with this commit. We
need to fall back to the linear search any time we are looking for a
pointer to a type that may not be unique.
At the same time, SROA considered an OpName on a variable to be a use of
the entire variable. That has been fixed.
Fixes#2209.
We currently place the load instructions at the start of the basic block
that dominates all of the loads. If that basic block contains OpPhi
instructions, then this will generate invalid code. We just need to
search for a location that comes after all of the OpPhi instructions.
Fixes#2204.
* Add OperandToUndefReductionPass.
Fixes#2115.
Also added some tests that are similar to those in OperandToConstantReductionPassTest.
In addition, refactor FindOrCreateGlobalUndef into reduction_util.cpp. Fixes#2184.
Removed many documentation comments that were identical or very similar to the overridden function's documentation comment.
* Don't fold specialized branchs in loop unswitch
Folding branches can have a lot of special cases, and can be a little
error prone. So I only want it in one place. That will be in dead
branch elimination. I will change loop unswitching to set the branches
that were being folded to have a constant condition. Then subsequent
pass of dead branch elimination will be able to remove the code.
At the same time, I added a check that loop unswitching will not
unswitch a branch with a constant condition. It is not useful to do it
because dead branch elimination will simple fold the branch anyway.
Also it avoid an infinite loop that would other wise be introduced by my
first change.
Fixes#2203.
Loop unswitching is unswitching the conditional branch that creates the
back-edge. In the version of the loop, where the bachedge is not taken,
there is no back-edge. This is what causes the validator to complain.
The solution I will go with will be to now unswitch a condition with a
back-edge. At this time we do not now if loop unswitching is used. We do
not include it in the optimization sets provided, nor is it used in
glslang's set. When there are opportunities and no breaks from the loop,
the loop with either be a single iteration loop, or an infinite loop.
There is no performance advantage to performing loop unswitching in
either of those cases. If there is a break, maintaining structured
control flow will be tricky. Unless we see a clear advantage to handling
these case, I would go with the safer simpler solution.
Fixes#2201.
If there are multiple edges to a basic block, then the ssa rewriter will
create OpPhi instructions with duplicate entries. This is invalid, and
it is fixed in this commit.
Fixes#2202.
* Run validator during reduction.
* Added functionality to validate modules after each reduction step, and some tests to check this is working. Also fixed an issue where reduction passes were not guaranteed to be executed at their minimum granularities.
There is inconsistencies between the different specs about whether or
not this capability is required/allowed, so tooling like glslang
currently ignores it. Once this is resolved the check and test can be
re-enabled.
* Invalidate the decoration manager at the start of ADCE.
If the decoration manager is kept live the the contex will try to keep
it up to date. ADCE deals with group decorations by changing the
operands in |OpGroupDecorate| instructions directly without informing
the decoration manager. This puts it in an invalid state, which will
cause an error when the context tries to update it. To Avoid this
problem, we will invalidate the decoration manager upfront.
At the same time, the decoration manager is now considered when checking
the consistency of the decoration manager.
Add a spirv-reduce pass which removes OpName and OpMemberName instructions.
This is useful to enable other reduction passes, e.g. RemoveUnreferencedInstruction may not be able to remove an instruction creating an id whose only usage is an OpName for this id.
* Fix invalid OpPhi generated by merge-return.
When we create a new phi node for a value say %10, we have to replace
all of the uses of %10 that are no longer dominated by the def of %10
by the result id of the new phi. However, if the use is in a phi node,
it is possible that the bb contains the use is not dominated by either.
In this case, needs to be handled differently.
* Split loop headers before add a new branch to them.
In merge return, Phi node in loop header that are also merges for loop
do not get updated correctly. Those cases do not fit in with our
current analysis. Doing this will simplify the code by reducing the
number of cases that have to be handled.
Added documentation to the ir context to indicates that TakeNextId()
returns 0 when the max id is reached. TODOs were added to each call
sight so that we know where we have to start to handle this case.
Handle id overflow in |SplitLoopHeader|.
Handle id overflow in |GetOrCreatePreHeaderBlock|.
Handle failure to create preheader in LICM.
Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1841.
* Only check for binding and descriptor set on variables that are
statically used by an entry point
* updated tests and added a couple new ones
* new method for collecting entry points that statically reference an
id
* Validate OpForwardPointer
The validator does not have a a check that OpForwardPointer is giving
a forward reference to a pointer type. We add that check.
https://crbug.com/910852
* Remove more specialized check.
There was a check that the forward pointer is actually a poiner type,
but it was only done if it was used in a struct. This was too specific.
Remove it in favour of the more general check that was added.
* Format
* Check the storage type in OpTypeForwardPointer
* Fix typo is test case epxected results.
We currently simulate all shift operations when the two operand are
constants. The problem is that if the shift amount is larger than
32, the result is undefined.
I'm changing the folder to return 0 if the shift value is too high.
That way, we will have defined behaviour.
https://crbug.com/910937.
Fixes#2147
* Checks that device scope is not used for availability and visibility
operations unless VulkanMemoryModelDeviceScopeKHR capability is present
* implemented for atomics, barriers and memory instructions currently
This CL changes the id/name output from the validator to always use a
consistent id[%name] style. This removes the need for getIdOrName. The
name lookup is changed to use the NameMapper so the output is consistent
with what the disassembler will produce.
Fixes#2137
* Validate uses of ids defined in unreachable blocks.
For some reason we do not make sure the uses of ids that are defined
in unreachable blocks are dominated by their def. This is causing
invalid code to pass the validator.
Fixes#2143
* Add test for unreachable code after a return.
We want to allow code like:
```
void foo() {
a = ...;
...
return; // for debugging
<use of a>;
...
}
```
I added a test to make sure that something like this is still accepted
by the validator.
* Add test for unreachable def used in phi.
Upgrade to VulkanKHR memory model
* Converts Logical GLSL450 memory model to Logical VulkanKHR
* Adds extension and capability
* Removes deprecated decorations and replaces them with appropriate
flags on downstream instructions
* Support for Workgroup upgrades
* Support for copy memory
* Adding support for image functions
* Adding barrier upgrades and tests
* Use QueueFamilyKHR scope instead of device
* Move ProcessFunction* function from pass to the context.
There are a few functions that are used to traverse the call tree.
They currently live in the Pass class, but they have nothing to do with
a pass, and may be needed outside of a pass. They would be better in
the ir context, or in a specific call tree class if we ever have a need
for it.
* Don't inline recursive functions.
Inlining does not check if a function is recursive or not. This has
been fine as long as the shader was a Vulkan shader, which forbid
recursive functions. However, not all shaders are vulkan, so either
we limit inlining to Vulkan shaders or we teach it to look for recursive
functions.
I prefer to keep the passes as general as is reasonable. The change
does not require much new code in inlining and gives a reason to refactor
some other code.
The changes are to add a member function to the Function class that
checks if that function is recursive or not.
Then this is used in inlining to not inlining a function call if it calls
a recursive function.
* Add id to function analysis
There are a few places that build a map from ids to Function whose
result is that id. I decided to add an analysis to the context for this
to reduce that code, and simplify some of the functions.
* Add missing file.
Fixes#2104
* Checks the rules for logical addressing and variable pointers
* Has an out for relaxed logical pointers
* Updated PassFixture to expose validator options
* enabled relaxed logical pointers for some tests
* New validator tests
Restrict capabilities to WebGPU spec
This covers whitelisting Matrix, Shader, Sampled1D, Image1D,
DerivativeControl, and ImageQuery. These are the allowed capabilities
that don't require an extension. Whitelisting VulkanMemoryModelKHR
will be handled by whitelisting its extension in a seperate patch.
Fixes#2101
* Added a reduction pass to replace ids with ids of the same type that dominate them.
* Introduce helper method for querying whether an operand type is an input id.
Make sure that initialized variable have correct storage class
For WebGPU and Vulkan environments, variables must have the storage
class; Output, Private, or Function, if they have an initializer.
Fixes#2071
Adding validation that the addressing declared by OpMemoryModel is
Logical and the memory model declared is VulkanKHR. Updating a bunch
of tests that were broken by this.
Fixes#2060
Check forbidden Annotation instructions for WebGPU env
From the WebGPU SPIR-V Execution Enviroment spec:
OpDecorationGroup, OpGroupDecorate, OpGroupMemberDecorate are not
allowed.
Fixes#2062
Validate that debugging instructions are not present for WebGPU
For WebGPU execution environments, check that all of the debug
instructions have already been stripped before validation.
Fixes#2063
We had a test that checks that a spirv binary where the depth of the
deepest nested construct is just below the default limit. This test
would take a long time causing a timeout in some builds. We are
questioning the value of the test since we already have a test that can
tell us if we are within a custom limit. We will remove the test.
Add tests for matrix type data rule validation
This covers the data rules for matrix types specified in the SPIR-V
spec, section 2.16.1, and the WebGPU SPIR-V Execution Environment
spec.
Fixes#2065Fixes#2080
Ban sequentially consistent with VulkanKHR
* Added validation check that SequentiallyConsistent memory semantics
are not used if the memory model is VulkanKHR
* Added tests
* Fixed a bug in evaluating constant 32-bit integers and updated some
handling to avoid inferring a value from a spec constant default
Remaining memory semantics validation
* Adds checks that OutputMemoryKHR, MakeAvailableKHR and MakeVisibleKHR
are only used if the VulkanMemoryModelKHR capabailty is present
* Added checks that MakeAvailableKHR requires release semantics
* Added checks that MakeVisibleKHR requires acquire semantics
* Added checks that MakeAvailableKHR and MakeVisibleKHR require a
storage class
Adds validator option to specify scalar block layout rules.
Both VK_KHR_relax_block_layout and VK_EXT_scalar_block_layout can be
enabled at the same time. But scalar block layout is as permissive
as relax block layout.
Also, scalar block layout does not require padding at the end of a
struct.
Add test for scalar layout testing ArrayStride 12 on array of vec3s
Cleanup: The internal getSize method does not need a round-up argument,
so remove it.
These are bookend passes designed to help preserve line information
across passes which delete, move and clone instructions. The propagation
pass attaches a debug line instruction to every instruction based on
SPIR-V line propagation rules. It should be performed before optimization.
The redundant line elimination pass eliminates all line instructions
which match the previous line instruction. This pass should be performed
at the end of optimization to reduce physical SPIR-V file size.
Fixes#2027.
From the Vulkan 1.1 spec 14.5.2:
Variables identified with the Uniform storage class are used to access
transparent buffer backed resources. Such variables must be typed as
OpTypeStruct, or an array of this type.
Fixes#1949
Validate variable types for UniformConstant storage in Vulkan (#2008)
From the Vulkan 1.1 spec 14.5.2:
Variables identified with the UniformConstant storage class are used
only as handles to refer to opaque resources. Such variables must be
typed as OpTypeImage, OpTypeSampler, OpTypeSampledImage, or an array
of one of these types.
Fixes#2008
That function currently only handled OpPtrAccessChain if it was in the
middle of the chain, but not at the start. Fixing that up.
Fixes crbug.com/905271.
The Vulkan specification does not permit use of the VertexId and
InstanceId BuiltIn decorations, so add a check to ensure they are not
being used when the target environment is Vulkan.
* Add base and core bindless validation instrumentation classes
* Fix formatting.
* Few more formatting fixes
* Fix build failure
* More build fixes
* Need to call non-const functions in order.
Specifically, these are functions which call TakeNextId(). These need to
be called in a specific order to guarantee that tests which do exact
compares will work across all platforms. c++ pretty much does not
guarantee order of evaluation of operands, so any such functions need to
be called separately in individual statements to guarantee order.
* More ordering.
* And more ordering.
* And more formatting.
* Attempt to fix NDK build
* Another attempt to address NDK build problem.
* One more attempt at NDK build failure
* Add instrument.hpp to BUILD.gn
* Some name improvement in instrument.hpp
* Change all types in instrument.hpp to int.
* Improve documentation in instrument.hpp
* Format fixes
* Comment clean up in instrument.hpp
* imageInst -> image_inst
* Fix GetLabel() issue.
If there is only 1 return and it is in a loop, then the function cannot be inlined.
Fix condition when inlined code needs one-trip loop wrapper. The dummy loop is needed when there is a return inside a selection construct. Even if there is only 1 return.
* Validate the id bound.
Validates that the id bound for the module is not larger than the max id
bound. Also adds an option to set the max id bound. Allows the
optimizer option to set the max id bound to also set the id bound for
the validation run done by the optimizer.
Fixes#2030.
This CL takes the various opt unit tests and makes a single executable
instead of one per test. This reduces the number of build targets by
~125 when building with ninja.
When looking for a break from a selection construct, we do not realize
that a jump to the continue target of a loop containing the selection
is a break. This causes and infinit loop, or possibly other failures.
Fixes#2004.
When looking for a break from a selection construct, we do not need to
look inside nested constructs. However, if a loop header has an
unconditional branch, then we enter the loop. Entering the loop causes
an infinite loop because we keep going through the loop.
The solution is to look for a merge block, if one exsits, even for block
terminated by an OpBranch.
Fixes#1979.
The SPV_KHR_8bit_storage extension does not permit 8-bit integers to be
cast directly to floating point types. We are seeing shaders in the
wild, being produced by toolchains like glslang, that are generating
invalid SPIR-V.
This change adds validation to check for the patterns not permitted, and
some tests that expose the failure.
ADCE liveness algorithm should treat OpUnreachable at least like other
branch instructions. It was being treated as always live which was
preventing useless structured constructs from being eliminated.
OpUnreachable is generated by dead branch elimination which is now
being required by merge return, so this fix should accompany that
change.
We currently run merge-return on all functions, but
dead-branch-elimination only runs on function reachable from an entry
point or exported function. Since dead-branch-elimination is needed for
merge-return, they have to match.
Fixes#1976.
In logical addressing mode, we are not allowed to generate variables
pointers. There is already a check for OpSelect. However, OpPhi
and OpPtrAccessChain are not checked to make sure it does not
generate an variable pointer. I've added those checks.
Fixes#1957.
Was removing control structures which didn't have data dependency
with enclosed live loop and otherwise did not contain live code.
An example is a counting loop around a live loop.
Fixes#1967.
* MakePointerVisibleKHR cannot be used with OpStore
* MakePointerAvailableKHR cannot be used with OpLoad
* MakePointerAvailableKHR and MakePointerVisibleKHR both require
NonPrivatePointerKHR
* NonPrivatePointerKHR is limited to a subset of storage classes
* many tests
* Validation checks for new image operands MakeTexelAvailableKHR and
MakeTexelVisibleKHR
* added tests
* Tests that NonPrivateTexelKHR is accepted for all image operands
Updating test environments
* fixed build errors
* changed image types for *FetchSuccess tests to use a type defined in
1.3 shader body
Merge return assumes that the only unreachable blocks are those needed
to keep the structured cfg valid. Even those must be essentially empty
blocks.
If this is not the case, we get unpredictable behaviour. This commit
add a check in merge return, and emits an error if it is not the case.
Added a pass of dead branch elimination before merge return in both the
performance and size passes. It is a precondition of merge return.
Fixes#1962.
The current implementation in the folder when seeing a division by zero
is to assert. In the release build, the compiler will attempt to
compute the value, which causes its own problems.
The solution I will go with is to fold the division, and just give it
the value of 0. The same goes for remainder and mod operations.
Fixes#1961.
This commit checks the following when Shader capability exists:
"The FPRoundingMode decoration can be applied only to a width-only
conversion instruction that is used as the Object operand of an
OpStore storing through a pointer to a 16-bit floating-point object
in the StorageBuffer, Uniform, PushConstant, Input, or Output
Storage Classes.".
The HlslCounterBufferGOOGLE that was introduced changed the OpDecorateId
so that is can now reference an id other than the target. If that other
id is used only in the decoration, then the definition of the id will be
removed because decoration do not count as real uses.
However, if the target of the decoration is still live the decoration
will not be removed. This leaves a reference to an id that is not
defined.
There are two solutions to consider. The first is that is the decoration
is kept, then the definition of the id should be kept live. Implementing
this change would be involved because the way ADCE handles decorations
will have to be reimplemented.
The other solution is to remove the decoration the id is otherwise dead.
This works for this specific case. Also this is the more desirable
behaviour in this case. The id will always be the id of a variable that
belongs to a descriptor set. If that variable is not bound and we do
not remove it, the driver will complain.
I chose to implement the second solution. The first will be left to when
a case for it comes up.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1885.
This commit will change the message for unknown extensions from an error
to a warning.
Code was added to limit the number of warning messages so that consummer
of the messages are not overwhelmed. This is standard practice in
compilers.
Many other issues were found at while looking into this. They have been
documented in #1950.
Fixes http://crbug.com/875547.
* Check rules from Execution Mode tables, 2.16.2 and the Vulkan
environment spec
* Allows MeshNV execution model with the following execution modes
* LocalSize, LocalSizeId, OutputPoints and OutputVertices
* Done to not break their validation
There are a few spots where copy propagate arrays is trying
to go from a Type to an id, but the type is not unique. When generating
code this pass needs specific ids, otherwise we get type mismatches.
However, the ambigous types means we can sometimes get the wrong type
and generate invalid code.
That code has been rewritten to not rely on the type manager, and just
look at the instructions instead.
I have opened https://github.com/KhronosGroup/SPIRV-Tools/issues/1939 to
try to get a way to make this more robust.
In DecorationManager::RemoveDecorationsFrom, we do not remove the id
from a decoration group if the group has no decorations. This causes
problems because KillNamesAndDecorates is suppose to remove all
references to the id, but in this case, there is still a reference.
This is fixed by adding a special case.
Also, there is the possibility of a double free because
RemoveDecorationsFrom will delete the instructions defining |id| when
|id| is a decoration group. Later, KillInst would later write to memory
that has been deleted when trying to turn it into a Nop. To fix this,
we will only remove the decorations that use |id| and not its definition
in RemoveDecorationsFrom.
OpPhi instruction must appear before all non-OpPhi instructions
except for OpLine. Without this commit, Validator does not check
the case that an OpPhi is preceeded by an OpLine and the OpLine is
preceeded by a non-OpPhi instruction that is not OpLine.
Checked all instructions whose object is OpTypeSampledImage or
OpTypeImage as suggested in #487. OpImageTexelPointer instruction
is missing and others look good. This commit adds only
OpImageTexelPointer.
This CL removes the use of SetContextMessageConsumer from the
binary_parse_test tests and creates a Context object and uses
SetMessageConsumer instead.
Instead of using the source/table.h methods, this CL switches the stats
tool to use the spvtools::Context class and assign the message consumer
through the public API.
A limit of 0 for the scalar replacement options it used to indicate that
there is no limit. The current implementation does not allow 0. This
should be fixed.
It seems like the current implementation of KillNameAndDecorates does
not handle group decorations correctly. The id being removed is not
removed from the OpGroupDecorate instructions. Even worst, any
decorations that apply to that group are removed.
The solution is to use the function in the decoration manager that will
remove the decorations and update the instructions instead of doing the
work itself.
Adds unrolling to the legalization passes.
After enabling unrolling I found a bug when there is a self-referencing
phi node. That has been fixed.
The test that checks for that the order of optimizations is correct also
needed to be updated.
The current implementation of merge return can create bad, but correct,
code. When it is not in a loop construct, it will insert a lot of
extra branch around code. The potentially large number of branches are
bad. At the same time, it can separate code store to variables from
its uses hiding the fact that the store dominates the load.
This hurts the later analysis because the compiler thinks that multiple
values can reach a load, when there is really only 1. This poorer
analysis leads to missed optimizations.
The solution is to create a dummy loop around the entire body of the
function, then we can break from that loop with a single branch. Also
only new merge nodes would be those at the end of loops meaning that
most analysies will not be hurt.
Remove dead code for cases that are no longer possible.
It seems like some drivers expect there the be an OpSelectionMerge
before conditional branches, even if they are not strictly needed.
So we add them.
* Create structed cfg analysis.
There are lots of optimization that have to traverse the CFG in a
structured order just because it wants to know which constructs a
basic block in contained in. This adds extra complexity to these
optimizations, for causes too much refactoring of older optimizations.
To help with this problem, I have written an analysis that can give this
information.
* Identify branches breaking from loops.
Dead branch elimination does a search for a conditional branch to the
end of the current selection construct. This search assumes that the
only way to leave the construct is through the merge node. But that is
not true. The code can jump to the merge node of a loop that contains
the construct.
The search needs to take this into consideration.
In merge blocks, we do not allow the merging of two blocks with merge
instructions. This is because if the two block are merged only 1 of
those instructions can exists. However, if the successor block is the
merge block of the predecessor, then we can delete the merge instruction
in the predecessor. In this case, we are able to merge the blocks.
* Create a new entry point for the optimizer
Creates a new struct to hold the options for the optimizer, and creates
an entry point that take the optimizer options as a parameter.
The old entry point that takes validator options are now deprecated.
The validator options will be one of the optimizer options.
Part of the optimizer options will also be the upper bound on the id bound.
* Add a command line option to set the max value for the id bound. The default is 0x3FFFFF.
* Modify `TakeNextIdBound` to return 0 when the limit is reached.
Support collapsed into one commit:
- Asm/Dis support for SPV_KHR_vulkan_memory_model
- Add Vulkan mem model image operands to switch
- Add TODO for source/validate_image.cpp
- val: Image operands NonPrivateTexelKHR, VolatileTexelKHR have no operands
This is required for memory model tests to pass SPIR-V validation.
- Round trip tests: Test new flags on OpCopyMemory*
This splits the spvtools_config into a public and private part to avoid
leaking internal bits to dependents. A new target is added for the
public headers so that "gn check" works for dependents.
Also formats test/fuzzers/BUILD.gn
* Validate all type ids.
The validator does not check if the type of an instruction is actually
a type unless the OpCode has a specific requirement. For example,
OpFAdd is checked, but OpUndef is not.
The commit add a generic check that if there is a type id then the id
defines a type.
http://crbug.com/876694
* Merge other checks for type into new one.
There are a couple check that the type id is a type for specific
opcodes. Those have been mereged into 1.
Small changes to other test cases to make them valid enough for the
purpose of the test.
In the specification of `OpTypeFunction`, it says
> OpFunction is the only valid use of OpTypeFunction.
This commit add a check in the validator for this rule.
A test started to fail because the new check happens before the check
the test case is testing. Updated the test case to still fail the
check it was suppose to fail originally.
http://crbug.com/874571
* Copy decorations when creating new ids.
When creating a new value based on an old value, we need to copy the
decorations to the new id. This change does this in 3 places:
1) The variable holding the return value of the function generated by
merge return should get decorations from the function.
2) The results of the OpPhi instructions should get decorations from the
variable they are replacing in the ssa writer.
3) In local access chain convert the intermediate struct (result of
OpCompositeInsert) generated for the store replacement should get its
decorations from the variable being stored to.
Fixes#1787.
If seems like at least 1 driver does not like a condition jump to the end
of a selection construct. We are generating these in the merge return
pass. This change stops merge return from generating this sequence.
Part of #1861.
When doing predicate blocks, we need to traverse every block in
structured order in order to keep track of which construct a block is
contained in. The standard way of traversing code in structured order
is to create a list with all of the nodes in order. However, when
predicating blocks, new blocks are created, and those blocks are missed.
This causes branches that go too far.
The solution is to update the order as new blocks are created. Since
we are using an std::list, we do not have to worry about invalidation of
iterators when changing the list.
* Split constant opcode validation out of idUsage and into
validate_constants.cpp
* minor style fixes
* reduced duplication
* fixed an issue with array sizing
* Refactor PredicateBlocks
Refactor PredicateBlocks so that we know which constructs a return
is contained in. Will be used later.
* Have PredicateBlocks jump the existing merge blocks.
In PredicateBlocks, we currently skip instructions with side effects,
but it still follows the same control flow (sort-of). This causes a
problem, when we are trying to predicate code in a loop. We skip all
of the code with side effects (IV increment), but still follow the
same control flow (jump back the start of the loop). This creates an
infinite loop because the code will keep jumping back to the start of
the loop without changing the values that effect the exit condition.
This is a large change to merge-return. When predicating a block that
is in a loop or merge construct, it will jump to the merge block of the
construct. Once out of all constructs we will generate code as we did
before.
* Handle breaks from structured-ifs in DCE.
dead code elimination assumes that are conditional branches except for
breaks and continues in loops will have an OpSelectionMerge before them.
That is not true when breaking out of a selection construct.
The fix is to look for breaks in selection constructs in the same place
we look for breaks and continues for loops.
When dead-branch-elim folds a conditional branch, it also deletes the
OpSelectionMerge instruction. If that construct contains a
conditional branch to the merge node, it will not have its own
OpSelectionMerge. When the headers merge instruction is deleted, the
the inner conditional branch will no longer be legal. It will be a
selection to a node that is not a merge node.
We fix this up by moving the OpSelectionMerge to a new location if it is
still needed.
This forks the testing harness from https://github.com/google/shaderc
to allow testing CLI tools.
New features needed for SPIRV-Tools include:
1- A new PlaceHolder subclass for spirv shaders. This place holder
calls spirv-as to convert assembly input into SPIRV bytecode. This is
required for most tools in SPIRV-Tools.
2- A minimal testing file for testing basic functionality of spirv-opt.
Add tests for all flags in spirv-opt.
1. Adds tests to check that known flags match the names that each pass
advertises.
2. Adds tests to check that -O, -Os and --legalize-hlsl schedule the
expected passes.
3. Adds more functionality to Expect classes to support regular
expression matching on stderr.
4. Add checks for integer arguments to optimization flags.
5. Fixes#1817 by modifying the parsing of integer arguments in
flags that take them.
6. Fixes -Oconfig file parsing (#1778). It reads every line of the file
into a string and then parses that string by tokenizing every group of
characters between whitespaces (using the standard cin reading
operator). This mimics shell command-line parsing, but it does not
support quoting (and I'm not planning to).
When doing the validator checks, an instruction is currently registered
at the end of IdPass. This creates an inconsistency. In IdPass, an
instruction that uses its own result will treat that use as a forward
reference. Then in the following passes it will not because the
definition can be found.
It seems best to update the state after all of the check have been done
for the current instruction. This makes it consistent for all of the
passes.
This makes a different when trying to verify OpTypeStruct.
Fixes https://crbug.com/874372.
In local-access-chain-convert, we replace loads by load the entire
variable, then doing the extract. The extract will have the same value
as the load. However, if the load has a decoration on it, the
decoration is lost because we do not copy any them to the new id.
This is fixed by rewritting the load into the extract and keeping the
same result id.
This change has the effect that we do not call DCEInst on the loads
because the load is not being deleted, but replaced. This could leave
OpAccessChain instructions around that are not used. This is not a
problem for -O and -Os. They run local_single_*_elim passes and then
dead code elimination. The dce will remove the unused access chains,
and the load elimination passes work even if there are unused access
chains. I have added test to them to ensure they will not loss
opportunities.
Fixes#1787.
The code in source/message was only used in a single set of tests to
format the output results. This CL changes the test to verify the
message instead of all the error values and removes the source/message
code.
* Moved function opcode validation out of idUsage and into new files
* minor style changes
* General opcode checking is in validate_function.cpp
* Execution limitation checking is in
validate_execution_limitations.cpp
* Execution limitations was split into a new pass as it requires other
validation to register those limitations first.
* Changed entry point validation to check storage class of variable
instead of pointer
* added a test
* Moved several checks after opcode validation
* These checks should be able to guarantee individual instructions are
ok
* Updated tests due to reordered checks
* Moved type instruction validation out of validation idUsage into a new
file
* Consolidate type unique pass into new file
* Removed one bad test
* Reworked validation ordering
* Run the validator in the optimization fuzzers.
The optimizers assumes that the input to the optimizer is valid. Since
the fuzzers do not check that the input is valid before passing the
spir-v to the optimizer, we are getting a few errors.
The solution is to run the validator in the optimizer to validate the
input.
For the legalization passes, we need to add an extra option to the
validator to accept certain types of variable pointers, even if the
capability is not given. At the same time, we changed the option
"--legalize-hlsl" to relax the validator in the same way instead of
turning it off.
Fixes#1800
* Refactored duplication of code between OpCopyMemory and
OpCopyMemorySized validation
* Fixed some bugs in OpCopyMemorySized validation
* Replaced asserts with checks
* Added new tests
* Replaced uses in opcode validation of current_function()
* Added non-const accessor to function lookup in ValidationState_t
* Updated a couple bad tests due to check reordering
1.
BUILD.gn: Don't use the extra Chromium clang warnings
Also removes the unused .gn secondary_sources.
2.
Move fuzzers in test/ instead of testing/
This frees up testing/ to be the git subtree of Chromium's src/testing/
that contains test.gni, gtest, gmock and libfuzzer
3.
DEPS: get the whole testing/ subtree of Chromium
4.
BUILD.gn: Simplify the standalone gtest targets
These targets definitions are inspired from ANGLE's and add a variable
that is the path of the googletest directory so that it can be made
overridable in future commits.
6.
BUILD.gn: Add overridable variables for deps dirs
This avoids hardcoded paths to dependencies that make it hard to
integrate SPIRV-Tools in other GN projects.
When validating a FunctionCall we can trigger an assert if we are not
currently within a function body. This CL adds verification that we are
within a function before attempting to add a function call.
Issue 1789.
In the merge return pass, we will split a block, but not update the phi
instructions that reference the block. Since the branch in the original
block is now part of the block with the new id, the phi nodes must be
updated.
This commit will change this.
I have also considered other places where an id of a basic block could
be referenced, and I don't think any of them need to change.
1) Branch and merge instructions: These jump to the start of the
original block, and so we want them to jump to the block that uses the
original id. Nothing needs to change.
2) Names and decorations: I don't think it matters with block keeps the
name, and there are no decorations that apply to basic blocks.
Fixes#1736.
Many of the files have using std::<foo> statements in them, but then the
use of <foo> will be inconsistently std::<foo> or <foo> scattered
through the file. This CL removes all of the using statements and
updates the code to have the required std:: prefix.
This CL removes the two diag() overloads and leaves only the version
which accepts an Instruction. This is safer as we never use the
implicit location from the validation state.
When creating a new phi for a value in the function, merge return will
rewrite all uses of an id that are no longer dominated by its
definition. Uses that are not in a basic block, like OpName or
decorations, are not dominated, but they should not be replaced.
Fixes#1736.
This CL updates the diag() calls in validate_cfg to provide the
associated instruction. This fixes a couple places where we output the
last line of the file instead of the instruction as the disassembly.
This CL changes validate.cpp to use diag providing an explicit
instruction. This changes the result of the function end checks to not
output a disassembly anymore as printing the last line of the module
didn't seem to make sense.
* Combines OpAccessChain, OpInBoundsAccessChain, OpPtrAccessChain and
OpInBoundsPtrAccessChain
* New folding rule to fold add with 0 for integers
* Converts to a bitcast if the result type does not match the operand
type
V
Currently, some instructions will be missing from the list of
ordered_instructions. This will cause issues due to the debug change
which passed the last instruction into subsequent passes.
This CL moves the addition to the ordered list out of the
RegisterInstruction method into AddOrderedInstruction. This method is
called first in ProcessInstruction and the CapabilitiesPass and IdPass
are updated to take an Instruction parameter.
This CL removes the redundant operator name from the error messages in
validate_composites. The operator will be printed on the next line with
the disassembly.
This CL splits the switch in ImagePass into individual validate
functions. The error messages have been updated to drop the
suffix/prefix of the opcode name since it will be displayed in the
disassembly.
This re-implements the -Oconfig=<file> flag to use a new API that takes
a list of command-line flags representing optimization passes.
This moves the processing of flags that create new optimization passes
out of spirv-opt and into the library API. Useful for other tools that
want to incorporate a facility similar to -Oconfig.
The main changes are:
1- Add a new public function Optimizer::RegisterPassesFromFlags. This
takes a vector of strings. Each string is assumed to have the form
'--pass_name[=pass_args]'. It creates and registers into the pass
manager all the passes specified in the vector. Each pass is
validated internally. Failure to create a pass instance causes the
function to return false and a diagnostic is emitted to the
registered message consumer.
2- Re-implements -Oconfig in spirv-opt to use the new API.
Fixes#1731
* Updated folding rules related to vector shuffle to account for the
undef literal value:
* FoldVectorShuffleFeedingShuffle
* FoldVectorShuffleFeedingExtract
* FoldVectorShuffleWithConstants
* These rules would commit memory violations due to treating the undef
literal value as an accessible composite component
Fixes#1727
* If the pass finds any dead branches it can optimize then at the end of
the pass it reorders basic blocks to ensure they satisfy block ordering
requirements
* Added some new tests
* While investigating this issue, found and fixed a non-deterministic
ordering of dominators
* Now the edges used to construct the dominator tree are sorted
according to posorder traversal indices
This CL updates the code to pull a valid instruction for the line number
when outputting a component error in OpVectorShuffle. The error line
isn't the best at this point as it points at the component, but it's
better then a -1 (turning to max<size_t>) that was being output.
The error messages has been updated to better reflect what the error is
attempting to say.
Issue 1719.
With current implementation, the constant manager does not keep around
two constant with the same value but different types when the types
hash to the same value. So when you start looking for that constant you
will get a constant with the wrong type back.
I've made a few changes to the constant manager to fix this. First off,
I have changed the map from constant to ids to be an std::multimap.
This way a single constant can be mapped to mutiple ids each
representing a different type.
Then when asking for an id of a constant, we can search all of the ids
associated with that constant in order to find the one with the correct
type.
When folding an OpVectorShuffle where the first operand is defined by
an OpVectorShuffle, is unused, and is equal to the second, we end up
with an infinite loop. This is because we think we change the
instruction, but it does not actually change. So we keep trying to
folding the same instruction.
This commit fixes up that specific issue. When the operand is unused,
we replace it with Null.
When folding a vector shuffle that feeds another vector shuffle causes
the size of the first operand to change, when other indices have to be
adjusted reletive to the new size.
The function class provides a {Set|Get}Parent call in order to provide
the context to the LoopDescriptor methods. This CL removes the module
from Function and provides the needed context directly to LoopDescriptor
on creation.
This CL removes the context() method from opt::Function. In the places
where the context() was used we can retrieve, or provide, the context in
another fashion.
Currently the IRContext is passed into the Pass::Process method. It is
then up to the individual pass to store the context into the context_
variable. This CL changes the Run method to store the context before
calling Process which no-longer receives the context as a parameter.
- Vulkan 1.0 uses strict layout rules
- Vulkan 1.0 with relaxed-block-layout validator option
enforces all rules except for the relaxation of vector
offset.
- Vulkan 1.1 and later always supports relaxed block layout
Add spot check tests for the relaxed-block-layout scenarios.
Fixes#1697
This CL moves the various validate files into the val/ directory with
the rest of the validation infrastructure. This matches how opt/ is
setup with the passes with the infrastructure.
Other environments do not.
Add tests for OpenGL 4.5 and SPIR-V universal 1.0 to ensure
they still check monotonic layout.
For universal 1.0, we're assuming it otherwise follows Vulkan
rules for block layout.
Fixes#1685
For the instructions which execute after the IdPass check we can provide
the Instruction instead of the spv_parsed_instruction_t. This
Instruction class provides a bit more context (like the source line)
that is not available from spv_parsed_instruction_t.
This CL moves the validation code to the val:: namespace. This makes it
clearer which instance of the Instruction and other classes are being
referred too.
This CL moves the files in opt/ to consistenly be under the opt::
namespace. This frees up the ir:: namespace so it can be used to make a
shared ir represenation.
Currently the utils/ folder uses both spvutils:: and spvtools::utils.
This CL changes the namespace to consistenly be spvtools::utils to match
the rest of the codebase.
Implement rules for row-major matrices
Use ArrayStride and MatrixStride to compute sizes
Propagate matrix stride and RowMajor/ColumnMajor through array members of structs.
Fixes#1637Fixes#1668
Fixes#1618.
Adds a check that validates acceptable exits from case constructs. Case
constructs may only exit to another case construct, the corresponding
merge, an outer loop continue or outer loop merge.
Fixes#1664 : PushConstant with Block follows storage buffer rules
PushConstant variables were being checked with block rules, which are
too strict.
Fixes#1606 : StorageBuffer with Block layout follows buffer rules
StorageBuffer variables were not being checked before.
Fix layout messages: say storage class and decoration
We need to provide more information about storage class and decoration.
The folding routines are currently global functions. They also rely on
data in an std::map that holds the folding rules for each opcode. This
causes that map to not have a clear owner, and therefore never gets
deleted.
There has been a request to delete this map. To implement this, we will
create a InstructionFolder class that owns the maps. The IRContext will
own the InstructionFolder instance. Then the global functions will
become public memeber functions of the InstructionFolder.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1659.
There are a few locations where we need to handle duplicate types. We
cannot merge them because they may be needed for reflection. When this
happens we need do some extra lookups in the type manager.
The specific fixes are:
1) When generating a constant through `GetDefiningInstruction` accept
and use an id for the desired type of the constant. This will make sure
you get the type that is needed.
2) In Private-to-local, make sure we to update the def-use chains when a
new pointer type is created.
3) In the type manager, make sure that `FindPointerToType` returns a
pointer that points to the given type and not a duplicate type.
4) In scalar replacment, make sure the null constants that are created
are the correct type.
- Add asm/dis test for SPV_KHR_8bit_storage
- validator: SPV_KHR_8bit_storage capabilities enable declaration of 8bit int
TODO:
- validator: ban arithmetic on 8bit unless Int8 is enabled
Covered by https://github.com/KhronosGroup/SPIRV-Tools/issues/1595
Produce better error diagnostics in the CFG validation.
This CL fixes up several issues with the diagnostic error line output
in the CFG validation code. For the cases where we can determine a
better line it has been output. For other cases, we removed the
diagnostic line and the error line number from the results.
Fixes#1657
Revert "Don't merge types of resources"
This reverts commit f393b0e480, but leaves
the tests that were added. Added new test. These test are the so that,
if someone tries the same change I made, they will see the test that
they need to handle.
Don't run remove duplicates in -O and -Os
Romve duplicates was run to help reduce compile time when looking for
types in the type manager. I've run compile time test on three sets
of shaders, and the compile time does not seem to change.
It should be safe to remove it.
The Compact IDs pass can corrupt the CFG, but first the CFG has to
be setup. To do this, a test that builds the CFG, then performs the
compact IDs pass, then checks context consistency.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1648
When doing reflection users care about the names of the variable, the
name of the type, and the name of the members. Remove duplicates breaks
this because it removes the names one of the types when merging.
To fix this we have to keep the different types around for each
resource. This commit adds code to remove duplicates to look for the
types uses to describe resources, and make sure they do not get merged.
However, allow merging of a type used in a resource with something not
used in a resource. Was done when the non resource type came second.
This could have a negative effect on compile time, but it was not
expected to be much.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1372.
Fixes#937
Stop std140/430 validation when runtime array is encountered.
Check for standard uniform/storage buffer layout instead of std140/430.
Added validator command line switch to skip block layout checking.
Validate structs decorated as Block/BufferBlock only when they
are used as variable with storage class of uniform or push
constant.
Expose --relax-block-layout to command line.
dneto0 modification:
- Use integer arithmetic instead of floor.
We need this to reduce the test time on Visual Studio debug builds.
AppVeyor times out at 300s for a process.
Artificially lower the local variable limit check for testing purposes.
We don't get much value from using the full 500K+ variables.
Use a limit of 5000 instead.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1632
Add SPV_ENV_WEBGPU_0 for work-in-progress WebGPU.
val: Disallow OpUndef in WebGPU env
Silence unused variable warnings when !defined(SPIRV_EFFCE)
Limit visibility of validate_instruction.cpp's symbols
Only InstructionPass needs to be visible so all other functions are put
in an anonymous namespace inside the libspirv namespace.
Also add a corresponding check for capabilities in the validator.
Update previously existing test cases where an instruction used to fail
assembling because of a version check, but now they succeed because the
instruction is also guarded by a capability. Now it should assemble.
Add tests to ensure that capabilities are checked appropriately.
The explicitly reserved instructions OpImageSparseSampleProj*
now assemble, but they fail validation.
Fixes#1624
[val] Add extra context to error messages.
This CL extends the error messages produced by the validator to output the
disassembly of the errored line.
The validation_id messages have also been updated to print the line number of
the error instead of the word number. Note, the error number is from the start
of the SPIR-V, it does not include any headers printed in the disassembled code.
Fixes#670, #1581
This CL reverts the revert of 'Disallow array-of-arrays with DescriptorSets when
validating." Other changes have been committed which should aleviate the
AppVeryor resource constraints.
This reverts commit f2c93c6e12.
This CL adds validation to disallow using an array-of-arrays when attached to a
DescriptorSet.
Fixes#1522
Validate Ids before DataRules.
The DataRule validators call FindDefs with the assumption that they
definitions being looked at can be found. This may not be true if we
have not validated identifiers first.
This CL flips the IdPass and DataRulesPass to fix this issue.
Fixes#491
* Basic blocks now have a link to the terminator
* Check all case sepecific rules
* Missing check for branching into the middle of a case (#1618)
Fixes#1120
Checks that all static uses of the Input and Output variables are listed
as interfaces in each corresponding entry point declaration.
* Changed validation state to track interface lists
* updated many tests
* Modified validation state to store entry point names
* Combined with interface list and called EntryPointDescription
* Updated uses
* Changed interface validation error messages to output entry point name
in addtion to ID
We replace the std::vector in the Operand class by a new class that does
a small size optimization. This helps improve compile time on Windows.
Tested on three sets of shaders. Trying various values for the small
vector. The optimal value for the operand class was 2. However, for
the Instruction class, using an std::vector was optimal. Size of "0"
means that an std::vector was used.
Instruction size
0 4 8
Operand Size
0 489 544 684
1 593 487
2 469 570
4 473
8 505
This is a single thread run of ~120 shaders. For the multithreaded run
the results were the similar. The basline time was ~62sec. The
optimal configuration was an 2 for the OperandData and an
std::vector for the OperandList with a compile time of ~38sec. Similar
expiriments were done with other sets of shaders. The compile time still
improved, but not as much.
Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1609.
Try to reduce the amount of disk space used by especially by debug builds,
which may be contributing to AppVeyor failures.
Collapses tests in categories:
- validator
- loop optimizations
- dominator analysis
- linker
Contributes to #1615
- Fix tests for basic group operations (e.g. Reduce) to allow for
new capabilities in SPIR-V 1.3 that enable them.
- Refactor operand capability check to avoid code duplication and
to put all checks that don't need table lookup before any table
lookup.
- Test round trip assembly/disassembly support for extension
SPV_NV_viewport_array2
- Test assembly and validation of decoration ViewportRelativeNV
Fixes#1596
Fixes#1281
* New structured cfg check: all non-construct header blocks'
predecessors must come from within the construct
* New function to calculate blocks in a construct
* Fixed a bug in BasicBlock type bitset
Relaxing check to not consider unreachable predecessors
* Fixing broken common uniform elim test
This CL updates the validate_id code to output the name of the object along with
the id number. There were a few instances which already output the name, this
just extends to all of them. Now, the output should say 123[obj] instead of just
123.
Issue #1581
* Disallow array-of-arrays with DescriptorSets when validating.
This CL adds validation to disallow using an array-of-arrays when attached to a
DescriptorSet.
Fixes#1522
By using forward pointers, we are able to define a struct that has a
pointer to itself. This could be directly or indirectly. The current
implementation of the type manager did not handle this case. There are
three changes that are made in this commit inorder to handle this case:
1) Change the handling of OpTypeForwardPointer
The current handling of OpTypeForwardsPointer is broken if there is a
reference to the pointer before the real definition. When build the
type that contain the forward delared pointer, the type manager will ask
for the type for that ID, and will get a nullptr because it does not
exists. This nullptr is not handleded very well.
The change is to keep track of the incomplete types the first time
through all of the types. An incomplete type is a ForwardPointer or any
type that references an incomplete type.
Then we implement a second pass through the incomplete types that will
complete them.
2) Hashing types.
When hashing a type, we want to uses all of the subtypes as part of the
hash. However, with types that reference them selves, this creates an
infinite recursion. To get around this, we keep track of which types
have been seen on the path from the root type. If we have see the
current type already then we can stop the recursion.
3) Comparing types.
In order to check if two types are the same, we must check that all of
their subtypes are the same as well. This also causes an infinit
recursion. The solution is to stop comparing the subtypes if we are
trying to compare two pointer types that we are already in the middle of
comparing. The ideas is that if the two pointer are different, then in
progress compare will return false itself.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1578.
We add a new rule to the folding rules to fold an FMix feeding an
extract when the alpha value for the element being extracted is either
0 or 1. In those case, we can simple extract from one of the operands
to the FMix.
With that change the simplification pass completely subsumes the
insert-extract elimination pass. So we remove the insert-extract
elimination passes and replce them with calls to the simplification
pass.
In a follow up PR, we should delete the insert-extract elimination pass.
Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.
According to the SPIR-V Spec, section 2.4 Logical Layout of a Module there
should be a single required OpMemoryModel instruction provided. This CL adds
validation that OpMemoryModel is provided to the SPIR-V validator.
Fixes#1207
Removes the limit on scalar replacement for the lagalization passes.
This is done by adding an option to the pass (and command line option)
to set the limit on maximum size of the composite that scalar
replacement is willing to divide.
Fixes#1494.
ADCE does not treat OpCopyMemory as an instruction that references
memory. Because of that stores are removed that should not be.
This change teaches ADCE that OpCopyMemory and OpCopyMemorySize both
loads from and stores to memory. This will keep other stores live when
needed, and will allows ADCE to remove OpCopyMemory instructions as
well.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1556.
SPV_EXT_shader_viewport_index_layer enables using ViewportIndex
and Layer in vertex and tessellation shaders.
Also, as per the Vulkan spec:
> The ViewportIndex decoration must be used only within vertex,
> tessellation evaluation, geometry, and fragment shaders.
> In a vertex, tessellation evaluation, or geometry shader, any
> variable decorated with ViewportIndex must be declared using
> the Output storage class.
> In a fragment shader, any variable decorated with ViewportIndex
> must be declared using the Input storage class.
Similarly for Layer.
Currently in scalar replacement, we create a new variable for every
memeber of the composite being divided. It is often overkill, because
not all of those members will be used. This change will check which
elements are used and only create variable for the members that are
used.
This reduces the compile time for one set of shader from 248s to 165s.
Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
The code patterns generated by DXC around function calls can cause many
store to be storing the same value that was just loaded from the same
location:
```
%10 = OpLoad %type %var
OpStore %var %10
```
We want to clean these up very early on because they can cause other
transformations to do a lot of work. For the cases I see, they can be
removed during local-single-block-elim.
For one set of shaders the compile time goes from 248s to 182s. A 26%
improvement.
Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.
We have already disabled common uniform elimination because it created
sequences of loads an entire uniform object, then we extract just a
single element. This caused problems in some drivers, and is just
generally slow because it loads more memory than needed.
However, there are other way to get into this situation, so I've added
a pass that looks specifically for this pattern and removes it when only
a portion of the load is used.
Fixes#1547.
An FClamp instruction forces a values to be within a certain interval.
When the upper or lower bound of the FClamp is a constant and the value
being compared with is a constant, then in some case we can fold the
compared because the entire range is say less than the value.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1549.
If there is a shader with a variable in the workgroup storage class that
is stored to, but not loadeds, then we know nothing will read those
loads. It should be safe to remove them.
This is implemented in ADCE by treating workgroup variables the same
way that private variables are treated.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1550.
When doing if-conversion, we do not currently move code out of the side
nodes. The reason for this is that it can increase the number of
instructions that get executed because both side nods will have to be
executed now.
In this commit, we add code to move an instruction, and all of the
instructions it depends on, out of a side node and into the header of
the selection construct. However to keep the cost down, we only do it
when the two values in the OpPhi node compute the same value. This way
we have to move only one of the instructions and the other becomes
unused most of the time. So no real extra cost.
Makes the value number table an alalysis in the ir context.
Added more opcodes to list of code motion safe opcodes.
Fixes#1526.
Previously, the loop class used the terms latch and continue block
interchangeably. This patch splits the two and corrects and tests some
uses of the old uses of GetLatchBlock.
This pass will look for adjacent loops that are compatible and legal to
be fused.
Loops are compatible if:
- they both have one induction variable
- they have the same upper and lower bounds
- same initial value
- same condition
- they have the same update step
- they are adjacent
- there are no break/continue in either of them
Fusion is legal if:
- fused loops do not have any dependencies with dependence distance
greater than 0 that did not exist in the original loops.
- there are no function calls in the loops (could have side-effects)
- there are no barriers in the loops
It will fuse all such loops as long as the number of registers used for
the fused loop stays under the threshold defined by
max_registers_per_loop.
Adds support for spliting loops whose register pressure exceeds a user
provided level. This pass will split a loop into two or more loops given
that the loop is a top level loop and that spliting the loop is legal.
Control flow is left intact for dead code elimination to remove.
This pass is enabled with the --loop-fission flag to spirv-opt.
Track live scalars in VDCE as if they were single element vectors.
Handle the extended instructions for GLSL in VDCE.
Handle composite construct instructions in VDCE.
If one of the operands to an OpVectorTimesScalar instruction is zero,
then the result will be the 0 vector. Currently we do not fold the
insturction unless both operands are constants. This change fixes that.
We also allow folding of OpPhi instructions where the incoming values
are either an OpUndef or the OpPhi instruction itself. As with other
cases, this can be simplified to the OpUndef.
Track live scalars in VDCE as if they were single element vectors.
Handle the extended instructions for GLSL in VDCE.
Handle composite construct instructions in VDCE.
Fixes#1511.
Eliminate unused store to variable if followed by store to same
variable in same block.
Most significantly, this cleans up stores made unused by this pass.
These useless stores can inhibit subsequent optimizations, specifically
LocalSingleStoreElim. Eliminating them makes subsequent optimization more
effective.
The main effect of this pass is to simplify the work done by the SSA
rewriter. It catches many local loads/stores that help speeding up the
work done by the main rewriter.
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.
It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
For each function, the analysis determine which SSA registers are live
at the beginning of each basic block and which one are killed at
the end of the basic block.
It also includes utilities to simulate the register pressure for loop
fusion and fission.
The implementation is based on the paper "A non-iterative data-flow
algorithm for computing liveness sets in strict ssa programs" from
Boissinot et al.
* Adds new pass for validating non-uniform group instructions
* Currently on checks execution scope for Vulkan 1.1 and SPIR-V 1.3
* Added test framework
The local-single-store-elim algorithm is not fundamentally bad.
However, when there are a large number of variables, some of the
maps that are used can become very large. These large data structures
then take a very long time to be destroyed. I've seen cases around 40%
if the time.
I've rewritten that algorithm to not use as much memory. This give a
significant improvement when running a large number of shader through
DXC.
I've also made a small change to local-single-block-elim to delete the
loads that is has replaced. That way local-single-store-elim will not
have to look at those. local-single-store-elim now does the same thing.
The time for one set goes from 309s down to 126s. For another set, the
time goes from 102s down to 88s.
GCD MIV test as described in Chapter 3 of "Optimizing Compilers for
Modern Architectures: A Dependence-Based Approach" by Randy Allen, and
Ken Kennedy.
Delta test as described in Figure 3 of "Practical Dependence Testing" by
Gina Goff, Ken Kennedy, and Chau-Wen Tseng from PLDI '91.
* Reworked how execution model limitations are checked
* Now OpFunction checks which entry points call it and checks its
registered limitations instead of building a call stack in the entry
point
* New tests
* Moving function to entry point mapping into VState
Relaxs checks for per-vertex builtin variables. If the builtin
decoration is applied to a variable, then those checks now allow a level
of arraying on the variable before checking the type consistency.
* Allows arrays of variables to be present for the per-vertex variables:
* Position
* PointSize
* ClipDistance
* CullDistance
* Updated tests
Add test for case where OpBranch branches to a value (a function value).
Previous tests only checked a label value (name of a block.).
Update validate_id.cpp to remove the TODO for OpBranch and say that it
is already checked in validate_cfg.cpp
The unordered_set in ADCE that holds all of the live instructions takes
a very long time to be destroyed. In some shaders, it takes over 40% of
the time.
If we look at the unique ids of the live instructions, I believe they
are dense enough make a simple bit vector a good choice for to hold that
data. When I check the density of the bit vector for larger shaders, we
are usually using less than 4 bytes per element in the vector, and
almost always less than 16.
So, in this commit, I introduce a simple bit vector class, and
use it in ADCE.
This help improve the compile time for some shaders on windows by the
40% mentioned above.
Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
For each loop in a function, the pass walks the loops from inner to outer most loop
and tries to peel loop for which a certain amount of iteration can be done before or after the loop.
To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.
Provides functionality to perform ZIV and SIV dependency analysis tests
between a load and store within the same loop.
Dependency tests rely on scalar analysis to prove and disprove dependencies
with regard to the loop being analysed.
Based on the 1990 paper Practical Dependence Testing by Goff, Kennedy, Tseng
Adds support for marking loops in the loop nest as IRRELEVANT.
Loops are marked IRRELEVANT if the analysed instructions contain
no induction variables for the loops, i.e. the loops induction
variable is not relevent to the dependence of the store and load.
Adding three rules to fold OpDot (implemented as two).
- When an OpDot has two constants, then fold to the resulting const.
- When one of the inputs is the 0 vector, then fold to zero.
- When one of the inputs is a single 1 with 0s, then rewrite to an
OpCompositeExtract of the appropriate element. This will help find
even more folding opportunities.
Contributes to #709.
According to Vulkan spec 1.1.72:
> The PrimitiveId decoration must be used only within fragment,
> tessellation control, tessellation evaluation, and geometry shaders.
> In a tessellation control or tessellation evaluation shader, any
> variable decorated with PrimitiveId must be declared using the Input
> storage class.
We were enforcing that PrimitiveId can only be used with Output
storage class for TCS and TES before.
From the test case, the slice of the CFG that is interesting for the bug
is
25
|
v
30
|
v
31<-+
| |
v |
34--+
1. In block 25, we have a Phi candidate for %f with arguments
%47 = Phi[%float_0, %0]. This merges %float_0 and a yet unknown
argument from the external loop backedge.
2. We are now processing block 34:
i. The load %35 = OpLoad %f triggers a Phi candidate to be placed in
block 31.
ii. The Phi candidate %50 = Phi needs two arguments. The one coming
from block 30 is %47. But the one coming from block 34 (which we
are now processing and have marked sealed), finds %50 itself as
the reaching def for %f.
3. This wrongfully marks %50 as a copy-of Phi, which ultimately makes
both %47 and %50 copy-of Phis that get eliminated.
Update grammar table generation:
- Get extensions from instructions, not just operand-kinds
- Don't explicitly list extensions that come from the SPIR-V core
grammar or from a KHR extended instruction set grammar.
This makes it easier to support new extensions since the recommended
extension strategy is to add instructions to the core grammar file.
Also, test the validator has trivial support for passing through
the extensions SPV_NV_shader_subgroup_partitioned and
SPV_EXT_descriptor_indexing.
Migrating to unified grammar means we sometimes have two fields
for a certain feature: version and extensions. It means the feature
in question can be used either in SPIR-V of advanced-enough
versions or in any SPIR-V with with the specified extensions.
Validator now respects the above rules.
At every definition of a builtin id, run at-reference-check rules on the
defining instruction as well.
Previosly the validation was missing the case when invalid storage class
was defined in the instruction which defines the built-in, and not in
the instruction which references the built-in.
Refactored validate built-ins to make
GetExecutionModels(entry_point)
and
GetExecutionModes(entry_point)
available in validation state.
Entry points are allowed to have multiple execution modes and execution
models.
Finished the last missing feature in Vulkan built-ins validation:
FragDepth requires DepthReplacing.
Currently OpImageTexelPointer operations are treat like a use of the
pointer, but it does
not look for the memory being referenced to make sure stores are not
removed.
This change teaches it so identify the memory being accessed, and
treats it as if that memory is loaded.
Fixes to #1445.
OpImageTexelPointer acts like a special kind of load. It is not an
array load, but it also cannot be removed the same way a regular
load can. The type of propagation that needs to be done is similar
to what we do for arrays, so I want to merge that code into that
optmization.
Contributers to #1445.
OpImageTexelPointer acts like a special kind of load. It is still
safe to change the storage class of a variable used in a
OpImageTexalPointer instruction.
Contributes to #1445.
CPPreference.com has this description of digits10:
“The value of std::numeric_limits<T>::digits10 is the number of
base-10 digits that can be represented by the type T without change,
that is, any number with this many significant decimal digits can be
converted to a value of type T and back to decimal form, without
change due to rounding or overflow.”
This means that any number with this many digits can be represented
accurately in the corresponding type. A change in any digit in a
number after that may or may not cause it a different bitwise
representation. Therefore this isn’t necessarily enough precision to
accurately represent the value in text. Instead we need max_digits10
which has the following description:
“The value of std::numeric_limits<T>::max_digits10 is the number of
base-10 digits that are necessary to uniquely represent all distinct
values of the type T, such as necessary for
serialization/deserialization to text.”
The patch includes a test case in hex_float_test which tries to do a
round-robin conversion of a number that requires more than 6 decimal
places to be accurately represented. This would fail without the
patch.
Sadly this also breaks a bunch of other tests. Some of the tests in
hex_float_test use ldexp and then compare it with a value which is not
the same as the one returned by ldexp but instead is the value rounded
to 6 decimals. Others use values that are not evenly representable as
a binary floating fraction but then happened to generate the same
value when rounded to 6 decimals. Where the actual value didn’t seem
to matter these have been changed with different values that can be
represented as a binary fraction.
When the original code copies an entire array or struct one element at a
time, this turns into a series of OpCompositeInsert instruction followed
by a store of the whole array. We currently miss opportunities in copy
propagate arrays because we do not recognize this as a copy.
This commit adds code to copy propagate arrays to identify this code
pattern.
Also updates the performance passed to run array copy propagation.
The first implementation of MemroyObject, which is used in copy
propagate arrays, forced the access chain to be like the access chains
in OpCompositeExtract. This excluded the possibility of the memory
object from representing an array element that was extracted with a
variable index. Looking at the code, that restriction is not
neccessary. I also see some opportunities for doing this in some real
shaders.
Contributes to #1430.
This patch adds support for the analysis of scalars in loops. It works
by traversing the defuse chain to build a DAG of scalar operations and
then simplifies the DAG by folding constants and grouping like terms.
It represents induction variables as recurrent expressions with respect
to a given loop and can simplify DAGs containing recurrent expression by
rewritting the entire DAG to be a recurrent expression with respect to
the same loop.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1427
Adjusting validation to the new rule:
"Before version 1.3, it is only valid to use this instruction with
TessellationControl, GLCompute, or Kernel execution models.
There is no such restriction starting with version 1.3."
Also fixed wrong version numbers in source/spirv_target_env.cpp.
When we change the type of an object that gets stored, we do not want to
change the type of the memory location being stored to. In order to
still be able to do the rewrite, we will decompose and rebuild the
object so it is the type that can be stored.
Fixes#1416.
The sprir-v generated from HLSL code contain many copyies of very large
arrays. Not only are these time consumming, but they also cause
problems for drivers because they require too much space.
To work around this, we will implement an array copy propagation. Note
that we will not implement a complete array data flow analysis in order
to implement this. We will be looking for very simple cases:
1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.
The hard part is keeping all of the types correct. We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.
Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent. With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values. See |Struct::GetExtraHashWords|.
Contributes to #1416.
Added a framework for validation of BuiltIn variables. The framework
allows implementation of flexible abstract rules which are required for
built-ins as the information (decoration, definition, reference) is not
in one place, but is scattered all over the module.
Validation rules are implemented as a map
id -> list<functor(instrution)>
Ids which are dependent on built-in types or objects receive a task
list, such as "this id cannot be referenced from function which is
called from entry point with execution model X; propagate this rule
to your descendants in the global scope".
Also refactored test/val/val_fixtures.
All built-ins covered by tests
This patch adds a new option --time-report to spirv-opt. For each pass
executed by spirv-opt, the flag prints resource utilization for the pass
(CPU time, wall time, RSS and page faults)
This fixes issue #1378
This pass replaces the load/store elimination passes. It implements the
SSA re-writing algorithm proposed in
Simple and Efficient Construction of Static Single Assignment Form.
Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
In: Jhala R., De Bosschere K. (eds)
Compiler Construction. CC 2013.
Lecture Notes in Computer Science, vol 7791.
Springer, Berlin, Heidelberg
https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6
In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.
When a target variable is loaded, it queries the variable's reaching
definition. If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.
The algorithm avoids repeated lookups using memoization.
For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA. That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
The loop peeler util takes a loop as input and create a new one before.
The iterator of the duplicated loop then set to accommodate the number
of iteration required for the peeling.
The loop peeling pass that decided to do the peeling and profitability
analysis is left for a follow-up PR.
We are seeing shaders that have multiple returns in a functions. These
functions must get inlined for legalization purposes; however, the
inliner does not know how to inline functions that have multiple
returns.
The solution we will go with it to improve the merge return pass to
handle structured control flow.
Note that the merge return pass will assume the cfg has been cleanedup
by dead branch elimination.
Fixes#857.
Previously we keep a separate static grammar table for opcodes/
operands per SPIR-V version. This commit changes that to use a
single unified static grammar table for opcodes/operands.
This essentially changes how grammar facts are queried against
a certain target environment. There are only limited filtering
according to the desired target environment; a symbol is
considered as available as long as:
1. The target environment satisfies the minimal requirement of
the symbol; or
2. There is at least one extension enabling this symbol.
Note that the second rule assumes the extension enabling the
symbol is indeed requested in the SPIR-V code; checking that
should be the validator's work.
Also fixed a few grammar related issues:
* Rounding mode capability requirements are moved to client APIs.
* Reserved symbols not available in any extension is no longer
recognized by assembler.
Strips reflection info. This is limited to decorations and
decoration instructions related to the SPV_GOOGLE_hlsl_functionality1
extension.
It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1.
It will also remove the OpExtension for SPV_GOOGLE_decorate_string
if there are no further remaining uses of OpDecorateStringGOOGLE.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398
This reimplementation fixes several issues when removing decorations associated
to an ID (partially addresses #1174 and gives tools for fixing #898), as well
as making it easier to remove groups; a few additional tests have been added.
DecorationManager::RemoveDecoration() will still not delete dead decorations it
created, but I do not think it is its job either; given the following input
```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
OpDecorate %2 Restrict
%2 = OpDecorationGroup
OpGroupDecorate %2 %1 %3
OpDecorate %4 Invariant
%4 = OpDecorationGroup
OpGroupDecorate %4 %2
%uint = OpTypeInt 32 0
%1 = OpVariable %uint Uniform
%3 = OpVariable %uint Uniform
```
which of the following two outputs would you expect RemoveDecoration(2) to produce:
```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
%uint = OpTypeInt 32 0
%1 = OpVariable %uint Uniform
%3 = OpVariable %uint Uniform
```
or
```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
OpDecorate %4 Invariant
%4 = OpDecorationGroup
%uint = OpTypeInt 32 0
%1 = OpVariable %uint Uniform
%3 = OpVariable %uint Uniform
```
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/924
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1174
The default target is SPIR-V 1.3.
For example, spirv-as will generate a SPIR-V 1.3 binary by default.
Use command line option "--target-env spv1.0" if you want to make a SPIR-V
1.0 binary or validate against SPIR-V 1.0 rules.
Example:
# Generate a SPIR-V 1.0 binary instead of SPIR-V 1.3
spirv-as --target-env spv1.0 a.spvasm -o a.spv
spirv-as --target-env vulkan1.0 a.spvasm -o a.spv
# Validate as SPIR-V 1.0.
spirv-val --target-env spv1.0 a.spv
# Validate as Vulkan 1.0
spirv-val --target-env vulkan1.0 a.spv
The merging types we do not remove other information related to the
types. We simply leave it duplicated, and hope it is removed later.
This is what happens with decorations. They are removed in the next
phase of remove duplicates. However, for OpNames that is not the case.
We end up with two different names for the same id, which does not make
sense.
The solution is to remove the names and decorations for the type being
removed instead of rewriting them to refer to the other type.
Note that it is possible that if the first type does not have a name,
then the types will end up with no name. That is fine because the names
should not have any semantic significance anyway.
The was identified in issue #1372, but this does not fix that issue.
* Also mark function parameters as varying
* Conservatively mark assignment instructions as varying if any input is
varying after attempting to fold
* Added a test to catch this case
As per Vulkan spec, BuiltIn variables can't have Location or Component
decorations. On some drivers, these can lead to driver crashing when
compiling the shader pipeline; for example, NVidia/AMD desktop drivers:
https://github.com/KhronosGroup/glslang/issues/1182.
This change adds validation and tests to catch this.
* getFloatConstantKind() now handles OpConstantNull
* PerformOperation() now handles OpConstantNull for vectors
* Fixed some instances where we would attempt to merge a division by 0
* added tests
The algorithm used in DCEInst to remove dead code is very slow. It is
fine if you only want to remove a small number of instructions, but, if
you need to remove a large number of instructions, then the algorithm in
ADCE is much faster.
This PR removes the calls to DCEInst in the load-store removal passes
and adds a pass of ADCE afterwards.
A number of different iterations of the order of optimization, and I
believe this is the best I could find.
The results I have on 3 sets of shaders are:
Legalization:
Set 1: 5.39 -> 5.01
Set 2: 13.98 -> 8.38
Set 3: 98.00 -> 96.26
Performance passes:
Set 1: 6.90 -> 5.23
Set 2: 10.11 -> 6.62
Set 3: 253.69 -> 253.74
Size reduction passes:
Set 1: 7.16 -> 7.25
Set 2: 17.17 -> 16.81
Set 3: 112.06 -> 107.71
Note that the third set's compile time is large because of the large
number of basic blocks, not so much because of the number of
instructions. That is why we don't see much gain there.
Adding basis of arithmetic merging
* Refactored constant collection in ConstantManager
* New rules:
* consecutive negates
* negate of arithmetic op with a constant
* consecutive muls
* reciprocal of div
* Removed IRContext::CanFoldFloatingPoint
* replaced by Instruction::IsFloatingPointFoldingAllowed
* Fixed some bad tests
* added some header comments
Added PerformIntegerOperation
* minor fixes to constants and tests
* fixed IntMultiplyBy1 to work with 64 bit ints
* added tests for integer mul merging
Adding test for vector integer multiply merging
Adding support for merging integer add and sub through negate
* Added tests
Adding rules to merge mult with preceding divide
* Has a couple tests, but needs more
* Added more comments
Fixed bug in integer division folding
* Will no longer merge through integer division if there would be a
remainder in the division
* Added a bunch more tests
Adding rules to merge divide and multiply through divide
* Improved comments
* Added tests
Adding rules to handle mul or div of a negation
* Added tests
Changes for review
* Early exit if no constants are involved in more functions
* fixed some comments
* removed unused declaration
* clarified some logic
Adding new rules for add and subtract
* Fold adds of adds, subtracts or negates
* Fold subtracts of adds, subtracts or negates
* Added tests
This change makes the IR builder use the type manager to generate
OpTypeInts when creating OpConstants. This avoids dangling references
being stored by the created OpConstants.
It moves all conditional branching and switch whose conditions are loop
invariant and uniform. Before performing the loop unswitch we check that
the loop does not contain any instruction that would prevent it
(barriers, group instructions etc.).
Fixes a bug at the same time. In `UpdateDefUse`, if the definition
already exists, we are not suppose to analyse it again. When you do
the entries for the definition are deleted, and we don't want that.
The check for this was wrong.
This function now checks for side-effects before adding operand
instructions to the dead instruction work list.
Because this fix puts more pressure on IsCombinatorInstruction() to
be correct, this commit adds all OpConstant* and OpType* instructions
to combinator_ops_ set.
Fixes#1341.
This change implements instruction folding for arithmetic operations
that are redundant, specifically:
x + 0 = 0 + x = x
x - 0 = x
0 - x = -x
x * 0 = 0 * x = 0
x * 1 = 1 * x = x
0 / x = 0
x / 1 = x
mix(a, b, 0) = a
mix(a, b, 1) = b
Cache ExtInst import id in feature manager
This allows us to avoid string lookups during optimization; for now we
just cache GLSL std450 import id but I can imagine caching more sets as
they become utilized by the optimizer.
Add tests for add/sub/mul/div/mix folding
The tests cover scalar float/double cases, and some vector cases.
Since most of the code for floating point folding is shared, the tests
for vector folding are not as exhaustive as scalar.
To test sub->negate folding I had to implement a custom fixture.
I mixed up two cases when folding an OpCompositeExtract that is feed by
and OpCompositeInsert. The specific cases are demonstracted in the new
test. I mixed up the conditions for the cases, and treated one like the
other.
Fixes#1323.
* Now track propagation status and assert on bad statuses
* Added helper methods to access instruction propagation status
* Modified the phi meet operator to properly reflect the paper it is
based on
* Modified SSA edge addition so that all edge are added, but only on
state changes
* Fixed a bug in instruction simulation where interesting conditional
branches would not mark the interesting edge as executed
* Added a test to catch this bug
* Added an ostream operator for SSAPropagator::PropStatus
This change handles all 6 regular comparison types in two variations,
ordered (true if values are ordered *and* comparison is true) and
unordered (true if values are unordered *or* comparison is true).
Ordered comparison matches the default floating-point behavior on host
but we use std::isnan to check ordering explicitly anyway.
This change also slightly reworks the floating-point folding support
code to make it possible to define a folding operation that returns
boolean instead of floating point.
These tests exhaustively test ordered/unordered comparisons for
float/double.
Since for NaN inputs the comparison result doesn't depend on the
comparison function, we just test == and !=; NaN inputs result in true
unordered comparisons and false ordered comparisons.
In dead branch elimination, we already recognize unreachable continue
blocks, and update OpPhi instruction accordingly. This change adds an
extra check: if the head block has exactly 1 other incoming edge, then
replace the OpPhi with the value from that edge.
Fixes#1314.
unordered_map is not POD. Using it as static may cause problems
when operator new() and operator delete() is customized.
Also changed some function signatures to use const char* instead
of std::string, which will give caller the flexibility to avoid
creating a std::string.
We can fold OpSelect into one of the operands in two cases:
- condition is constant
- both results are the same
Even if the original shader doesn't have either of these, if-conversion
pass sometimes ends up generating instructions like
%7127 = OpSelect %int %3220 %7058 %7058
And this optimization cleans them up.
This patch adds initial support for loop unrolling in the form of a
series of utility classes which perform the unrolling. The pass can
be run with the command spirv-opt --loop-unroll. This will unroll
loops within the module which have the unroll hint set. The unroller
imposes a number of requirements on the loops it can unroll. These are
documented in the comments for the LoopUtils::CanPerformUnroll method in
loop_utils.h. Some of the restrictions will be lifted in future patches.
Implementation of the simplification pass.
- Create pass that calls the instruction folder on each instruction and
propagate instructions that fold to a copy. This will do copy
propagation as well.
- Did not use the propagator engine because I want to modify the instruction
as we go along.
- Change folding to not allocate new instructions, but make changes in
place. This change had a big impact on compile time.
- Add simplification pass to the legalization passes in place of
insert-extract elimination.
- Added test cases for new folding rules.
- Added tests for the simplification pass
- Added a method to the CFG to apply a function to the basic blocks in
reverse post order.
Contributes to #1164.
Add pkg-config file for shared libraries
Properly build SPIRV-Tools DLL
Test C interface with shared library
Set PATH to shared library file for c_interface_shared test
Otherwise, the test won't find SPIRV-Tools-shared.dll.
Do not use private functions when testing with shared library
Make all symbols hidden by default for shared library target
* Added TypeManager::RebuildType
* rebuilds the type and its constituent types in terms of memory owned
by the manager.
* Used by TypeManager::RegisterType to properly allocate memory
* Adding an unit test to expose the issue
* Added some tests to provide coverage of RebuildType
* Added an accessor to the target pointer for a forward pointer
Create the folding engine that will
1) attempt to fold an instruction.
2) iterates on the folding so small folding rules can be easily combined.
3) insert new instructions when needed.
I've added the minimum number of rules needed to test the features above.
This patch adds LoopUtils class to handle some loop related transformations. For now it has 2 transformations that simplifies other transformations such as loop unroll or unswitch:
- Dedicate exit blocks: this ensure that all exit basic block
(out-of-loop basic blocks that have a predecessor in the loop)
have all their predecessors in the loop;
- Loop Closed SSA (LCSSA): this ensure that all definitions in a loop are used inside the loop
or in a phi instruction in an exit basic block.
It also adds the following capabilities:
- Loop::IsLCSSA to test if the loop is in a LCSSA form
- Loop::GetOrCreatePreHeaderBlock that can build a loop preheader if required;
- New methods to allow on the fly updates of the loop descriptors.
- New methods to allow on the fly updates of the CFG analysis.
- Instruction::SetOperand to allow expression of the index relative to Instruction::NumOperands (to be compatible with the index returned by DefUseManager::ForEachUse)
Creates a pass that will remove instructions that are invalid for the
current shader stage. For the instruction to be considered for replacement
1) The opcode must be valid for a shader modules.
2) The opcode must be invalid for the current shader stage.
3) All entry points to the module must be for the same shader stage.
4) The function containing the instruction must be reachable from an entry point.
Fixes#1247.
* Had to remove templating from InstructionBuilder as a result
* now preserved analyses are specified as a constructor argument
* updated tests and uses
* changed static_assert to a runtime assert
* this should probably get further changes in the future
* When handling unreachable merges and continues, do not optimize to the
same IR
* pass did not check whether the unreachable blocks were in the
optimized form before transforming them
* added a test to catch this issue
* Should handle all possibilities
* Stricter checks for what is disallowed:
* header and header
* merge and merge
* Allow header and merge blocks to be merged
* Erases the structured control declaration if merging header and
merge blocks together.
* If the dead branch elim is performed on a module without structured
control flow, the OpSelectionMerge may not be present
* Add a check for pointer validity before dereferencing
* Added a test to catch the bug
* Forces traversal of phis if the def has changed to varying
* Mark a phi as varying if all incoming values are varying
* added a test to catch the bug
This adds Dead Insert Elimination to the end of the
--eliminate-insert-extract pass. See the new tests for examples of code
that will benefit.
Essentially, this removes OpCompositeInsert instructions which are not
used, either because there is no instruction which uses the value at the
index it is inserted, or because a subsequent insert intercepts any such
use.
This code has been seen to remove significant amounts of dead code from
real-life HLSL shaders being ported to Vulkan. In fact, it is needed to
remove dead texture samples which cause Vulkan validation layer errors
(unbound textures and samplers) if not removed . Such DCE is thus
required for fxc equivalence and legalization.
This analysis operates across "chains" of Inserts which can also contain
Phi instructions.
* Handles simple cases only
* Identifies phis in blocks with two predecessors and attempts to
convert the phi to an select
* does not perform code motion currently so the converted values must
dominate the join point (e.g. can't be defined in the branches)
* limited for now to two predecessors, but can be extended to handle
more cases
* Adding if conversion to -O and -Os
Ban floating point case for OpAtomicLoad, OpAtomicExchange,
OpAtomicCompareExchange. In graphics (Shader) environments, these
instructions only operate on scalar integers. Ban the floating point
case. OpenCL supports atomic_float.
Implemented Vulkan-specific rules:
- OpTypeImage must declare a scalar 32-bit float or 32-bit integer type
for the “Sampled Type”.
- OpSampledImage must only consume an “Image” operand whose type has its
“Sampled” operand set to 1.
The current folding routines have a very cumbersome interface, make them
harder to use, and not a obvious how to extend.
This change is to create a new interface for the folding routines, and
show how it can be used by calling it from CCP.
This does not make a significant change to the behaviour of CCP. In
general it should produce the same code as before; however it is
possible that an instruction that takes 32-bit integers as inputs and
the result is not a 32-bit integer or bool will not be folded as before.
It seems like andriod has a problem with INT32_MAX and the like. I'll
explicitly define those if the are not already defined.
The class factorize the instruction building process.
Def-use manager analysis can be updated on the fly to maintain coherency.
To be updated to take into account more analysis.
* AddToWorklist can now be called unconditionally
* It will only add instructions that have not already been marked as
live
* Fixes a case where a merge was not added to the worklist because the
branch was already marked as live
* Added two similar tests that fail without the fix
We have come across a driver bug where and OpUnreachable inside a loop
is causing the shader to go into an infinite loop. This commit will try
to avoid this bug by turning OpUnreachable instructions that are
contained in a loop into branches to the loop merge block.
This is not added to "-O" and "-Os" because it should only be used if
the driver being targeted has this problem.
Fixes#1209.
At the moment specialization constants look like constants to ccp. This
causes a problem because they are handled differently by the constant
manager.
I choose to simply skip over them, and not try to add them to the value
table. We can do specialization before ccp if we want to be able to
propagate these values.
Fixes#1199.
The current code expects the users of the constant manager to initialize
it with all of the constants in the module. The problem is that you do
not want to redo the work multiple times. So I decided to move that
code to the constructor of the constant manager. This way it will
always be initialized on first use.
I also removed an assert that expects all constant instructions to be
successfully mapped. This is because not all OpConstant* instruction
can map to a constant, and neither do the OpSpecConstant* instructions.
The real problem is that an OpConstantComposite can contain a member
that is OpUndef. I tried to treat OpUndef like OpConstantNull, but this
failed because an OpSpecConstantComposite with an OpUndef cannot be
changed to an OpConstantComposite. Since I feel this case will not be
common, I decided to not complicate the code.
Fixes#1193.
* Added for Instruction, BasicBlock, Function and Module
* Uses new disassembly functionality that can disassemble individual
instructions
* For debug use only (no caching is done)
* Each output converts module to binary, parses and outputs an
individual instruction
* Added a test for whole module output
* Disabling Microsoft checked iterator warnings
* Updated check_copyright.py to accept 2018
* Changed MemPass::InsertPhiInstructions to set basic blocks for new
phis
* Local SSA elim now maintains instr to block mapping
* Added a test and confirmed it fails without the updated phis
* IRContext::set_instr_block no longer builds the map if the analysis is
invalid
* Added instruction to block mapping verification to
IRContext::IsConsistent()
This improves Extract replacement to continue through VectorShuffle.
It will also handle Mix with 0.0 or 1.0 in the a-value of the desired
component.
To facilitate optimization of VectorShuffle, the algorithm was refactored
to pass around the indices of the extract in a vector rather than pass the
extract instruction itself. This allows the indices to be modified as the
algorithm progresses.
Modified ADCE to remove dead globals.
* Entry point and execution mode instructions are marked as alive
* Reachable functions and their parameters are marked as alive
* Instruction deletion now deferred until the end of the pass
* Eliminated dead insts set, added IsDead to calculate that value
instead
* Ported applicable dead variable elimination tests
* Ported dead constant elim tests
Added dead function elimination to ADCE
* ported dead function elim tests
Added handling of decoration groups in ADCE
* Uses a custom sorter to traverse decorations in a specific order
* Simplifies necessary checks
Updated -O and -Os pass lists.
Pass now paints live blocks and fixes constant branches and switches as
it goes. No longer requires structured control flow. It also removes
unreachable blocks as a side effect. It fixes the IR (phis) before doing
any code removal (other than terminator changes).
Added several unit tests for updated/new functionality.
Does not remove dead edge from a phi node:
* Checks that incoming edges are live in order to retain them
* Added BasicBlock::IsSuccessor
* added test
Fixing phi updates in the presence of extra backedge blocks
* Added tests to catch bug
Reworked how phis are updated
* Instead of creating a new Phi and RAUW'ing the old phi with it, I now
replace the phi operands, but maintain the def/use manager correctly.
For unreachable merge:
* When considering unreachable continue blocks the code now properly
checks whether the incoming edge will continue to be live.
Major refactoring for review
* Broke into 4 major functions
* marking live blocks
* marking structured targets
* fixing phis
* deleting blocks
This fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1143.
When an instruction transitions from constant to bottom (varying) in the
lattice, we were telling the propagator that the instruction was
varying, but never updating the actual value in the values table.
This led to incorrect value substitutions at the end of propagation.
The patch also re-enables CCP in -O and -Os.
In HLSL structured buffer legalization, pointer to pointer types
are emitted to indicate a structured buffer variable should be
treated as an alias of some other variable. We need an option to
relax the check of pointer types in logical addressing mode to
catch other validation errors.
Add post-order tree iterator.
Add DominatorTreeNode extensions:
- Add begin/end methods to do pre-order and post-order tree traversal from a given DominatorTreeNode
Add DominatorTree extensions:
- Add begin/end methods to do pre-order and post-order tree traversal
- Tree traversal ignore by default the pseudo entry block
- Retrieve a DominatorTreeNode from a basic block
Add loop descriptor:
- Add a LoopDescriptor class to register all loops in a given function.
- Add a Loop class to describe a loop:
- Loop parent
- Nested loops
- Loop depth
- Loop header, merge, continue and preheader
- Basic blocks that belong to the loop
Correct a bug that forced dominator tree to be constantly rebuilt.
Turn `Linker::Link()` into free functions
As very little information was kept in the Linker class, we can get rid
of the whole class and have the `Link()` as free functions instead; the
environment target as well as the consumer are passed along through an
`spv_context` object.
The resulting linked_binary is passed as a pointer rather than a
reference to follow the Google C++ Style guidelines.
Addresses remaining comments from
https://github.com/KhronosGroup/SPIRV-Tools/pull/693 about the SPIR-V
linker.
Fix variable naming in the linker
Some of the variables were using mixed case, which did not follow the
Google C++ Style guidelines.
Linker: Use EXPECT_EQ when possible and update some test
* Replace occurrences of ASSERT_EQ by EXPECT_EQ when possible;
* Reformulated some of the error messages;
* Added the symbol name in the error message when there is a type or
decoration mismatch between the imported and exported declarations.
Opt: List all duplicates removed by RemoveDuplicatePass in the header
Opt: Make the const version of GetLabelInst() return a pointer
For consistency with the non-const version, as well as other similar
functions.
Opt: Rename function_end to EndInst()
As pointed out by dneto0 the previous name was quite confusing and could
be mistaken with a function returning an end iterator.
Also change the return type of the const version to a pointer rather
than a reference, for consistency.
Opt: Add performance comment to RemoveDuplicateTypes and decorations
This comment was requested during the review of
https://github.com/KhronosGroup/SPIRV-Tools/pull/693.
Opt: Add comments and fix variable naming in RemoveDuplicatePass
* Add missing comments to private functions;
* Rename variables that were using mixed case;
* Add TODO for moving AreTypesEqual out.
Linker: Remove commented out code and add TODOs
Linker: Merged together strings that were too much splitted
Implement a C++ RAII wrapper around spv_context
In value numbering, we treat loads and stores of images, ie OpImageLoad,
as a memory operation where it is interested in the "base address" of
the instruction. In those cases, it is an image instruction.
The problem is that `Instruction::GetBaseAddress()` does not account for
the image instructions, so the assert at the end to make sure it found
a valid base address for its addressing mode fails.
The solution is to look at the load/store instruction to determine how
the assertion should be done.
Fixes#1160.
This fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1159. I
had missed a nuance in the original algorithm. When simulating Phi
instructions, the SSA edges out of a Phi instruction should never be
added to the list of edges to simulate.
Phi instructions can be in SSA def-use cycles with other Phi
instructions. This was causing the propagator to fall into an infinite
loop when the same def-use edge kept being added to the queue.
The original algorithm in the paper specifically separates the visit of
a Phi instruction vs the visit of a regular instruction. This fix makes
the implementation match the original algorithm.
1. Added OpCompositeExtract/Insert out of bounds checks where possible
(everything except RuntimeArray)
2. Moved validation of OpCompositeExtract/Insert from validate_id.cpp to
validate_composites.cpp.
In CCP we should not need to insert Phi nodes because CCP never looks at
loads/stores. This required adjusting two tests that relied on Phi
instructions being inserted. I changed the tests to have the Phi
instructions pre-inserted.
I also added a new test to make sure that CCP does not try to look
through stores and loads.
Finally, given that CCP does not handle loads/stores, it's better to run
mem2reg before it. I've changed the -O/-Os schedules to run local
multi-store elimination before CCP.
Although this is just an efficiency fix for CCP, it is
also working around a bug in Phi insertion. When Phi instructions are
inserted, they are never associated a basic block. This causes a
segfault when the propagator tries to lookup CFG edges when analyzing
Phi instructions.
Add grammar file for DebugInfo extended instruction set
- Each new operand enum kind in extinst.debuginfo.grammar.json maps
to a new value in spv_operand_type_t.
- Add new concrete enum operand types for DebugInfo
Generate a C header for the DebugInfo extended instruction set
Add table lookup of DebugInfo extended instrutions
Handle the debug info operand types in binary parser,
disassembler, and assembler.
Add DebugInfo round trip tests for assembler, disassembler
Android.mk: Support DebugInfo extended instruction set
The extinst.debuginfo.grammar.json file is currently part of
SPIRV-Tools source.
It contributes operand type enums, so it has to be processed
along with the core grammar files.
We also generate a C header DebugInfo.h.
Add necessary grammar file processing to Android.mk.
This implements the conditional constant propagation pass proposed in
Constant propagation with conditional branches,
Wegman and Zadeck, ACM TOPLAS 13(2):181-210.
The main logic resides in CCPPass::VisitInstruction. Instruction that
may produce a constant value are evaluated with the constant folder. If
they produce a new constant, the instruction is considered interesting.
Otherwise, it's considered varying (for unfoldable instructions) or
just not interesting (when not enough operands have a constant value).
The other main piece of logic is in CCPPass::VisitBranch. This
evaluates the selector of the branch. When it's found to be a known
value, it computes the destination basic block and sets it. This tells
the propagator which branches to follow.
The patch required extensions to the constant manager as well. Instead
of hashing the Constant pointers, this patch changes the constant pool
to hash the contents of the Constant. This allows the lookups to be
done using the actual values of the Constant, preventing duplicate
definitions.
In order to keep track of all of the implicit capabilities as well as
the explicit ones, we will add them all to the feature manager. That is
the object that needs to be queried when checking if a capability is
enabled.
The name of the "HasCapability" function in the module was changed to
make it more obvious that it does not check for implied capabilities.
Keep an spv_context and AssemblyGrammar in IRContext
* changed the way duplicate types are removed to stop copying
instructions
* Reworked RemoveDuplicatesPass::AreTypesSame to use type manager and
type equality
* Reworked TypeManager memory management to store a pool of unique
pointers of types
* removed unique pointers from id map
* fixed instances where free'd memory could be accessed
A few optimizations are updates to handle code that is suppose to be
using the logical addressing mode, but still has variables that contain
pointers as long as the pointer are to opaque objects. This is called
"relaxed logical addressing".
|Instruction::GetBaseAddress| will check that pointers that are use meet
the relaxed logical addressing rules. Optimization that now handle
relaxed logical addressing instead of logical addressing are:
- aggressive dead-code elimination
- local access chain convert
- local store elimination passes.
When a private variable is used in a single function, it can be
converted to a function scope variable in that function. This adds a
pass that does that. The pass can be enabled using the option
`--private-to-local`.
This transformation allows other transformations to act on these
variables.
Also moved `FindPointerToType` from the inline class to the type manager.
- Test validation success for OpEmitVertex OpEndPrimitive
- Test missing capabilities for primitive instructions
- Primitive instructions require Geometry execution model
@ehsannas had filed an issue against SPIR-V spec, concerning
Image Operands section (3.14):
Sample
A following operand is the sample number of the sample to use. Only
valid with OpImageFetch, OpImageRead, and OpImageWrite.
Relaxing the check to allow OpImageSparseRead and
OpImageSparseFetch to fix failing tests.
types. This allows the lookup of type declaration ids from arbitrarily
constructed types. Users should be cautious when dealing with non-unique
types (structs and potentially pointers) to get the exact id if
necessary.
* Changed the spec composite constant folder to handle ambiguous composites
* Added functionality to create necessary instructions for a type
* Added ability to remove ids from the type manager
This fixes issue #1075
- Mark continue when conditional branch with merge block.
Only mark if merge block is not continue block.
- Handle conditional branch break with preceding merge
Inlining is not setting the parent (function) for each basic block.
This can cause problems for later optimizations. The solution is to set
the parent for each new block just before it is linked into the
function.
include: Add target environment enums for OpenCL 1.2 and 2.0
Validator: Validate OpenCL capabilities
Update validate capabilities to handle embedded profiles
Add test for OpenCL capabilities validation
Update messages to mention the OpenCL profile used
Re-format val_capability_test.cpp
Adds a scalar replacement pass. The pass considers all function scope
variables of composite type. If there are accesses to individual
elements (and it is legal) the pass replaces the variable with a
variable for each composite element and updates all the uses.
Added the pass to -O
Added NumUses and NumUsers to DefUseManager
Added some helper methods for the inst to block mapping in context
Added some helper methods for specific constant types
No longer generate duplicate pointer types.
* Now searches for an existing pointer of the appropriate type instead
of failing validation
* Fixed spec constant extracts
* Addressed changes for review
* Changed RunSinglePassAndMatch to be able to run validation
* current users do not enable it
Added handling of acceptable decorations.
* Decorations are also transfered where appropriate
Refactored extension checking into FeatureManager
* Context now owns a feature manager
* consciously NOT an analysis
* added some test
* fixed some minor issues related to decorates
* added some decorate related tests for scalar replacement
Adds a pass that looks for redundant instruction in a function, and
removes them. The algorithm is a hash table based value numbering
algorithm that traverses the dominator tree.
This pass removes completely redundant instructions, not partially
redundant ones.
Currently when inlining a call, the name and decorations for the result of the
call is not deleted. This should be changed. Added a test for this as well.
This fixes issue #622.
Support for dominator and post dominator analysis on ir::Functions. This patch contains a DominatorTree class for building the tree and DominatorAnalysis and DominatorAnalysisPass classes for interfacing and caching the built trees.
The current method of removing an instruction is to call ToNop. The
problem with this is that it leaves around an instruction that later
passes will look at. We should just delete the instruction.
In MemPass there is a utility routine called DCEInst. It can delete
essentially any instruction, which can invalidate pointers now that they
are actually deleted. The interface was changed to add a call back that
can be used to update any local data structures that contain
ir::Intruction*.
Computing the value numbers on demand, as we do now, can lead to
different results depending on the order in which the users asks for
the value numbers. To make things more stable, we compute them ahead
of time.
This needs custom code since the rules from the extension
are not encoded in the grammar.
Changes are:
- The new group instructions don't require Group capability
when the extension is declared.
- The Reduce, InclusiveScan, ExclusiveScan normally require the Kernel
capability, but don't when the extension is declared.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/991
This class implements a generic value propagation algorithm based on the
conditional constant propagation algorithm proposed in
Constant propagation with conditional branches,
Wegman and Zadeck, ACM TOPLAS 13(2):181-210.
The implementation is based on
A Propagation Engine for GCC
Diego Novillo, GCC Summit 2005
http://ols.fedoraproject.org/GCC/Reprints-2005/novillo-Reprint.pdf
The purpose of this implementation is to act as a common framework for any
transformation that needs to propagate values from statements producing new
values to statements using those values.
Re-formatted the source tree with the command:
$ /usr/bin/clang-format -style=file -i \
$(find include source tools test utils -name '*.cpp' -or -name '*.h')
This required a fix to source/val/decoration.h. It was not including
spirv.h, which broke builds when the #include headers were re-ordered by
clang-format.
Removed the check that result type of OpImageRead should be a vector4.
Will reenable/adapt once the spec is clarified on what the right
dimension should be.
Replaced representation of uses
* Changed uses from unordered_map<uint32_t, UseList> to
set<pairInstruction*, Instruction*>>
* Replaced GetUses with ForEachUser and ForEachUse functions
* updated passes to use new functions
* partially updated tests
* lots of cleanup still todo
Adding an unique id to Instruction generated by IRContext
Each instruction is given an unique id that can be used for ordering
purposes. The ids are generated via the IRContext.
Major changes:
* Instructions now contain a uint32_t for unique id and a cached context
pointer
* Most constructors have been modified to take a context as input
* unfortunately I cannot remove the default and copy constructors, but
developers should avoid these
* Added accessors to parents of basic block and function
* Removed the copy constructors for BasicBlock and Function and replaced
them with Clone functions
* Reworked BuildModule to return an IRContext owning the built module
* Since all instructions require a context, the context now becomes the
basic unit for IR
* Added a constructor to context to create an owned module internally
* Replaced uses of Instruction's copy constructor with Clone whereever I
found them
* Reworked the linker functionality to perform clones into a different
context instead of moves
* Updated many tests to be consistent with the above changes
* Still need to add new tests to cover added functionality
* Added comparison operators to Instruction
Adding tests for Instruction, IRContext and IR loading
Fixed some header comments for BuildModule
Fixes to get tests passing again
* Reordered two linker steps to avoid use/def problems
* Fixed def/use manager uses in merge return pass
* Added early return for GetAnnotations
* Changed uses of Instruction::ToNop in passes to IRContext::KillInst
Simplifying the uses for some contexts in passes
Creates a pass that removes redundant instructions within the same basic
block. This will be implemented using a hash based value numbering
algorithm.
Added a number of functions that check for the Vulkan descriptor types.
These are used to determine if we are variables are read-only or not.
Implemented a function to check if loads and variables are read-only.
Implemented kernel specific and shader specific versions.
A big change is that the Combinator analysis in ADCE is factored out
into the IRContext as an analysis. This was done because it is being
reused in the value number table.
Add new "short descriptor" algorithm to MARK-V codec.
Add three shader compression models:
lite - fast, poor compression
mid - balanced
max - best compression
Each instruction is given an unique id that can be used for ordering
purposes. The ids are generated via the IRContext.
Major changes:
* Instructions now contain a uint32_t for unique id and a cached context
pointer
* Most constructors have been modified to take a context as input
* unfortunately I cannot remove the default and copy constructors, but
developers should avoid these
* Added accessors to parents of basic block and function
* Removed the copy constructors for BasicBlock and Function and replaced
them with Clone functions
* Reworked BuildModule to return an IRContext owning the built module
* Since all instructions require a context, the context now becomes the
basic unit for IR
* Added a constructor to context to create an owned module internally
* Replaced uses of Instruction's copy constructor with Clone whereever I
found them
* Reworked the linker functionality to perform clones into a different
context instead of moves
* Updated many tests to be consistent with the above changes
* Still need to add new tests to cover added functionality
* Added comparison operators to Instruction
* Added an internal option to LinkerOptions to verify merged ids are
unique
* Added a test for the linker to verify merged ids are unique
* Updated MergeReturnPass to supply a context
* Updated DecorationManager to supply a context for cloned decorations
* Reworked several portions of the def use tests in anticipation of next
set of changes
If SPIRV-Tools is used as an external project and have
googletest being kept in the same directory as it, we
won't have gmock-matchers.h in external/. This will
result in a compilation error.
Use gmock.h instead.
To make the decoration manger available everywhere, and to reduce the
number of times it needs to be build, I add one the IRContext.
As the same time, I move code that modifies decoration instruction into
the IRContext from mempass and the decoration manager. This will make
it easier to keep everything up to date.
This should take care of issue #928.
Works with current DefUseManager infrastructure.
Added merge return to the standard opts.
Added validation to passes.
Disabled pass for shader capabilty.
This analysis builds a map from instructions to the basic block that
contains them. It is accessed via get_instr_block(). Once built, it is kept
up-to-date by the IRContext, as long as instructions are removed via
KillInst.
I have not yet marked passes that preserve this analysis. I will do it
in a separate change.
Other changes:
- Add documentation about analysis values requirement to be powers of 2.
- Force a re-build of the def-use manager in tests.
- Fix AllPreserveFirstOnlyAfterPassWithChange to use the
DummyPassPreservesFirst pass.
- Fix sentinel value for IRContext::Analysis enum.
- Fix logic for checking if the instr<->block mapping is valid in KillInst.
Fixes issue #728. Currently the inliner is not generating decorations for
inlined code which corresponds to function code which has decorations. An
example of decorations that are relevant: RelaxedPrecision, NoContraction.
The solution is to replicate the decoration during inlining.
Add Effcee as an optional dependency for use in tests. In future it will
be a required dependency.
Effcee is a stateful pattern matcher that has much of the functionality
of LLVM's FileCheck, except in library form. Effcee makes it much easier
to write tests for optimization passes.
Demonstrate its use in a test for the strength-reduction pass.
Update README.md with example commands of how to get sources.
Update Appveyor and Travis-CI build rules.
Also: Include test libraries if not SPIRV_SKIP_TESTS
- SPIRV_SKIP_TESTS is implied by SPIRV_SKIP_EXECUTABLES
This change will move the instances of the def-use manager to the
IRContext. This allows it to persists across optimization, and does
not have to be rebuilt multiple times.
Added test to ensure that the IRContext is validating and invalidating
the analyses correctly.
This is the first part of adding the IRContext. This class is meant to
hold the extra data that is build on top of the module that it
owns.
The first part will simply create the IRContext class and get it passed
to the passes in place of the module. For now it does not have any
functionality of its own, but it acts more as a wrapper for the module.
The functions that I added to the IRContext are those that either
traverse the headers or add to them. I did this because we may decide
to have other ways of dealing with these sections (for example adding a
type pool, or use the decoration manager).
I also added the function that add to the header because the IRContext
needs to know when an instruction is added to update other data
structures appropriately.
Note that there is still lots of work that needs to be done. There are
still many places that change the module, and do not inform the context.
That will be the next step.
Mark structured conditional branches live only if one or more instructions
in their associated construct is marked live. After closure, replace dead
structured conditional branches with a branch to its merge and remove
dead blocks.
ADCE: Dead If Elim: Remove duplicate StructuredOrder code
Also generalize ComputeStructuredOrder so that the caller can specify the
root block for the order. Phi insertion uses pseudo_entry_block and adce and
dead branch elim use the first block of the function.
ADCE: Dead If Elim: Pull redundant code out of InsertPhiInstructions
ADCE: Dead If Elim: Encapsulate CFG Cleanup Initialization
ADCE: Dead If Elim: Remove redundant code from ADCE initialization
ADCE: Dead If: Use CFGCleanup to eliminate newly dead blocks
Moved bulk of CFG Cleanup code into MemPass.
There are a number of users of spriv-opt that are hitting errors
because of stores with different types. In general, this is wrong, but,
in these cases, the types are the exact same except for decorations.
The options is "--relax-store-struct", and it can be used with the
validator or the optimizer.
We assume that if layout information is missing it is consistent. For
example if one struct has a offset of one of its members, and the other
one does not, we will still consider them as being layout compatible.
The problem will be if both struct has and offset decoration for
corresponding members, and the offset are different.
This change will replace a number of the
std::vector<std::unique_ptr<Instruction>> member of the module to
InstructionList. This is for consistency and to make it easier to
delete instructions that are no longer needed.
Function static non-POD data causes problems with DLL lifetime.
This pull request turns all static info tables into strict POD
tables. Specifically, the capabilities/extensions field of
opcode/operand/extended-instruction table are turned into two
fields, one for the count and the other a pointer to an array of
capabilities/extensions. CapabilitySet/EnumSet are not used in
the static table anymore, but they are still used for checking
inclusion by constructing on the fly, which should be cheap for
the majority cases.
Also moves all these tables into the global namespace to avoid
C++11 function static thread-safe initialization overhead.
Markv codec now receives two optional callbacks:
LogConsumer for internal codec logging
DebugConsumer for testing if encoding->decoding produces the original
results.
There does not seem to be any pass that remove global variables. I
think we could use one. This pass will look specifically for global
variables that are not referenced and are not exported. Any decoration
associated with the variable will also be removed. However, this could
cause types or constants to become unreferenced. They will not be
removed. Another pass will have to be called to remove those.
The pass checks correctness of operands of instruction in opcode range
OpConvertFToU - OpBitset.
Disabled invalid tests
Disabled UConvert validation until Vulkan CTS can catch up.
Add validate_conversion to Android.mk
Also remove duplicate entry in CMakeLists.txt.
This is the first step in replacing the std::vector of Instruction
pointers to using and intrusive linked list.
To this end, we created the InstructionList class. It inherites from
the IntrusiveList class, but add the extra concept of ownership. An
InstructionList owns the instruction that are in it. This is to be
consistent with the current ownership rules where the vector owns the
instruction that are in it.
The other larger change is that the inst_ member of the BasicBlock class
was changed to using the InstructionList class.
Added test for the InsertBefore functions, and making sure that the
InstructionList destructor will delete the elements that it contains.
I've also add extra comments to explain ownership a little better.
- Adds a new pass CFGCleanupPass. This serves as an umbrella pass to
remove unnecessary cruft from a CFG.
- Currently, the only cleanup operation done is the removal of
unreachable basic blocks.
- Adds unit tests.
- Adds a flag to spirvopt to execute the pass (--cfg-cleanup).
- switched from C to C++
- moved MARK-V model creation from backend to frontend
- The same MARK-V model object can be used to encode/decode multiple
files
- Added MARK-V model factory (currently only one option)
- Added --validate option to spirv-markv (run validation while
encoding/decoding)
This commit is the initial implementation of the intrusive linked list
class. It includes the implementation in the header files, and unit
test.
The iterators are circular: incrementing end() gives begin() and
decrementing begin() gives end(). Also made it valid to
decrement end().
Expliticly defines move constructor and move assignment
- Visual Studio 2013 does not implicitly generate the move constructor or
move assignments. So they need to be explicit, otherwise it will try to
use the copy constructor, which we explicitly deleted.
- Can't use "= default" either.
Seems like VS2013 does not support explicitly using the default move
constructors and move assignments, so I wrote them out.
Expands dead branch elimination to eliminate dead switch cases. It also
changes dbe to eliminate orphaned merge blocks and recursively eliminate
any blocks thereby orphaned.
Add extra iterators for ir::Module's sections
Add extra getters to ir::Function
Add a const version of BasicBlock::GetLabelInst()
Use the max of all inputs' version as version
Split debug in debug1 and debug2
- Debug1 instructions have to be placed before debug2 instructions.
Error out if different addressing or memory models are found
Exit early if no binaries were given
Error out if entry points are redeclared
Implement copy ctors for Function and BasicBlock
- Visual Studio ends up generating copy constructors that call deleted
functions while compiling the linker code, while GCC and clang do not.
So explicitly write those functions to avoid Visual Studio messing up.
Move removing duplicate capabilities to its own pass
Add functions running on all IDs present in an instruction
Remove duplicate SpvOpExtInstImport
Give default options value for link functions
Remove linkage capability if not making a library
Check types before allowing to link
Detect if two types/variables/functions have different decorations
Remove decorations of imported variables/functions and their types
Add a DecorationManager
Add a method for removing all decorations of id
Add methods for removing operands from instructions
Error out if one of the modules has a non-zero schema
Update README.md to talk about the linker
Do not freak out if an imported built-in variable has no export
Creates a pass called eliminate dead functions that looks for functions
that could never be called, and deletes them from the module.
To support this change a new function was added to the Pass class to
traverse the call trees from diffent starting points.
Includes a test to ensure that annotations are removed when deleting a
dead function. They were not, so fixed that up as well.
Did some cleanup of the assembly for the test in pass_test.cpp. Trying
to make them smaller and easier to read.
Create a new optimization pass, strength reduction, which will replace
integer multiplication by a constant power of 2 with an equivalent bit
shift. More changes could be added later.
- Does not duplicate constants
- Adds vector |Concat| utility function to a common test header.
Includes:
- Multi-sequence move-to-front
- Coding by id descriptor
- Statistical coding of non-id words
- Joint coding of opcode and num_operands
Removed explicit form Huffman codec constructor
- The standard use case for it is to be constructed from initializer list.
Using serialization for Huffman codecs
This adapts the fix for the single-block loop. Split the loop like
before. But when we move the OpLoopMerge back to the loop header,
redirect the continue target only when the original loop was a single
block loop.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/800
Use the list from the SPIR-V registry page. Also, capture it as
a string so it's much easier to update via copy-paste.
The validator will accept modules that declare these known
extensions. However, we might not know about new tokens or
instructions declared in them. For that we need grammar updates
applied to SPIRV-Headers.
If the caller block is a single-block loop and inlining will
replace the caller block by several blocks, then:
- The original OpLoopMerge instruction will end up in the *last*
such block. That's the wrong place to put it.
- Move it back to the end of the first block.
- Update its Continue Target ID to point to the last block
We also have to take care of cases where the inlined code
begins with a structured header block. In this case
we need to ensure the restored OpLoopMerge does not appear
in the same block as the merge instruction from the callee's
first block.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/787
- DeadBranchElim: Make sure to mark orphan'd merge blocks and continue
targets as live.
- Add test with loop in dead branch
- Add test that orphan'd merge block is handled.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/776
Bit stream writer was manifesting incorrect behaviour when the following
two conditions were met:
- writer was on 64-bit word boundary
- WriteBits was invoked with num_bits=0 (can happen when a Huffman codec has only one
value)
The bug was causing very rare sporadic corruption which was detected by
tests after a random experimental change in MARK-V model.
Only inline calls to functions with opaque params or return
TODO: Handle parameter type or return type where the opqaue
type is buried within an array.
Includes code to deal correctly with OpFunctionParameter. This
is needed by opaque propagation which may not exhaustively inline
entry point functions.
Adds ProcessEntryPointCallTree: a method to do work on the
functions in the entry point call trees in a deterministic order.
Refactored the Huffman codec implementation and added ability to
serialize to C++-like text format. This would reduce the time-complexity
if loading hard-coded codecs.
Id descriptors are computed as a recursive hash of all instructions used
to define an id. Descriptors are invarint of actual id values and
the similar code in different files would produce the same descriptors.
Multiple ids can have the same descriptor. For example
%1 = OpConstant %u32 1
%2 = OpConstant %u32 1
would produce two ids with the same descriptor. But
%3 = OpConstant %s32 1
%4 = OpConstant %u32 2
would have descriptors different from %1 and %2.
Descriptors will be used as handles of move-to-front sequences in SPIR-V
compression.
ADCE will now generate correct code in the presence of function calls.
This is needed for opaque type optimization needed by glslang. Currently
all function calls are marked as live. TODO: mark calls live only if they
write a non-local.
- UniformElim: Only process reachable blocks
- UniformElim: Don't reuse loads of samplers and images across blocks.
Added a second phase which only reuses loads within a block for samplers
and images.
- UniformElim: Upgrade CopyObject skipping in GetPtr
- UniformElim: Add extensions whitelist
Currently disallowing SPV_KHR_variable_pointers because it doesn't
handle extended pointer forms.
- UniformElim: Do not process shaders with GroupDecorate
- UniformElim: Bail on shaders with non-32-bit ints.
- UniformElim: Document support for only single index and add TODO.
Add MultiMoveToFront class which supports multiple move-to-front
sequences and allows to promote value in all sequences at once.
Added caching for last accessed sequence handle and last accessed value
in each sequence.
Currently only SPV_KHR_variable_pointers is disallowed in passes which
do pointer analysis. Positive and negative tests of the general extensions
mechanism were added to aggressive_dce but cover all passes.
Create aggressive dead code elimination pass
This pass eliminates unused code from functions. In addition,
it detects and eliminates code which may have spurious uses but which do
not contribute to the output of the function. The most common cause of
such code sequences is summations in loops whose result is no longer used
due to dead code elimination. This optimization has additional compile
time cost over standard dead code elimination.
This pass only processes entry point functions. It also only processes
shaders with logical addressing. It currently will not process functions
with function calls. It currently only supports the GLSL.std.450 extended
instruction set. It currently does not support any extensions.
This pass will be made more effective by first running passes that remove
dead control flow and inlines function calls.
This pass can be especially useful after running Local Access Chain
Conversion, which tends to cause cycles of dead code to be left after
Store/Load elimination passes are completed. These cycles cannot be
eliminated with standard dead code elimination.
Additionally: This transform uses a whitelist of instructions that it
knows do have side effects, (a.k.a. combinators). It assumes other
instructions have side effects: it will not remove them, and assumes
they have side effects via their ID operands.
A SSA local variable load/store elimination pass.
For every entry point function, eliminate all loads and stores of function
scope variables only referenced with non-access-chain loads and stores.
Eliminate the variables as well.
The presence of access chain references and function calls can inhibit
the above optimization.
Only shader modules with logical addressing are currently processed.
Currently modules with any extensions enabled are not processed. This
is left for future work.
This pass is most effective if preceeded by Inlining and
LocalAccessChainConvert. LocalSingleStoreElim and LocalSingleBlockElim
will reduce the work that this pass has to do.
Fixes Instruction::ForEachInId so it covers
SPV_OPERAND_TYPE_MEMORY_SEMANTICS_ID and SPV_OPERAND_TYPE_SCOPE_ID.
Future proof a bit by using the common spvIsIdType routine.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/697
Fixed width encoding is intended to be used for small unsigned integers
when the upper bound is known both to the encoder and the decoder
(for example move-to-front rank).
Command line application is located at tools/spirv-markv
API at include/spirv-tools/markv.h
At the moment only very basic compression is implemented, mostly varint.
Scope of supported SPIR-V opcodes is also limited.
Using a simple move-to-front implementation instead of encoding mapped
ids.
Work in progress:
- Does not cover all of SPIR-V
- Does not promise compatibility of compression/decompression across
different versions of the code.
The implementation is based on AVL and order statistic tree.
It accepts all kinds of values and the implementation
doesn't expect the behaviour to be consistent with id coding.
Intended by SPIR-V compression algorithms.
Added data structure to SpirvStats which is used to collect statistics
on opcodes following other opcodes.
Added a simple analysis print-out to spirv-stats.
If the variable_pointer extension is used:
* OpLoad's pointer argument may be the result of any of the following:
* OpSelect
* OpPhi
* OpFunctionCall
* OpPtrAccessChain
* OpCopyObject
* OpLoad
* OpConstantNull
* Return value of a function may be a pointer.
* It is valid to use a pointer as the return value of a function.
* OpStore should allow a variable pointer argument.
Add --flatten-decorations to spirv-opt
Flattens decoration groups. That is, replace OpDecorationGroup
and its uses in OpGroupDecorate and OpGroupMemberDecorate with
ordinary OpDecorate and OpMemberDecorate instructions.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/602
The spvtools::Optimizer::Run method should also write the output binary
if optimization succeeds without changes but the output binary vector
does not have exactly the same contents as the input binary.
We have to check both the base pointer of the storage and the size of
the vector
Added a test for this too.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/611
Supported in assembler, disassembler, and binary parser.
The validator does not check SPV_AMD_gcn_shader validation rules
beyond parsing the extension.
Adds generic support for generating instruction tables for vendor
extensions.
Adds generic support for extensions the validator should recognize
(but not check) but which aren't derived from the SPIR-V core
grammar file.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/594
The SPIR-V core grammar file in a recent SPIRV-Headers
applied the fix from Rev 7 of SPV_KHR_16bit_storage:
FPRoundingMode enums are now enabled by the capabilities
introduced by that extension.
Update the SPIRV-Tools tests accordingly.
Autogenerating the following code:
- extension enum
- extension-to-string
- string-to-extension
- capability-to-string
Capability mapping table will not compile if incomplete.
TODO: Use "spirv-latest-version.h" instead of 1.1.
Added function to generate capability tables for tests.
Known extensions are saved in validation state. Unknown extension
produce a dignostic message, but do not fail the validation.
Moved extension definitions to their own file.
The assembler assigns ID numbers sequentially, so it's confusing
to have a %1 in the source assembly when it isn't the first mentioned
ID. Rewrite the ID names to avoid this situation in a few cases.
From the SPIR-V Spec 2.16.1:
A function declaration (an OpFunction with no basic blocks), must have
a Linkage Attributes Decoration with the Import Linkage Type.
A function definition (an OpFunction with basic blocks) cannot be
decorated with the Import Linkage Type.
The limit for the number of struct members is parameterized using
command line options.
Add --max-struct-depth command line option.
Add --max-switch-branches command line option.
Add --max-function-args command line option.
Add --max-control-flow-nesting-depth option.
Add --max-access-chain-indexes option.
If a merge block is reachable, then it must be *strictly* dominated
by its header. Until now we've allowed the header and the merge
block to be the same.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/551
Also: Use dominates and postdominates methods on BasicBlock to
improve readability.
It is best to check the error messages of unit tests that fail
validation. This will ensure that a validation failure is due to what we
expect and not due to some secondary reason.
Updating SPIR-V Validator unit tests with error message checks.
When applied to a structure-type member, all members of that structure
type must also be decorated with BuiltIn. (No allowed mixing of built-in
variables and non-built-in variables within a single structure.)
When applied to a structure-type member, that structure type cannot be
contained as a member of another structure type.
There is at most one object per Storage Class that can contain a
structure type containing members decorated with BuiltIn, consumed per
entry-point.
It is acceptable for OpAccessChain, OpInBoundsAccessChain,
OpPtrAccessChain, OpInBoundsPtrAccessChain, OpCompositeInsert, and
OpCompositeExtract to not take any indexes as arguments. In such cases,
no indexing will be done on the Base pointer/composite.
Added a new file where all the decoration validation can be performed.
In this change the SPIRV Spec Section 2.16.1 is implemented:
"It is illegal to initialize an imported variable. This means
that a module-scope OpVariable with initialization value cannot be
marked with the Import Linkage Type."
Also added unit tests.
* Added the decoration class as well as the code that registers the
decorations for each <id> and also decorations for struct members.
* Added unit tests for decorations in ValidationState as well as
decoration id tests.
According to the SPIRV Spec (2.16.1):
* There is at least one OpEntryPoint instruction, unless the Linkage
capability is being used.
* No function can be targeted by both an OpEntryPoint instruction and an
OpFunctionCall instruction.
Also updated unit tests to includ OpEntryPoint.
We are adding a new API which can be called to run the SPIR-V validator,
and retrieve the ValidationState_t object. This is very useful for
unit testing.
I have also added basic unit tests that demonstrate usage of this flow
and ease of use to verify correctness.
The validity of each command is checked based on the descripton in
SPIR-V Spec Section 3.32.12 (Composite Instructions).
Also checked that the number of indexes passed to these commands does
not exceed the limit described in 2.17 (Universal Limits).
Also added unit tests for each one.
entry_block_to_construct_ maps an entry block to its construct. The key
in this map (the entry block) is not unique, and therefore the entry for
the continue construct gets overwritten when the selection construct is
discovered.
Since a given block may be the entry block of different types of
constructs, the (basic_block, construct_type) pair should be able to
uniquely identify the construct.
Adds test:
- In this test, a basic block is the entry block of a continue construct
as well as the entry block of a selection construct.
It can be shown that this unit test would crash without the fix in this
PR and passes with the fix in this PR.
Validation for OpPtrAccessChain is similar to OpAccessChain with the
following difference: OpPtrAccessChain takes an extra argument (word 4)
which is the Element <id> argument.
Validation for OpInBoundsPtrAccessChain is also similar to OpPtrAccessChain.
Also added tests for all access chain instructions:
Modified the existing parameterized tests to accommodate OpPtrAccessChain and
OpInBoundsPtrAccessChain.
Also fixed a typo in previous commits.
Using parameterized unit tests to avoid duplicate code that runs the
tests of OpAccessChain and OpInBoundsAccessChain.
This is also a steppingstone to adding tests for OpPtrAccessChain and
OpInBoundsPtrAccessChain.
According to Section 2.17 (Universal Limits) of the SPIR-V Spec, the
control flow nesting depth may not be larger than 1023.
This is checked only when we are required to have structured
control flow. Otherwise it's not clear how to compute control
flow nesting depth.
- Parse CHANGES file with Universal Python line endings in case
the source tree was checked out with Windows line endings.
- Use our own clone of strnlen_s which might not be available
everywhere.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/508
According to sectin 2.17 in SPIR-V Spec, the structure nesting depth may
not be larger than 255. This is interpreted as structures nested in
structures. The code does not look into arrays or follow pointers to see
if it reaches a structure downstream.
Use memoization to avoid exponential runtime.
The validation code for OpAccessChain was missing OpTypeRuntimeArray as
a possible type that can be indexed into.
This was caught by running the validator on VKCTS.
Also adding unit tests for it.
* Result Type must be an OpTypePointer. Its Type operand must be the
type reached by walking the Base’s type hierarchy down to the last
provided index in Indexes, and its Storage Class operand must be the
same as the Storage Class of Base.
* Base must be a pointer, pointing to the base of a composite object.
* Indexes walk the type hierarchy to the desired depth, potentially down
to scalar granularity. The first index in Indexes will select the
top-level member/element/component/element of the base composite. All
composite constituents use zero-based numbering, as described by their
OpType... instruction. The second index will apply similarly to that
result, and so on. Once any non-composite type is reached, there must
be no remaining (unused) indexes. Each of the Indexes must:
- be a scalar integer type,
- be an OpConstant when indexing into a structure.
* Check for the case where no indexes are passed to OpAccessChain.
Minor improvements based on code review.
According to the Universal Limits section of the SPIR-V Spec (2.17), the
number of global variables may not exceed 65,535 and the number of local
variables may not exceed 524,287.
Also added unit tests for each one.
According to the SPIR-V spec (section 2.17: Universal Limits), the
OpTypeFunction instruction may not take more than 255 arguments for the
function. Also added unit tests for it.
The number of (literal, label) pairs passed to OpSwitch may not exceed
16,383. Added code to validate this and added unit tests for it.
Also fixed a typo in another validor error message.
This is described in Section 2.17 of the SPIR-V Spec.
* Updated existing unit test 'SemanticsIdIsAnIdNotALiteral' to pass by
manipulating the ID bound in its binary header.
* Fixed boundary check in the code.
* Added unit test to check the case that the largest ID is equal to the
ID bound.
This change implements the validation for usages of OpSampledImage
instruction as described in the Data Rules section of the Universal
Validation Rules of the SPIR-V Spec.
SpecConstantComposite may specialize to a vector, matrix, array, or
struct. In each case, the number of components and type of components
that are being specialized to must match the expected result type.
Removed use of macros in these tests.
Now using the spvValidateBase class. Using CompileSuccessfully(), and
ValidateInstructions() to compile to binary and run the validator. Also
using getDiagnosticString() to check the proper error message string.
All the heavy lifting is done in ValidateBase class.
According to the Data Rules section of 2.16.1. Universal Validation
Rules of the SPIR-V Spec:
Forward reference operands in an OpTypeStruct
* must be later declared with OpTypePointer
* the type pointed to must be an OpTypeStruct
* had an earlier OpTypeForwardPointer forward reference to the same <id>
These rules are under "Data Rules" in 2.16.1 (Universal Validation
Rules) part of the SPIR-V 1.1 Specification document:
* Scalar floating-point types can be parameterized only as 32 bit, plus
any additional sizes enabled by capabilities.
* Scalar integer types can be parameterized only as 32 bit, plus any
additional sizes enabled by capabilities.
* Vector types can only be parameterized with numerical types or the
OpTypeBool type.
* Matrix types can only be parameterized with floating-point types.
* Matrix types can only be parameterized as having only 2, 3, or 4
columns.
* Specialization constants (see Specialization) are limited to integers,
Booleans, floating-point numbers, and vectors of these.
Number of components in a vector can be 2 or 3 or 4. If Vector16
capability is used, 8 and 16 components are also allowed.
Also added unit tests for vector data rule.
* Allows OpTypeForwardPointer to reference IDs not yet declared in
the module
* Allows OpTypeStruct to reference IDs not yet declared in
the module
Possible Issue: OpTypeStruct should only allow forward references
if the ID is a pointer that is referenced by a forward pointer. Need
Type support in Validator which is currently a work in progress.
There is no difference between the previous IgnoreMessage() function
and a null std::function, from functionality's perspective.
The user can set nullptr as the MessageConsumer, so need to guard
against nullptr before calling the consumer anyway. It's better
we use it internally so that it may expose problems by us instead
of the user.
Default-constructed Pass/PassManager will have a MessageConsumer
which ignores all messages. SetMessageConsumer() should be called
to supply a meaningful MessageConsumer.
Requires use of SPIRV-Headers that has support
for SPV_KHR_shader_ballot.
Adds assembler, disassembler, binary parser support.
Adds general support for allowing an operand to be
only enabled by a set of extensions.
TODO: Validator support for extension checking.
* Use PIMPL idiom in the C++ interface.
* Clean up interface for assembling and disassembling.
* Add validation into C++ interface.
* Add more tests for the C++ interface.
Add the following macros for logging purpose:
* SPIRV_ASSERT
* SPIRV_DEBUG
* SPIRV_UNIMPLEMENTED
* SPIRV_UNREACHABLE
The last two is always turned on, while the first two can only
be turned on in debug build.
Every time an event happens in the library that the user should be
aware of, the callback will be invoked.
The existing diagnostic mechanism is hijacked internally by a
callback that creates an diagnostic object each time an event
happens.
Defer removal of a Phi's result id from the undefined-forward-reference
set until after you've scanned the arguments. The reordering is only
significant for Phi.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/415
ParseNumber(): Returns false if the given string is a nullptr.
ParseAndEncodeXXXX(): Returns kInvalidText and populate error message:
"The given text is a nullptr", if the givne string is a nullptr.
The pass instance is constructed with a map from spec id (uint32_t) to
default values in string format. The default value strings will be
parsed to numbers according to the target spec constant type.
If the Spec Id decoration is found to be applied on multiple different
target ids, that decoration instruction (OpDecorate or OpGroupDecorate)
will be skipped. But other decoration instrucitons may still be
processed.
Pull out the number parsing logic from
AssemblyContext::binaryEncodeNumericLiteral() to utilities.
The new utility function: `ParseAndEncodeNumber()` now accepts:
* number text to parse
* number type
* a emit function, which is a function which will be called with each
parsed uint32 word.
* a pointer to std::string to be overwritten with error messages.
(pass nullptr if expect no error message)
and returns:
* an enum result type to indicate the status
Type/Structs moved to utility:
* template<typename T> class ClampToZeroIfUnsignedType
New type:
* enum EncodeNumberStatus: success or error code
* NumberType: hold the number type information for the number to be parsed.
* several helper functions are also added for NumberType.
Functions moved to utility:
* Helpers:
* template<typename T> checkRangeAndIfHexThenSignExtend() -> CheckRangeAndIfHex....()
* Interfaces:
* template<typename T> parseNumber() -> ParseNumber()
* binaryEncodeIntegerLiteral() -> ParseAndEncodeIntegerNumber()
* binaryEncodeFloatingPointLiteral() -> ParseAndEncodeFloatingPointNumber()
* binaryEncodeNumericLiteral() -> ParseAndEncodeNumber()
Tests added/moved to test/ParseNumber.cpp, including tests for:
* ParseNumber(): This is moved from TextToBinary.cpp to ParseNumber.cpp
* ParseAndEncodeIntegerNumber(): New added
* ParseAndEncodeFloatingPointNumber(): New added
* ParseAndEncodeNumber(): New added
Note that the error messages are kept almost the same as before, but
they may be inappropriate for an utility function. Those will be fixed
in another CL.
De-duplicate constants and unifies the uses of constants for a SPIR-V
module. If two constants are defined exactly the same, only one of them
will be kept and all the uses of the removed constant will be redirected
to the kept one.
This pass handles normal constants (defined with
OpConstant{|True|False|Composite}), some spec constants (those defined
with OpSpecConstant{Op|Composite}) and null constants (defined with
OpConstantNull).
There are several cases not handled by this pass:
1) If there are decorations for the result id of a constant defining
instruction, that instruction will not be processed. This means the
instruction won't be used to replace other instructions and other
instructions won't be used to replace it either.
2) This pass does not unify null constants (defined with
OpConstantNull instruction) with their equivalent zero-valued normal
constants (defined with OpConstant{|False|Composite} with zero as the
operand values or component values).
Also removed the default argument value of `skip_nop` for function
`SinglePassRunAndCheck()` and `SinglePassRunAndDisassemble()`. This is
required to support variadic arguments.
Use libspirv::CapabilitySet instead of a 64-bit mask.
Remove dead function spvOpcodeRequiresCapability and its tests.
The JSON grammar parser is simplified since it just writes the
list of capabilities as a braced list, and takes advantage of
the CapabilitySet intializer-list constructor.
For the spec constants defined by OpSpecConstantOp and
OpSpecContantComposite, if all of their operands are constants with
determined values (normal constants whose values are fixed), calculate
the correct values of the spec constants and re-define them as normal
constants.
In short, this pass replaces all the spec constants defined by
OpSpecContantOp and OpSpecConstantComposite with normal constants when
possible. So far not all valid operations of OpSpecConstantOp are
supported, we have several constriction here:
1) Only 32-bit integer and boolean (both scalar and vector) are
supported for any arithmetic operations. Integers in other width (like
64-bit) are not supported.
2) OpSConvert, OpFConvert, OpQuantizeToF16, and all the
operations under Kernel capability, are not supported.
3) OpCompositeInsert is not supported.
Note that this pass does not unify normal constants. This means it is
possible to have new generatd constants defining the same values.
This lets us write smaller test cases with the IrLoader, avoiding
boilerplate for function begin/end, and basic block begin/end.
Also ForEachInst is more forgiving of cases where a basic block
doesn't have a label, and when a function doesn't have a defining
or end instruction.
Also:
- Add const forms of ForEachInst
- Rewrite Module::ToBinary in terms of ForEachInst
- Add Instruction::ToBinaryWithoutAttachedDebugInsts
- Delete the ToBinary method on Function, BasicBlock, and Instruction
since it can now be implemented with ForEachInst in a less confusing
way, e.g. without recursion.
- Preserve debug line instructions on OpFunctionEnd (and store that
instruction as a unique-pointer, for regularity).
* Fix the behavior when analyzing an individual instruction:
* exisiting instruction:
Clear the original records and re-analyze it as a new instruction.
* new instruction with exisiting result id:
Clear the original records of the exisiting result id. This means
the records of the analyzed result-id-defining instruction will be
overwritten by the record of the new instruction with the same
result id.
* new instruction with new result id or without result id:
Just update the internal records to incorperate the new
instruction.
* Add tests for analyzing individual instruction w/o an exisiting module.
* Refactor ClearInst() implementation
* Remove ClearDef() function.
* Fixed a bug in DefUseManager::ReplaceAllUsesWith() that OpName
instruction may trigger the assertion incorrectly.
* update the blurbs for EraseUseRecordsOfOperandIds()
By deriving from std::iterator, iterator_traits will be properly
set up for our custom iterator type, thus we can use algorithms
from STL with our custom iterators.
Previously we use vectors of objects and move semantics to handle
ownership. That approach has the flaw that inserting an object into
the middle of a vector, which may trigger a vector reallocation,
can invalidate some addresses taken from instructions.
Now the in-memory representation internally uses vector of unique
pointers to handle ownership. Since objects are explicitly heap-
allocated now, pointers to them won't be invalidated by vector
resizing anymore.
The NEW behavior is to not dereference variables or interpret keywords
that have been quoted or bracketed.
For more information, see
https://cmake.org/cmake/help/v3.1/policy/CMP0054.html.
This is to suppress a warning when using CMake 3.1.3+.
- Find unreachable continue targets. Look for back edges
with a DFS traversal separate from the dominance traversals,
where we count the OpLoopMerge from the header to the continue
target as an edge in the graph.
- It's ok for a loop to have multiple back edges, provided
they are all from the same block, and we call that the latch block.
This may require a clarification/fix in the SPIR-V spec.
- Compute postdominance correctly for infinite loop:
Bias *predecessor* traversal root finding so that you use
a later block in the original list. This ensures that
for certain simple infinite loops in the CFG where neither
block branches to a node without successors, that we'll
compute the loop header as dominating the latch block, and the
latch block as postdominating the loop header.
Fixes dominance calculation when there is a forward arc from an
unreachable block A to a reachable block B. Before this fix, we would
say that B is not dominated by the graph entry node, and instead say
that the immediate dominator of B is the psuedo-entry node of the
augmented CFG.
The fix:
- Dominance is defined in terms of a traversal from the entry block
of the CFG. So the forward DFS should start from the function
entry block, not the pseudo-entry-block.
- When following edges backward during dominance calculations, only go to
nodes that are actually reachable in the forward traversal.
Important: the sense of reachability flips around when computing
post-dominance.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/297
AssemblyBuilder contains boilplates.
Adds OpName instructions for all added defining instructions.
Adds OpDecorate SpecId for all spec constants added with OpSpecConstant,
OpSpecConstantTrue and OpSpecConstantFalse instructions.
The def-use dominance checker doesn't have enough info to know
that a particular use is in an OpPhi, so skip tracking those uses
for now. Add a TODO to do a proper OpPhi variable-argument check
in the future.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/286
Ensure the dominance calculation visits all nodes in the CFG.
The successor list of the pseudo-entry node is augmented with
a single node in each cycle that otherwise would not be visited.
Similarly, the predecssors list of the pseduo-exit node is augmented
with the a single node in each cycle that otherwise would not
be visited.
Pulls DepthFirstSearch out so it's accessible outside of the dominator
calculation.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/279
Add a pass to freeze spec constants to their default values. This pass does
not fold the frozen spec constants and does not handle SpecConstantOp
instructions and SpecConstantComposite instructions.
* Creates an ID class which manages definition and use of IDs
* Moved tracking code from validate.cpp to validate_id.cpp
* Rename and combine SsaPass and ProcessIds into IdPass
* Remove module dependency in Function
* The `Input` StorageClass doesn't require the `Shader` capability
anymore.
* The `Sampled1D` and `SampledBuffer` capabilities don't require
the `Shader` capability anymore. So they do not indirectly
depend on the `Matrix` capability. So are the `Image1D` and
`ImageBuffer` capabilities, which depend on `Sampled1D` and
`SampledBuffer`.
A new GLSL grammar file is uploaded for SPIR-V 1.1, but it's the
same as the existing one for SPIR-V 1.0.
Now tracking commit 3814effb879ab5a98a7b9288a4b4c7849d2bc8ac in
SPIRV-Headers.
For DependencyInfinite and DependencyLength, test
that they don't require a capability to be turned on.
Also, that they are assembled, binary parsed, and disassembled
correctly.
Works around issue 248 by weakening the test:
https://github.com/KhronosGroup/SPIRV-Tools/issues/248
The validator should try to track (32-bit) constant values, and then
for capability checks on IDs, check the referenced value, not the
raw ID number.
For dominance calculations we use an "augmented" CFG
where we always add a pseudo-entry node that is the predecessor
in the augmented CFG to any nodes that have no predecessors in the
regular CFG. Similarly, we add a pseudo-exit node that is the
predecessor in the augmented CFG that is a successor to any
node that has no successors in the regular CFG.
Pseudo entry and exit blocks live in the Function object.
Fixes a subtle problem where we were implicitly creating
the block_details for the pseudo-exit node since it didn't
appear in the idoms map, and yet we referenced it. In such a case the
contents of the block details could be garbage, or zero-initialized.
That sometimes caused incorrect calculation of immediate dominators
and post-dominators. For example, on a debug build where the details
could be zero-initialized, the dominator of an unreachable block would
be given as the pseudo-exit node. Bizarre.
Also, enforce the rule that you must have an OpFunctionEnd to close off
the last function.
The operands following the extended instruction literal
number are determined by the extended instruction itself.
So drop the zero-or-more IdRef pattern at the end of OpExtInst.
It's arguable whether this should actually be a grammar fix. I've
chosen to patch this in SPIRV-Tools instead of in the grammar file.
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/233
Also fix two test cases for OpenCL extended instructions. These
errors of supplying too many operands are now detected.
* ValidationState_t and idUsage now store the addressing model and memory model of the SPIR-V module (this is necessary for certain instructions that need different checks depending on if the logical or physical addressing model is used)
* removed SpvOpPtrAccessChain and SpvOpInBoundsPtrAccessChain from spvOpcodeIsPointer again as these are disallowed in logical addressing mode and only allowed in physical addressing mode (which doesn't use/need spvOpcodeIsPointer in the first place)
* added SpvOpImageTexelPointer and SpvOpCopyObject to spvOpcodeIsPointer
* OpLoad/OpStore now only check if the used pointer operand originated from a valid pointer producing opcode in logical addressing mode (as per 2.16.1)
* moved bitcast pointer tests to the kernel / physical addressing model part (+cleanup)
* renamed spvOpcodeIsPointer to spvOpcodeReturnsLogicalPointer to clarify this function is only meant to be used with the logical addressing model
Refactor the ValidateCapability test fixture.
Explain the meaning of test parameters. Factor out methods for
convenience and readability. DRY v1.0 and v1.1 tests.
Add a high level version number for SPIRV-Tools, beginning
with v2016.0-dev. The README describes the format of the
version number.
The high level version number is extracted from the CHANGES
file. That works around:
- stale-bait for when we don't add tags to the repository
- our inability to add tags to the repository
Option --version causes spirv-as, spirv-dis, and spirv-val to
show the high level version number.
Add spvSoftwareVersionString to return the C-string for
the high level version number.
Add spvSoftwareVersionDetailsString() so that clients can get
more information if they want to.
Also allows us to clean up the uses in the tool executables files,
so now only one file includes build-version.inc.
Move the update-build-version logic to the only
CMakeLists file that needs it.
The update build version script takes a new argument
to name the output file.
Introduced in v1.1, SubgroupDispatch adds the following:
- two new execution modes
- one new capability
- two new opcodes
Extend ValidateBase methods to take a spv_target_env. Replace the
context_ member with ScopedContext inside the said methods. Give
ScopedContext wider visibility by moving it outside
TextToBinaryTestBase.
Add test for named-barrier instructions and capability.
Add spv_target_env as an optional argument to CompileSuccessfully() and
CompileFailure(). Currently defaults to UNIVERSAL_1_0, though that
could change in the future.
Make spv_context a local variable in test methods instead of a
TextToBinaryTestBase member. Introduce ScopedContext to make temp
contexts easier.
For fulfilling this purpose, the |opcode| field in the
|spv_parsed_instruction_t| struct is changed to of type uint16_t.
Also add functions to query the information of a given SPIR-V
target environment.
This patch uses a Python script to parse the JSON grammar file to
generate the opcode table and operand kind tables.
Now we don't need to do the post-processing (from OperandClass
to spv_operand_type_t) and copying of the opcode info table is
not required anymore!
Previously, the grammar allowed many execution modes for a single
OpExecutionMode instruction.
Removes the variable- and optional- execution mode operand type
enum values.
Issue found by antiagainst@