* Avoid use of type manager in extact->construct folding
When dealing with structs the type manager merge two different structs
into a single entry if they have all of the same decorations and
element types. This is because they hash to the same value in the hash
table. This can cause problems if you need to get the id of a type from
the type manager because you could get either one. In this case, it
returns the wrong one.
The fix avoids using the type manager in one place. I have not
looked closely at other places the type manager is used to make
sure it is used safely everywhere.
Fixes#5624
* Remove use of TypeManager::GetId
This removes a use of TypeManager::GetId by keeping the id around. This
avoid a potential problem if the type manager gets confused. These types
of bugs are hard to generate test cases for, so I do not have a test.
However, existing tests make sure that do not regress.
* Basic support for SPV_EXT_replicated_composites
Validation will follow as a separate PR (still need to write a test suite)
Change-Id: Ic95fa6ce39d32f5ac2787bc38dba2748c9cc58f7
Signed-off-by: Kevin Petit <kevin.petit@arm.com>
* Update SPIRV-Headers
Change-Id: I6c0df248d99c13b49d78528d035a4222027c0232
---------
Signed-off-by: Kevin Petit <kevin.petit@arm.com>
This removes the folding rules added in #4783 and #4808. They lead to
poor code generation on Adreno devices when 16-bit floating point values
were used. Since this change is transformation is suppose to be neutral,
there is no general reason to continue doing it.
I have talked to the owners of SwiftShader, and they do not mind if the
transform is removed. They were the ones the requested the change in the
first place.
Fixes#5658
The Scope operand of `OpReadClockKHR` was always validated using the
Vulkan environment rules, which only allow `Subgroup` or `Device`.
For the OpenCL environment, `Workgroup` is also a valid Scope, so
`Workgroup` should not be rejected in the universal environment.
Guard the existing Scope check behind `spvIsVulkanEnv` and add a new
Scope check for the OpenCL environment.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
* Check for matrix decorations on arrays of matrices
* MatrixStide, RowMajor and ColMajor can be applied to matrix or
arrays of matrix members
* Check that matrix stride satisfies alignment in arrays
Reject `OpCooperativeMatrixStoreKHR` with a `MakePointerVisibleKHR`
MemoryAccess operand, as `MakePointerVisibleKHR` is not supposed to be
used with store operations.
The `CoopMatKHRStoreMemoryAccessFail` test failed to catch this
because it used the helper function `GenCoopMatLoadStoreShader` which
generates `...NV` instead of `...KHR` instructions. Add a new helper
function to generate similar shaders for the KHR extension, as the NV
and KHR extensions have various subtle differences that makes
parameterizing the original helper function non-trivial.
Signed-off-by: Sven van Haastregt <sven.vanhaastregt@arm.com>
* Disallow duplicate decorations generally
* Only FuncParamAttr and UserSemantic can be applied to the same target
multiple times
* Unchecked: completely duplicate UserSemantic and FuncParamAttr
* Disallow duplicate execution modes generally
* Exceptions for float controls, float controls2 and some intel
execution modes
* Fix invalid fuzzer transforms
This fixes the problem reported in #5623 using the observation that if
we are re-building a type that already exists in the type pool, we
should just return that type.
This makes type rebuilding more efficient, and it also prevents the
type builder from getting itself into infinite recursion (as reported in
this issue).
In fixing this, I found a couple of other bugs in the type builder:
- When rebuilding an Array type, we were not re-building the element
type. This caused stale type references in the rebuilt type.
- This bug had not been caught by the test, because the test itself had
a bug in it: the test was rebuilding types on top of the same ID (the
ID counter was never incremented).
Initially, the bug in the test caused a failure with the new logic in
the builder because we now return types from the pool directly, which
causes a failure when two incompatible types are registered under the
same ID.
Fixing that issue in the test exposed another bug in the rebuilder: we
were not re-building the element type for Array types. This was causing
a stale type reference inside Array types which was later caught by the
type removal logic in the test.
* [OPT] Identify arrays with unknown length in copy prop arrays
The code in copy propagate arrays assumes that the length of an
OpTypeArray is known at compile time, but that is not true when the size
is an OpSpecConstant. We try to fix that assumption.
Fixes https://crbug.com/oss-fuzz/66634
We are removing Kernel from the image channel order and image
channel data type enums because Kernel is already required
transitively, so we need to update the tests to match.
* [OPT] Use new instruction folder for for all opcodes in spec consti folding
When folding and OpSpecConstantOp, we use the new instruction folder for
a small number of opcodes. This enable the new instruction folder for
all opcodes and uses the old one as a fall back. This allows us to
remove some code from the older folder that is now covered by the new
one.
Fixes#5499
Adds folding rules that will fold basic artimetic for signed and
unsigned integers of all sizes, including 64-bit.
Also folds OpSConvert and OpUConvert.
We repeat basically the same code multiple times in the different types of
folding tests. This commit adds a function that builds the module, finds the
instruction to fold, and folds it. Doing the routine checks at the same
time. We also have a couple generic functions for checking that an
instruction is a constant with the expected value.
The extension SPV_KHR_maximal_reconvergence adds more constraints
around the merge blocks, and how the control flow can be altered.
The one we address here is explained in the following part of the spec:
Note: This means that the instructions in a break block will execute as if
they were still diverged according to the loop iteration. This restricts
potential transformations an implementation may perform on the IR to match
shader author expectations. Similarly, instructions in the loop construct
cannot be moved into the continue construct unless it can be proven that
invocations are always converged.
Until the optimizer is clever enough to determine if the invocation
have already converged, we shall not meld a block which branches to a
merge block into it, as it might move some instructions outside of the
convergence region.
This behavior being only required with the extension, this commit
behavior change is gated by the extension.
This means using wave operations without the maximal reconvergence
extension might lead to undefined behaviors.
Co-authored-by: Natalie Chouinard <chouinard.nm@gmail.com>
Add OpEntryPoint to the list of instructions processed by the
DescriptorScalarReplacement pass. This is necessary for SPIR-V 1.4 and
above where global variables must be included in the interface.
Fixesmicrosoft/DirectXShaderCompiler#5962
Without this (or similar filtering), the `spirv-tools_expect_unittests` and `spirv-tools_spirv_test_framework_unittests` Python tests at `test/tools/` get defined even when `SPIRV_SKIP_TESTS` is set.
Add the VulkanMemoryModelDeviceScope capability to the capability
trimming pass. According the the spec, "If the Vulkan memory model is
declared and any instruction uses Device scope, the
VulkanMemoryModelDeviceScope capability must be declared." Since this
case, based on the type of an operand, is not covered by the JSON
grammar, it is added explicitly.
The PhysicalStorageBufferAddresses capability can now be
trimmed. From the spec, it seems any instruction enabled by this
required some operand to have the PhysicalStorageBuffer storage class.
This means checking the storage class is enough.
Now, because the pass uses the grammar, we don't need to add any
new logic.
Signed-off-by: Nathan Gauër <brioche@google.com>
The StorageImageReadWithoutFormat capability is only required when
an image type with the format set to Unknown is used with some specific
OpImageRead or OpImageSparseRead instructions.
This patch adds the required code to the capability trimming pass to
remove the StorageImageReadWithoutFormat capability when not required.
Signed-off-by: Nathan Gauër <brioche@google.com>
The function that get the number of elements in a composite variable
returns an incorrect values for the arrays. This is fixed, so that it
returns the correct number of elements for arrays where the number of
elements is represented as a 32-bit integer and is known at compile
time.
Fixes#4953
* Remove references to __FILE__
Uses of `__FILE__` leak the directory structure of the machine used to
build because it adds a string to the string table with the full path
name. I've removed the uses that show up in the release builds.
Fixes#5416
The SPIR-V specification allows any scalar integer type as an index. DXC
usually emits indexes as 32-bit integer types, however, in some cases it
is possible to make it emit 64-bit indexes instead (as in
https://github.com/microsoft/DirectXShaderCompiler/issues/5638).
* Add ComputeDerivativeGroup*NV capabilities to trim capabilities pass.
* Add SPV_NV_compute_shader_derivatives to allow lists
No tests needed for this. The code path is well tested. Just adding new
data.
Add a new legalization pass to dedupe invocation interlock instructions
DXC will be adding support for HLSL's rasterizer ordered views by using
the SPV_EXT_fragment_shader_interlock_extension. That extension
stipulates that if an entry point has an interlock ordering execution
mode, it must dynamically execute OpBeginInvocationInterlockEXT and
OpEndInvocationInterlockEXT, in that order, exactly once. This would be
difficult to determine in DXC's SPIR-V backend, so instead we will emit
these instructions potentially multiple times, and use this legalization
pass to ensure that the final SPIR-V follows the specification.
This PR uses data-flow analysis to determine where to place begin and
end instructions; in essence, determining whether a block contains or is
preceded by a begin instruction is similar to a specialized case of a
reaching definitions analysis, where we have only a single definition,
such as `bool has_begun = false`. For this simpler case, we can compute
the set of blocks using BFS to determine the reachability of the begin
instruction.
We need to do this for both begin and end instructions, so I have
generalized portions of the code to run both forward and backward over
the CFG for each respective case.
These functions are getting far too complicated to code in SPIRV-Tools
C++. Replace them with import stubs so that the real implementations
can live in Vulkan-ValidationLayers where they belong.
VVL will need to define these functions in spirv and link them to the
instrumented version of the user's shader.
From here on out, VVL can redefine the functions and any data they use
without updating SPIRV-Tools. Changing the function declarations will
still require both VVL and SPIRV-Tools to be updated in lock step.
From the Capability's text in the SPIRV spec:
```
An MS operand in OpTypeImage indicates multisampled, used with an
OpTypeImage having Sampled == 2 and Arrayed == 1.
```
Adding this logic to the capability trimming pass.
Adds the RayTracingKHR and RayQueryKHR capabilities to
the supported capabilities list (this includes the linked extension).
(NV and KHR capabilities/extensions shared the same IDs, so it also
works for NV flavors of those).
Currently spirv-link fails if all input files don't use the same
SPIR-V version. Add an option to instead use the highest input
version as the output version. Note that if one of the 'old'
input files uses an opcode that is deprecated in the 'new'
version, the output spirv will be invalid.
Some operands are not simple values, but bitmasks.
The lookup in the table for required decomposing the mask into
single values.
This commit adds support for such operands, like MinLod|Offset.
A token is allowed to parse even when it's from the wrong
version, or is not enabled by a capability or extension.
This allows more modules to parse.
Version/capability/extension checking is fully moved to
validation instead.
Fixes: #5364
* opt: fix StorageInputOutput16 trimming.
While integrating this pass into DXC, I found a lot of missing
cases. This PR fixes a few issues centered around this capability
while laying out fondations for more fixes.
1. The grammar can define extensions in operand & opcode tables.
- opcode can rely on common capabilities, but require a new
extension.
- opcode can also rely on a capability which requires an extension.
Sometimes, the extension is listed twice, in the opcode, and
capability. But this redundancy is not guaranteed.
2. minVersion check. The condition was flipped: we added the extension
when the minVersion was less than current.
Didn't noticed the issue as I only tests on the default env.
3. Capability/Extension instructions were not ignored.
- `OpCapability Foo` will require the `Foo` capability.
- it doesn't mean the module requires the `Foo` capability.
Same for extensions.
This commit adds disabled tests, for fixes which are too large to
be brought into this already large PR.
Multiple calls to this function were causing vkCreateGraphicsPipelines
to be 3x slower on some driver. I suspect this was because each
call had to be inlined separately which bloated the code and caused
more work in the driver's SPIRV -> native instruction compilation.
This commit adds a new optimization which tries to remove unnecessary
capabilities from a SPIR-V module.
When compiling a SPIR-V module, you may have some dead-code using
features gated by a capability.
DCE will remove this code, but the capability will remain. This means
your module would still require some capability, even if it doesn't
require it. Calling this pass on your module would remove obsolete
capabilities.
This pass wouldn't be enabled by default, and would only be usable
from the API (at least for now).
NOTE: this commit only adds the basic skeleton/structure, and
doesn't mark as supported many capabilities it could support.
I'll add them as supported as I write tests.
Signed-off-by: Nathan Gauër <brioche@google.com>
The code currently tries to get the value of the floating point constant
to see if it is -0.0. However, we are not able to get the value for
16-bit floating point value, and we hit an assert.
To avoid this, we add an early check for the width to make sure it is
either 32 or 64.
Fixes https://github.com/microsoft/DirectXShaderCompiler/issues/5413.
Expanding a bit the EnumSet API to have iterator-based
insert and constructors (like the STL).
This is also a pre-requisite from the capability-trimming pass as
it allows to build a const set from a constexpr std::array easily.
Signed-off-by: Nathan Gauër <brioche@google.com>
GetCapabilities returned a const*, and GetExtensions did not exist.
This commit adds GetExtensions, and changes the return value to
be a const&.
This commit also removes the overload to GetCapabilities which returns
a mutable set, as it is unused.
Signed-off-by: Nathan Gauër <brioche@google.com>
When using PhysicalStorageBuffer it is possible for a function to
return a pointer type. This was not being handled correctly in
`GetLoadedVariablesFromFunctionCall` in the DCE pass because
`IsPtr` returns the wrong result.
Fixes#5270.
* NFC: makes the FeatureManager immutable for users
The FeatureManager contains some internal state, like
a set of capabilities and extensions. Those are derived
from the module.
Before this commit, the FeatureManager exposed Remove* functions
which could unsync the reported extensions/capabilities from
the truth: the module.
The only valid usecase to remove items directly from the FeatureManager
is by the context itself, when an instruction is killed:
instead of running the whole an analysis, we remove the single outdated
item.
The was 2 users who mutated its state:
- one to invalidate the manager. Moved to call a reset function.
- one who removed an extension from the feature manager after removing
it from the module. This logic has been moved to the context, who
now handles the extension removal itself.
Signed-off-by: Nathan Gauër <brioche@google.com>
* clang-format
* add RemoveCapability since the fuzztests are using it
* add tests
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
The iterator class was initialized by setting the offset
and bucket to 0. Big oversight: what if the first enum is
not valid? Then `*iterator->begin()` would return the wrong
value.
Because the first capacity is Matrix, this bug was not visible by
any SPIRV test.
And this specific case wasn't tested correctly in the new enumset tests.
Signed-off-by: Nathan Gauër <brioche@google.com>
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
This avoids errors like this from instrumenting vertex shaders:
error: 165: Expected Constituents to be scalars or vectors of the
same type as Result Type components
%195 = OpCompositeConstruct %v4uint %uint_0 %191 %194 %uint_0
This commit adds forward iterator, and renames functions to
it matches the std::unordered_set/std::set better.
This goes against the SPIR-V coding style, but might be better in
the long run, especially when this set is used along real STL
sets.
(Right now, they are not compatible, and requires 2 syntaxes).
This container could in theory handle bidirectional
iterator, but for now, only forward seemed required for
our use-cases.
Signed-off-by: Nathan Gauër <brioche@google.com>
The current EnumSet implementation is only efficient for enums with
values < than 64. The reason is the first 63 values are stored as a
bitmask in a 64 bit unsigned integer, and the other values are stored
in a std::set.
For small enums, this is fine (most SPIR-V enums have IDs < than 64),
but performance starts to drop with larger enums (Capabilities,
opcodes).
Design considerations:
----------------------
This PR changes the internal behavior of the EnumSet to handle enums
with arbitrary values while staying performant.
The idea is to extend the 64-bits buckets sparsely:
- each bucket can store 64 value, starting from a multiplier of 64.
This could be considered as a hashset with linear probing.
- For small enums, there is a slight memory overhead due to the bucket
storage, but lookup is still constant.
- For linearly distributed values, lookup is constant.
- Worse case for storage are for enums with values which are multiples of 64.
But lookup is constant.
- Worse case for lookup are enums with a lot of small ranges scattered in
the space (requires linear probing).
For enums like capabilities/opcodes, this bucketing is useful as values
are usually scatters in distinct, but almost contiguous blocks.
(vendors usually have allocated ranges, like [5000;5500], while [1000;5000]
is mostly unused).
Benchmarking:
-------------
Benchmarking was done in 2 ways:
- a benchmark built for the occasion, which only measure the EnumSet
performance.
- SPIRV-Tools tests, to measure a more realist scenario.
Running SPIR-V tests with both implementations shows the same
performance (delta < noise). So seems like we have no regressions.
This method is noisy by nature (I/O, etc), but the most representative
of a real-life scenario.
Protocol:
- run spirv-tests with no stdout using perf, multiple times.
Result:
- measure noise is larger than the observed difference.
The custom benchmark was testing EnumSet interfaces using SPIRV enums.
Doing thousand of insertion/deletion/lookup, with 2 kind of scenarios:
- add once, lookup many times.
- add/delete/loopkup many time.
For small enums, results are similar (delta < noise). Seems relevant
with the previously observed results as most SPIRV enums are small, and
SPIRV-Tools is not doing that many intensive operations on EnumSets.
Performance on large enums (opcode/capabilities) shows an improvement:
+-----------------------------+---------+---------+---------+
| Metric | Old | New | Delta % |
+-----------------------------+---------+---------+---------+
| Execution time | 27s | 7s | -72% |
| Instruction count | 174b | 129b | -25% |
| Branch count | 28b | 33b | +17% |
| Branch miss | 490m | 26m | -94% |
| Cache-misses | 149k | 26k | -82% |
+-----------------------------+---------+---------+---------+
Future work
-----------
This was by-design an NFC change to compare apples-to-apples.
The next PR aims to add STL-like iterators to the EnumSet to allow
using it with STL algorithms, and range-based for loops.
Signed-off-by: Nathan Gauër <brioche@google.com>
* Validate layouts for PhysicalStorageBuffer pointers
Fixes#5282
* These pointers may not orginate from a variable so standard layout
validation misses them
* Now checks every instructions that results in a physical storage
buffer pointer
* May not start from a Block-decorated struct so that part is fudged
with a valid layout
* formatting
* SPV_KHR_cooperative_matrix
* Update DEPS with headers
* Update according to review recommendations
* Bugfix and formatting
* Formatting missed or damaged by VS2022
Simplify what we add to user code by moving most of it into a function
that checks both that the descriptor index is in bounds and the
initialization state. Move error logging into this function as
well.
Remove many options to turn off parts of the instrumentation,
because there were far too many permutations to keep working and
test properly.
Combine Buffer and TexBuffer error checking. This requires that VVL
set the length of TexBuffers in the descriptor input state, rather
than relying on the instrumentation code to call OpImageQuerySize.
Since the error log includes the descriptor set and binding numbers
we can use a single OOB error code rather than having 4 per-type
error codes.
Since the error codes are getting renumbered, make them start at 1
rather than 0 so it is easier to determine if the error code was
actually set by the instrumentation.
From the 3.27 release notes:
The FindPythonInterp and FindPythonLibs modules, which have been
deprecated since CMake 3.12, have been removed by policy CMP0148.
Port projects to FindPython3, FindPython2, or FindPython.
closes#4145
The length of a uvec3 was assumed to be 16 bytes, but it is 12.
Sometimes the stride might be 16 bytes though, which is probably the
source of the confusion.
Redo structure length to be the offset + length of the last member.
Add tests to cover arrays of uvec3s and uvec3 struct members.
Fixes https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/5691
If an id in one module is not defined by any instruction, don't bother
matching it with an id in the other module, as this disturbs the
reported id bound, resulting in spurious differences.
Fixes#5260.