Multiple calls to this function were causing vkCreateGraphicsPipelines
to be 3x slower on some driver. I suspect this was because each
call had to be inlined separately which bloated the code and caused
more work in the driver's SPIRV -> native instruction compilation.
This commit adds a new optimization which tries to remove unnecessary
capabilities from a SPIR-V module.
When compiling a SPIR-V module, you may have some dead-code using
features gated by a capability.
DCE will remove this code, but the capability will remain. This means
your module would still require some capability, even if it doesn't
require it. Calling this pass on your module would remove obsolete
capabilities.
This pass wouldn't be enabled by default, and would only be usable
from the API (at least for now).
NOTE: this commit only adds the basic skeleton/structure, and
doesn't mark as supported many capabilities it could support.
I'll add them as supported as I write tests.
Signed-off-by: Nathan Gauër <brioche@google.com>
The code currently tries to get the value of the floating point constant
to see if it is -0.0. However, we are not able to get the value for
16-bit floating point value, and we hit an assert.
To avoid this, we add an early check for the width to make sure it is
either 32 or 64.
Fixes https://github.com/microsoft/DirectXShaderCompiler/issues/5413.
Expanding a bit the EnumSet API to have iterator-based
insert and constructors (like the STL).
This is also a pre-requisite from the capability-trimming pass as
it allows to build a const set from a constexpr std::array easily.
Signed-off-by: Nathan Gauër <brioche@google.com>
GetCapabilities returned a const*, and GetExtensions did not exist.
This commit adds GetExtensions, and changes the return value to
be a const&.
This commit also removes the overload to GetCapabilities which returns
a mutable set, as it is unused.
Signed-off-by: Nathan Gauër <brioche@google.com>
When using PhysicalStorageBuffer it is possible for a function to
return a pointer type. This was not being handled correctly in
`GetLoadedVariablesFromFunctionCall` in the DCE pass because
`IsPtr` returns the wrong result.
Fixes#5270.
* NFC: makes the FeatureManager immutable for users
The FeatureManager contains some internal state, like
a set of capabilities and extensions. Those are derived
from the module.
Before this commit, the FeatureManager exposed Remove* functions
which could unsync the reported extensions/capabilities from
the truth: the module.
The only valid usecase to remove items directly from the FeatureManager
is by the context itself, when an instruction is killed:
instead of running the whole an analysis, we remove the single outdated
item.
The was 2 users who mutated its state:
- one to invalidate the manager. Moved to call a reset function.
- one who removed an extension from the feature manager after removing
it from the module. This logic has been moved to the context, who
now handles the extension removal itself.
Signed-off-by: Nathan Gauër <brioche@google.com>
* clang-format
* add RemoveCapability since the fuzztests are using it
* add tests
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
The iterator class was initialized by setting the offset
and bucket to 0. Big oversight: what if the first enum is
not valid? Then `*iterator->begin()` would return the wrong
value.
Because the first capacity is Matrix, this bug was not visible by
any SPIRV test.
And this specific case wasn't tested correctly in the new enumset tests.
Signed-off-by: Nathan Gauër <brioche@google.com>
---------
Signed-off-by: Nathan Gauër <brioche@google.com>
This avoids errors like this from instrumenting vertex shaders:
error: 165: Expected Constituents to be scalars or vectors of the
same type as Result Type components
%195 = OpCompositeConstruct %v4uint %uint_0 %191 %194 %uint_0
This commit adds forward iterator, and renames functions to
it matches the std::unordered_set/std::set better.
This goes against the SPIR-V coding style, but might be better in
the long run, especially when this set is used along real STL
sets.
(Right now, they are not compatible, and requires 2 syntaxes).
This container could in theory handle bidirectional
iterator, but for now, only forward seemed required for
our use-cases.
Signed-off-by: Nathan Gauër <brioche@google.com>
The current EnumSet implementation is only efficient for enums with
values < than 64. The reason is the first 63 values are stored as a
bitmask in a 64 bit unsigned integer, and the other values are stored
in a std::set.
For small enums, this is fine (most SPIR-V enums have IDs < than 64),
but performance starts to drop with larger enums (Capabilities,
opcodes).
Design considerations:
----------------------
This PR changes the internal behavior of the EnumSet to handle enums
with arbitrary values while staying performant.
The idea is to extend the 64-bits buckets sparsely:
- each bucket can store 64 value, starting from a multiplier of 64.
This could be considered as a hashset with linear probing.
- For small enums, there is a slight memory overhead due to the bucket
storage, but lookup is still constant.
- For linearly distributed values, lookup is constant.
- Worse case for storage are for enums with values which are multiples of 64.
But lookup is constant.
- Worse case for lookup are enums with a lot of small ranges scattered in
the space (requires linear probing).
For enums like capabilities/opcodes, this bucketing is useful as values
are usually scatters in distinct, but almost contiguous blocks.
(vendors usually have allocated ranges, like [5000;5500], while [1000;5000]
is mostly unused).
Benchmarking:
-------------
Benchmarking was done in 2 ways:
- a benchmark built for the occasion, which only measure the EnumSet
performance.
- SPIRV-Tools tests, to measure a more realist scenario.
Running SPIR-V tests with both implementations shows the same
performance (delta < noise). So seems like we have no regressions.
This method is noisy by nature (I/O, etc), but the most representative
of a real-life scenario.
Protocol:
- run spirv-tests with no stdout using perf, multiple times.
Result:
- measure noise is larger than the observed difference.
The custom benchmark was testing EnumSet interfaces using SPIRV enums.
Doing thousand of insertion/deletion/lookup, with 2 kind of scenarios:
- add once, lookup many times.
- add/delete/loopkup many time.
For small enums, results are similar (delta < noise). Seems relevant
with the previously observed results as most SPIRV enums are small, and
SPIRV-Tools is not doing that many intensive operations on EnumSets.
Performance on large enums (opcode/capabilities) shows an improvement:
+-----------------------------+---------+---------+---------+
| Metric | Old | New | Delta % |
+-----------------------------+---------+---------+---------+
| Execution time | 27s | 7s | -72% |
| Instruction count | 174b | 129b | -25% |
| Branch count | 28b | 33b | +17% |
| Branch miss | 490m | 26m | -94% |
| Cache-misses | 149k | 26k | -82% |
+-----------------------------+---------+---------+---------+
Future work
-----------
This was by-design an NFC change to compare apples-to-apples.
The next PR aims to add STL-like iterators to the EnumSet to allow
using it with STL algorithms, and range-based for loops.
Signed-off-by: Nathan Gauër <brioche@google.com>
* Validate layouts for PhysicalStorageBuffer pointers
Fixes#5282
* These pointers may not orginate from a variable so standard layout
validation misses them
* Now checks every instructions that results in a physical storage
buffer pointer
* May not start from a Block-decorated struct so that part is fudged
with a valid layout
* formatting
* SPV_KHR_cooperative_matrix
* Update DEPS with headers
* Update according to review recommendations
* Bugfix and formatting
* Formatting missed or damaged by VS2022
Simplify what we add to user code by moving most of it into a function
that checks both that the descriptor index is in bounds and the
initialization state. Move error logging into this function as
well.
Remove many options to turn off parts of the instrumentation,
because there were far too many permutations to keep working and
test properly.
Combine Buffer and TexBuffer error checking. This requires that VVL
set the length of TexBuffers in the descriptor input state, rather
than relying on the instrumentation code to call OpImageQuerySize.
Since the error log includes the descriptor set and binding numbers
we can use a single OOB error code rather than having 4 per-type
error codes.
Since the error codes are getting renumbered, make them start at 1
rather than 0 so it is easier to determine if the error code was
actually set by the instrumentation.
From the 3.27 release notes:
The FindPythonInterp and FindPythonLibs modules, which have been
deprecated since CMake 3.12, have been removed by policy CMP0148.
Port projects to FindPython3, FindPython2, or FindPython.
closes#4145
The length of a uvec3 was assumed to be 16 bytes, but it is 12.
Sometimes the stride might be 16 bytes though, which is probably the
source of the confusion.
Redo structure length to be the offset + length of the last member.
Add tests to cover arrays of uvec3s and uvec3 struct members.
Fixes https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/5691
If an id in one module is not defined by any instruction, don't bother
matching it with an id in the other module, as this disturbs the
reported id bound, resulting in spurious differences.
Fixes#5260.
Follow up to #5249
* glslang tests that physical storage buffer pointers can be used as
varyings in shaders
* Allow physical storage buffer pointers in IO interfaces as a 64-bit
type
- Consider prior type pairings when attempting to pair function
parameters by type.
- Pair all parameters that have matching types, not just the first.
- Update diff tests.
Fixes#5218.
* Check if const is zero before getting components.
Two folding rules try to cast a constant to a MatrixConstant before
checking if it is a Null constant. This leads to the null pointer being
dereferneced. The solution is to move the check for zero earlier.
Fixes https://github.com/microsoft/DirectXShaderCompiler/issues/5063
We want to be able to apply scalar replacement on variables that have
the AliasPointer and RestrictPointer decorations.
This exposed a bug that needs to be fixed as well.
Scalar replacement sometimes uses the type manager to get the type id for the
variables it is creating. The variable type is a pointer to a pointee
type. Currently, scalar replacement uses the type manager when only if
the pointee type has to be unique in the module. This is done to try to avoid the case where two type hash to the same
value in the type manager, and it returns the wrong one.
However, this check is not the correct check. Pointer types still have to be
unique in the spir-v module. However, two unique pointer types can hash
to the same value if their pointee types are isomorphic. For example,
%s1 = OpTypeStruct %int
%s2 = OpTypeStruct %int
; %p1 and %p2 will hash to the same value even though they are still
; considered "unique".
%p1 = OpTypePointer Function %s1
%p2 = OpTypePointer Function %s2
To fix this, we now use FindPointerToType, and we modified TypeManager::IsUnique to refer to the whether or not a type will hash to a unique value and say that pointers are not unique.
Fixes#5196
Split per-DescriptorSet state into separate memory blocks
which are accessed via an array of buffer device addresses.
This is being done to make it easier to update state for a
single DescriptorSet without rebuilding the old giant flat
buffer.
The new data format is documented as comments in
include/spirv-tools/instrument.hpp
* Update protobuf to v21.12
We need to update because the current version was not buliding with gcc 12.2. I
could not move upda to v22.x because there was an odd use of some defines that
were causing failures.
* Disable clang warnings for protobuf headers
The control barrier instruction was allowed in a limiteted set of shader types.
Part of the HLSL legalization, we use to remove the instructions when it was is
a shader in which it was not allowed. As of spv1.3 that restriction is not long
there.
This change modifies replaced invalid opc to no longer remove it.
Fixes#4999.
spirv_target_env.cpp uses std::pair, which is defined in utility.
Right now, this compiles because utility is provided via transitive
includes in other C++ standard library headers provided by the
implementation. However, these transitive includes are not guaranteed to
exist, and won't exist in certain cases (e.g. compiling against LLVM's
libc++ with modules enabled.)
GetExtractOperandsForElementOfCompositeConstruct() states "Returns the
empty vector if |result_index| is out-of-bounds", but violates that
contract for non-vector result types.
Adding a new type instruction required making the same change in two
places.
Also remove IsCompileTimeConstantInst that was used in a single place
and replace its use by an expression that better conveys the intent.
Change-Id: I49330b74bd34a35db6369c438c053224805c18e0
Signed-off-by: Kevin Petit <kevin.petit@arm.com>
This commit adds a C++ wrapper above the current spvBinaryParse
function. I tried to match it 1:1, except for 2 things:
- std::function<>& are used. No more function pointers, allowing
context capture.
- spv_result_t replaced with a boolean, to match other C++ apis.
Callbacks still return a spv_result_t because the underlying implem
relies on that. The convertion from spv_result_t to boolean is only done
at the boundary.
Signed-off-by: Nathan Gauër <brioche@google.com>
* Update fuzz tests to not use invalid combinations of LoopMerge + BranchConditional
New IDs were selected to make clear the transformation being done here, instead
of reordering all IDs in between or refactoring the SPIR-V in the test.
* spirv-val: Conditional Branch without an exit is invalid in loop header
From 2.16.2, for CFG:
Selections must be structured. That is, an OpSelectionMerge
instruction is required to precede:
- an OpSwitch instruction
- an OpBranchConditional instruction that has different True Label
and False Label operands where neither are declared merge blocks
or Continue Targets.
* Fix null pointer in FoldInsertWithConstants.
Struct types are not supported in constant folding yet.
* Added 'Test case 16' to fold_test.
Tests OpCompositeInsert not to be folded on a struct type.
Contributes to https://github.com/KhronosGroup/glslang/issues/2439
* When OpTypeStruct is used in Vulkan env and its last member
is a RuntimeArray, check if the struct is decorated with
Block or BufferBlock, as required by VUID-...-04680.
-Make more use of InstructionBuilder instruction helper methods
-Use MakeUnique<>() rather than new
-Add InstrumentPass::GenReadFunctionCall() which optimizes function
calls in a loop with constant arguments and no side effects.
This is a prepatory change for future work on the instrumentation
code which will add more generated functions.
The CHANGES file was an alternative source of truth that was trying
to duplicate/replace the git history truth.
This allow us to change the way we handle releases so we don't have to make
sure our CHANGES PR are linked to the tag and tested PR, simplifying the
process.
Kokoro clones repos with a different user used to run the build steps,
meaning if some git command must be run at build time, they will fail
because of this dubious ownership issue.
Running some git commands makes only sense with history, so changing
checkout depth so we can run them and get the true result.
This is a known Kororo issue.
Fixing this is required to generate the version file using git history.
Signed-off-by: Nathan Gauër <brioche@google.com>
Signed-off-by: Nathan Gauër <brioche@google.com>
Kokoro clones repos with a different user used to run the build steps,
meaning if some git command must be run at build time, they will fail
because of this dubious ownership issue.
Running some git commands makes only sense with history, so changing
checkout depth so we can run them and get the true result.
This is a known Kororo issue.
Fixing this is required to generate the version file using git history.
Signed-off-by: Nathan Gauër <brioche@google.com>
Avoid using OpConstantNull with types that do not allow it.
Update existing tests for slight changes in code generation.
Add new tests based on the Vulkan Validation layer test case
that exposed this problem.
* Validate version 5 of clspv reflection
* Add validation for instructions in versions 2 through 5
* Change the instruction reported for remaining indices on an access
chain to the access chain itself for clarity
* update spirv-headers
* Update minimum required CMake to 3.17.2
- For Wasm build, update to emscripten/emsdk:3.1.28 which has 3.22.1
- Move the docker-compose.yml down to the source/wasm directory.
Fixes: #5040
* Fix working directory for invocation of wasm build
When the parser saw more significant hex digits than fit in
the target type, it would compute a nonsensical shift amount, resulting
in undefined behaviour.
Now, drop the excess bits, effectively truncating the significand.
Also guard against overflow of the exponent in the extraordinary (and untested)
case where we see more than, for example, 2**(32-4+1) significant hex digits
for a 32-bit float, or 2**(16-4+1) significant hex digits for a 16-bit
float.
Also guard against overflow of the indexing counting the number of
significant bits. When that would occur silently drop any further
significant bits. (Untested)
Avoid hex floats in C++ code. It's a C++17 feature.
Fixes: #4724
* Fix layout validation
Fixes#5010
* Unless scalar block layout is enabled, no member can reside at an
offset between the end of the previous member that is a struct or
array and the next multiple of that alignment
* Remove a dead if that was introduced
Co-authored-by: David Neto <dneto@google.com>
This can cause interface incompatibility and should only be done
if ADCE has been applied to the following shader in the pipeline.
For this reason this capability is not available through the CLI
but rather only non-default through the API. This functionality is
intended as part of a larger cross-shader dead code elimination
sequence.
Constexpr guaranteed no runtime init in addition to const semantics.
Moving all opt/ to constexpr.
Moving all compile-unit statics to anonymous namespaces to uniformize
the method used (anonymous namespace vs static has the same behavior
here AFAIK).
Signed-off-by: Nathan Gauër <brioche@google.com>
Safe version will only optimize vertex shaders. All other shaders will
succeed without change.
Change --eliminate-dead-input-components to use new safe version.
Unsafe version (allowing non-vertex shaders) currently only available
through API. Should only be used in combination with other optimizations
to keep interfaces consistent. See optimizer.hpp for more details.
Add a flags field at the first offset within this buffer.
Define flags to allow buffer OOB checking to be enabled or
disabled at run time. This is to support VK_EXT_pipeline_robustnes.
This pass eliminates components of output variables that are not stored
to. Currently this just eliminates trailing components of arrays and
structs, all of which are dead.
WARNING: This pass is not designed to be a standalone pass as it can
cause interface incompatibiliies with the following shader in the
pipeline. See the comment in optimizer.hpp for best usage. This pass is
currently available only through the API; it is not available in the CLI.
This commit also fixes a bug in CreateDecoration() which is part of the
system of generating SPIR-V from the Type manager.
This adds two passes to accomplish this: one pass to analyze a shader
to determine the input slots that are live. The second pass is run on
the preceding shader to eliminate any stores to output slots that are
not consumed by the following shader.
These passes support vert, tesc, tese, geom, and frag shaders.
These passes are currently only available through the API.
These passes together with dead code elimination, and elimination of
dead input and output components and variables (WIP), will allow users
to do dead code elimination across shader boundaries.
Fixes#4918
* Prevent block merging from producing an invalid case construct by
merging a switch target/default with another construct's merge or
continue block
* This is to satisfy the structural dominance requirement between the
switch header and the case constructs
Fixes#4671
Fixes https://crbug.com/oss-fuzz/43265
* Only validate full layout in a vulkan environment
* Universal validation still checks that the right decorations are
present, but the values are only considered for vulkan
* One exception is that invalid overlaps are only checked for vulkan
* This is a pragmatic choice as SPIR-V doesn't define the size of
types so the amount of universal checking would be quite limited
* Removed redundant check for GLSLShared and GLSLPacked decorations
* Should never have been validated as part of universal validation
* make conditionals independent
Fixes https://crbug.com/oss-fuzz/48553
* Assign a reflexive dominator if no other dominator can be found using
forward traversals
* This prevents a null dereference of a pointer in the sorting of the
output
* Support Narrow Types in BitCast Folding Rule
This change adds support for narrow types in the BitCastScalarOrVector
folding rule. According to Section 2.2.1 of the SPIR-V spec, types that
are narrower than 32 bits are automatically either sign extended, or
zero extended depending on the type. With that guaranteed, we should
be able to use the first 32-bit word of any narrow type for the folding
logic without performing any special conversions.
In order to reduce code duplication, this change moves the
GetU32BitValue and GetU64BitValue functions from IntConstant to
ScalarConstant. Without this move, we would have needed an identical
version of GetU32BitValue on FloatConstant.
* Add Tests for 16-bit BitCast Folding
This change adds several new test cases to the
IntegerInstructionFoldingTest which trigger the 16-bit BitCast logic.
The logic for half types was also added to the integer case since we
can't easily validate half float types in C++ code. It's easier to
validate them as unsigned integers instead. Pllus this also allows us
to verify the SPIR-V constant sign extension logic too.
* Add 8-Bit Folding Test Cases
This change adds a couple more test cases to the integer instruction
folding test suite in order to ensure that the BitCast logic also
works correctly with the Int8 shader capability.
The always-friendly messages make it harder to debug when the
disassembly is later generated without friendly names.
Additionally, the friendly-name-mapper is slow. Disabling it improves
performance of an ANGLE test that creates numerous shaders by ~5%.
Half the messages used to output 'id[%name]' and half id[%name]. With
this change, all messages consistently output 'id[%name]'. Some typos
are also fixed in the process.
This was spotted in the Validation Layers where OpSpecConstantOp %x CompositeExtract %y 0 was being folded to a constant, but anything that was using it wasn't recognizing it as a constant, the simple fix was to add a const_mgr->MapInst(new_const_inst); so the next instruction knew it was a const
* Remove `spvOpcodeTerminatesExecution`
This function is the same as `spvOpcodeIsAbort` except for
OpUnreachable. The names are so close in meaning that it is hard to
distinguish them. I've removed `spvOpcodeTerminatesExecution` since it
is used in only a single place. I've special cased OpUnreachable in
that location.
At the same time, I fixed up some comments related to the use of the
TerminatesExecution and IsAbort functions.
Following up on #4930.
* Fix comments
Removed now unused DebugDeclare visibility logic for generating
DebugValue.
Also eliminated the phi sort introduced in 272e4b3. This should have
been removed in the first commit.
Changed a couple small parts of the algorithm to reduce time to build
the dominator trees. There should be no visible changes.
Add a depth first search algorithm that does not run a function on
backedges. The check if an edge is a back edge is time consuming, and
pointless if the function run on it is a nop.
Add name annotations to the generated instrumentation code to
make it easier to understand. Example spirv-cross output:
vec4 _140;
if (0u < inst_bindless_direct_read_4(0u, 0u, 1u, uint(_19)))
{
_140 = texture(textures[nonuniformEXT(_19)], inUV);
}
else
{
inst_bindless_stream_write_4(50u, 1u, uint(_19), 0u);
_140 = vec4(0.0);
}