The folding rule `BitCastScalarOrVector` was incorrectly handling
bitcasting to unsigned integers smaller than 32-bits. It was simply
copying the entire 32-bit word containing the integer. This conflicts with the
requirement in section 2.2.1 of the SPIR-V spec which states that
unsigned numeric types with a bit width less than 32-bits must have the
high-order bits set to 0.
This change include a refactor of the bit extension code to be able to
test it better, and to use it in multiple files.
Fixes https://github.com/microsoft/DirectXShaderCompiler/issues/6319.
This reverts commit d18d0d92e5.
This is reverted because it causes a 7X slowdown when legalizing
SPIR-V with NonSemantic.Shader.DebugInfo.100 instructions.
This is due to the creation of very large UseLists for several
heavily used operands for this extension combined with the fact
that the original commit changed the performance of Uselists to O(n).
* Optimize DefUseManager allocations
Saves around 30-35% of compilation time.
For inst->use_ids, use a pool linked list instead of allocating vectors for every instruction. For inst->uses, use a "PooledLinkedList"' -- a linked list that has shared storage for all nodes. Neither re-use nodes, instead we do a bulk compaction operation when too much memory is being wasted (tuneable).
Includes separate PooledLinkedList templated datastructure, a very special case construct, but split out to make the code a little easier to understand.
Incrementally compute the hash instead of collecting words
Avoids allocating temporary space in a std::vector and std::u32string, and making three passes over all the hashed data.
Switch to using std::vector to prevent processing duplicates instead of std::unordered_set: avoids an allocation/deletion every call to ComputeHashValue, and ends up faster due to much better cache behaviour and smaller constant-factor when searching the (generally very small) list.
In my test case, made Type::HashValue go from 7.5% of compilation time to .5%
This CL takes the various opt unit tests and makes a single executable
instead of one per test. This reduces the number of build targets by
~125 when building with ninja.
We replace the std::vector in the Operand class by a new class that does
a small size optimization. This helps improve compile time on Windows.
Tested on three sets of shaders. Trying various values for the small
vector. The optimal value for the operand class was 2. However, for
the Instruction class, using an std::vector was optimal. Size of "0"
means that an std::vector was used.
Instruction size
0 4 8
Operand Size
0 489 544 684
1 593 487
2 469 570
4 473
8 505
This is a single thread run of ~120 shaders. For the multithreaded run
the results were the similar. The basline time was ~62sec. The
optimal configuration was an 2 for the OperandData and an
std::vector for the OperandList with a compile time of ~38sec. Similar
expiriments were done with other sets of shaders. The compile time still
improved, but not as much.
Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1609.
Introduce a pass that does a DCE type analysis for vector elements
instead of the whole vector as a single element.
It will then rewrite instructions that are not used with something else.
For example, an instruction whose value are not used, even though it is
referenced, is replaced with an OpUndef.
The unordered_set in ADCE that holds all of the live instructions takes
a very long time to be destroyed. In some shaders, it takes over 40% of
the time.
If we look at the unique ids of the live instructions, I believe they
are dense enough make a simple bit vector a good choice for to hold that
data. When I check the density of the bit vector for larger shaders, we
are usually using less than 4 bytes per element in the vector, and
almost always less than 16.
So, in this commit, I introduce a simple bit vector class, and
use it in ADCE.
This help improve the compile time for some shaders on windows by the
40% mentioned above.
Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.
Re-formatted the source tree with the command:
$ /usr/bin/clang-format -style=file -i \
$(find include source tools test utils -name '*.cpp' -or -name '*.h')
This required a fix to source/val/decoration.h. It was not including
spirv.h, which broke builds when the #include headers were re-ordered by
clang-format.
This change will replace a number of the
std::vector<std::unique_ptr<Instruction>> member of the module to
InstructionList. This is for consistency and to make it easier to
delete instructions that are no longer needed.
This is the first step in replacing the std::vector of Instruction
pointers to using and intrusive linked list.
To this end, we created the InstructionList class. It inherites from
the IntrusiveList class, but add the extra concept of ownership. An
InstructionList owns the instruction that are in it. This is to be
consistent with the current ownership rules where the vector owns the
instruction that are in it.
The other larger change is that the inst_ member of the BasicBlock class
was changed to using the InstructionList class.
Added test for the InsertBefore functions, and making sure that the
InstructionList destructor will delete the elements that it contains.
I've also add extra comments to explain ownership a little better.
This commit is the initial implementation of the intrusive linked list
class. It includes the implementation in the header files, and unit
test.
The iterators are circular: incrementing end() gives begin() and
decrementing begin() gives end(). Also made it valid to
decrement end().
Expliticly defines move constructor and move assignment
- Visual Studio 2013 does not implicitly generate the move constructor or
move assignments. So they need to be explicit, otherwise it will try to
use the copy constructor, which we explicitly deleted.
- Can't use "= default" either.
Seems like VS2013 does not support explicitly using the default move
constructors and move assignments, so I wrote them out.