SPIRV-Tools

mirror of https://github.com/KhronosGroup/SPIRV-Tools synced 2024-11-28 22:21:03 +00:00

Author	SHA1	Message	Date
Arseny Kapoulkine	f765d16bd9	Add external interface for creating a pass token Currently it's impossible for external code to register a pass because the only source file that can create pass tokens is optimizer.cpp. This makes it hard to add passes that can't be upstreamed since you can't run them from the usual pass sequence without reimplementing Optimizer. This change adds a PassToken constructor that takes unique_ptr to opt::Pass; if out-of-tree code implements opt::Pass it can register a custom pass without having to add it to SPIRV-Tools source code.	2018-05-25 09:19:43 -04:00
dan sinclair	0a14a1f748	Validate that only a single OpMemoryModel is provided. This CL adds validation that only a single OpMemoryModel is provided in the SPIR-V binary. Fixes #1574	2018-05-24 08:43:14 -04:00
dan sinclair	3b87dac56b	Validate presence of OpMemoryModel. According to the SPIR-V Spec, section 2.4 Logical Layout of a Module there should be a single required OpMemoryModel instruction provided. This CL adds validation that OpMemoryModel is provided to the SPIR-V validator. Fixes #1207	2018-05-23 08:17:39 -04:00
Steven Perron	a579e720a8	Remove the limit on struct size in SROA. Removes the limit on scalar replacement for the lagalization passes. This is done by adding an option to the pass (and command line option) to set the limit on maximum size of the composite that scalar replacement is willing to divide. Fixes #1494.	2018-05-18 10:03:46 -04:00
Steven Perron	f1f7cc870e	Get ADCE to handle OpCopyMemory ADCE does not treat OpCopyMemory as an instruction that references memory. Because of that stores are removed that should not be. This change teaches ADCE that OpCopyMemory and OpCopyMemorySize both loads from and stores to memory. This will keep other stores live when needed, and will allows ADCE to remove OpCopyMemory instructions as well. Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1556.	2018-05-16 13:50:47 -04:00
Lei Zhang	b09e3ce842	Allow ViewportIndex & Layer to be used in VS/DS with extension SPV_EXT_shader_viewport_index_layer enables using ViewportIndex and Layer in vertex and tessellation shaders. Also, as per the Vulkan spec: > The ViewportIndex decoration must be used only within vertex, > tessellation evaluation, geometry, and fragment shaders. > In a vertex, tessellation evaluation, or geometry shader, any > variable decorated with ViewportIndex must be declared using > the Output storage class. > In a fragment shader, any variable decorated with ViewportIndex > must be declared using the Input storage class. Similarly for Layer.	2018-05-16 13:16:27 -04:00
Steven Perron	9b1a938ea1	SROA: Only create symbols that are loaded. Currently in scalar replacement, we create a new variable for every memeber of the composite being divided. It is often overkill, because not all of those members will be used. This change will check which elements are used and only create variable for the members that are used. This reduces the compile time for one set of shader from 248s to 165s. Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.	2018-05-16 10:48:25 -04:00
Steven Perron	0e1b7e5aef	Fix getting operand without checking opcode. Fixes https://github.com/KhronosGhttps://github.com/KhronosGroup/SPIRV-Tools/issues/1559roup/SPIRV-Tools/issues/1559. There is an load of an operand of an instruction that was suppose to be only for the OpCompositeExtract case. However, an error caused it to be loaded for every opcode, even those that do not have an operand in that position. We fix up that bug, and a couple other things noticed that the same time.	2018-05-16 09:34:43 -04:00
Lei Zhang	efcc33e8a9	Support SpvOpExecutionModeId in SPIR-V logical layout	2018-05-16 08:43:50 -04:00
Steven Perron	f46f2d3e5d	Remove redundant stores. The code patterns generated by DXC around function calls can cause many store to be storing the same value that was just loaded from the same location: ``` %10 = OpLoad %type %var OpStore %var %10 ``` We want to clean these up very early on because they can cause other transformations to do a lot of work. For the cases I see, they can be removed during local-single-block-elim. For one set of shaders the compile time goes from 248s to 182s. A 26% improvement. Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.	2018-05-15 10:24:05 -04:00
Steven Perron	af430ec822	Add pass to fold a load feeding an extract. We have already disabled common uniform elimination because it created sequences of loads an entire uniform object, then we extract just a single element. This caused problems in some drivers, and is just generally slow because it loads more memory than needed. However, there are other way to get into this situation, so I've added a pass that looks specifically for this pattern and removes it when only a portion of the load is used. Fixes #1547.	2018-05-14 15:40:34 -04:00
Steven Perron	804e8884c4	Fold fclamp feeding compare. An FClamp instruction forces a values to be within a certain interval. When the upper or lower bound of the FClamp is a constant and the value being compared with is a constant, then in some case we can fold the compared because the entire range is say less than the value. Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1549.	2018-05-14 10:27:49 -04:00
Steven Perron	9ec3f81e5c	Remove dead Workgroup variables in ADCE. If there is a shader with a variable in the workgroup storage class that is stored to, but not loadeds, then we know nothing will read those loads. It should be safe to remove them. This is implemented in ADCE by treating workgroup variables the same way that private variables are treated. Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1550.	2018-05-09 16:07:26 -04:00
Steven Perron	0856997df6	Allow ADCE to remove more instructions. At this time, DCE will only remove an instruction if it is a combinator. However, there are certain non-combinator instructions that can be safely removed if their results are not used. The derivative instructions are on example. We are also missing some instructions from the list of combinators those are added as the same time.	2018-05-05 09:15:28 -04:00
Steven Perron	7d01643132	Allow hoisting code in if-conversion. When doing if-conversion, we do not currently move code out of the side nodes. The reason for this is that it can increase the number of instructions that get executed because both side nods will have to be executed now. In this commit, we add code to move an instruction, and all of the instructions it depends on, out of a side node and into the header of the selection construct. However to keep the cost down, we only do it when the two values in the OpPhi node compute the same value. This way we have to move only one of the instructions and the other becomes unused most of the time. So no real extra cost. Makes the value number table an alalysis in the ir context. Added more opcodes to list of code motion safe opcodes. Fixes #1526.	2018-05-04 12:56:29 -04:00
Stephen McGroarty	1c2cbaf569	Add GetContinueBlock to loop class. Previously, the loop class used the terms latch and continue block interchangeably. This patch splits the two and corrects and tests some uses of the old uses of GetLatchBlock.	2018-05-03 14:30:41 -04:00
Steven Perron	70bb3c1cc2	Fold divide and multiply by same value. We want to fold code like (x*y)/x and other permutations of this. Fixes #1531.	2018-05-02 10:18:37 -04:00
Toomas Remmelg	1dc2458060	Add a loop fusion pass. This pass will look for adjacent loops that are compatible and legal to be fused. Loops are compatible if: - they both have one induction variable - they have the same upper and lower bounds - same initial value - same condition - they have the same update step - they are adjacent - there are no break/continue in either of them Fusion is legal if: - fused loops do not have any dependencies with dependence distance greater than 0 that did not exist in the original loops. - there are no function calls in the loops (could have side-effects) - there are no barriers in the loops It will fuse all such loops as long as the number of registers used for the fused loop stays under the threshold defined by max_registers_per_loop.	2018-05-01 15:40:37 -04:00
Stephen McGroarty	9a5dd6fe88	Support loop fission. Adds support for spliting loops whose register pressure exceeds a user provided level. This pass will split a loop into two or more loops given that the loop is a top level loop and that spliting the loop is legal. Control flow is left intact for dead code elimination to remove. This pass is enabled with the --loop-fission flag to spirv-opt.	2018-05-01 15:15:10 -04:00
Steven Perron	9ba0879ddf	Improve Vector DCE Track live scalars in VDCE as if they were single element vectors. Handle the extended instructions for GLSL in VDCE. Handle composite construct instructions in VDCE.	2018-04-30 11:55:50 -04:00
Steven Perron	a00a0a09ae	Revert "Improvements to vector dce." This reverts commit `2813722993`. A regression was found. Undoing the change until it is fixed.	2018-04-27 10:33:19 -04:00
Alan Baker	4246abdc74	Fixes handling of kill and unreachable ops in inlining. Fixes #1527 * Adds handling for copying OpKill and OpUnreachable and forces the generation of a new basic block * Adds tests to check	2018-04-27 09:42:37 -04:00
Steven Perron	e1bcd2b2d8	Fold OpVectorTimesScalar and OpPhi better. If one of the operands to an OpVectorTimesScalar instruction is zero, then the result will be the 0 vector. Currently we do not fold the insturction unless both operands are constants. This change fixes that. We also allow folding of OpPhi instructions where the incoming values are either an OpUndef or the OpPhi instruction itself. As with other cases, this can be simplified to the OpUndef.	2018-04-26 12:41:16 -04:00
Steven Perron	2813722993	Improvements to vector dce. Track live scalars in VDCE as if they were single element vectors. Handle the extended instructions for GLSL in VDCE. Handle composite construct instructions in VDCE. Fixes #1511.	2018-04-26 11:07:48 -04:00
Cort Stratton	72524db2de	Fixes #1521 : PadToWord() should use std::move() in && variant	2018-04-25 22:03:14 -04:00
Greg Fischer	268be6143d	LocalSingleBlockElim: Add store-store elimination Eliminate unused store to variable if followed by store to same variable in same block. Most significantly, this cleans up stores made unused by this pass. These useless stores can inhibit subsequent optimizations, specifically LocalSingleStoreElim. Eliminating them makes subsequent optimization more effective. The main effect of this pass is to simplify the work done by the SSA rewriter. It catches many local loads/stores that help speeding up the work done by the main rewriter.	2018-04-25 10:30:18 -04:00
Steven Perron	ee8cd5c847	Add Dead insert elmination back in.	2018-04-24 10:10:30 -04:00
Steven Perron	2c0ce87210	Vector DCE (#1512 ) Introduce a pass that does a DCE type analysis for vector elements instead of the whole vector as a single element. It will then rewrite instructions that are not used with something else. For example, an instruction whose value are not used, even though it is referenced, is replaced with an OpUndef.	2018-04-23 11:13:07 -04:00
Victor Lomuller	efc5061929	Dominator analysis interface clean. Remove the CFG requirement when querying a dominator/post-dominator from an IRContext. Updated all uses of the function and tests.	2018-04-20 15:41:59 -04:00
Jaebaek Seo	48802bad72	Constant folding for OpVectorTimesScalar	2018-04-20 13:43:04 -04:00
Victor Lomuller	0ec08c28c1	Add register liveness analysis. For each function, the analysis determine which SSA registers are live at the beginning of each basic block and which one are killed at the end of the basic block. It also includes utilities to simulate the register pressure for loop fusion and fission. The implementation is based on the paper "A non-iterative data-flow algorithm for computing liveness sets in strict ssa programs" from Boissinot et al.	2018-04-20 09:45:15 -04:00
Alan Baker	09c206b6fb	Fixes #1480 . Validate group non-uniform scopes. * Adds new pass for validating non-uniform group instructions * Currently on checks execution scope for Vulkan 1.1 and SPIR-V 1.3 * Added test framework	2018-04-20 09:25:00 -04:00
David Neto	e7c2e91ded	Fix for old XCode: std::set has explicit ctor	2018-04-19 16:33:12 -04:00
Greg Fischer	df7f00f60e	DeadInsertElim: Don't revisit select phi nodes during MarkInsertChain Fixes #1487.	2018-04-19 14:40:00 -04:00
Jaebaek Seo	430a29335e	Fix broken pointer of CommonUniformElimPass	2018-04-19 09:36:10 -04:00
Steven Perron	c20a718e00	Rewrite local-single-store-elim to not create large data structures. The local-single-store-elim algorithm is not fundamentally bad. However, when there are a large number of variables, some of the maps that are used can become very large. These large data structures then take a very long time to be destroyed. I've seen cases around 40% if the time. I've rewritten that algorithm to not use as much memory. This give a significant improvement when running a large number of shader through DXC. I've also made a small change to local-single-block-elim to delete the loads that is has replaced. That way local-single-store-elim will not have to look at those. local-single-store-elim now does the same thing. The time for one set goes from 309s down to 126s. For another set, the time goes from 102s down to 88s.	2018-04-18 16:38:18 -04:00
Jaebaek Seo	0fa42996b5	Merge pull request #1461 from jaebaek/fnegate Add constant folding for OpFNegate Contributes to #709	2018-04-18 13:46:10 -04:00
Toomas Remmelg	0f335cf87e	Add support for MIV and Delta test dependence analysis. GCD MIV test as described in Chapter 3 of "Optimizing Compilers for Modern Architectures: A Dependence-Based Approach" by Randy Allen, and Ken Kennedy. Delta test as described in Figure 3 of "Practical Dependence Testing" by Gina Goff, Ken Kennedy, and Chau-Wen Tseng from PLDI '91.	2018-04-17 13:57:02 -04:00
Jaebaek Seo	d8b9306a4f	Add more unit tests	2018-04-17 12:08:45 -04:00
Jaebaek Seo	79491259e0	Add constant folding for FNegate	2018-04-17 12:08:45 -04:00
Alan Baker	38359ba800	Fixes #1483 . Validating Vulkan 1.1 barrier execution scopes * Reworked how execution model limitations are checked * Now OpFunction checks which entry points call it and checks its registered limitations instead of building a call stack in the entry point * New tests * Moving function to entry point mapping into VState	2018-04-17 10:26:38 -04:00
David Neto	152b9a681e	ADCE: Remove OpDecorateStringGOOGLE Also fix a few failures to set "modified" status when removing global values. Add OpDecorateStringGOOGLE to decoration ordering Fixes #1492	2018-04-17 10:24:30 -04:00
Alan Baker	0e80b86dbe	Fixes #1472 . Per-vertex variable validation fixes. Relaxs checks for per-vertex builtin variables. If the builtin decoration is applied to a variable, then those checks now allow a level of arraying on the variable before checking the type consistency. * Allows arrays of variables to be present for the per-vertex variables: * Position * PointSize * ClipDistance * CullDistance * Updated tests	2018-04-16 12:58:35 -04:00
Rex Xu	7fe186476a	Fix validation issues relevant to SPV_AMD_gpu_shader_int16. Frexp/FrexpStruct allows exp to be either 16-bit or 32 bit integer if SPV_AMD_gpu_shader_int16 is enabled.	2018-04-16 10:49:01 -04:00
David Neto	e8814be732	Add validator test for OpBranch Add test for case where OpBranch branches to a value (a function value). Previous tests only checked a label value (name of a block.). Update validate_id.cpp to remove the TODO for OpBranch and say that it is already checked in validate_cfg.cpp	2018-04-16 10:27:51 -04:00
Steven Perron	d42f65e7c1	Use a bit vector in ADCE The unordered_set in ADCE that holds all of the live instructions takes a very long time to be destroyed. In some shaders, it takes over 40% of the time. If we look at the unique ids of the live instructions, I believe they are dense enough make a simple bit vector a good choice for to hold that data. When I check the density of the bit vector for larger shaders, we are usually using less than 4 bytes per element in the vector, and almost always less than 16. So, in this commit, I introduce a simple bit vector class, and use it in ADCE. This help improve the compile time for some shaders on windows by the 40% mentioned above. Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1328.	2018-04-13 16:38:02 -04:00
Steven Perron	8190c26270	Change parameter to Mempass::RemovePhiOperands Pass a hashtable by const ref instead of by value. Big impact on compile time.	2018-04-13 09:53:37 -04:00
Alan Baker	e805d1f8d7	Fixes #1469 . Allow subgroup memory scope for Vulkan 1.1 * New error that prevents CrossDevice memory scope for all vulkan * Old error specifically references Vulkan 1.0 * New tests	2018-04-12 13:16:04 -04:00
Alan Baker	c522b697bf	Fixes #1470 . Don't restrict WGS storage class * Removed restriction that workgroup size can only be on Input storage class * added test	2018-04-12 09:22:34 -04:00
Steven Perron	bc648fd76a	Delete unused code in MemPass Since the SSA rewriter was added, the code old phi insertion code is no longer used. It is going stale and should be deleted.	2018-04-11 15:40:33 -04:00

1 2 3 4 5 ...

942 Commits