SPIRV-Tools

mirror of https://github.com/KhronosGroup/SPIRV-Tools synced 2024-12-13 04:00:08 +00:00

Author	SHA1	Message	Date
dan sinclair	e6b953361d	Move the ir namespace to opt. (#1680 ) This CL moves the files in opt/ to consistenly be under the opt:: namespace. This frees up the ir:: namespace so it can be used to make a shared ir represenation.	2018-07-09 11:32:29 -04:00
Steven Perron	101a9bcbb0	Add private to local to optimization and size passes. Many optimization will run on function scope symbols only. When symbols are moved from private scope to function scople, then these optimizations can do more. I believe it is a good idea to run this pass with both -O and -Os. To get the most out of it it should be run ASAP after inlining and something that remove all of the dead functions.	2018-07-04 21:26:09 -04:00
Steven Perron	465f2815cb	Revert change and stop running remove duplicates. Revert "Don't merge types of resources" This reverts commit `f393b0e480`, but leaves the tests that were added. Added new test. These test are the so that, if someone tries the same change I made, they will see the test that they need to handle. Don't run remove duplicates in -O and -Os Romve duplicates was run to help reduce compile time when looking for types in the type manager. I've run compile time test on three sets of shaders, and the compile time does not seem to change. It should be safe to remove it.	2018-06-29 14:09:44 -04:00
Steven Perron	fe2fbee294	Delete the insert-extract-elim pass. Replaces anything that creates an insert-extract-elim pass and create a simplifiation pass instead. Then delete the implementation of the pass. Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.	2018-06-01 10:13:39 -04:00
Steven Perron	745dd00af9	Fold FMix feeding Extract, and use the simplification pass. We add a new rule to the folding rules to fold an FMix feeding an extract when the alpha value for the element being extracted is either 0 or 1. In those case, we can simple extract from one of the operands to the FMix. With that change the simplification pass completely subsumes the insert-extract elimination pass. So we remove the insert-extract elimination passes and replce them with calls to the simplification pass. In a follow up PR, we should delete the insert-extract elimination pass. Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1570.	2018-05-25 14:42:59 -04:00
Arseny Kapoulkine	f765d16bd9	Add external interface for creating a pass token Currently it's impossible for external code to register a pass because the only source file that can create pass tokens is optimizer.cpp. This makes it hard to add passes that can't be upstreamed since you can't run them from the usual pass sequence without reimplementing Optimizer. This change adds a PassToken constructor that takes unique_ptr to opt::Pass; if out-of-tree code implements opt::Pass it can register a custom pass without having to add it to SPIRV-Tools source code.	2018-05-25 09:19:43 -04:00
Steven Perron	a579e720a8	Remove the limit on struct size in SROA. Removes the limit on scalar replacement for the lagalization passes. This is done by adding an option to the pass (and command line option) to set the limit on maximum size of the composite that scalar replacement is willing to divide. Fixes #1494.	2018-05-18 10:03:46 -04:00
Steven Perron	af430ec822	Add pass to fold a load feeding an extract. We have already disabled common uniform elimination because it created sequences of loads an entire uniform object, then we extract just a single element. This caused problems in some drivers, and is just generally slow because it loads more memory than needed. However, there are other way to get into this situation, so I've added a pass that looks specifically for this pattern and removes it when only a portion of the load is used. Fixes #1547.	2018-05-14 15:40:34 -04:00
Toomas Remmelg	1dc2458060	Add a loop fusion pass. This pass will look for adjacent loops that are compatible and legal to be fused. Loops are compatible if: - they both have one induction variable - they have the same upper and lower bounds - same initial value - same condition - they have the same update step - they are adjacent - there are no break/continue in either of them Fusion is legal if: - fused loops do not have any dependencies with dependence distance greater than 0 that did not exist in the original loops. - there are no function calls in the loops (could have side-effects) - there are no barriers in the loops It will fuse all such loops as long as the number of registers used for the fused loop stays under the threshold defined by max_registers_per_loop.	2018-05-01 15:40:37 -04:00
Stephen McGroarty	9a5dd6fe88	Support loop fission. Adds support for spliting loops whose register pressure exceeds a user provided level. This pass will split a loop into two or more loops given that the loop is a top level loop and that spliting the loop is legal. Control flow is left intact for dead code elimination to remove. This pass is enabled with the --loop-fission flag to spirv-opt.	2018-05-01 15:15:10 -04:00
Steven Perron	ee8cd5c847	Add Dead insert elmination back in.	2018-04-24 10:10:30 -04:00
Steven Perron	2c0ce87210	Vector DCE (#1512 ) Introduce a pass that does a DCE type analysis for vector elements instead of the whole vector as a single element. It will then rewrite instructions that are not used with something else. For example, an instruction whose value are not used, even though it is referenced, is replaced with an OpUndef.	2018-04-23 11:13:07 -04:00
Victor Lomuller	10e5d7cf13	Add a loop peeling pass. For each loop in a function, the pass walks the loops from inner to outer most loop and tries to peel loop for which a certain amount of iteration can be done before or after the loop. To limit code growth, peeling will not happen if the growth in code size goes above a configurable threshold.	2018-04-11 15:41:29 +01:00
Steven Perron	cbceeceab4	In copy-prop-arrays, indentify copies via OpCompositeInsert When the original code copies an entire array or struct one element at a time, this turns into a series of OpCompositeInsert instruction followed by a store of the whole array. We currently miss opportunities in copy propagate arrays because we do not recognize this as a copy. This commit adds code to copy propagate arrays to identify this code pattern. Also updates the performance passed to run array copy propagation.	2018-03-29 09:39:55 -04:00
Steven Perron	c4dc046399	Copy propagate arrays The sprir-v generated from HLSL code contain many copyies of very large arrays. Not only are these time consumming, but they also cause problems for drivers because they require too much space. To work around this, we will implement an array copy propagation. Note that we will not implement a complete array data flow analysis in order to implement this. We will be looking for very simple cases: 1) The source must never be stored to. 2) The target must be stored to exactly once. 3) The store to the target must be a store to the entire array, and be a copy of the entire source. 4) All loads of the target must be dominated by the store. The hard part is keeping all of the types correct. We do not want to have to do too large a search to update everything, which may not be possible, do we give up if we see any instruction that might be hard to update. Also in types.h, the element decorations are not stored in an std::map. This change was done so the hashing algorithm for a Struct is consistent. With the std::unordered_map, the traversal order was non-deterministic leading to the same type getting hashed to different values. See \|Struct::GetExtraHashWords\|. Contributes to #1416.	2018-03-26 14:44:41 -04:00
Jaebaek Seo	3b594e1630	Add --time-report to spirv-opt This patch adds a new option --time-report to spirv-opt. For each pass executed by spirv-opt, the flag prints resource utilization for the pass (CPU time, wall time, RSS and page faults) This fixes issue #1378	2018-03-20 21:30:06 -04:00
Diego Novillo	735d8a579e	SSA rewrite pass. This pass replaces the load/store elimination passes. It implements the SSA re-writing algorithm proposed in Simple and Efficient Construction of Static Single Assignment Form. Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013) In: Jhala R., De Bosschere K. (eds) Compiler Construction. CC 2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin, Heidelberg https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6 In contrast to common eager algorithms based on dominance and dominance frontier information, this algorithm works backwards from load operations. When a target variable is loaded, it queries the variable's reaching definition. If the reaching definition is unknown at the current location, it searches backwards in the CFG, inserting Phi instructions at join points in the CFG along the way until it finds the desired store instruction. The algorithm avoids repeated lookups using memoization. For reducible CFGs, which are a superset of the structured CFGs in SPIRV, this algorithm is proven to produce minimal SSA. That is, it inserts the minimal number of Phi instructions required to ensure the SSA property, but some Phi instructions may be dead (https://en.wikipedia.org/wiki/Static_single_assignment_form).	2018-03-20 20:56:55 -04:00
Steven Perron	b3daa93b46	Change merge return pass to handle structured cfg. We are seeing shaders that have multiple returns in a functions. These functions must get inlined for legalization purposes; however, the inliner does not know how to inline functions that have multiple returns. The solution we will go with it to improve the merge return pass to handle structured control flow. Note that the merge return pass will assume the cfg has been cleanedup by dead branch elimination. Fixes #857.	2018-03-19 13:49:04 -04:00
David Neto	844e186cf7	Add --strip-reflect pass Strips reflection info. This is limited to decorations and decoration instructions related to the SPV_GOOGLE_hlsl_functionality1 extension. It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1. It will also remove the OpExtension for SPV_GOOGLE_decorate_string if there are no further remaining uses of OpDecorateStringGOOGLE. Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398	2018-03-15 21:20:42 -04:00
Steven Perron	2cb589cc14	Remove uses DCEInst and call ADCE The algorithm used in DCEInst to remove dead code is very slow. It is fine if you only want to remove a small number of instructions, but, if you need to remove a large number of instructions, then the algorithm in ADCE is much faster. This PR removes the calls to DCEInst in the load-store removal passes and adds a pass of ADCE afterwards. A number of different iterations of the order of optimization, and I believe this is the best I could find. The results I have on 3 sets of shaders are: Legalization: Set 1: 5.39 -> 5.01 Set 2: 13.98 -> 8.38 Set 3: 98.00 -> 96.26 Performance passes: Set 1: 6.90 -> 5.23 Set 2: 10.11 -> 6.62 Set 3: 253.69 -> 253.74 Size reduction passes: Set 1: 7.16 -> 7.25 Set 2: 17.17 -> 16.81 Set 3: 112.06 -> 107.71 Note that the third set's compile time is large because of the large number of basic blocks, not so much because of the number of instructions. That is why we don't see much gain there.	2018-02-27 21:06:08 -05:00
Victor Lomuller	3497a94460	Add loop unswitch pass. It moves all conditional branching and switch whose conditions are loop invariant and uniform. Before performing the loop unswitch we check that the loop does not contain any instruction that would prevent it (barriers, group instructions etc.).	2018-02-27 08:52:46 -05:00
Stephen McGroarty	e354984b09	Unroller support for multiple induction variables Support for multiple induction variables within a loop and support for loop condition operands <= and >=.	2018-02-27 11:50:08 +00:00
Steven Perron	94af58a350	Clean up variables before sroa In some shaders there are a lot of very large and deeply nested structures. This creates a lot of work for scalar replacement. Also, since commit `ca4457b` we have been very aggressive as rewriting variables. This has causes a large increase in compile time in creating and then deleting the instructions. To help low the costs, I want to run a cleanup of some of the easy loads and stores to remove. This reduces the number of symbols sroa has to work on. It also reduces the amount of code the simplifier has to simplify because it was not generated by sroa. To confirm the improvement, I ran numbers on three different sets of shaders: Time to run --legalize-hlsl: Set #1: 55.89s -> 12.0s Set #2: 1m44s -> 1m40.5s Set #3: 6.8s -> 5.7s Time to run -O Set #1: 18.8s -> 10.9s Set #2: 5m44s -> 4m17s Set #3: 7.8s -> 7.8s Contributes to #1328.	2018-02-22 21:40:58 -05:00
Steven Perron	c1b936637e	Add Insert-extract elimination back into legalization passes. Fixes #1326.	2018-02-21 09:46:51 -05:00
Steven Perron	04cd63e5b9	Make better use of simplification pass The simplification pass works better after all of the dead branches are removed. So swapping them around in the legalization passes. Also adding the simplification pass to performance passes right after dead branch elimination. Added CCP to the legalization passes so we can propagate the constants into the branchs, and remove as many branches a possible. CCP is designed to still get opportunities even if the branches are dead, so it is a good place for it. Fixes #1118	2018-02-16 20:46:49 -05:00
Stephen McGroarty	dd8400e150	Initial support for loop unrolling. This patch adds initial support for loop unrolling in the form of a series of utility classes which perform the unrolling. The pass can be run with the command spirv-opt --loop-unroll. This will unroll loops within the module which have the unroll hint set. The unroller imposes a number of requirements on the loops it can unroll. These are documented in the comments for the LoopUtils::CanPerformUnroll method in loop_utils.h. Some of the restrictions will be lifted in future patches.	2018-02-14 15:44:38 -05:00
Alexander Johnston	84ccd0b9ae	Loop invariant code motion initial implementation	2018-02-08 22:55:47 -05:00
Steven Perron	06cdb96984	Make use of the instruction folder. Implementation of the simplification pass. - Create pass that calls the instruction folder on each instruction and propagate instructions that fold to a copy. This will do copy propagation as well. - Did not use the propagator engine because I want to modify the instruction as we go along. - Change folding to not allocate new instructions, but make changes in place. This change had a big impact on compile time. - Add simplification pass to the legalization passes in place of insert-extract elimination. - Added test cases for new folding rules. - Added tests for the simplification pass - Added a method to the CFG to apply a function to the basic blocks in reverse post order. Contributes to #1164.	2018-02-07 23:01:47 -05:00
Alan Baker	abe113219e	Reordering performance passes ordering to produce better opts * Moved initial insert/extract passes later to cover more opportunities * Added an extra set of passes to clean up opportunities exposed later in the pipeline	2018-02-01 18:01:10 -05:00
Steven Perron	61d8c0384b	Add pass to reaplce invalid opcodes Creates a pass that will remove instructions that are invalid for the current shader stage. For the instruction to be considered for replacement 1) The opcode must be valid for a shader modules. 2) The opcode must be invalid for the current shader stage. 3) All entry points to the module must be for the same shader stage. 4) The function containing the instruction must be reachable from an entry point. Fixes #1247.	2018-02-01 15:25:09 -05:00
GregF	0aa0ac52f7	Opt: Add ScalarReplacement to RegisterSizePasses	2018-01-31 10:19:17 -05:00
GregF	f28b106173	InsertExtractElim: Split out DeadInsertElim as separate pass	2018-01-30 08:52:14 -05:00
Alan Baker	2e93e806e4	Initial implementation of if conversion * Handles simple cases only * Identifies phis in blocks with two predecessors and attempts to convert the phi to an select * does not perform code motion currently so the converted values must dominate the join point (e.g. can't be defined in the branches) * limited for now to two predecessors, but can be extended to handle more cases * Adding if conversion to -O and -Os	2018-01-25 09:42:00 -08:00
Steven Perron	34d4294c2c	Create a pass to work around a driver bug related to OpUnreachable. We have come across a driver bug where and OpUnreachable inside a loop is causing the shader to go into an infinite loop. This commit will try to avoid this bug by turning OpUnreachable instructions that are contained in a loop into branches to the loop merge block. This is not added to "-O" and "-Os" because it should only be used if the driver being targeted has this problem. Fixes #1209.	2018-01-18 20:31:46 -05:00
Steven Perron	8cb0aec724	Remove redundant passes from legalization passes With work that Alan has done, some passes have become redundant. ADCE now removed unused variables. Dead branch elimination removes unreachable blocks. This means we can remove CFG Cleanup and dead variable elimination.	2018-01-12 17:47:50 -05:00
Alan Baker	3a054e1ddc	Adding additional functionality to ADCE. Modified ADCE to remove dead globals. * Entry point and execution mode instructions are marked as alive * Reachable functions and their parameters are marked as alive * Instruction deletion now deferred until the end of the pass * Eliminated dead insts set, added IsDead to calculate that value instead * Ported applicable dead variable elimination tests * Ported dead constant elim tests Added dead function elimination to ADCE * ported dead function elim tests Added handling of decoration groups in ADCE * Uses a custom sorter to traverse decorations in a specific order * Simplifies necessary checks Updated -O and -Os pass lists.	2018-01-10 08:35:48 -05:00
Diego Novillo	e5560d64de	Fix constant propagation of induction variables. This fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1143. When an instruction transitions from constant to bottom (varying) in the lattice, we were telling the propagator that the instruction was varying, but never updating the actual value in the values table. This led to incorrect value substitutions at the end of propagation. The patch also re-enables CCP in -O and -Os.	2018-01-08 15:34:35 -05:00
David Neto	3fbbd3c772	Remove CCP from size and performance recipes, pending bugfixes Currently CCP is incorrectly optimizing loops. See https://github.com/KhronosGroup/SPIRV-Tools/issues/1143	2018-01-05 14:01:18 -05:00
David Neto	c32e79eeef	Add --print-all optimizer option Adds optimizer API to write disassembly to a given output stream before each pass, and after the last pass. Adds spirv-opt --print-all option to write disassembly to stderr before each pass, and after the last pass.	2018-01-04 18:34:18 -05:00
Steven Perron	7834beea80	Update legalization passes I've a few passes the legalization passes. The first is to add the more specialized load-store removal passes to help improve the compile time, as was suggested in #1118. I've also added dead branch elimination while we wait for the behaviour of dead branch elimination to be folded into CFG cleanup. I did not add CCP because it seems like most of the constant propagation what is needed is already being done by the load-store removal passes, which call `ReplaceAllUsesWith`. We can reconsider this if needed.	2018-01-04 11:04:49 -05:00
Diego Novillo	135150a1a8	Do not insert Phi nodes in CCP propagator. In CCP we should not need to insert Phi nodes because CCP never looks at loads/stores. This required adjusting two tests that relied on Phi instructions being inserted. I changed the tests to have the Phi instructions pre-inserted. I also added a new test to make sure that CCP does not try to look through stores and loads. Finally, given that CCP does not handle loads/stores, it's better to run mem2reg before it. I've changed the -O/-Os schedules to run local multi-store elimination before CCP. Although this is just an efficiency fix for CCP, it is also working around a bug in Phi insertion. When Phi instructions are inserted, they are never associated a basic block. This causes a segfault when the propagator tries to lookup CFG edges when analyzing Phi instructions.	2018-01-03 15:12:25 -05:00
Diego Novillo	4ba9dcc8a0	Implement SSA CCP (SSA Conditional Constant Propagation). This implements the conditional constant propagation pass proposed in Constant propagation with conditional branches, Wegman and Zadeck, ACM TOPLAS 13(2):181-210. The main logic resides in CCPPass::VisitInstruction. Instruction that may produce a constant value are evaluated with the constant folder. If they produce a new constant, the instruction is considered interesting. Otherwise, it's considered varying (for unfoldable instructions) or just not interesting (when not enough operands have a constant value). The other main piece of logic is in CCPPass::VisitBranch. This evaluates the selector of the branch. When it's found to be a known value, it computes the destination basic block and sets it. This tells the propagator which branches to follow. The patch required extensions to the constant manager as well. Instead of hashing the Constant pointers, this patch changes the constant pool to hash the contents of the Constant. This allows the lookups to be done using the actual values of the Constant, preventing duplicate definitions.	2017-12-21 14:29:45 -05:00
Steven Perron	7505d24225	Update the legalization passes. Changes the set of optimizations done for legalization. While doing this, I added documentation to explain why we want each optimization. A new option "--legalize-hlsl" is added so the legalization passes can be easily run from the command line. The legalize option implies skip-validation.	2017-12-20 17:56:03 -05:00
Steven Perron	b86eb6842b	Convert private variables to function scope. When a private variable is used in a single function, it can be converted to a function scope variable in that function. This adds a pass that does that. The pass can be enabled using the option `--private-to-local`. This transformation allows other transformations to act on these variables. Also moved `FindPointerToType` from the inline class to the type manager.	2017-12-19 14:21:04 -05:00
Alan Baker	616908503d	Improving the usability of the type manager. The type manager hashes types. This allows the lookup of type declaration ids from arbitrarily constructed types. Users should be cautious when dealing with non-unique types (structs and potentially pointers) to get the exact id if necessary. * Changed the spec composite constant folder to handle ambiguous composites * Added functionality to create necessary instructions for a type * Added ability to remove ids from the type manager	2017-12-18 08:20:56 -05:00
Alan Baker	867451f49e	Add scalar replacement Adds a scalar replacement pass. The pass considers all function scope variables of composite type. If there are accesses to individual elements (and it is legal) the pass replaces the variable with a variable for each composite element and updates all the uses. Added the pass to -O Added NumUses and NumUsers to DefUseManager Added some helper methods for the inst to block mapping in context Added some helper methods for specific constant types No longer generate duplicate pointer types. * Now searches for an existing pointer of the appropriate type instead of failing validation * Fixed spec constant extracts * Addressed changes for review * Changed RunSinglePassAndMatch to be able to run validation * current users do not enable it Added handling of acceptable decorations. * Decorations are also transfered where appropriate Refactored extension checking into FeatureManager * Context now owns a feature manager * consciously NOT an analysis * added some test * fixed some minor issues related to decorates * added some decorate related tests for scalar replacement	2017-12-11 10:51:13 -05:00
Steven Perron	5d602abd66	Add global redundancy elimination Adds a pass that looks for redundant instruction in a function, and removes them. The algorithm is a hash table based value numbering algorithm that traverses the dominator tree. This pass removes completely redundant instructions, not partially redundant ones.	2017-12-07 18:35:38 -05:00
Lei Zhang	aec60b8158	Add RegisterLegalizationPasses() into the interface Add note to mention the use scenario. The original list came from Glslang.	2017-11-23 17:26:44 -05:00
Steven Perron	28c415500d	Create a local value numbering pass Creates a pass that removes redundant instructions within the same basic block. This will be implemented using a hash based value numbering algorithm. Added a number of functions that check for the Vulkan descriptor types. These are used to determine if we are variables are read-only or not. Implemented a function to check if loads and variables are read-only. Implemented kernel specific and shader specific versions. A big change is that the Combinator analysis in ADCE is factored out into the IRContext as an analysis. This was done because it is being reused in the value number table.	2017-11-23 11:45:09 -05:00
Alan Baker	a771713e42	Adding an unique id to Instruction generated by IRContext Each instruction is given an unique id that can be used for ordering purposes. The ids are generated via the IRContext. Major changes: * Instructions now contain a uint32_t for unique id and a cached context pointer * Most constructors have been modified to take a context as input * unfortunately I cannot remove the default and copy constructors, but developers should avoid these * Added accessors to parents of basic block and function * Removed the copy constructors for BasicBlock and Function and replaced them with Clone functions * Reworked BuildModule to return an IRContext owning the built module * Since all instructions require a context, the context now becomes the basic unit for IR * Added a constructor to context to create an owned module internally * Replaced uses of Instruction's copy constructor with Clone whereever I found them * Reworked the linker functionality to perform clones into a different context instead of moves * Updated many tests to be consistent with the above changes * Still need to add new tests to cover added functionality * Added comparison operators to Instruction * Added an internal option to LinkerOptions to verify merged ids are unique * Added a test for the linker to verify merged ids are unique * Updated MergeReturnPass to supply a context * Updated DecorationManager to supply a context for cloned decorations * Reworked several portions of the def use tests in anticipation of next set of changes	2017-11-20 17:49:10 -05:00

1 2

80 Commits