SPIRV-Tools

mirror of https://github.com/KhronosGroup/SPIRV-Tools synced 2024-11-23 20:20:06 +00:00

Author	SHA1	Message	Date
Stephen McGroarty	ad7e4b8401	Initial patch for scalar evolution analysis This patch adds support for the analysis of scalars in loops. It works by traversing the defuse chain to build a DAG of scalar operations and then simplifies the DAG by folding constants and grouping like terms. It represents induction variables as recurrent expressions with respect to a given loop and can simplify DAGs containing recurrent expression by rewritting the entire DAG to be a recurrent expression with respect to the same loop.	2018-03-28 16:34:23 -04:00
Alan Baker	97c8fdccd2	Adding OpPhi validation rules. * Added tests * Fixes SSA check for unreachable phi parents * Fixes invalid cfg cleanup test	2018-03-27 17:26:26 -04:00
Steven Perron	5e07ab1358	Handle more cases in copy propagate arrays. When we change the type of an object that gets stored, we do not want to change the type of the memory location being stored to. In order to still be able to do the rewrite, we will decompose and rebuild the object so it is the type that can be stored. Fixes #1416.	2018-03-27 11:04:49 -04:00
Steven Perron	c4dc046399	Copy propagate arrays The sprir-v generated from HLSL code contain many copyies of very large arrays. Not only are these time consumming, but they also cause problems for drivers because they require too much space. To work around this, we will implement an array copy propagation. Note that we will not implement a complete array data flow analysis in order to implement this. We will be looking for very simple cases: 1) The source must never be stored to. 2) The target must be stored to exactly once. 3) The store to the target must be a store to the entire array, and be a copy of the entire source. 4) All loads of the target must be dominated by the store. The hard part is keeping all of the types correct. We do not want to have to do too large a search to update everything, which may not be possible, do we give up if we see any instruction that might be hard to update. Also in types.h, the element decorations are not stored in an std::map. This change was done so the hashing algorithm for a Struct is consistent. With the std::unordered_map, the traversal order was non-deterministic leading to the same type getting hashed to different values. See \|Struct::GetExtraHashWords\|. Contributes to #1416.	2018-03-26 14:44:41 -04:00
Diego Novillo	735d8a579e	SSA rewrite pass. This pass replaces the load/store elimination passes. It implements the SSA re-writing algorithm proposed in Simple and Efficient Construction of Static Single Assignment Form. Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013) In: Jhala R., De Bosschere K. (eds) Compiler Construction. CC 2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin, Heidelberg https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6 In contrast to common eager algorithms based on dominance and dominance frontier information, this algorithm works backwards from load operations. When a target variable is loaded, it queries the variable's reaching definition. If the reaching definition is unknown at the current location, it searches backwards in the CFG, inserting Phi instructions at join points in the CFG along the way until it finds the desired store instruction. The algorithm avoids repeated lookups using memoization. For reducible CFGs, which are a superset of the structured CFGs in SPIRV, this algorithm is proven to produce minimal SSA. That is, it inserts the minimal number of Phi instructions required to ensure the SSA property, but some Phi instructions may be dead (https://en.wikipedia.org/wiki/Static_single_assignment_form).	2018-03-20 20:56:55 -04:00
Victor Lomuller	bdf421cf40	Add loop peeling utility The loop peeler util takes a loop as input and create a new one before. The iterator of the duplicated loop then set to accommodate the number of iteration required for the peeling. The loop peeling pass that decided to do the peeling and profitability analysis is left for a follow-up PR.	2018-03-20 10:21:10 -04:00
Steven Perron	b3daa93b46	Change merge return pass to handle structured cfg. We are seeing shaders that have multiple returns in a functions. These functions must get inlined for legalization purposes; however, the inliner does not know how to inline functions that have multiple returns. The solution we will go with it to improve the merge return pass to handle structured control flow. Note that the merge return pass will assume the cfg has been cleanedup by dead branch elimination. Fixes #857.	2018-03-19 13:49:04 -04:00
David Neto	844e186cf7	Add --strip-reflect pass Strips reflection info. This is limited to decorations and decoration instructions related to the SPV_GOOGLE_hlsl_functionality1 extension. It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1. It will also remove the OpExtension for SPV_GOOGLE_decorate_string if there are no further remaining uses of OpDecorateStringGOOGLE. Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398	2018-03-15 21:20:42 -04:00
David Neto	884933366b	Teach DecorationManager about OpDecorateStringGOOGLE Also add more decoration manager test coverage for OpDecorateId. Fixes #1396	2018-03-13 22:18:33 -04:00
Alan Baker	7e03e76a5f	Fixes #1402 . Don't merge non-branch terminators into loop header. Added tests	2018-03-13 22:16:17 -04:00
Alan Baker	43d1609183	Fixes #1407 . Removing assertion against void pointer Added test	2018-03-13 19:45:20 -04:00
Alan Baker	4065adf05d	Fixes #1404 . Don't DCE workgroup size Added test.	2018-03-13 19:38:31 -04:00
Pierre Moreau	5bd55f10cd	Reimplement the DecorationManager This reimplementation fixes several issues when removing decorations associated to an ID (partially addresses #1174 and gives tools for fixing #898), as well as making it easier to remove groups; a few additional tests have been added. DecorationManager::RemoveDecoration() will still not delete dead decorations it created, but I do not think it is its job either; given the following input ``` OpCapability Shader OpCapability Linkage OpMemoryModel Logical GLSL450 OpDecorate %2 Restrict %2 = OpDecorationGroup OpGroupDecorate %2 %1 %3 OpDecorate %4 Invariant %4 = OpDecorationGroup OpGroupDecorate %4 %2 %uint = OpTypeInt 32 0 %1 = OpVariable %uint Uniform %3 = OpVariable %uint Uniform ``` which of the following two outputs would you expect RemoveDecoration(2) to produce: ``` OpCapability Shader OpCapability Linkage OpMemoryModel Logical GLSL450 %uint = OpTypeInt 32 0 %1 = OpVariable %uint Uniform %3 = OpVariable %uint Uniform ``` or ``` OpCapability Shader OpCapability Linkage OpMemoryModel Logical GLSL450 OpDecorate %4 Invariant %4 = OpDecorationGroup %uint = OpTypeInt 32 0 %1 = OpVariable %uint Uniform %3 = OpVariable %uint Uniform ``` Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/924 Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1174	2018-03-12 09:56:14 -04:00
Alan Baker	bc9cfee6fa	Fixes #1385 . Grab correct input to calculate indices. * Added tests to catch the bug	2018-03-07 16:07:40 -05:00
Alan Baker	5f50e6209c	Fixes #1376 . Don't handle half folding gracefully. * Added early returns to folding rules to prevent half attempts * Added some tests	2018-03-06 14:00:02 -05:00
Steven Perron	9ba50e34f2	Avoid generating duplicate names when merging types The merging types we do not remove other information related to the types. We simply leave it duplicated, and hope it is removed later. This is what happens with decorations. They are removed in the next phase of remove duplicates. However, for OpNames that is not the case. We end up with two different names for the same id, which does not make sense. The solution is to remove the names and decorations for the type being removed instead of rewriting them to refer to the other type. Note that it is possible that if the first type does not have a name, then the types will end up with no name. That is fine because the names should not have any semantic significance anyway. The was identified in issue #1372, but this does not fix that issue.	2018-03-05 12:02:50 -05:00
Pierre Moreau	6cd6e5ebef	Define Disassemble only when Effcee is used in fold_test	2018-03-02 16:40:52 -05:00
Alan Baker	52bceb3569	Handles more cases of redundant selects * Handles OpConstantNull and vector types * vector selects (except against a null) are converted to vector shuffles * Added tests	2018-03-02 14:28:08 -05:00
Alan Baker	824625760b	Fixes #1361 . Mark all non-constant global values as varying in CCP * Also mark function parameters as varying * Conservatively mark assignment instructions as varying if any input is varying after attempting to fold * Added a test to catch this case	2018-03-01 15:24:41 -05:00
Alan Baker	ce5941a642	Fixes #1357 . Support null constants better in folding * getFloatConstantKind() now handles OpConstantNull * PerformOperation() now handles OpConstantNull for vectors * Fixed some instances where we would attempt to merge a division by 0 * added tests	2018-02-28 23:12:27 -05:00
GregF	bdaf8d56fb	Opt: Add constant folding for FToI and IToF	2018-02-28 23:08:52 -05:00
Alan Baker	9457cabbce	Fixes #1354 . Do not merge integer division. * Removes merging of div with a div or mul for integers * Updated tests	2018-02-28 13:33:21 -05:00
Steven Perron	588f4fcc95	Add more folding rules for vector shuffle. Adds rule to fold OpVectorShuffle with constant inputs. Adds rules to fold OpCompositeExtrac being fed by an OpVectorShuffle.	2018-02-27 21:20:22 -05:00
Steven Perron	2cb589cc14	Remove uses DCEInst and call ADCE The algorithm used in DCEInst to remove dead code is very slow. It is fine if you only want to remove a small number of instructions, but, if you need to remove a large number of instructions, then the algorithm in ADCE is much faster. This PR removes the calls to DCEInst in the load-store removal passes and adds a pass of ADCE afterwards. A number of different iterations of the order of optimization, and I believe this is the best I could find. The results I have on 3 sets of shaders are: Legalization: Set 1: 5.39 -> 5.01 Set 2: 13.98 -> 8.38 Set 3: 98.00 -> 96.26 Performance passes: Set 1: 6.90 -> 5.23 Set 2: 10.11 -> 6.62 Set 3: 253.69 -> 253.74 Size reduction passes: Set 1: 7.16 -> 7.25 Set 2: 17.17 -> 16.81 Set 3: 112.06 -> 107.71 Note that the third set's compile time is large because of the large number of basic blocks, not so much because of the number of instructions. That is why we don't see much gain there.	2018-02-27 21:06:08 -05:00
Alan Baker	802cf053c7	Merge arithmetic with non-trivial constant operands Adding basis of arithmetic merging * Refactored constant collection in ConstantManager * New rules: * consecutive negates * negate of arithmetic op with a constant * consecutive muls * reciprocal of div * Removed IRContext::CanFoldFloatingPoint * replaced by Instruction::IsFloatingPointFoldingAllowed * Fixed some bad tests * added some header comments Added PerformIntegerOperation * minor fixes to constants and tests * fixed IntMultiplyBy1 to work with 64 bit ints * added tests for integer mul merging Adding test for vector integer multiply merging Adding support for merging integer add and sub through negate * Added tests Adding rules to merge mult with preceding divide * Has a couple tests, but needs more * Added more comments Fixed bug in integer division folding * Will no longer merge through integer division if there would be a remainder in the division * Added a bunch more tests Adding rules to merge divide and multiply through divide * Improved comments * Added tests Adding rules to handle mul or div of a negation * Added tests Changes for review * Early exit if no constants are involved in more functions * fixed some comments * removed unused declaration * clarified some logic Adding new rules for add and subtract * Fold adds of adds, subtracts or negates * Fold subtracts of adds, subtracts or negates * Added tests	2018-02-27 13:02:13 -05:00
Stephen McGroarty	20b8cdb7c6	Make IR builder use the type manager for constants This change makes the IR builder use the type manager to generate OpTypeInts when creating OpConstants. This avoids dangling references being stored by the created OpConstants.	2018-02-27 12:59:26 -05:00
Victor Lomuller	3497a94460	Add loop unswitch pass. It moves all conditional branching and switch whose conditions are loop invariant and uniform. Before performing the loop unswitch we check that the loop does not contain any instruction that would prevent it (barriers, group instructions etc.).	2018-02-27 08:52:46 -05:00
Stephen McGroarty	e354984b09	Unroller support for multiple induction variables Support for multiple induction variables within a loop and support for loop condition operands <= and >=.	2018-02-27 11:50:08 +00:00
Steven Perron	3f19c2031a	Preserve analysies in the simplification pass Fixes a bug at the same time. In `UpdateDefUse`, if the definition already exists, we are not suppose to analyse it again. When you do the entries for the definition are deleted, and we don't want that. The check for this was wrong.	2018-02-22 16:06:30 -05:00
GregF	46a9ec9d23	Opt: Check for side-effects in DCEInst() This function now checks for side-effects before adding operand instructions to the dead instruction work list. Because this fix puts more pressure on IsCombinatorInstruction() to be correct, this commit adds all OpConstant* and OpType* instructions to combinator_ops_ set. Fixes #1341.	2018-02-22 12:24:13 -05:00
Alan Baker	01760d2f0f	Fixes #1338 . Handle OpConstantNull in branch/switch conditions * No longer assume the branch/switch condition must be bool or int constants (respectively) * Added a couple unit tests for each case	2018-02-21 10:22:39 -05:00
Arseny Kapoulkine	309be423cc	Add folding for redundant add/sub/mul/div/mix operations This change implements instruction folding for arithmetic operations that are redundant, specifically: x + 0 = 0 + x = x x - 0 = x 0 - x = -x x * 0 = 0 * x = 0 x * 1 = 1 * x = x 0 / x = 0 x / 1 = x mix(a, b, 0) = a mix(a, b, 1) = b Cache ExtInst import id in feature manager This allows us to avoid string lookups during optimization; for now we just cache GLSL std450 import id but I can imagine caching more sets as they become utilized by the optimizer. Add tests for add/sub/mul/div/mix folding The tests cover scalar float/double cases, and some vector cases. Since most of the code for floating point folding is shared, the tests for vector folding are not as exhaustive as scalar. To test sub->negate folding I had to implement a custom fixture.	2018-02-20 18:29:27 -05:00
Steven Perron	9d95a91a9f	Fix folding insert feeding extract I mixed up two cases when folding an OpCompositeExtract that is feed by and OpCompositeInsert. The specific cases are demonstracted in the new test. I mixed up the conditions for the cases, and treated one like the other. Fixes #1323.	2018-02-20 11:22:51 -05:00
Alan Baker	c3f34d8bf3	Fixes #1300 . Adding checks for bad CCP transitions and unsettled values * Now track propagation status and assert on bad statuses * Added helper methods to access instruction propagation status * Modified the phi meet operator to properly reflect the paper it is based on * Modified SSA edge addition so that all edge are added, but only on state changes * Fixed a bug in instruction simulation where interesting conditional branches would not mark the interesting edge as executed * Added a test to catch this bug * Added an ostream operator for SSAPropagator::PropStatus	2018-02-18 19:41:34 -05:00
Arseny Kapoulkine	1054413600	Add constant folding rules for floating-point comparison This change handles all 6 regular comparison types in two variations, ordered (true if values are ordered and comparison is true) and unordered (true if values are unordered or comparison is true). Ordered comparison matches the default floating-point behavior on host but we use std::isnan to check ordering explicitly anyway. This change also slightly reworks the floating-point folding support code to make it possible to define a folding operation that returns boolean instead of floating point. These tests exhaustively test ordered/unordered comparisons for float/double. Since for NaN inputs the comparison result doesn't depend on the comparison function, we just test == and !=; NaN inputs result in true unordered comparisons and false ordered comparisons.	2018-02-16 20:41:22 -05:00
Steven Perron	50f307f889	Simplify OpPhi instructions referencing unreachable continues In dead branch elimination, we already recognize unreachable continue blocks, and update OpPhi instruction accordingly. This change adds an extra check: if the head block has exactly 1 other incoming edge, then replace the OpPhi with the value from that edge. Fixes #1314.	2018-02-16 18:58:03 -05:00
Steven Perron	3756b387f3	Get CCP to use the constant floating point rules. Fixes #1311	2018-02-16 13:49:47 -05:00
Arseny Kapoulkine	32a8e04c7d	Add folding of redundant OpSelect insns We can fold OpSelect into one of the operands in two cases: - condition is constant - both results are the same Even if the original shader doesn't have either of these, if-conversion pass sometimes ends up generating instructions like %7127 = OpSelect %int %3220 %7058 %7058 And this optimization cleans them up.	2018-02-15 10:03:22 -05:00
Steven Perron	6669d8163d	Fold binary floating point operators. Adds the floating rules for FAdd, FDiv, FMul, and FSub. Contributes to #1164.	2018-02-14 15:48:15 -05:00
Stephen McGroarty	dd8400e150	Initial support for loop unrolling. This patch adds initial support for loop unrolling in the form of a series of utility classes which perform the unrolling. The pass can be run with the command spirv-opt --loop-unroll. This will unroll loops within the module which have the unroll hint set. The unroller imposes a number of requirements on the loops it can unroll. These are documented in the comments for the LoopUtils::CanPerformUnroll method in loop_utils.h. Some of the restrictions will be lifted in future patches.	2018-02-14 15:44:38 -05:00
Alan Baker	229ebc0665	Fixes #1295 . Mark undef values as varying in ccp. * Undef now marked as varying in ccp * this prevents incorrect meet operations since phis were always not interesting * added a test to catch the bug	2018-02-14 10:21:26 -05:00
Steven Perron	1d7b1423f9	Add folding of OpCompositeExtract and OpConstantComposite constant instructions. Create files for constant folding rules. Add the rules for OpConstantComposite and OpCompositeExtract.	2018-02-09 17:52:33 -05:00
Alexander Johnston	84ccd0b9ae	Loop invariant code motion initial implementation	2018-02-08 22:55:47 -05:00
GregF	ca4457b4b6	SROA: Do replacement on structs with no partial references.	2018-02-08 15:20:02 -05:00
Steven Perron	06cdb96984	Make use of the instruction folder. Implementation of the simplification pass. - Create pass that calls the instruction folder on each instruction and propagate instructions that fold to a copy. This will do copy propagation as well. - Did not use the propagator engine because I want to modify the instruction as we go along. - Change folding to not allocate new instructions, but make changes in place. This change had a big impact on compile time. - Add simplification pass to the legalization passes in place of insert-extract elimination. - Added test cases for new folding rules. - Added tests for the simplification pass - Added a method to the CFG to apply a function to the basic blocks in reverse post order. Contributes to #1164.	2018-02-07 23:01:47 -05:00
David Neto	e7fafdaa68	Fix test inclusion when Effcee is absent	2018-02-06 12:10:50 -05:00
Alan Baker	871022772e	Registering a type now rebuilds it out of memory owned by the manager. * Added TypeManager::RebuildType * rebuilds the type and its constituent types in terms of memory owned by the manager. * Used by TypeManager::RegisterType to properly allocate memory * Adding an unit test to expose the issue * Added some tests to provide coverage of RebuildType * Added an accessor to the target pointer for a forward pointer	2018-02-06 10:17:56 -05:00
Steven Perron	bc1ec9418b	Add general folding infrastructure. Create the folding engine that will 1) attempt to fold an instruction. 2) iterates on the folding so small folding rules can be easily combined. 3) insert new instructions when needed. I've added the minimum number of rules needed to test the features above.	2018-02-02 12:24:11 -05:00
Victor Lomuller	50e85c865c	Add LoopUtils class to gather some loop transformation support. This patch adds LoopUtils class to handle some loop related transformations. For now it has 2 transformations that simplifies other transformations such as loop unroll or unswitch: - Dedicate exit blocks: this ensure that all exit basic block (out-of-loop basic blocks that have a predecessor in the loop) have all their predecessors in the loop; - Loop Closed SSA (LCSSA): this ensure that all definitions in a loop are used inside the loop or in a phi instruction in an exit basic block. It also adds the following capabilities: - Loop::IsLCSSA to test if the loop is in a LCSSA form - Loop::GetOrCreatePreHeaderBlock that can build a loop preheader if required; - New methods to allow on the fly updates of the loop descriptors. - New methods to allow on the fly updates of the CFG analysis. - Instruction::SetOperand to allow expression of the index relative to Instruction::NumOperands (to be compatible with the index returned by DefUseManager::ForEachUse)	2018-02-01 15:35:09 -05:00
Steven Perron	61d8c0384b	Add pass to reaplce invalid opcodes Creates a pass that will remove instructions that are invalid for the current shader stage. For the instruction to be considered for replacement 1) The opcode must be valid for a shader modules. 2) The opcode must be invalid for the current shader stage. 3) All entry points to the module must be for the same shader stage. 4) The function containing the instruction must be reachable from an entry point. Fixes #1247.	2018-02-01 15:25:09 -05:00

1 2 3 4 5

236 Commits