SPIRV-Tools

mirror of https://github.com/KhronosGroup/SPIRV-Tools synced 2024-10-19 11:30:15 +00:00

Author	SHA1	Message	Date
Alan Baker	4f866abfd8	Validate static uses of interfaces Fixes #1120 Checks that all static uses of the Input and Output variables are listed as interfaces in each corresponding entry point declaration. * Changed validation state to track interface lists * updated many tests * Modified validation state to store entry point names * Combined with interface list and called EntryPointDescription * Updated uses * Changed interface validation error messages to output entry point name in addtion to ID	2018-06-13 10:56:14 -04:00
GregF	6fbfe1c016	Fix SSA rewrite for nested loops. From the test case, the slice of the CFG that is interesting for the bug is 25 \| v 30 \| v 31<-+ \| \| v \| 34--+ 1. In block 25, we have a Phi candidate for %f with arguments %47 = Phi[%float_0, %0]. This merges %float_0 and a yet unknown argument from the external loop backedge. 2. We are now processing block 34: i. The load %35 = OpLoad %f triggers a Phi candidate to be placed in block 31. ii. The Phi candidate %50 = Phi needs two arguments. The one coming from block 30 is %47. But the one coming from block 34 (which we are now processing and have marked sealed), finds %50 itself as the reaching def for %f. 3. This wrongfully marks %50 as a copy-of Phi, which ultimately makes both %47 and %50 copy-of Phis that get eliminated.	2018-04-06 15:17:52 -04:00
Neil Roberts	57a2441791	hex_float: Use max_digits10 for the float precision CPPreference.com has this description of digits10: “The value of std::numeric_limits<T>::digits10 is the number of base-10 digits that can be represented by the type T without change, that is, any number with this many significant decimal digits can be converted to a value of type T and back to decimal form, without change due to rounding or overflow.” This means that any number with this many digits can be represented accurately in the corresponding type. A change in any digit in a number after that may or may not cause it a different bitwise representation. Therefore this isn’t necessarily enough precision to accurately represent the value in text. Instead we need max_digits10 which has the following description: “The value of std::numeric_limits<T>::max_digits10 is the number of base-10 digits that are necessary to uniquely represent all distinct values of the type T, such as necessary for serialization/deserialization to text.” The patch includes a test case in hex_float_test which tries to do a round-robin conversion of a number that requires more than 6 decimal places to be accurately represented. This would fail without the patch. Sadly this also breaks a bunch of other tests. Some of the tests in hex_float_test use ldexp and then compare it with a value which is not the same as the one returned by ldexp but instead is the value rounded to 6 decimals. Others use values that are not evenly representable as a binary floating fraction but then happened to generate the same value when rounded to 6 decimals. Where the actual value didn’t seem to matter these have been changed with different values that can be represented as a binary fraction.	2018-04-03 12:53:10 -04:00
Diego Novillo	735d8a579e	SSA rewrite pass. This pass replaces the load/store elimination passes. It implements the SSA re-writing algorithm proposed in Simple and Efficient Construction of Static Single Assignment Form. Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013) In: Jhala R., De Bosschere K. (eds) Compiler Construction. CC 2013. Lecture Notes in Computer Science, vol 7791. Springer, Berlin, Heidelberg https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6 In contrast to common eager algorithms based on dominance and dominance frontier information, this algorithm works backwards from load operations. When a target variable is loaded, it queries the variable's reaching definition. If the reaching definition is unknown at the current location, it searches backwards in the CFG, inserting Phi instructions at join points in the CFG along the way until it finds the desired store instruction. The algorithm avoids repeated lookups using memoization. For reducible CFGs, which are a superset of the structured CFGs in SPIRV, this algorithm is proven to produce minimal SSA. That is, it inserts the minimal number of Phi instructions required to ensure the SSA property, but some Phi instructions may be dead (https://en.wikipedia.org/wiki/Static_single_assignment_form).	2018-03-20 20:56:55 -04:00
Steven Perron	2cb589cc14	Remove uses DCEInst and call ADCE The algorithm used in DCEInst to remove dead code is very slow. It is fine if you only want to remove a small number of instructions, but, if you need to remove a large number of instructions, then the algorithm in ADCE is much faster. This PR removes the calls to DCEInst in the load-store removal passes and adds a pass of ADCE afterwards. A number of different iterations of the order of optimization, and I believe this is the best I could find. The results I have on 3 sets of shaders are: Legalization: Set 1: 5.39 -> 5.01 Set 2: 13.98 -> 8.38 Set 3: 98.00 -> 96.26 Performance passes: Set 1: 6.90 -> 5.23 Set 2: 10.11 -> 6.62 Set 3: 253.69 -> 253.74 Size reduction passes: Set 1: 7.16 -> 7.25 Set 2: 17.17 -> 16.81 Set 3: 112.06 -> 107.71 Note that the third set's compile time is large because of the large number of basic blocks, not so much because of the number of instructions. That is why we don't see much gain there.	2018-02-27 21:06:08 -05:00
Alan Baker	eb0c73dad6	Maintain instruction to block mapping in phi insertion * Changed MemPass::InsertPhiInstructions to set basic blocks for new phis * Local SSA elim now maintains instr to block mapping * Added a test and confirmed it fails without the updated phis * IRContext::set_instr_block no longer builds the map if the analysis is invalid * Added instruction to block mapping verification to IRContext::IsConsistent()	2018-01-12 10:16:53 -05:00
Steven Perron	79a00649b4	Allow pointers to pointers in logical addressing mode. A few optimizations are updates to handle code that is suppose to be using the logical addressing mode, but still has variables that contain pointers as long as the pointer are to opaque objects. This is called "relaxed logical addressing". \|Instruction::GetBaseAddress\| will check that pointers that are use meet the relaxed logical addressing rules. Optimization that now handle relaxed logical addressing instead of logical addressing are: - aggressive dead-code elimination - local access chain convert - local store elimination passes.	2017-12-19 14:29:14 -05:00
GregF	78c025abe9	MultiStore: Support OpVariable Initialization Treat an OpVariable with initialization as if it was an OpStore. With PR #1073, this completes work for issue #1017.	2017-12-11 10:37:14 -05:00
Diego Novillo	83228137e1	Re-format source tree - NFC. Re-formatted the source tree with the command: $ /usr/bin/clang-format -style=file -i \ $(find include source tools test utils -name '.cpp' -or -name '.h') This required a fix to source/val/decoration.h. It was not including spirv.h, which broke builds when the #include headers were re-ordered by clang-format.	2017-11-27 14:31:49 -05:00
GregF	ac04b2faea	Opt: Fix HasLoads to not report decoration as load.	2017-11-07 17:39:58 -05:00
David Neto	33b879c105	elim-multi-store: only patch loop header phis that we created There can already be OpPhi instructions in a loop header that are unrelated to the optimization. We should not be patching those. Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/826	2017-09-21 10:01:30 -04:00
GregF	cc8bad3a5b	Add LocalMultiStoreElim pass A SSA local variable load/store elimination pass. For every entry point function, eliminate all loads and stores of function scope variables only referenced with non-access-chain loads and stores. Eliminate the variables as well. The presence of access chain references and function calls can inhibit the above optimization. Only shader modules with logical addressing are currently processed. Currently modules with any extensions enabled are not processed. This is left for future work. This pass is most effective if preceeded by Inlining and LocalAccessChainConvert. LocalSingleStoreElim and LocalSingleBlockElim will reduce the work that this pass has to do.	2017-07-07 17:54:21 -04:00

12 Commits