SPIRV-Tools

mirror of https://github.com/KhronosGroup/SPIRV-Tools synced 2024-11-27 05:40:06 +00:00

Author	SHA1	Message	Date
Steven Perron	e065cc208f	Keep decorations when replacing loads in access-chain-convert. (#1829 ) In local-access-chain-convert, we replace loads by load the entire variable, then doing the extract. The extract will have the same value as the load. However, if the load has a decoration on it, the decoration is lost because we do not copy any them to the new id. This is fixed by rewritting the load into the extract and keeping the same result id. This change has the effect that we do not call DCEInst on the loads because the load is not being deleted, but replaced. This could leave OpAccessChain instructions around that are not used. This is not a problem for -O and -Os. They run local_single_*_elim passes and then dead code elimination. The dce will remove the unused access chains, and the load elimination passes work even if there are unused access chains. I have added test to them to ensure they will not loss opportunities. Fixes #1787.	2018-08-15 09:14:21 -04:00
dan sinclair	eda2cfbe12	Cleanup includes. (#1795 ) This Cl cleans up the include paths to be relative to the top level directory. Various include-what-you-use fixes have been added.	2018-08-03 15:06:09 -04:00
dan sinclair	2cce2c5b97	Move tests into namespaces (#1689 ) This CL moves the test into namespaces based on their directories.	2018-07-11 09:24:49 -04:00
Steven Perron	f46f2d3e5d	Remove redundant stores. The code patterns generated by DXC around function calls can cause many store to be storing the same value that was just loaded from the same location: ``` %10 = OpLoad %type %var OpStore %var %10 ``` We want to clean these up very early on because they can cause other transformations to do a lot of work. For the cases I see, they can be removed during local-single-block-elim. For one set of shaders the compile time goes from 248s to 182s. A 26% improvement. Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1494.	2018-05-15 10:24:05 -04:00
Greg Fischer	268be6143d	LocalSingleBlockElim: Add store-store elimination Eliminate unused store to variable if followed by store to same variable in same block. Most significantly, this cleans up stores made unused by this pass. These useless stores can inhibit subsequent optimizations, specifically LocalSingleStoreElim. Eliminating them makes subsequent optimization more effective. The main effect of this pass is to simplify the work done by the SSA rewriter. It catches many local loads/stores that help speeding up the work done by the main rewriter.	2018-04-25 10:30:18 -04:00
Steven Perron	c20a718e00	Rewrite local-single-store-elim to not create large data structures. The local-single-store-elim algorithm is not fundamentally bad. However, when there are a large number of variables, some of the maps that are used can become very large. These large data structures then take a very long time to be destroyed. I've seen cases around 40% if the time. I've rewritten that algorithm to not use as much memory. This give a significant improvement when running a large number of shader through DXC. I've also made a small change to local-single-block-elim to delete the loads that is has replaced. That way local-single-store-elim will not have to look at those. local-single-store-elim now does the same thing. The time for one set goes from 309s down to 126s. For another set, the time goes from 102s down to 88s.	2018-04-18 16:38:18 -04:00
Steven Perron	2cb589cc14	Remove uses DCEInst and call ADCE The algorithm used in DCEInst to remove dead code is very slow. It is fine if you only want to remove a small number of instructions, but, if you need to remove a large number of instructions, then the algorithm in ADCE is much faster. This PR removes the calls to DCEInst in the load-store removal passes and adds a pass of ADCE afterwards. A number of different iterations of the order of optimization, and I believe this is the best I could find. The results I have on 3 sets of shaders are: Legalization: Set 1: 5.39 -> 5.01 Set 2: 13.98 -> 8.38 Set 3: 98.00 -> 96.26 Performance passes: Set 1: 6.90 -> 5.23 Set 2: 10.11 -> 6.62 Set 3: 253.69 -> 253.74 Size reduction passes: Set 1: 7.16 -> 7.25 Set 2: 17.17 -> 16.81 Set 3: 112.06 -> 107.71 Note that the third set's compile time is large because of the large number of basic blocks, not so much because of the number of instructions. That is why we don't see much gain there.	2018-02-27 21:06:08 -05:00
Steven Perron	79a00649b4	Allow pointers to pointers in logical addressing mode. A few optimizations are updates to handle code that is suppose to be using the logical addressing mode, but still has variables that contain pointers as long as the pointer are to opaque objects. This is called "relaxed logical addressing". \|Instruction::GetBaseAddress\| will check that pointers that are use meet the relaxed logical addressing rules. Optimization that now handle relaxed logical addressing instead of logical addressing are: - aggressive dead-code elimination - local access chain convert - local store elimination passes.	2017-12-19 14:29:14 -05:00
Diego Novillo	83228137e1	Re-format source tree - NFC. Re-formatted the source tree with the command: $ /usr/bin/clang-format -style=file -i \ $(find include source tools test utils -name '.cpp' -or -name '.h') This required a fix to source/val/decoration.h. It was not including spirv.h, which broke builds when the #include headers were re-ordered by clang-format.	2017-11-27 14:31:49 -05:00
Steven Perron	e4c7d8e748	Add strength reduction; for now replace multiply by power of 2 Create a new optimization pass, strength reduction, which will replace integer multiplication by a constant power of 2 with an equivalent bit shift. More changes could be added later. - Does not duplicate constants - Adds vector \|Concat\| utility function to a common test header.	2017-09-18 17:01:36 -04:00
GregF	c8c86a0d36	Opt: Have "size" passes process full entry point call tree. Includes code to deal correctly with OpFunctionParameter. This is needed by opaque propagation which may not exhaustively inline entry point functions. Adds ProcessEntryPointCallTree: a method to do work on the functions in the entry point call trees in a deterministic order.	2017-08-18 10:16:01 -04:00
GregF	1d477b9898	Opt: Add opaque tests	2017-08-15 15:54:41 -06:00
GregF	7954740d54	Opt: Delete names and decorations of dead instructions	2017-07-26 18:36:41 -04:00
Lei Zhang	9f6efc76c8	Opt: HasOnlySupportedRefs should consider OpCopyObject This fixes test failure after merging the previous pull request.	2017-07-25 23:22:09 -04:00
GregF	adb237f3bd	Fix handling of CopyObject in GetPtr and its call sites	2017-07-21 18:08:01 -04:00
GregF	7c8da66bc2	mem2reg: Add pass to eliminate local loads and stores in single block.	2017-06-12 17:03:47 -04:00

16 Commits