Commit Graph

30 Commits

Author SHA1 Message Date
Steven Perron
6b07212659
Use OpReturn* in wrap-opkill (#2886)
* Use OpReturn* in wrap-opkill

The warp-opkill pass is generating incorrect code.  It is placing an
OpUnreachable at the end of a basic block, when the block can be
reached.  We can't reach the end of the block, but we can reach the end.
Instead we will add a return instruction.

Fixes #2875.
2019-09-20 10:32:27 -04:00
Steven Perron
61edde52a0 Revert "Use OpReturn* in wrap-opkill"
This reverts commit 87f0fa432f.
2019-09-19 22:39:56 -04:00
Steven Perron
87f0fa432f Use OpReturn* in wrap-opkill
The warp-opkill pass is generating incorrect code.  It is placing an
OpUnreachable at the end of a basic block, when the block can be
reached.  We can't reach the end of the block, but we can reach the end.
Instead we will add a return instruction.

Fixes #2875.
2019-09-19 22:34:57 -04:00
Steven Perron
a41520eaa4
Replace uses of SPV_AMD_shader_trinary_minmax extension (#2835)
Part of #2814
2019-09-05 09:29:04 -04:00
greg-lunarg
c77045b4a0 Instrument: Be sure Float16 capability on when generating float16 null (#2831) 2019-09-03 15:19:36 -04:00
greg-lunarg
d11725b1d4 Add --relax-float-ops and --convert-relaxed-to-half (#2808)
The first pass applies the RelaxedPrecision decoration to all executable
instructions with float32 based type results. The second pass converts
all executable instructions with RelaxedPrecision result to the equivalent
float16 type, inserting converts where necessary.
2019-09-03 13:22:13 -04:00
Steven Perron
35d98be3bc
Amd ext to khr (#2811)
Add the first steps to removing the AMD extension VK_AMD_shader_ballot.
Splitting up to make the PRs smaller.

Adding utilities to add capabilities and change the version of the
module.

Replaces the instructions:

OpGroupIAddNonUniformAMD = 5000
OpGroupFAddNonUniformAMD = 5001
OpGroupFMinNonUniformAMD = 5002
OpGroupUMinNonUniformAMD = 5003
OpGroupSMinNonUniformAMD = 5004
OpGroupFMaxNonUniformAMD = 5005
OpGroupUMaxNonUniformAMD = 5006
OpGroupSMaxNonUniformAMD = 5007

and extentend instructions

WriteInvocationAMD = 3
MbcntAMD = 4

Part of #2814
2019-08-29 12:48:17 -04:00
Steven Perron
bc62722b80
Handle overflow in wrap-opkill (#2801)
Fixes https://crbug/994203
2019-08-18 19:00:18 -04:00
Steven Perron
60043edfa1
Replace OpKill With function call. (#2790)
We are no able to inline OpKill instructions into a continue construct.
See #2433.  However, we have to be able to inline to correctly do
legalization.  This commit creates a pass that will wrap OpKill
instructions into a function of its own.  That way we are able to inline
the rest of the code.

The follow up to this will be to not inline any function that contains
an OpKill.

Fixes #2726
2019-08-14 09:27:12 -04:00
alan-baker
06c9dc07bd
Upgrade modf and frexp (#2266)
Fixes #2138

* Modf and frexp are upgraded to use the struct version of the
instruction and generate an explicit store whose flags can be upgraded
separately
* Fixed major bug where availability and visibility were reversed for
non-copy memory instructions
* Fixed bug where availability and visibility scope operands were reversed for copy memory
* Upgraded all opt tests to use SPV_ENV_UNIVERSAL_1_3
* Upgrade tests moved into unified tests and removed standalone test
2019-01-07 12:36:38 -05:00
Steven Perron
2e4563d94f
Document in the context what happens with id overflow. (#2159)
Added documentation to the ir context to indicates that TakeNextId()
returns 0 when the max id is reached.  TODOs were added to each call
sight so that we know where we have to start to handle this case.

Handle id overflow in |SplitLoopHeader|.

Handle id overflow in |GetOrCreatePreHeaderBlock|.

Handle failure to create preheader in LICM.

Part of https://github.com/KhronosGroup/SPIRV-Tools/issues/1841.
2018-12-06 09:07:00 -05:00
greg-lunarg
1e9fc1aac1 Add base and core bindless validation instrumentation classes (#2014)
* Add base and core bindless validation instrumentation classes

* Fix formatting.

* Few more formatting fixes

* Fix build failure

* More build fixes

* Need to call non-const functions in order.

Specifically, these are functions which call TakeNextId(). These need to
be called in a specific order to guarantee that tests which do exact
compares will work across all platforms. c++ pretty much does not
guarantee order of evaluation of operands, so any such functions need to
be called separately in individual statements to guarantee order.

* More ordering.

* And more ordering.

* And more formatting.

* Attempt to fix NDK build

* Another attempt to address NDK build problem.

* One more attempt at NDK build failure

* Add instrument.hpp to BUILD.gn

* Some name improvement in instrument.hpp

* Change all types in instrument.hpp to int.

* Improve documentation in instrument.hpp

* Format fixes

* Comment clean up in instrument.hpp

* imageInst -> image_inst

* Fix GetLabel() issue.
2018-11-08 13:54:54 -05:00
Steven Perron
7075c49923
Add dummy loop in merge-return. (#1896)
The current implementation of merge return can create bad, but correct,
code.  When it is not in a loop construct, it will insert a lot of
extra branch around code.  The potentially large number of branches are
bad.  At the same time, it can separate code store to variables from
its uses hiding the fact that the store dominates the load.

This hurts the later analysis because the compiler thinks that multiple
values can reach a load, when there is really only 1.  This poorer
analysis leads to missed optimizations.

The solution is to create a dummy loop around the entire body of the
function, then we can break from that loop with a single branch.  Also
only new merge nodes would be those at the end of loops meaning that
most analysies will not be hurt.

Remove dead code for cases that are no longer possible.

It seems like some drivers expect there the be an OpSelectionMerge
before conditional branches, even if they are not strictly needed.
So we add them.
2018-09-18 08:52:47 -04:00
dan sinclair
eda2cfbe12
Cleanup includes. (#1795)
This Cl cleans up the include paths to be relative to the top level
directory. Various include-what-you-use fixes have been added.
2018-08-03 15:06:09 -04:00
dan sinclair
58a6876cee
Rewrite include guards (#1793)
This CL rewrites the include guards to make PRESUBMIT.py include guard
check happy.
2018-08-03 08:05:33 -04:00
dan sinclair
c7da51a085
Cleanup extraneous namespace qualifies in source/opt. (#1716)
This CL follows up on the opt namespacing CLs by removing the
unnecessary opt:: and opt::analysis:: namespace prefixes.
2018-07-12 15:14:43 -04:00
dan sinclair
e6b953361d
Move the ir namespace to opt. (#1680)
This CL moves the files in opt/ to consistenly be under the opt::
namespace. This frees up the ir:: namespace so it can be used to make a
shared ir represenation.
2018-07-09 11:32:29 -04:00
dan sinclair
3dad1cda11
Change libspirv to spvtools namespace (#1678)
This CL changes all of the libspirv namespace code to spvtools to match
the rest of the code base.
2018-07-07 09:38:00 -04:00
Steven Perron
1f7b1f1bf7 Small vector optimization for operands.
We replace the std::vector in the Operand class by a new class that does
a small size optimization.  This helps improve compile time on Windows.

Tested on three sets of shaders.  Trying various values for the small
vector.  The optimal value for the operand class was 2.  However, for
the Instruction class, using an std::vector was optimal.  Size of "0"
means that an std::vector was used.

                Instruction size
	        0      4      8
Operand Size

0               489    544    684
1               593    487
2               469    570
4               473
8               505

This is a single thread run of ~120 shaders.  For the multithreaded run
the results were the similar.  The basline time was ~62sec.  The
optimal configuration was an 2 for the OperandData and an
std::vector for the OperandList with a compile time of ~38sec.  Similar
expiriments were done with other sets of shaders.  The compile time still
improved, but not as much.

Contributes to https://github.com/KhronosGroup/SPIRV-Tools/issues/1609.
2018-06-12 13:41:08 -04:00
Steven Perron
af430ec822 Add pass to fold a load feeding an extract.
We have already disabled common uniform elimination because it created
sequences of loads an entire uniform object, then we extract just a
single element.  This caused problems in some drivers, and is just
generally slow because it loads more memory than needed.

However, there are other way to get into this situation, so I've added
a pass that looks specifically for this pattern and removes it when only
a portion of the load is used.

Fixes #1547.
2018-05-14 15:40:34 -04:00
Steven Perron
c4dc046399 Copy propagate arrays
The sprir-v generated from HLSL code contain many copyies of very large
arrays.  Not only are these time consumming, but they also cause
problems for drivers because they require too much space.

To work around this, we will implement an array copy propagation.  Note
that we will not implement a complete array data flow analysis in order
to implement this.  We will be looking for very simple cases:

1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.

The hard part is keeping all of the types correct.  We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.

Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent.  With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values.  See |Struct::GetExtraHashWords|.

Contributes to #1416.
2018-03-26 14:44:41 -04:00
Victor Lomuller
bdf421cf40 Add loop peeling utility
The loop peeler util takes a loop as input and create a new one before.
The iterator of the duplicated loop then set to accommodate the number
of iteration required for the peeling.

The loop peeling pass that decided to do the peeling and profitability
analysis is left for a follow-up PR.
2018-03-20 10:21:10 -04:00
Victor Lomuller
3497a94460 Add loop unswitch pass.
It moves all conditional branching and switch whose conditions are loop
invariant and uniform. Before performing the loop unswitch we check that
the loop does not contain any instruction that would prevent it
(barriers, group instructions etc.).
2018-02-27 08:52:46 -05:00
Stephen McGroarty
e354984b09 Unroller support for multiple induction variables
Support for multiple induction variables within a loop and support for
loop condition operands <= and >=.
2018-02-27 11:50:08 +00:00
Stephen McGroarty
dd8400e150 Initial support for loop unrolling.
This patch adds initial support for loop unrolling in the form of a
series of utility classes which perform the unrolling. The pass can
be run with the command spirv-opt --loop-unroll. This will unroll
loops within the module which have the unroll hint set. The unroller
imposes a number of requirements on the loops it can unroll. These are
documented in the comments for the LoopUtils::CanPerformUnroll method in
loop_utils.h. Some of the restrictions will be lifted in future patches.
2018-02-14 15:44:38 -05:00
Steven Perron
bc1ec9418b Add general folding infrastructure.
Create the folding engine that will

1) attempt to fold an instruction.
2) iterates on the folding so small folding rules can be easily combined.
3) insert new instructions when needed.

I've added the minimum number of rules needed to test the features above.
2018-02-02 12:24:11 -05:00
Victor Lomuller
50e85c865c Add LoopUtils class to gather some loop transformation support.
This patch adds LoopUtils class to handle some loop related transformations. For now it has 2 transformations that simplifies other transformations such as loop unroll or unswitch:
 - Dedicate exit blocks: this ensure that all exit basic block
   (out-of-loop basic blocks that have a predecessor in the loop)
   have all their predecessors in the loop;
 - Loop Closed SSA (LCSSA): this ensure that all definitions in a loop are used inside the loop
   or in a phi instruction in an exit basic block.

It also adds the following capabilities:
 - Loop::IsLCSSA to test if the loop is in a LCSSA form
 - Loop::GetOrCreatePreHeaderBlock that can build a loop preheader if required;
 - New methods to allow on the fly updates of the loop descriptors.
 - New methods to allow on the fly updates of the CFG analysis.
 - Instruction::SetOperand to allow expression of the index relative to Instruction::NumOperands (to be compatible with the index returned by DefUseManager::ForEachUse)
2018-02-01 15:35:09 -05:00
Alan Baker
2735e0851e Remove constexpr from Analysis operators
* Had to remove templating from InstructionBuilder as a result
 * now preserved analyses are specified as a constructor argument
* updated tests and uses
* changed static_assert to a runtime assert
 * this should probably get further changes in the future
2018-01-31 14:44:43 -05:00
Alan Baker
2e93e806e4 Initial implementation of if conversion
* Handles simple cases only
* Identifies phis in blocks with two predecessors and attempts to
convert the phi to an select
 * does not perform code motion currently so the converted values must
 dominate the join point (e.g. can't be defined in the branches)
 * limited for now to two predecessors, but can be extended to handle
 more cases
* Adding if conversion to -O and -Os
2018-01-25 09:42:00 -08:00
Victor Lomuller
cf3b2a58c4 Introduce an instruction builder helper class.
The class factorize the instruction building process.
Def-use manager analysis can be updated on the fly to maintain coherency.
To be updated to take into account more analysis.
2018-01-19 10:17:45 -05:00