Commit Graph

1467 Commits

Author SHA1 Message Date
Steven Perron
c4dc046399 Copy propagate arrays
The sprir-v generated from HLSL code contain many copyies of very large
arrays.  Not only are these time consumming, but they also cause
problems for drivers because they require too much space.

To work around this, we will implement an array copy propagation.  Note
that we will not implement a complete array data flow analysis in order
to implement this.  We will be looking for very simple cases:

1) The source must never be stored to.
2) The target must be stored to exactly once.
3) The store to the target must be a store to the entire array, and be a
copy of the entire source.
4) All loads of the target must be dominated by the store.

The hard part is keeping all of the types correct.  We do not want to
have to do too large a search to update everything, which may not be
possible, do we give up if we see any instruction that might be hard to
update.

Also in types.h, the element decorations are not stored in an std::map.
This change was done so the hashing algorithm for a Struct is
consistent.  With the std::unordered_map, the traversal order was
non-deterministic leading to the same type getting hashed to different
values.  See |Struct::GetExtraHashWords|.

Contributes to #1416.
2018-03-26 14:44:41 -04:00
Andrew Woloszyn
0a8b6a96e1 Replace an undefined double->float cast with infinity.
This was caught by UBSan. The given double would overflow
the underlying float, which is undefined. Instead test
with an explicit float::infinity.
2018-03-26 13:15:22 -04:00
Andrey Tuganov
9cf87ecbc8 Add Vulkan specific atomic result type restriction
Atomic instructions must declare a scalar 32-bit integer type for the “Result Type”.
2018-03-26 12:06:25 -04:00
Andrey Tuganov
fe9121f721 Add Vulkan validation rules for BuiltIn variables
Added a framework for validation of BuiltIn variables. The framework
allows implementation of flexible abstract rules which are required for
built-ins as the information (decoration, definition, reference) is not
in one place, but is scattered all over the module.

Validation rules are implemented as a map
id -> list<functor(instrution)>

Ids which are dependent on built-in types or objects receive a task
list, such as "this id cannot be referenced from function which is
called from entry point with execution model X; propagate this rule
to your descendants in the global scope".

Also refactored test/val/val_fixtures.

All built-ins covered by tests
2018-03-23 14:02:42 -04:00
Eleni Maria Stea
045cc8f75b Fixes compile errors generated with -Wpedantic
This patch fixes the compile errors generated when the options
SPIRV_WARN_EVERYTHING and SPIRV_WERROR (that force -Wpedantic) are
set to cmake.
2018-03-22 09:40:11 -04:00
Steven Perron
dbb35c4260 Fixed remaining review comments from #1380 2018-03-21 16:47:01 -04:00
Diego Novillo
2e644e4578 Fix VS2013 build failures. 2018-03-20 21:44:17 -04:00
Jaebaek Seo
3b594e1630 Add --time-report to spirv-opt
This patch adds a new option --time-report to spirv-opt.  For each pass
executed by spirv-opt, the flag prints resource utilization for the pass
(CPU time, wall time, RSS and page faults)

This fixes issue #1378
2018-03-20 21:30:06 -04:00
Diego Novillo
735d8a579e SSA rewrite pass.
This pass replaces the load/store elimination passes.  It implements the
SSA re-writing algorithm proposed in

     Simple and Efficient Construction of Static Single Assignment Form.
     Braun M., Buchwald S., Hack S., Leißa R., Mallon C., Zwinkau A. (2013)
     In: Jhala R., De Bosschere K. (eds)
     Compiler Construction. CC 2013.
     Lecture Notes in Computer Science, vol 7791.
     Springer, Berlin, Heidelberg

     https://link.springer.com/chapter/10.1007/978-3-642-37051-9_6

In contrast to common eager algorithms based on dominance and dominance
frontier information, this algorithm works backwards from load operations.

When a target variable is loaded, it queries the variable's reaching
definition.  If the reaching definition is unknown at the current location,
it searches backwards in the CFG, inserting Phi instructions at join points
in the CFG along the way until it finds the desired store instruction.

The algorithm avoids repeated lookups using memoization.

For reducible CFGs, which are a superset of the structured CFGs in SPIRV,
this algorithm is proven to produce minimal SSA.  That is, it inserts the
minimal number of Phi instructions required to ensure the SSA property, but
some Phi instructions may be dead
(https://en.wikipedia.org/wiki/Static_single_assignment_form).
2018-03-20 20:56:55 -04:00
Victor Lomuller
bdf421cf40 Add loop peeling utility
The loop peeler util takes a loop as input and create a new one before.
The iterator of the duplicated loop then set to accommodate the number
of iteration required for the peeling.

The loop peeling pass that decided to do the peeling and profitability
analysis is left for a follow-up PR.
2018-03-20 10:21:10 -04:00
Steven Perron
b3daa93b46 Change merge return pass to handle structured cfg.
We are seeing shaders that have multiple returns in a functions.  These
functions must get inlined for legalization purposes; however, the
inliner does not know how to inline functions that have multiple
returns.

The solution we will go with it to improve the merge return pass to
handle structured control flow.

Note that the merge return pass will assume the cfg has been cleanedup
by dead branch elimination.

Fixes #857.
2018-03-19 13:49:04 -04:00
Lei Zhang
1ef6b19260 Migrate to use unified grammar tables
Previously we keep a separate static grammar table for opcodes/
operands per SPIR-V version. This commit changes that to use a
single unified static grammar table for opcodes/operands.

This essentially changes how grammar facts are queried against
a certain target environment. There are only limited filtering
according to the desired target environment; a symbol is
considered as available as long as:

1. The target environment satisfies the minimal requirement of
   the symbol; or
2. There is at least one extension enabling this symbol.

Note that the second rule assumes the extension enabling the
symbol is indeed requested in the SPIR-V code; checking that
should be the validator's work.

Also fixed a few grammar related issues:
* Rounding mode capability requirements are moved to client APIs.
* Reserved symbols not available in any extension is no longer
  recognized by assembler.
2018-03-17 15:25:26 -04:00
David Neto
844e186cf7 Add --strip-reflect pass
Strips reflection info. This is limited to decorations and
decoration instructions related to the SPV_GOOGLE_hlsl_functionality1
extension.
It will remove the OpExtension for SPV_GOOGLE_hlsl_functionality1.
It will also remove the OpExtension for SPV_GOOGLE_decorate_string
if there are no further remaining uses of OpDecorateStringGOOGLE.

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1398
2018-03-15 21:20:42 -04:00
David Neto
2e3aec23ca Add recent Google extensions to optimizer whitelists
Optimizations should work in the presence of recent
SPV_GOOGLE_decorate_string and SPV_GOOGLE_hlsl_functionality1

SPV_GOOGLE_decorate_string:
- Adds operation OpDecorateStringGOOGLE to decorate an object with decorations
  having string operands.

SPV_GOOGLE_hlsl_functionality1:
- Adds HlslSemanticGOOGLE, used to decorate an interface variable with
  an HLSL semantic string.  Optimizations already preserve those variables
  as required because they are interface variables (with uses), independent
  of whether they have HLSL decorations.

- Adds HlslCounterBufferGOOGLE, used to associate a buffer with a
  counter variable.

Fixes #1391
2018-03-15 11:16:20 -04:00
Alan Baker
9f3a1c85cc NFC: Speed up dead insert phi traversal on Windows. 2018-03-14 17:45:47 -04:00
David Neto
884933366b Teach DecorationManager about OpDecorateStringGOOGLE
Also add more decoration manager test coverage for OpDecorateId.

Fixes #1396
2018-03-13 22:18:33 -04:00
Alan Baker
7e03e76a5f Fixes #1402. Don't merge non-branch terminators into loop header.
Added tests
2018-03-13 22:16:17 -04:00
Alan Baker
43d1609183 Fixes #1407. Removing assertion against void pointer
Added test
2018-03-13 19:45:20 -04:00
Alan Baker
4065adf05d Fixes #1404. Don't DCE workgroup size
Added test.
2018-03-13 19:38:31 -04:00
Greg Fischer
077249b67f Fix InsertFeedingExtract rule when extract remains. 2018-03-12 22:06:23 -04:00
Pierre Moreau
5bd55f10cd Reimplement the DecorationManager
This reimplementation fixes several issues when removing decorations associated
to an ID (partially addresses #1174 and gives tools for fixing #898), as well
as making it easier to remove groups; a few additional tests have been added.

DecorationManager::RemoveDecoration() will still not delete dead decorations it
created, but I do not think it is its job either; given the following input

```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
OpDecorate %2 Restrict
%2      = OpDecorationGroup
OpGroupDecorate %2 %1 %3
OpDecorate %4 Invariant
%4      = OpDecorationGroup
OpGroupDecorate %4 %2
%uint   = OpTypeInt 32 0
%1      = OpVariable %uint Uniform
%3      = OpVariable %uint Uniform
```

which of the following two outputs would you expect RemoveDecoration(2) to produce:

```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
%uint = OpTypeInt 32 0
%1 = OpVariable %uint Uniform
%3 = OpVariable %uint Uniform
```

or

```
OpCapability Shader
OpCapability Linkage
OpMemoryModel Logical GLSL450
OpDecorate %4 Invariant
%4      = OpDecorationGroup
%uint   = OpTypeInt 32 0
%1      = OpVariable %uint Uniform
%3      = OpVariable %uint Uniform
```

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/924
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1174
2018-03-12 09:56:14 -04:00
David Neto
340370eddb Remove extension whitelist from some transforms
Remove extension whitelists from transforms that are essentially
combinatorial (and avoiding pointers) or which affect only control flow.
It's very very unlikely an extension will add a new control flow construct.

Remove from:
- dead branch elimination
- dead insertion elimination
- insert extract elimination
- block merge

Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1392
2018-03-08 12:25:49 -05:00
Rex Xu
314cfa29b2 Add missing SPV extension strings 2018-03-08 21:54:00 +08:00
David Neto
ac43466853 Start v2018.3 development 2018-03-07 17:13:18 -05:00
David Neto
8d8a71278b Finalize v2018.2 2018-03-07 17:11:50 -05:00
David Neto
a0da44efbc Update CHANGES 2018-03-07 17:10:13 -05:00
Alan Baker
bc9cfee6fa Fixes #1385. Grab correct input to calculate indices.
* Added tests to catch the bug
2018-03-07 16:07:40 -05:00
Andrey Tuganov
03b8a3fe54 AMD_gpu_shader_half_float enables float16
Fixes https://github.com/KhronosGroup/SPIRV-Tools/issues/1375

Hardcoded float16 feature enabling if extension
SPV_AMD_gpu_shader_half_float is present.
2018-03-07 11:07:58 -05:00
David Neto
01f32ee001 Update README for SPIR-V 1.3 2018-03-06 15:17:31 -05:00
David Neto
00fa39318f Support SPIR-V 1.3 and Vulkan 1.1
The default target is SPIR-V 1.3.

For example, spirv-as will generate a SPIR-V 1.3 binary by default.
Use command line option "--target-env spv1.0" if you want to make a SPIR-V
1.0 binary or validate against SPIR-V 1.0 rules.

Example:
        # Generate a SPIR-V 1.0 binary instead of SPIR-V 1.3
	spirv-as --target-env spv1.0 a.spvasm -o a.spv
	spirv-as --target-env vulkan1.0 a.spvasm -o a.spv

        # Validate as SPIR-V 1.0.
	spirv-val --target-env spv1.0 a.spv
        # Validate as Vulkan 1.0
	spirv-val --target-env vulkan1.0 a.spv
2018-03-06 15:17:31 -05:00
Alan Baker
5f50e6209c Fixes #1376. Don't handle half folding gracefully.
* Added early returns to folding rules to prevent half attempts
* Added some tests
2018-03-06 14:00:02 -05:00
David Neto
5f69f75126 Support SPV_GOOGLE_decorate_string and SPV_GOOGLE_hlsl_functionality1
This commit add assembling, disassembling, and basic validation for two
Google extensions to better support HLSL translation.
2018-03-05 13:34:13 -05:00
Steven Perron
9ba50e34f2 Avoid generating duplicate names when merging types
The merging types we do not remove other information related to the
types.  We simply leave it duplicated, and hope it is removed later.
This is what happens with decorations.  They are removed in the next
phase of remove duplicates.  However, for OpNames that is not the case.
We end up with two different names for the same id, which does not make
sense.

The solution is to remove the names and decorations for the type being
removed instead of rewriting them to refer to the other type.

Note that it is possible that if the first type does not have a name,
then the types will end up with no name.  That is fine because the names
should not have any semantic significance anyway.

The was identified in issue #1372, but this does not fix that issue.
2018-03-05 12:02:50 -05:00
Pierre Moreau
6cd6e5ebef Define Disassemble only when Effcee is used in fold_test 2018-03-02 16:40:52 -05:00
David Neto
a942ff40f5 Android.mk: Generate enum mappings from unified1 grammar
Some tokens are only showing up in the unified1 grammar.
So enum string mappings have to be generated from that grammar, not
the grammar from the (deprecated) include/spirv/1.2 grammar.

Example: capabilities FragmentFullyCovered, Float16ImageAMD
2018-03-02 15:27:22 -05:00
David Neto
fe21921629 Start v2018.2 development 2018-03-02 14:30:49 -05:00
David Neto
6432a129f8 Finalize v2018.1 2018-03-02 14:30:49 -05:00
Alan Baker
52bceb3569 Handles more cases of redundant selects
* Handles OpConstantNull and vector types
 * vector selects (except against a null) are converted to vector
 shuffles
* Added tests
2018-03-02 14:28:08 -05:00
David Neto
a7cec7843c Update CHANGES 2018-03-02 14:01:28 -05:00
Alan Baker
824625760b Fixes #1361. Mark all non-constant global values as varying in CCP
* Also mark function parameters as varying
* Conservatively mark assignment instructions as varying if any input is
varying after attempting to fold
* Added a test to catch this case
2018-03-01 15:24:41 -05:00
Arseny Kapoulkine
8b27ba834d Vulkan BuiltIn variables can't have Location/Component decorations
As per Vulkan spec, BuiltIn variables can't have Location or Component
decorations. On some drivers, these can lead to driver crashing when
compiling the shader pipeline; for example, NVidia/AMD desktop drivers:
https://github.com/KhronosGroup/glslang/issues/1182.

This change adds validation and tests to catch this.
2018-03-01 15:00:08 -05:00
Alan Baker
ce5941a642 Fixes #1357. Support null constants better in folding
* getFloatConstantKind() now handles OpConstantNull
* PerformOperation() now handles OpConstantNull for vectors
* Fixed some instances where we would attempt to merge a division by 0
* added tests
2018-02-28 23:12:27 -05:00
GregF
bdaf8d56fb Opt: Add constant folding for FToI and IToF 2018-02-28 23:08:52 -05:00
Alan Baker
9457cabbce Fixes #1354. Do not merge integer division.
* Removes merging of div with a div or mul for integers
* Updated tests
2018-02-28 13:33:21 -05:00
Steven Perron
588f4fcc95 Add more folding rules for vector shuffle.
Adds rule to fold OpVectorShuffle with constant inputs.

Adds rules to fold OpCompositeExtrac being fed by an OpVectorShuffle.
2018-02-27 21:20:22 -05:00
Victor Lomuller
90e1637ce4 Remove Function::GetBlocks pushed by accident 2018-02-27 21:07:10 -05:00
Steven Perron
2cb589cc14 Remove uses DCEInst and call ADCE
The algorithm used in DCEInst to remove dead code is very slow.  It is
fine if you only want to remove a small number of instructions, but, if
you need to remove a large number of instructions, then the algorithm in
ADCE is much faster.

This PR removes the calls to DCEInst in the load-store removal passes
and adds a pass of ADCE afterwards.

A number of different iterations of the order of optimization, and I
believe this is the best I could find.

The results I have on 3 sets of shaders are:

Legalization:

Set 1: 5.39 -> 5.01
Set 2: 13.98 -> 8.38
Set 3: 98.00 -> 96.26

Performance passes:

Set 1: 6.90 -> 5.23
Set 2: 10.11 -> 6.62
Set 3: 253.69 -> 253.74

Size reduction passes:

Set 1: 7.16 -> 7.25
Set 2: 17.17 -> 16.81
Set 3: 112.06 -> 107.71

Note that the third set's compile time is large because of the large
number of basic blocks, not so much because of the number of
instructions.  That is why we don't see much gain there.
2018-02-27 21:06:08 -05:00
David Neto
0c13467161 Consistently include latest spirv.h header file.
Use indirection through latest_version_spirv.h

Also, when generating enum tables, use the unified1 JSON grammar since
it now has FragmentFullyCoveredEXT but the other JSON grammars don't.
They are starting to fall behind.
2018-02-27 18:47:29 -05:00
Alan Baker
802cf053c7 Merge arithmetic with non-trivial constant operands
Adding basis of arithmetic merging

* Refactored constant collection in ConstantManager
* New rules:
 * consecutive negates
 * negate of arithmetic op with a constant
 * consecutive muls
 * reciprocal of div

* Removed IRContext::CanFoldFloatingPoint
 * replaced by Instruction::IsFloatingPointFoldingAllowed
* Fixed some bad tests
* added some header comments

Added PerformIntegerOperation

* minor fixes to constants and tests
* fixed IntMultiplyBy1 to work with 64 bit ints
* added tests for integer mul merging

Adding test for vector integer multiply merging

Adding support for merging integer add and sub through negate

* Added tests

Adding rules to merge mult with preceding divide

* Has a couple tests, but needs more
* Added more comments

Fixed bug in integer division folding

* Will no longer merge through integer division if there would be a
remainder in the division
* Added a bunch more tests

Adding rules to merge divide and multiply through divide

* Improved comments
* Added tests

Adding rules to handle mul or div of a negation

* Added tests

Changes for review

* Early exit if no constants are involved in more functions
* fixed some comments
* removed unused declaration
* clarified some logic

Adding new rules for add and subtract

* Fold adds of adds, subtracts or negates
* Fold subtracts of adds, subtracts or negates
* Added tests
2018-02-27 13:02:13 -05:00
Stephen McGroarty
20b8cdb7c6 Make IR builder use the type manager for constants
This change makes the IR builder use the type manager to generate
OpTypeInts when creating OpConstants. This avoids dangling references
being stored by the created OpConstants.
2018-02-27 12:59:26 -05:00