SPIRV-Tools/syntax.md

# SPIR-V Assembly language syntax

## Overview

The assembly attempts to adhere the binary form as closely as possible
using text names from section 3 of the SPIR-V spec.

Here is an example:

```
OpCapability Shader
OpMemoryModel Logical Simple
OpEntryPoint GLCompute %3 "main"
OpExecutionMode %3 LocalSize 64 64 1
OpTypeVoid %1
OpTypeFunction %2 %1
OpFunction %1 %3 None %2
OpLabel %4
OpReturn
OpFunctionEnd
```

A module is a sequence of instructions, separated by whitespace.
An instruction is an opcode name followed by operands, separated by
whitespace.  Typically each instruction is presented on its own line,
but the assembler does not enforce this rule.

The opcode names and expected operands are described in section 3 of
the SPIR-V specification.  An operand is one of:
* a literal integer: A decimal integer, or a hexadecimal integer
  (indicated by a leading `0x`).
* a literal floating point number.
* a literal string, surrounded by double-quotes ("). TODO: describe quoting and
escaping rules.
* a named enumerated value, specific to that operand position.  For example,
the `OpMemoryModel` takes a named Addressing Model operand (e.g. `Logical` or
`Physical32`), and a named Memory Model operand (e.g. `Simple` or `OpenCL`).
Named enumerated values are only meaningful in specific positions, and will
otherwise generate an error.
* a mask expression, consisting of one or more mask enum names separated
  by `|`.  For example, the expression `NotNaN|NotInf|NSZ` denotes the mask
  which is the combination of the `NotNaN`, `NotInf`, and `NSZ` flags.
  (This is supported by the assembler but not yet by the disassembler.
  TODO(dneto): Add disassembler support for emitting mask expressions.)
* an injected immediate integer: `!<integer>`.  See [below](#immediate).

## Assignment-oriented Assembly Form
<a name="assignment-form"></a>
The description and examples from above describe the Canonical Assembly
Form for SPIR-V assembly language.

We also define the Assignment-oriented Assembly Form, aimed at improving
the text's readability.  In AAF, the `<result-id>` generated by an
instruction is moved to the beginning of that instruction and followed by
an `=` sign.  This allows us to distinguish between variable definitions
and uses and locate value definitions more easily.  So, the above example
can also be written as:

```
     OpCapability Shader
     OpMemoryModel Logical Simple
     OpEntryPoint GLCompute %3 "main"
     OpExecutionMode %3 LocalSize 64 64 1
%1 = OpTypeVoid
%2 = OpTypeFunction %1
%3 = OpFunction %1 None %2
%4 = OpLabel
     OpReturn
     OpFunctionEnd
```

## ID Definitions & Usage

An ID definition pertains to the `<result-id>` of an OpCode, and ID usage is any
input to an OpCode. All IDs are prefixed with `%`. To differentiate between
defs and uses, we suggest using the second format shown in the above example.

## Named IDs

The assembler also supports named IDs, or virtual IDs, which greatly improves
the readability of the assembly. The same ID definition and usage prefixes
apply. Names must begin with an character in the range `[a-z|A-Z]`. The
following example will result in identical SPIR-V binary as the example above.

```
          OpCapability Shader
          OpMemoryModel Logical Simple
          OpEntryPoint GLCompute %main "main"
          OpExecutionMode %main LocalSize 64 64 1
  %void = OpTypeVoid
%fnMain = OpTypeFunction %void
  %main = OpFunction %void None %fnMain
%lbMain = OpLabel
          OpReturn
          OpFunctionEnd
```

## Arbitrary Integers
<a name="immediate"></a>

*Warning: Not all of the following has been implemented*

When writing tests it can be useful to emit an invalid 32 bit word into the
binary stream at arbitrary positions within the assembly. To specify an
arbitrary word into the stream the prefix `!` is used, this takes the form
`!<integer>`. Here is an example.

```
OpCapability !0x0000FF00
```

Any word in a valid assembly program may be replaced by `!<integer>` -- even
words that dictate how the rest of the instruction is parsed.  Consider, for
example, the following assembly program:

```
%4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33
OpExecutionMode %3 InputLines
```

The words `OpConstant`, `LocalSize`, and `InputLines` may be replaced by random
`!<integer>` values, and the assembler will still assemble an output binary with
three instructions.  It will not necessarily be valid SPIR-V, but it will
faithfully reflect the input text.

You may wonder how the assembler recognizes the instruction structure (including
instruction boundaries) in the text with certain crucial words replaced by
arbitrary integers.  If, say, `OpConstant` becomes a `!<integer>` whose value
differs from the binary representation of `OpConstant` (remember that this
feature is intended for fine-grain control in SPIR-V testing), the assembler
generally has no idea what that value stands for.  So how does it know there is
exactly one `<id>` and three number literals following in that instruction,
before the next one begins?  And if `LocalSize` is replaced by an arbitrary
`!<integer>`, how does it know to take the next three words (instead of zero or
one, both of which are possible in the absence of certainty that `LocalSize`
provided)?  The answer is a simple rule governing the parsing of instructions
with `!<integer>` in them:

When a word in the assembly program is a `!<integer>`, that integer value is
emitted into the binary output, and parsing proceeds differently than before:
each subsequent word not recognized as an OpCode is emitted into the binary
output without any checking; when a recognizable OpCode is eventually
encountered, it begins a new instruction and parsing returns to normal.  (If a
subsequent OpCode is never found, then this alternate parsing mode handles all
the remaining words in the program.  If a subsequent OpCode is in an
[assignment form](#assignment-form), the ID preceding it begins a new
instruction.)

The assembler processes the words encountered in alternate parsing mode as
follows:

* If the word is a number literal, it outputs that number as one or more words,
  as defined in the SPIR-V specification for Literal Number.
* If the word is a string literal, it outputs a sequence of words representing
  the string as defined in the SPIR-V specification for Literal String.
* If the word is an ID, it outputs the ID's internal number.  If no such number
  exists yet, a unique new one will be generated.  (Uniqueness is at the
  translation-unit level: no other ID in the same translation unit will have the
  same number.)
* If the word is another `!<integer>`, it outputs that integer.
* Any other word causes the assembler to quit with an error.

Note that this has some interesting consequences, including:

* When an OpCode is replaced by `!<integer>`, the integer value should encode
  the instruction's word count, as specified in the physical-layout section of
  the SPIR-V specification.

* Consecutive instructions may have their OpCode replaced by `!<integer>` and
  still produce valid SPIR-V.  For example, `!262187 %1 %2 "abc" !327739 %1 %3 6
  %2` will successfully assemble into SPIR-V declaring a constant and a
  PrivateGlobal variable.

* Enums (such as `DontInline` or `SubgroupMemory`, for instance) are not handled
  by the alternate parsing mode.  They must be replaced by `!<integer>` for
  successful assembly.

* The `<result-id>` on the left-hand side of an assignment cannot be
  a`!<integer>`.  But it can be a number prefixed by `%`, which still gives the
  user control over its value.

* The `=` sign cannot be processed by the alternate parsing mode if the OpCode
  following it is a `!<integer>`.

* When replacing a named ID with `!<integer>`, it is possible to generate
  unintentionally valid SPIR-V.  If the integer provided happens to equal a
  number generated for an existing named ID, it will result in a reference to
  that named ID being output.  This may be valid SPIR-V, contrary to the
  presumed intention of the writer.

## Notes

* Some enumerants cannot be used by name, because the target instruction
in which they are meaningful take an ID reference instead of a literal value.
For example:
   * Named enumerated value `CmdExecTime` from section 3.30 Kernel
     Profiling Info is used in constructing a mask value supplied as
     an ID for `OpCaptureEventProfilingInfo`.  But no other instruction
     has enough context to bring the enumerant names from section 3.30
     into scope.
   * Similarly, the names in section 3.29 Kernel Enqueue Flags are used to
     construct a value supplied as an ID to the Flags argument of
     OpEnqueueKernel.
   * Similarly for the names in section 3.25 Memory Semantics.
   * Similarly for the names in section 3.27 Scope.
* Some enumerants cannot be used by name, because they only name values
returned by an instruction:
   * Enumerants from 3.12 Image Channel Order name possible values returned
     by the `OpImageQueryOrder` instruction.
   * Enumerants from 3.13 Image Channel Data Type name possible values
     returned by the `OpImageQueryFormat` instruction.
Move the syntax description into its own file 2015-09-14 17:05:53 +00:00			`# SPIR-V Assembly language syntax`

Clarify the syntax. Some named enumerants are unusable You can't use a named enumerant if it's only meaningful in an operand supplied as an ID to a target instruction. The place where you'd use the name is something like an OpConstant, but there's not enough context to bring those names into scope, unless you're willing to tolerate potential collisions. Occurs for the names in: - 3.25 Memory Semantics - 3.27 Scope ID - 3.29 Kernel Enqueue Flags - 3.30 Kernel Profiling Info 2015-09-14 17:56:45 +00:00			`## Overview`

			`The assembly attempts to adhere the binary form as closely as possible`
			`using text names from section 3 of the SPIR-V spec.`

			`Here is an example:`
Move the syntax description into its own file 2015-09-14 17:05:53 +00:00
			```
			`OpCapability Shader`
			`OpMemoryModel Logical Simple`
			`OpEntryPoint GLCompute %3 "main"`
			`OpExecutionMode %3 LocalSize 64 64 1`
			`OpTypeVoid %1`
			`OpTypeFunction %2 %1`
			`OpFunction %1 %3 None %2`
			`OpLabel %4`
			`OpReturn`
			`OpFunctionEnd`
			```

Clarify the syntax. Some named enumerants are unusable You can't use a named enumerant if it's only meaningful in an operand supplied as an ID to a target instruction. The place where you'd use the name is something like an OpConstant, but there's not enough context to bring those names into scope, unless you're willing to tolerate potential collisions. Occurs for the names in: - 3.25 Memory Semantics - 3.27 Scope ID - 3.29 Kernel Enqueue Flags - 3.30 Kernel Profiling Info 2015-09-14 17:56:45 +00:00			`A module is a sequence of instructions, separated by whitespace.`
			`An instruction is an opcode name followed by operands, separated by`
			`whitespace. Typically each instruction is presented on its own line,`
			`but the assembler does not enforce this rule.`

			`The opcode names and expected operands are described in section 3 of`
			`the SPIR-V specification. An operand is one of:`
Document the syntax of mask expressions 2015-09-17 18:02:11 +00:00			`* a literal integer: A decimal integer, or a hexadecimal integer`
			(indicated by a leading `0x`).
Clarify the syntax. Some named enumerants are unusable You can't use a named enumerant if it's only meaningful in an operand supplied as an ID to a target instruction. The place where you'd use the name is something like an OpConstant, but there's not enough context to bring those names into scope, unless you're willing to tolerate potential collisions. Occurs for the names in: - 3.25 Memory Semantics - 3.27 Scope ID - 3.29 Kernel Enqueue Flags - 3.30 Kernel Profiling Info 2015-09-14 17:56:45 +00:00			`* a literal floating point number.`
			`* a literal string, surrounded by double-quotes ("). TODO: describe quoting and`
			`escaping rules.`
			`* a named enumerated value, specific to that operand position. For example,`
			the `OpMemoryModel` takes a named Addressing Model operand (e.g. `Logical` or
			`Physical32`), and a named Memory Model operand (e.g. `Simple` or `OpenCL`).
			`Named enumerated values are only meaningful in specific positions, and will`
			`otherwise generate an error.`
Document the syntax of mask expressions 2015-09-17 18:02:11 +00:00			`* a mask expression, consisting of one or more mask enum names separated`
			by `\|`. For example, the expression `NotNaN\|NotInf\|NSZ` denotes the mask
			which is the combination of the `NotNaN`, `NotInf`, and `NSZ` flags.
			`(This is supported by the assembler but not yet by the disassembler.`
			`TODO(dneto): Add disassembler support for emitting mask expressions.)`
Clarify the syntax. Some named enumerants are unusable You can't use a named enumerant if it's only meaningful in an operand supplied as an ID to a target instruction. The place where you'd use the name is something like an OpConstant, but there's not enough context to bring those names into scope, unless you're willing to tolerate potential collisions. Occurs for the names in: - 3.25 Memory Semantics - 3.27 Scope ID - 3.29 Kernel Enqueue Flags - 3.30 Kernel Profiling Info 2015-09-14 17:56:45 +00:00			* an injected immediate integer: `!<integer>`. See [below](#immediate).

			`## Assignment-oriented Assembly Form`
			`<a name="assignment-form"></a>`
			`The description and examples from above describe the Canonical Assembly`
			`Form for SPIR-V assembly language.`

			`We also define the Assignment-oriented Assembly Form, aimed at improving`
			the text's readability. In AAF, the `<result-id>` generated by an
			`instruction is moved to the beginning of that instruction and followed by`
			an `=` sign. This allows us to distinguish between variable definitions
			`and uses and locate value definitions more easily. So, the above example`
			`can also be written as:`
Move the syntax description into its own file 2015-09-14 17:05:53 +00:00
			```
			`OpCapability Shader`
			`OpMemoryModel Logical Simple`
			`OpEntryPoint GLCompute %3 "main"`
			`OpExecutionMode %3 LocalSize 64 64 1`
			`%1 = OpTypeVoid`
			`%2 = OpTypeFunction %1`
			`%3 = OpFunction %1 None %2`
			`%4 = OpLabel`
			`OpReturn`
			`OpFunctionEnd`
			```

			`## ID Definitions & Usage`

			An ID definition pertains to the `<result-id>` of an OpCode, and ID usage is any
			input to an OpCode. All IDs are prefixed with `%`. To differentiate between
			`defs and uses, we suggest using the second format shown in the above example.`

			`## Named IDs`

			`The assembler also supports named IDs, or virtual IDs, which greatly improves`
			`the readability of the assembly. The same ID definition and usage prefixes`
			apply. Names must begin with an character in the range `[a-z\|A-Z]`. The
			`following example will result in identical SPIR-V binary as the example above.`

			```
			`OpCapability Shader`
			`OpMemoryModel Logical Simple`
			`OpEntryPoint GLCompute %main "main"`
			`OpExecutionMode %main LocalSize 64 64 1`
			`%void = OpTypeVoid`
			`%fnMain = OpTypeFunction %void`
			`%main = OpFunction %void None %fnMain`
			`%lbMain = OpLabel`
			`OpReturn`
			`OpFunctionEnd`
			```

			`## Arbitrary Integers`
Clarify the syntax. Some named enumerants are unusable You can't use a named enumerant if it's only meaningful in an operand supplied as an ID to a target instruction. The place where you'd use the name is something like an OpConstant, but there's not enough context to bring those names into scope, unless you're willing to tolerate potential collisions. Occurs for the names in: - 3.25 Memory Semantics - 3.27 Scope ID - 3.29 Kernel Enqueue Flags - 3.30 Kernel Profiling Info 2015-09-14 17:56:45 +00:00			`<a name="immediate"></a>`
Move the syntax description into its own file 2015-09-14 17:05:53 +00:00
			`Warning: Not all of the following has been implemented`

			`When writing tests it can be useful to emit an invalid 32 bit word into the`
			`binary stream at arbitrary positions within the assembly. To specify an`
			arbitrary word into the stream the prefix `!` is used, this takes the form
			`!<integer>`. Here is an example.

			```
			`OpCapability !0x0000FF00`
			```

			Any word in a valid assembly program may be replaced by `!<integer>` -- even
			`words that dictate how the rest of the instruction is parsed. Consider, for`
			`example, the following assembly program:`

			```
			`%4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33`
			`OpExecutionMode %3 InputLines`
			```

			The words `OpConstant`, `LocalSize`, and `InputLines` may be replaced by random
			`!<integer>` values, and the assembler will still assemble an output binary with
			`three instructions. It will not necessarily be valid SPIR-V, but it will`
			`faithfully reflect the input text.`

			`You may wonder how the assembler recognizes the instruction structure (including`
			`instruction boundaries) in the text with certain crucial words replaced by`
			arbitrary integers. If, say, `OpConstant` becomes a `!<integer>` whose value
			differs from the binary representation of `OpConstant` (remember that this
			`feature is intended for fine-grain control in SPIR-V testing), the assembler`
			`generally has no idea what that value stands for. So how does it know there is`
			exactly one `<id>` and three number literals following in that instruction,
			before the next one begins? And if `LocalSize` is replaced by an arbitrary
			`!<integer>`, how does it know to take the next three words (instead of zero or
			one, both of which are possible in the absence of certainty that `LocalSize`
			`provided)? The answer is a simple rule governing the parsing of instructions`
			with `!<integer>` in them:

			When a word in the assembly program is a `!<integer>`, that integer value is
			`emitted into the binary output, and parsing proceeds differently than before:`
			`each subsequent word not recognized as an OpCode is emitted into the binary`
			`output without any checking; when a recognizable OpCode is eventually`
			`encountered, it begins a new instruction and parsing returns to normal. (If a`
			`subsequent OpCode is never found, then this alternate parsing mode handles all`
			`the remaining words in the program. If a subsequent OpCode is in an`
Clarify the syntax. Some named enumerants are unusable You can't use a named enumerant if it's only meaningful in an operand supplied as an ID to a target instruction. The place where you'd use the name is something like an OpConstant, but there's not enough context to bring those names into scope, unless you're willing to tolerate potential collisions. Occurs for the names in: - 3.25 Memory Semantics - 3.27 Scope ID - 3.29 Kernel Enqueue Flags - 3.30 Kernel Profiling Info 2015-09-14 17:56:45 +00:00			`[assignment form](#assignment-form), the ID preceding it begins a new`
Forbid !<integer> preceding or succeeding '='. 2015-09-11 19:03:54 +00:00			`instruction.)`
Move the syntax description into its own file 2015-09-14 17:05:53 +00:00
			`The assembler processes the words encountered in alternate parsing mode as`
			`follows:`

			`* If the word is a number literal, it outputs that number as one or more words,`
			`as defined in the SPIR-V specification for Literal Number.`
			`* If the word is a string literal, it outputs a sequence of words representing`
			`the string as defined in the SPIR-V specification for Literal String.`
			`* If the word is an ID, it outputs the ID's internal number. If no such number`
			`exists yet, a unique new one will be generated. (Uniqueness is at the`
			`translation-unit level: no other ID in the same translation unit will have the`
			`same number.)`
			* If the word is another `!<integer>`, it outputs that integer.
			`* Any other word causes the assembler to quit with an error.`

			`Note that this has some interesting consequences, including:`

			* When an OpCode is replaced by `!<integer>`, the integer value should encode
			`the instruction's word count, as specified in the physical-layout section of`
			`the SPIR-V specification.`

			* Consecutive instructions may have their OpCode replaced by `!<integer>` and
			still produce valid SPIR-V. For example, `!262187 %1 %2 "abc" !327739 %1 %3 6
			%2` will successfully assemble into SPIR-V declaring a constant and a
			`PrivateGlobal variable.`

			* Enums (such as `DontInline` or `SubgroupMemory`, for instance) are not handled
			by the alternate parsing mode. They must be replaced by `!<integer>` for
			`successful assembly.`

Forbid !<integer> preceding or succeeding '='. 2015-09-11 19:03:54 +00:00			* The `<result-id>` on the left-hand side of an assignment cannot be
			a`!<integer>`. But it can be a number prefixed by `%`, which still gives the
			`user control over its value.`

Move the syntax description into its own file 2015-09-14 17:05:53 +00:00			* The `=` sign cannot be processed by the alternate parsing mode if the OpCode
			following it is a `!<integer>`.

			* When replacing a named ID with `!<integer>`, it is possible to generate
			`unintentionally valid SPIR-V. If the integer provided happens to equal a`
			`number generated for an existing named ID, it will result in a reference to`
			`that named ID being output. This may be valid SPIR-V, contrary to the`
			`presumed intention of the writer.`
Clarify the syntax. Some named enumerants are unusable You can't use a named enumerant if it's only meaningful in an operand supplied as an ID to a target instruction. The place where you'd use the name is something like an OpConstant, but there's not enough context to bring those names into scope, unless you're willing to tolerate potential collisions. Occurs for the names in: - 3.25 Memory Semantics - 3.27 Scope ID - 3.29 Kernel Enqueue Flags - 3.30 Kernel Profiling Info 2015-09-14 17:56:45 +00:00
			`## Notes`

			`* Some enumerants cannot be used by name, because the target instruction`
			`in which they are meaningful take an ID reference instead of a literal value.`
			`For example:`
			* Named enumerated value `CmdExecTime` from section 3.30 Kernel
			`Profiling Info is used in constructing a mask value supplied as`
			an ID for `OpCaptureEventProfilingInfo`. But no other instruction
			`has enough context to bring the enumerant names from section 3.30`
			`into scope.`
			`* Similarly, the names in section 3.29 Kernel Enqueue Flags are used to`
			`construct a value supplied as an ID to the Flags argument of`
			`OpEnqueueKernel.`
			`* Similarly for the names in section 3.25 Memory Semantics.`
			`* Similarly for the names in section 3.27 Scope.`
Document inability to use names from 3.12, 1.13 You can't use names from 3.12 Image Channel Order and 3.13 Image Channel Data Type since in the intstruction grammar, they are only used as return values, but never named arguments to instructions. 2015-09-16 20:16:22 +00:00			`* Some enumerants cannot be used by name, because they only name values`
			`returned by an instruction:`
			`* Enumerants from 3.12 Image Channel Order name possible values returned`
			by the `OpImageQueryOrder` instruction.
			`* Enumerants from 3.13 Image Channel Data Type name possible values`
			returned by the `OpImageQueryFormat` instruction.