SPIRV-Tools/syntax.md
Dejan Mircevski 114206e0bc Clarify !<integer> parsing.
Implement some outstanding feedback from
Ic29c5a4a8178a62a5a1acad13d02f19cc1307097:

 - use "token" instead of "word" when referring to assembly text

 - specify how the numbers are parsed

Add a test for negative numbers.
2015-10-26 12:55:33 -04:00

9.1 KiB

SPIR-V Assembly language syntax

Overview

The assembly attempts to adhere the binary form as closely as possible using text names from section 3 of the SPIR-V spec.

Here is an example:

OpCapability Shader
OpMemoryModel Logical Simple
OpEntryPoint GLCompute %3 "main"
OpExecutionMode %3 LocalSize 64 64 1
OpTypeVoid %1
OpTypeFunction %2 %1
OpFunction %1 %3 None %2
OpLabel %4
OpReturn
OpFunctionEnd

A module is a sequence of instructions, separated by whitespace. An instruction is an opcode name followed by operands, separated by whitespace. Typically each instruction is presented on its own line, but the assembler does not enforce this rule.

The opcode names and expected operands are described in section 3 of the SPIR-V specification. An operand is one of:

  • a literal integer: A decimal integer, or a hexadecimal integer (indicated by a leading 0x).
  • a literal floating point number.
  • a literal string, surrounded by double-quotes ("). TODO: describe quoting and escaping rules.
  • a named enumerated value, specific to that operand position. For example, the OpMemoryModel takes a named Addressing Model operand (e.g. Logical or Physical32), and a named Memory Model operand (e.g. Simple or OpenCL). Named enumerated values are only meaningful in specific positions, and will otherwise generate an error.
  • a mask expression, consisting of one or more mask enum names separated by |. For example, the expression NotNaN|NotInf|NSZ denotes the mask which is the combination of the NotNaN, NotInf, and NSZ flags. (This is supported by the assembler but not yet by the disassembler. TODO(dneto): Add disassembler support for emitting mask expressions.)
  • an injected immediate integer: !<integer>. See below.
  • an ID, e.g. %foo. See below.

Assignment-oriented Assembly Form

The description and examples from above describe the Canonical Assembly Form for SPIR-V assembly language.

We also define the Assignment-oriented Assembly Form, aimed at improving the text's readability. In AAF, the <result-id> generated by an instruction is moved to the beginning of that instruction and followed by an = sign. This allows us to distinguish between variable definitions and uses and locate value definitions more easily. So, the above example can also be written as:

     OpCapability Shader
     OpMemoryModel Logical Simple
     OpEntryPoint GLCompute %3 "main"
     OpExecutionMode %3 LocalSize 64 64 1
%1 = OpTypeVoid
%2 = OpTypeFunction %1
%3 = OpFunction %1 None %2
%4 = OpLabel
     OpReturn
     OpFunctionEnd

ID Definitions & Usage

An ID definition pertains to the <result-id> of an OpCode, and ID usage is a use of an ID as an input to an OpCode.

An ID in the assembly language begins with % and must be followed by a name consisting of one or more letters, numbers or underscore characters.

For every ID in the assembly program, the assembler generates a unique number called the ID's internal number. Then each ID reference translates into its internal number in the SPIR-V output. Internal numbers are unique within the compilation unit: no two IDs in the same unit will share internal numbers.

The disassembler generates IDs where the name is always a decimal number greater than 0.

          OpCapability Shader
          OpMemoryModel Logical Simple
          OpEntryPoint GLCompute %main "main"
          OpExecutionMode %main LocalSize 64 64 1
  %void = OpTypeVoid
%fnMain = OpTypeFunction %void
  %main = OpFunction %void None %fnMain
%lbMain = OpLabel
          OpReturn
          OpFunctionEnd

Arbitrary Integers

When writing tests it can be useful to emit an invalid 32 bit word into the binary stream at arbitrary positions within the assembly. To specify an arbitrary word into the stream the prefix ! is used, this takes the form !<integer>. Here is an example.

OpCapability !0x0000FF00

Any token in a valid assembly program may be replaced by !<integer> -- even tokens that dictate how the rest of the instruction is parsed. Consider, for example, the following assembly program:

%4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33
OpExecutionMode %3 InputLines

The tokens OpConstant, LocalSize, and InputLines may be replaced by random !<integer> values, and the assembler will still assemble an output binary with three instructions. It will not necessarily be valid SPIR-V, but it will faithfully reflect the input text.

You may wonder how the assembler recognizes the instruction structure (including instruction boundaries) in the text with certain crucial tokens replaced by arbitrary integers. If, say, OpConstant becomes a !<integer> whose value differs from the binary representation of OpConstant (remember that this feature is intended for fine-grain control in SPIR-V testing), the assembler generally has no idea what that value stands for. So how does it know there is exactly one <id> and three number literals following in that instruction, before the next one begins? And if LocalSize is replaced by an arbitrary !<integer>, how does it know to take the next three tokens (instead of zero or one, both of which are possible in the absence of certainty that LocalSize provided)? The answer is a simple rule governing the parsing of instructions with !<integer> in them:

When a token in the assembly program is a !<integer>, that integer value is emitted into the binary output, and parsing proceeds differently than before: each subsequent token not recognized as an OpCode is emitted into the binary output without any checking; when a recognizable OpCode is eventually encountered, it begins a new instruction and parsing returns to normal. (If a subsequent OpCode is never found, then this alternate parsing mode handles all the remaining tokens in the program. If a subsequent OpCode is in an assignment form, the ID preceding it begins a new instruction.)

The assembler processes the tokens encountered in alternate parsing mode as follows:

  • If the token is a number literal, it outputs that number as one or more words, as defined in the SPIR-V specification for Literal Number. The number must fit within the unsigned 32-bit range. All formats supported by strtoul() are accepted.
  • If the token is a string literal, it outputs a sequence of words representing the string as defined in the SPIR-V specification for Literal String.
  • If the token is an ID, it outputs the ID's internal number.
  • If the token is another !<integer>, it outputs that integer.
  • Any other token causes the assembler to quit with an error.

Note that this has some interesting consequences, including:

  • When an OpCode is replaced by !<integer>, the integer value should encode the instruction's word count, as specified in the physical-layout section of the SPIR-V specification.

  • Consecutive instructions may have their OpCode replaced by !<integer> and still produce valid SPIR-V. For example, !262187 %1 %2 "abc" !327739 %1 %3 6 %2 will successfully assemble into SPIR-V declaring a constant and a PrivateGlobal variable.

  • Enums (such as DontInline or SubgroupMemory, for instance) are not handled by the alternate parsing mode. They must be replaced by !<integer> for successful assembly.

  • The <result-id> on the left-hand side of an assignment cannot be a !<integer>. The <result-id> can be still be manually controlled if desired by using the Canonical Assembly Form or by simply expressing the entire instruction as !<integer> tokens for its opcode and operands.

  • The = sign cannot be processed by the alternate parsing mode if the OpCode following it is a !<integer>.

  • When replacing a named ID with !<integer>, it is possible to generate unintentionally valid SPIR-V. If the integer provided happens to equal a number generated for an existing named ID, it will result in a reference to that named ID being output. This may be valid SPIR-V, contrary to the presumed intention of the writer.

Notes

  • Some enumerants cannot be used by name, because the target instruction in which they are meaningful take an ID reference instead of a literal value. For example:
    • Named enumerated value CmdExecTime from section 3.30 Kernel Profiling Info is used in constructing a mask value supplied as an ID for OpCaptureEventProfilingInfo. But no other instruction has enough context to bring the enumerant names from section 3.30 into scope.
    • Similarly, the names in section 3.29 Kernel Enqueue Flags are used to construct a value supplied as an ID to the Flags argument of OpEnqueueKernel.
    • Similarly for the names in section 3.25 Memory Semantics.
    • Similarly for the names in section 3.27 Scope.
  • Some enumerants cannot be used by name, because they only name values returned by an instruction:
    • Enumerants from 3.12 Image Channel Order name possible values returned by the OpImageQueryOrder instruction.
    • Enumerants from 3.13 Image Channel Data Type name possible values returned by the OpImageQueryFormat instruction.