Document the fact that we use names for extended instructions and OpSpecConstantOp opcode operands.
11 KiB
SPIR-V Assembly language syntax
Overview
The assembly attempts to adhere to the binary form from Section 3 of the SPIR-V
spec as closely as possible, with one exception aiming at improving the text's
readability. The <result-id>
generated by an instruction is moved to the
beginning of that instruction and followed by an =
sign. This allows us to
distinguish between variable definitions and uses and locate value definitions
more easily.
Here is an example:
OpCapability Shader
OpMemoryModel Logical Simple
OpEntryPoint GLCompute %3 "main"
OpExecutionMode %3 LocalSize 64 64 1
%1 = OpTypeVoid
%2 = OpTypeFunction %1
%3 = OpFunction %1 None %2
%4 = OpLabel
OpReturn
OpFunctionEnd
A module is a sequence of instructions, separated by whitespace. An instruction is an opcode name followed by operands, separated by whitespace. Typically each instruction is presented on its own line, but the assembler does not enforce this rule.
The opcode names and expected operands are described in Section 3 of the SPIR-V specification. An operand is one of:
- a literal integer: A decimal integer, or a hexadecimal integer.
A hexadecimal integer is indicated by a leading
0x
or0X
. A hex integer supplied for a signed integer value will be sign-extended. For example,0xffff
supplied as the literal for anOpConstant
on a signed 16-bit integer type will be interpreted as the value-1
. - a literal floating point number, in decimal or hexadecimal form. See below.
- a literal string.
- A literal string is everything following a double-quote
"
until the following un-escaped double-quote. This includes special characters such as newlines. - A backslash
\
may be used to escape characters in the string. The\
may be used to escape a double-quote or a\
but is simply ignored when preceding any other character.
- A literal string is everything following a double-quote
- a named enumerated value, specific to that operand position. For example,
the
OpMemoryModel
takes a named Addressing Model operand (e.g.Logical
orPhysical32
), and a named Memory Model operand (e.g.Simple
orOpenCL
). Named enumerated values are only meaningful in specific positions, and will otherwise generate an error. - a mask expression, consisting of one or more mask enum names separated
by
|
. For example, the expressionNotNaN|NotInf|NSZ
denotes the mask which is the combination of theNotNaN
,NotInf
, andNSZ
flags. - an injected immediate integer:
!<integer>
. See below. - an ID, e.g.
%foo
. See below. - the name of an extended instruction. For example,
sqrt
in an extended instruction such as%f = OpExtInst %f32 %OpenCLImport sqrt %arg
- the name of an opcode for OpSpecConstantOp, but where the
Op
prefix is removed. For example, the following indicates the use of an integer addition in a specialization constant computation:%sum = OpSpecConstantOp %i32 IAdd %a %b
ID Definitions & Usage
An ID definition pertains to the <result-id>
of an instruction, and ID
usage is a use of an ID as an input to an instruction.
An ID in the assembly language begins with %
and must be followed by a name
consisting of one or more letters, numbers or underscore characters.
For every ID in the assembly program, the assembler generates a unique number called the ID's internal number. Then each ID reference translates into its internal number in the SPIR-V output. Internal numbers are unique within the compilation unit: no two IDs in the same unit will share internal numbers.
The disassembler generates IDs where the name is always a decimal number greater than 0.
So the example can be rewritten using more user-friendly names, as follows:
OpCapability Shader
OpMemoryModel Logical Simple
OpEntryPoint GLCompute %main "main"
OpExecutionMode %main LocalSize 64 64 1
%void = OpTypeVoid
%fnMain = OpTypeFunction %void
%main = OpFunction %void None %fnMain
%lbMain = OpLabel
OpReturn
OpFunctionEnd
Floating point literals
The assembler and disassembler support floating point literals in both decimal and hexadecimal form.
The syntax for a floating point literal is the same as floating point constants in the C programming language, except:
- An optional leading minus (
-
) is part of the literal. - An optional type specifier suffix is not allowed. Infinity and NaN values are expressed in hexadecimal float literals by using the maximum representable exponent for the bit width.
For example, in 32-bit floating point, 8 bits are used for the exponent, and the exponent bias is 127. So the maximum representable unbiased exponent is 128. Therefore, we represent the infinities and some NaNs as follows:
%float32 = OpTypeFloat 32
%inf = OpConstant %float32 0x1p+128
%neginf = OpConstant %float32 -0x1p+128
%aNaN = OpConstant %float32 0x1.8p+128
%moreNaN = OpConstant %float32 -0x1.0002p+128
The assembler preserves all the bits of a NaN value. For example, the encoding
of %aNaN
in the previous example is the same as the word with bits
0x7fc00000
, and %moreNaN
is encoded as 0xff800100
.
The disassembler prints infinite, NaN, and subnormal values in hexadecimal form. Zero and normal values are printed in decimal form with enough digits to preserve all significand bits.
Arbitrary Integers
When writing tests it can be useful to emit an invalid 32 bit word into the
binary stream at arbitrary positions within the assembly. To specify an
arbitrary word into the stream the prefix !
is used, this takes the form
!<integer>
. Here is an example.
OpCapability !0x0000FF00
Any token in a valid assembly program may be replaced by !<integer>
-- even
tokens that dictate how the rest of the instruction is parsed. Consider, for
example, the following assembly program:
%4 = OpConstant %1 123 456 789 OpExecutionMode %2 LocalSize 11 22 33
OpExecutionMode %3 InputLines
The tokens OpConstant
, LocalSize
, and InputLines
may be replaced by random
!<integer>
values, and the assembler will still assemble an output binary with
three instructions. It will not necessarily be valid SPIR-V, but it will
faithfully reflect the input text.
You may wonder how the assembler recognizes the instruction structure (including
instruction boundaries) in the text with certain crucial tokens replaced by
arbitrary integers. If, say, OpConstant
becomes a !<integer>
whose value
differs from the binary representation of OpConstant
(remember that this
feature is intended for fine-grain control in SPIR-V testing), the assembler
generally has no idea what that value stands for. So how does it know there is
exactly one <id>
and three number literals following in that instruction,
before the next one begins? And if LocalSize
is replaced by an arbitrary
!<integer>
, how does it know to take the next three tokens (instead of zero or
one, both of which are possible in the absence of certainty that LocalSize
provided)? The answer is a simple rule governing the parsing of instructions
with !<integer>
in them:
When a token in the assembly program is a !<integer>
, that integer value is
emitted into the binary output, and parsing proceeds differently than before:
each subsequent token not recognized as an OpCode or a is emitted
into the binary output without any checking; when a recognizable OpCode or a
is eventually encountered, it begins a new instruction and parsing
returns to normal. (If a subsequent OpCode is never found, then this alternate
parsing mode handles all the remaining tokens in the program.)
The assembler processes the tokens encountered in alternate parsing mode as follows:
- If the token is a number literal, since context may be lost, the number
is interpreted as a 32-bit value and output as a single word. In order to
specify multiple-word literals in alternate-parsing mode, further uses of
!<integer>
tokens may be required. All formats supported bystrtoul()
are accepted. - If the token is a string literal, it outputs a sequence of words representing the string as defined in the SPIR-V specification for Literal String.
- If the token is an ID, it outputs the ID's internal number.
- If the token is another
!<integer>
, it outputs that integer. - Any other token causes the assembler to quit with an error.
Note that this has some interesting consequences, including:
-
When an OpCode is replaced by
!<integer>
, the integer value should encode the instruction's word count, as specified in the physical-layout section of the SPIR-V specification. -
Consecutive instructions may have their OpCode replaced by
!<integer>
and still produce valid SPIR-V. For example,!262187 %1 %2 "abc" !327739 %1 %3 6 %2
will successfully assemble into SPIR-V declaring a constant and a PrivateGlobal variable. -
Enums (such as
DontInline
orSubgroupMemory
, for instance) are not handled by the alternate parsing mode. They must be replaced by!<integer>
for successful assembly. -
The
<result-id>
on the left-hand side of an assignment cannot be a!<integer>
. The<result-id>
can be still be manually controlled if desired by expressing the entire instruction as!<integer>
tokens for its opcode and operands. -
The
=
sign cannot be processed by the alternate parsing mode if the OpCode following it is a!<integer>
. -
When replacing a named ID with
!<integer>
, it is possible to generate unintentionally valid SPIR-V. If the integer provided happens to equal a number generated for an existing named ID, it will result in a reference to that named ID being output. This may be valid SPIR-V, contrary to the presumed intention of the writer.
Notes
- Some enumerants cannot be used by name, because the target instruction
in which they are meaningful take an ID reference instead of a literal value.
For example:
- Named enumerated value
CmdExecTime
from section 3.30 Kernel Profiling Info is used in constructing a mask value supplied as an ID forOpCaptureEventProfilingInfo
. But no other instruction has enough context to bring the enumerant names from section 3.30 into scope. - Similarly, the names in section 3.29 Kernel Enqueue Flags are used to construct a value supplied as an ID to the Flags argument of OpEnqueueKernel.
- Similarly for the names in section 3.25 Memory Semantics.
- Similarly for the names in section 3.27 Scope.
- Named enumerated value
- Some enumerants cannot be used by name, because they only name values
returned by an instruction:
- Enumerants from 3.12 Image Channel Order name possible values returned
by the
OpImageQueryOrder
instruction. - Enumerants from 3.13 Image Channel Data Type name possible values
returned by the
OpImageQueryFormat
instruction.
- Enumerants from 3.12 Image Channel Order name possible values returned
by the