Hull shaders have an implicitly arrayed output. This is handled by creating an arrayed form of the
provided output type, and writing to the element of it indexed by InvocationID.
The implicit indirection into that array was causing some troubles when copying to a split
structure. handleAssign was able to handle simple symbol lvalues, but not an lvalue composed
of an indirection into an array.
Changes:
(1) Allow clip/cull builtins as both input and output in the same shader stage. Previously,
not enough data was tracked to handle this.
(2) Handle the extra array dimension in GS inputs. The synthesized external variable can
now be created with the extra array dimension if needed, and the form conversion code is
able to handle it as well.
For example, both of these GS inputs would result in the same synthesized external type:
triangle in float4 clip[3] : SV_ClipDistance
triangle in float2 clip[3][2] : SV_ClipDistance
In the second case, the inner array dimension packs with the 2-vector of floats into an array[4],
which there is an array[3] of due to the triangle geometry.
While adding geometry stage support for clip/cull, it transpired that the
existing clip/cull support was not setting the type of the result of indexing
into the clup/cull variable. That's a defect independent of the geometry
support, so to simplify the geometry PR, this is addressed separately.
It doesn't appear to change the generated SPIR-V, but that's probably down to
something else tolerating a bad input.
HLSL allows a range of types for clip and cull distances. There are
three dimensions, including arrayness, vectorness, and semantic ID.
SPIR-V requires clip and cull distance be a single array of floats in
all cases.
This code provides input side conversion between the SPIR-V form and
the HLSL form. (Output conversion was added in PR #947 and #997).
This PR extends HlslParseContext::assignClipCullDistance to cope with
the input side conversion. Not as much changed as appears: there was
also a lot of renaming to reflect the fact that the code now handles
either direction.
Currently, non-{frag,vert} stages are not handled, and are explicitly
rejected.
Fixes#1026.
Some languages allow a restricted set of user structure types returned from texture sampling
operations. Restrictions include the total vector size of all components may not exceed 4,
and the basic types of all members must be identical.
This adds underpinnings for that ability. Because storing a whole TType or even a simple
TTypeList in the TSampler would be expensive, the structure definition is held in a
table outside the TType. The TSampler contains a small bitfield index, currently 4 bits
to support up to 15 separate texture template structure types, but that can be adjusted
up or down. Vector returns are handled as before.
There are abstraction methods accepting and returning a TType (such as may have been parsed
from a grammar). The new methods will accept a texture template type and set the
sampler to the structure if possible, checking a range of error conditions such as whether
the total structure vector components exceed 4, or whether their basic types differe, or
whether the struct contains non-vector-or-scalar members. Another query returns the
appropriate TType for the sampler.
High level summary of design:
In the TSampler, this holds an index into the texture structure return type table:
unsigned int structReturnIndex : structReturnIndexBits;
These are the methods to set or get the return type from the TSampler. They work for vector or structure returns, and potentially could be expanded to handle other things (small arrays?) if ever needed.
bool setTextureReturnType(TSampler& sampler, const TType& retType, const TSourceLoc& loc);
void getTextureReturnType(const TSampler& sampler, const TType& retType, const TSourceLoc& loc) const;
The ``convertReturn`` lambda in ``HlslParseContext::decomposeSampleMethods`` is greatly expanded to know how to copy a vec4 sample return to whatever the structure type should be. This is a little awkward since it involves introducing a comma expression to return the proper aggregate value after a set of memberwise copies.
This adds support for #pragma pack_matrix() to the HLSL front end.
The pragma sets the default matrix layout for subsequent unqualified matrices
in structs or buffers. Explicit qualification overrides the pragma value. Matrix
layout is not permitted at the structure level in HLSL, so only leaves which are
matrix types can be so qualified.
Note that due to the semantic (not layout) difference in first matrix indirections
between HLSL and SPIR-V, the sense of row and column major are flipped. That's
independent of this PR: just a factor to note. A column_major qualifier appears
as a RowMajor member decoration in SPIR-V modules, and vice versa.
The HLSL FE tracks four versions of a declared type to avoid losing information, since it
is not (at type-decl time) known how the type will be used downstream. If such a type
was used in a cbuffer declaration, the cbuffer type's members should have been using
the uniform form of the original user structure type, but were not.
This would manifest as matrix qualifiers (and other things, such as pack offsets) on user struct
members going missing in the SPIR-V module if the struct type was a member of a cbuffer, like so:
struct MyBuffer
{
row_major float4x4 mat1;
column_major float4x4 mat2;
};
cbuffer Example
{
MyBuffer g_MyBuffer;
};
Fixes: #789
This allows removal of isPerVertexBuiltIn(). It also leads to
removal of addInterstageIoToLinkage(), which is no longer needed.
Includes related name improvements.
The goal is to flatten all I/O, but there are multiple categories and
steps to complete, likely including a final unification of splitting
and flattening.
Most of this was obsoleted by entry-point wrapping.
Some other is just unnecessary.
Also, includes some spelling/name improvements.
This is to help lay ground work for flattening user I/O.
Two non-functional changes:
1. Remove flattenLevel, which is unneeded since at or around d1be7545c6.
2. Fix build warining about unused variable in executeInitializer.
HLSL allows several variables to be declared. There are packing rules involved:
e.g, a float3 and a float1 can be packed into a single array[4], while for a
float3 and another float3, the second one will skip the third array entry to
avoid straddling
This is implements that ability. Because there can be multiple variables involved,
and the final output array will often be a different type altogether (to fuse
the values into a single destination), a new variable is synthesized, unlike the prior
clip/cull support which used the declared variable. The new variable name is
taken from one of the declared ones, so the old tests are unchanged.
Several new tests are added to test various packing scenarios.
Only two semantic IDs are supported: 0, and 1, per HLSL rules. This is
encapsulated in
static const int maxClipCullRegs = 2;
and the algorithm (probably :) ) generalizes to larger values, although there
are a few issues around how HLSL would pack (e.g, would 4 scalars be packed into
a single HLSL float4 out reg? Probably, and this algorithm assumes so).
--resource-set-binding has a mode which allows per-register assignments of
bindings and descriptor sets on the command line, and another accepting a
single descriptor set value to assign to all variables.
The former worked, but the latter would crash when assigning the values.
This fixes it, and makes the former case a bit more robust against premature
termination of the pre-register values, which must come in (regname,set,binding)
triples.
This also allows the form "--resource-set-binding stage setnum", which was
mentioned in the usage message, but did not parse.
The operation of the per-register form of this option is unchanged.
Semantic test left over from other source languages is removed, since this is permitted by HLSL.
Also, to support the functionality, a targeted test is performed for this case and it is
turned into a EvqGlobal qualifier to create an AST initialization segment when needed.
Constness is now propagated up aggregate chains during initializer construction. This
handles hierarchical cases such as the distinction between:
static const float2 a[2] = { { 1, 2 }, { 3, 4} };
vs
static const float2 a[2] = { { 1, 2 }, { cbuffer_member, 4} };
The first of which can use a first class constant initalization, and the second cannot.
HLSL allows float/etc types for any/all intrinsics, while the
SPIR-V opcode requires bool. This adds a simple decomposition
to type convert the argument. It could get a little more clever
in some of the type cases if it ever had to.
Lays the groundwork for fixing issue #954.
Partial flattenings were previously tracked through a stack of active subsets
in the parse context, but full functionality needs AST nodes to represent
this across time, removing the need for parsecontext tracking.
In HLSL, there are three (TODO: ??) dimensions of clip and cull
distance values:
* The semantic's value N, ala SV_ClipDistanceN.
* The array demension, if the value is an array.
* The vector element, if the value is a vector or array of vectors.
In SPIR-V, clip and cull distance are arrays of scalar floats, always.
This PR currently ignores the semantic N axis, and handles the other
two axes by sequentially copying each vector element of each array member
into sequential floats in the output array.
Fixes: #946
This fixes:
1. A compilation error when assigning scalars to matricies
2. A semantic error in matrix construction from scalars. This was
initializing the diagonal, where HLSL semantics require the scalar be
replicated to every matrix element.
3. Functions accepting mats can be called with scalars, which will
be shape-converted to the matrix type. This was previously failing
to match the function signature.
NOTE: this does not yet handle complex scalars (a function call,
say) used to construct matricies. That'll be added when the
node replicator service is available. For now, there's an assert.
There's one new test (hlsl.scalar2matrix.frag). An existing test
lsl.type.half.frag changes, because of (2) above, and a negative
test error message changes due to (3) above.
Fixes#923.
For "s.m = t", a sampler member assigned a sampler, make t an alias
for s.m, and when s.m is flattened, it will flatten to the alias t.
Normally, assignments to samplers are disallowed.
This modifies function parameter passing to pass the counter
buffer associated with a struct buffer to a function as a
hidden parameter. Similarly function declarations will have
hidden parameters added to accept the associated counter buffers.
There is a limitation: if a SB type may or may not have an associated
counter, passing it as a function parameter will assume that it does, and
the counter will appear in the linkage whether or not there is a counter
method used on the object.
Marking as WIP since it might deserve discussion or at least explicit consideration.
During type sanitization, the TQualifier's TBuiltInVariable type is lost. However,
sometimes it's needed downstream. There were already two methods in use to track
it through to places it was needed: one in the TParameter, and one in a map in the
HlslParseContext used for structured buffers.
The latter was going to be insufficient when SB types with counters are passed to
user functions, and it's proving awkward to track the data to where it's needed.
This PR holds a proposal: track the original declared builtin type in the TType,
so it's trivially available where needed.
This lets the other two mechanisms be removed (and they are in this PR). There's a
side benefit of not losing certain types of information before the reflection interface.
This PR is only that proposal, so it changes no test results. If it's acceptable,
I'll use it for the last piece of SB counter functionality.
This implements mytex.mips[mip][coord] for texture types. There is
some error testing, but not comprehensive. The constructs can be
nested, e.g in this case the inner .mips is parsed before the completion
of the outer [][] operator.
tx.mips[tx.mips[a][b].x][c]
Using GS methods such as Append() in non-GS stages should be ignored, but was
creating errors due to the lack of a stream output symbol for the non-GS stage.
This adds infrastructure suitable for any front end to create SPIR-V loop
control flags. The only current front end doing so is HLSL.
[unroll] turns into spv::LoopControlUnrollMask
[loop] turns into spv::LoopControlDontUnrollMask
no specification means spv::LoopControlMaskNone
Multisample textures support a GetSamplePosition() method intended to query
positions given a sample index. This cannot be truly implemented in SPIR-V,
but #753 requested returning standard positions for the 1..16 cases, which
this PR adds. Anything besides that returns (0,0). If the standard positions
are not used, this will be wrong.
This should be revisited when there is a real query available.
glslang/MachineIndependent/iomapper.cpp:207:9: error: field 'resolver' will be initialized after field 'stage' [-Werror,-Wreorder]
: resolver(r)
^
glslang/MachineIndependent/iomapper.cpp:263:9: error: field 'resolver' will be initialized after field 'stage' [-Werror,-Wreorder]
: resolver(r)
^
hlsl/hlslParseHelper.cpp:70:5: error: field 'gsStreamOutput' will be initialized after field 'inputPatch' [-Werror,-Wreorder]
gsStreamOutput(nullptr),
^
Byte address buffers were failing to detect that they were byte address
buffers when used as fn parameters.
Note: this detection is a little awkward, and could be simplified if
it was easy to obtain the declared builtin type for an object.
Some texture and SB operations can take non-integer indexes, which should be
cast to integers before use if they are not already. This adds makeIntegerIndex()
for the purpose. Int types are left alone.
(This was done before for operator[], but needs to apply to some other things
too, hence its extraction into common function now)
This adds TProgram::getUniformBlockCounterIndex(int index), which returns the
index the block of the counter buffer associated with the block of the passed in
index, if any, or -1 if none.
This is WIP, heavy on the IP part. There's not yet enough to use in real workloads.
Currently present:
* Creation of separate counter buffers for structured buffer types needing them.
* IncrementCounter / DecrementCounter methods
* Postprocess to remove unused counter buffers from linkage
* Associated counter buffers are given @count suffix (invalid as a user identifier)
Not yet present:
* reflection queries to obtain bindings for counter buffers
* Append/Consume buffers
* Ability to use SB references passed as fn parameters
The prior decomposition of isfinite was not setting the return type on the
sequence node. (Sequence was used because there's an internal temporary
to avoid the complex rvalue problem).
HLSL requires vec2 tessellation coordinate declarations in some cases
(e.g, isoline topology), where SPIR-V requires the TessCoord qualified
builtin to be a vec3 in all cases. This alters the IO form of the
variable to be a vec3, which will be copied to the shader's declared
type if needed. This is not a validation; the shader type must be correct.
Improves foundation for adding scalar casts.
Makes handle/make names more sane, better commented, uses more
precise subclass typing, and removes mutual recursion between
converting initializer lists and making constructors.
Previously, patch constant functions only accepted OutputPatch. This
adds InputPatch support, via a pseudo-builtin variable type, so that
the patch can be tracked clear through from the qualifier.
The prior implementation of GS did not work with the new EP wrapping architecture.
This fixes it: the Append() method now looks up the actual output rather
than the internal sanitized temporary type, and writes to that.
In the hull shader, the PCF output does not participate in an argument list,
so has no defined ordering. It is always put at the end of the linkage. That
means the DS input reading PCF data must be be at the end of the DS linkage
as well, no matter where it may appear in the argument list. This change
makes sure that happens.
The detection is by looking for arguments that contain tessellation factor
builtins, even as a struct member. The whole struct is taken as the PCF output
if any members are so qualified.
The SPIR-V generator had assumed tessellation modes such as
primitive type and vertex order would only appear in tess eval
(domain) shaders. SPIR-V allows either, and HLSL allows and
possibly requires them to be in the hull shader.
This change:
1. Passes them through for either tessellation stage, and,
2. Does not set up defaults in the domain stage for HLSl compilation,
to avoid conflicting definitions.
HLSL HS outputs a per ctrl point value, and the DS reads an array
of that type. (It also has a per patch frequency). The per-ctrl-pt
frequency is arrayed on just one side, as opposed to SPIR-V which
is arrayed on both. To match semantics, the compiler creates an
array behind the scenes and indexes it by invocation ID, assigning
the HS return value to it.
SPIR-V requires that tessellation factor arrays be size 4 (outer) or 2 (inner).
HLSL allows other sizes such as 3, or even scalars. This commit converts
between them by forcing the IO types to be the SPIR-V size, and allowing
copies between the internal and IO types to handle these cases.
This PR emulates per control point inputs to patch constant functions.
Without either an extension to look across SIMD lanes or a dedicated
stage, the emulation must use separate invocations of the wrapped
entry point to obtain the per control point values. This is provided
since shaders are wanting this functionality now, but such an extension
is not yet available.
Entry point arguments qualified as an invocation ID are replaced by the
current control point number when calling the wrapped entry point. There
is no particular optimization for the case of the entry point not having
such an input but the PCF still accepting ctrl pt frequency data. It'll
work, but anyway makes no so much sense.
The wrapped entry point must return the per control point data by value.
At this time it is not supported as an output parameter.
It would have been possible for globally scoped user functions to collide
with builtin method names. This adds a prefix to avoid polluting the
namespace.
Ideally this would be an invalid character to use in user identifiers, but
as that requires changing the scanner, for the moment it's an unlikely yet
valid prefix.
Also use this to move deferred member-function-body parsing to a better
place.
This should also be well poised for implementing the 'namespace' keyword.
This is slightly cleaner today for entry-point wrapping, which sometimes made
two subtrees for a function definition instead of just one subtree. It will be
critical though for recognizing a struct with multiple member functions.
The non-LOD form of image size query is prohibited in certain cases:
see the OpImageQuerySize and OpImageQuerySizeLod sections of the SPIR-V
spec for details. Sometimes we were generating the non-LOD form when
we should have been using the LOD form. Sometimes the LOD form is required
even if the underlying HLSL query did not supply a MIP level itself,
in which case level 0 is now queried.
This change propagates the storage qualifier from the buffer object to its contained
array type so that isStructBufferType() realizes it is one. That propagation was
happening before only for global variable declarations, so compilation defects would
result if the use of a function parameter happened before a global declaration.
This fixes that case, whether or not there ever is a global declaration, and
regardless of the relative order.
This changes the hlsl.structbuffer.fn.frag test to exercise the alternate order.
There are no differences to generated SPIR-V for the cases which successfully compiled before.
Use an explicit cast from size_t to int to avoid errors like the following:
glslang\glslang\MachineIndependent\preprocessor\Pp.cpp(1053) : error C2220: warning treated as error - no 'object' file generated
glslang\glslang\MachineIndependent\preprocessor\Pp.cpp(1053) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
affects Pp.cpp, hlslParseHelper.cpp.
Initialize local variable to get rid of warningsa about potentially
uninitialized variables:
glslang\hlsl\hlslparsehelper.cpp(3667) : error C2220: warning treated as error - no 'object' file generated
glslang\hlsl\hlslparsehelper.cpp(3667) : warning C4701: potentially uninitialized local variable 'builtIn' used
affects hlslParseHelper.cpp
The f16tof32 opcode was indexing a vector with a float 0, rather
than an int 0. It may have made no functional difference due to the
identical bit pattern, but code looking at the type could be
confused.
This PR adds the ability to pass structuredbuffer types by reference
as function parameters.
It also changes the representation of structuredbuffers from anonymous
blocks with named members, to named blocks with pseudonymous members.
That should not be an externally visible change.
This is a partial implemention of structurebuffers supporting:
* structured buffer types of:
* StructuredBuffer
* RWStructuredBuffer
* ByteAddressBuffer
* RWByteAddressBuffer
* Atomic operations on RWByteAddressBuffer
* Load/Load[234], Store/Store[234], GetDimensions methods (where allowed by type)
* globallycoherent flag
But NOT yet supporting:
* AppendStructuredBuffer / ConsumeStructuredBuffer types
* IncrementCounter/DecrementCounter methods
Please note: the stride returned by GetDimensions is as calculated by glslang for std430,
and may not match other environments in all cases.
This obsoletes WIP PR #704, which was built on the pre entry point wrapping master. New version
here uses entry point wrapping.
This is a limited implementation of tessellation shaders. In particular, the following are not functional,
and will be added as separate stages to reduce the size of each PR.
* patchconstantfunctions accepting per-control-point input values, such as
const OutputPatch <hs_out_t, 3> cpv are not implemented.
* patchconstantfunctions whose signature requires an aggregate input type such as
a structure containing builtin variables. Code to synthesize such calls is not
yet present.
These restrictions will be relaxed as soon as possible. Simple cases can compile now: see for example
Test/hulsl.hull.1.tesc - e.g, writing to inner and outer tessellation factors.
PCF invocation is synthesized as an entry point epilogue protected behind a barrier and a test on
invocation ID == 0. If there is an existing invocation ID variable it will be used, otherwise one is
added to the linkage. The PCF and the shader EP interfaces are unioned and builtins appearing in
the PCF but not the EP are also added to the linkage and synthesized as shader inputs.
Parameter matching to (eventually arbitrary) PCF signatures is by builtin variable type. Any user
variables in the PCF signature will result in an error. Overloaded PCF functions will also result in
an error.
[domain()], [partitioning()], [outputtopology()], [outputcontrolpoints()], and [patchconstantfunction()]
attributes to the shader entry point are in place, with the exception of the Pow2 partitioning mode.
This removes pervertex output blocks, in favor of using only
loose variables. The pervertex blocks are not required and were
only partly implemented, and were adding some complication.
This change goes with wrap-entry-point.
Structs are split to remove builtin members to create valid SPIR-V. In this
process, an outer structure array dimension may be propegated onto the
now-removed builtin variables. For example, a mystruct[3].position ->
position[3]. The copy between the split and unsplit forms would handle
this in some cases, but not if the array dimension was at different levels
of aggregate.
It now does this, but may not handle arbitrary composite types. Unclear if
that has any semantic meaning for builtins though.
This introduces parallel types for IO-type containing aggregates used as
non-entry point function parameters or return types, or declared as variables.
Further uses of the same original type will share the same sanitized deep
structure.
This is intended to be used with the wrap-entry-point branch.
This needs some render testing, but is destined to be part of master.
This also leads to a variety of other simplifications.
- IO are global symbols, so only need one list of linkage nodes (deferred)
- no longer need parse-context-wide 'inEntryPoint' state, entry-point is localized
- several parts of splitting/flattening are now localized