SPIR-V requires that tessellation factor arrays be size 4 (outer) or 2 (inner).
HLSL allows other sizes such as 3, or even scalars. This commit converts
between them by forcing the IO types to be the SPIR-V size, and allowing
copies between the internal and IO types to handle these cases.
This PR emulates per control point inputs to patch constant functions.
Without either an extension to look across SIMD lanes or a dedicated
stage, the emulation must use separate invocations of the wrapped
entry point to obtain the per control point values. This is provided
since shaders are wanting this functionality now, but such an extension
is not yet available.
Entry point arguments qualified as an invocation ID are replaced by the
current control point number when calling the wrapped entry point. There
is no particular optimization for the case of the entry point not having
such an input but the PCF still accepting ctrl pt frequency data. It'll
work, but anyway makes no so much sense.
The wrapped entry point must return the per control point data by value.
At this time it is not supported as an output parameter.
It would have been possible for globally scoped user functions to collide
with builtin method names. This adds a prefix to avoid polluting the
namespace.
Ideally this would be an invalid character to use in user identifiers, but
as that requires changing the scanner, for the moment it's an unlikely yet
valid prefix.
Also use this to move deferred member-function-body parsing to a better
place.
This should also be well poised for implementing the 'namespace' keyword.
This is slightly cleaner today for entry-point wrapping, which sometimes made
two subtrees for a function definition instead of just one subtree. It will be
critical though for recognizing a struct with multiple member functions.
The non-LOD form of image size query is prohibited in certain cases:
see the OpImageQuerySize and OpImageQuerySizeLod sections of the SPIR-V
spec for details. Sometimes we were generating the non-LOD form when
we should have been using the LOD form. Sometimes the LOD form is required
even if the underlying HLSL query did not supply a MIP level itself,
in which case level 0 is now queried.
This change propagates the storage qualifier from the buffer object to its contained
array type so that isStructBufferType() realizes it is one. That propagation was
happening before only for global variable declarations, so compilation defects would
result if the use of a function parameter happened before a global declaration.
This fixes that case, whether or not there ever is a global declaration, and
regardless of the relative order.
This changes the hlsl.structbuffer.fn.frag test to exercise the alternate order.
There are no differences to generated SPIR-V for the cases which successfully compiled before.
Use an explicit cast from size_t to int to avoid errors like the following:
glslang\glslang\MachineIndependent\preprocessor\Pp.cpp(1053) : error C2220: warning treated as error - no 'object' file generated
glslang\glslang\MachineIndependent\preprocessor\Pp.cpp(1053) : warning C4267: '=' : conversion from 'size_t' to 'int', possible loss of data
affects Pp.cpp, hlslParseHelper.cpp.
Initialize local variable to get rid of warningsa about potentially
uninitialized variables:
glslang\hlsl\hlslparsehelper.cpp(3667) : error C2220: warning treated as error - no 'object' file generated
glslang\hlsl\hlslparsehelper.cpp(3667) : warning C4701: potentially uninitialized local variable 'builtIn' used
affects hlslParseHelper.cpp
The f16tof32 opcode was indexing a vector with a float 0, rather
than an int 0. It may have made no functional difference due to the
identical bit pattern, but code looking at the type could be
confused.
This PR adds the ability to pass structuredbuffer types by reference
as function parameters.
It also changes the representation of structuredbuffers from anonymous
blocks with named members, to named blocks with pseudonymous members.
That should not be an externally visible change.
This is a partial implemention of structurebuffers supporting:
* structured buffer types of:
* StructuredBuffer
* RWStructuredBuffer
* ByteAddressBuffer
* RWByteAddressBuffer
* Atomic operations on RWByteAddressBuffer
* Load/Load[234], Store/Store[234], GetDimensions methods (where allowed by type)
* globallycoherent flag
But NOT yet supporting:
* AppendStructuredBuffer / ConsumeStructuredBuffer types
* IncrementCounter/DecrementCounter methods
Please note: the stride returned by GetDimensions is as calculated by glslang for std430,
and may not match other environments in all cases.
This obsoletes WIP PR #704, which was built on the pre entry point wrapping master. New version
here uses entry point wrapping.
This is a limited implementation of tessellation shaders. In particular, the following are not functional,
and will be added as separate stages to reduce the size of each PR.
* patchconstantfunctions accepting per-control-point input values, such as
const OutputPatch <hs_out_t, 3> cpv are not implemented.
* patchconstantfunctions whose signature requires an aggregate input type such as
a structure containing builtin variables. Code to synthesize such calls is not
yet present.
These restrictions will be relaxed as soon as possible. Simple cases can compile now: see for example
Test/hulsl.hull.1.tesc - e.g, writing to inner and outer tessellation factors.
PCF invocation is synthesized as an entry point epilogue protected behind a barrier and a test on
invocation ID == 0. If there is an existing invocation ID variable it will be used, otherwise one is
added to the linkage. The PCF and the shader EP interfaces are unioned and builtins appearing in
the PCF but not the EP are also added to the linkage and synthesized as shader inputs.
Parameter matching to (eventually arbitrary) PCF signatures is by builtin variable type. Any user
variables in the PCF signature will result in an error. Overloaded PCF functions will also result in
an error.
[domain()], [partitioning()], [outputtopology()], [outputcontrolpoints()], and [patchconstantfunction()]
attributes to the shader entry point are in place, with the exception of the Pow2 partitioning mode.
This removes pervertex output blocks, in favor of using only
loose variables. The pervertex blocks are not required and were
only partly implemented, and were adding some complication.
This change goes with wrap-entry-point.
Structs are split to remove builtin members to create valid SPIR-V. In this
process, an outer structure array dimension may be propegated onto the
now-removed builtin variables. For example, a mystruct[3].position ->
position[3]. The copy between the split and unsplit forms would handle
this in some cases, but not if the array dimension was at different levels
of aggregate.
It now does this, but may not handle arbitrary composite types. Unclear if
that has any semantic meaning for builtins though.
This introduces parallel types for IO-type containing aggregates used as
non-entry point function parameters or return types, or declared as variables.
Further uses of the same original type will share the same sanitized deep
structure.
This is intended to be used with the wrap-entry-point branch.
This needs some render testing, but is destined to be part of master.
This also leads to a variety of other simplifications.
- IO are global symbols, so only need one list of linkage nodes (deferred)
- no longer need parse-context-wide 'inEntryPoint' state, entry-point is localized
- several parts of splitting/flattening are now localized
When copying split types with mixtures of user variables and buitins,
where the builtins are extracted, there is a parallel structures traversal.
The traversal was not obtaining the derefenced types in the array case.
Since EOpMatrixSwizzle is a new op, existing back-ends only work when the
front end first decomposes it to other operations. So far, this is only
being done for simple assignment into matrix swizzles.
This partially addressess issue #670, for when the matrix swizzle
degenerates to a component or column: m[c], m[c][r] (where HLSL
swaps rows and columns for user's view).
An error message is given for the arbitrary cases not covered.
These cases will work for arbitrary use of l-values.
Future work will handle more arbitrary swizzles, which might
not work as arbitrary l-values.
(Still adding tests: do not commit)
This fixes PR #632 so that:
(a) The 4 PerVertex builtins are added to an interface block for all stages except fragment.
(b) Other builtin qualified variables are added as "loose" linkage members.
(c) Arrayness from the PerVertex builtins is moved to the PerVertex block.
(d) Sometimes, two PerVertex blocks are created, one for in, one for out (e.g, for some GS that
both reads and writes a Position)
- fixed ParseHelper.cpp newlines (crlf -> lf)
- removed trailing white space in most source files
- fix some spelling issues
- extra blank lines
- tabs to spaces
- replace #include comment about no location
Reads and write syntax to UAV objects is turned into EOpImageLoad/Store
operations. This translation did not support destination swizzles,
for example, "mybuffer[tc].zyx = 3;", so such statements would fail to
compile. Now they work.
Parial updates are explicitly prohibited.
New test: hlsl.rw.swizzle.frag
This PR adds support for default function parameters in the following cases:
1. Simple constants, such as void fn(int x, float myparam = 3)
2. Expressions that can be const folded, such a ... myparam = sin(some_const)
3. Initializer lists that can be const folded, such as ... float2 myparam = {1,2}
New tests are added: hlsl.params.default.frag and hlsl.params.default.err.frag
(for testing error situations, such as ambiguity or non-const-foldable).
In order to avoid sampler method ambiguity, the hlsl better() lambda now
considers sampler matches. Previously, all sampler types looked identical
since only the basic type of EbtSampler was considered.
This commit adds support for copying nested hierarchical types of split
types. E.g, a struct of a struct containing both user and builtin interstage
IO variables.
When copying split types, if any subtree does NOT contain builtin interstage
IO, we can copy the whole subtree with one assignment, which saves a bunch
of AST verbosity for memberwise copies of that subtree.
This adds structure splitting, which among other things will enable GS support where input structs
are passed, and thus become input arrays of structs in the GS inputs. That is a common GS case.
The salient points of this PR are:
* Structure splitting has been changed from "always between stages" to "only into the VS and out of
the PS". It had previously happened between stages because it's not legal to pass a struct
containing a builtin IO variable.
* Structs passed between stages are now split into a struct containing ONLY user types, and a
collection of loose builtin IO variables, if any. The user-part is passed as a normal struct
between stages, which is valid SPIR-V now that the builtin IO is removed.
* Internal to the shader, a sanitized struct (with IO qualifiers removed) is used, so that e.g,
functions can work unmodified.
* If a builtin IO such as Position occurs in an arrayed struct, for example as an input to a GS,
the array reference is moved to the split-off loose variable, which is given the array dimension
itself.
When passing things around inside the shader, such as over a function call, the the original type
is used in a sanitized form that removes the builtIn qualifications and makes them temporaries.
This means internal function calls do not have to change. However, the type when returned from
the shader will be member-wise copied from the internal sanitized one to the external type.
The sanitized type is used in variable declarations.
When copying split types and unsplit, if a sub-struct contains only user variables, it is copied
as a single entity to avoid more AST verbosity.
Above strategy arrived at with talks with @johnkslang.
This is a big complex change. I'm inclined to leave it as a WIP until it can get some exposure to
real world cases.
This PR implements recursive type flattening. For example, an array of structs of other structs
can be flattened to individual member variables at the shader interface.
This is sufficient for many purposes, e.g, uniforms containing opaque types, but is not sufficient
for geometry shader arrayed inputs. That will be handled separately with structure splitting,
which is not implemented by this PR. In the meantime, that case is detected and triggers an error.
The recursive flattening extends the following three aspects of single-level flattening:
- Flattening of structures to individual members with names such as "foo[0].samp[1]";
- Turning constant references to the nested composite type into a reference to a particular
flattened member.
- Shadow copies between arrays of flattened members and the nested composite type.
Previous single-level flattening only flattened at the shader interface, and that is unchanged by this PR.
Internally, shadow copies are, such as if the type is passed to a function.
Also, the reasons for flattening are unchanged. Uniforms containing opaque types, and interface struct
types are flattened. (The latter will change with structure splitting).
One existing test changes: hlsl.structin.vert, which did in fact contain a nested composite type to be
flattened.
Two new tests are added: hlsl.structarray.flatten.frag, and hlsl.structarray.flatten.geom (currently
issues an error until type splitting is online).
The process of arriving at the individual member from chained postfix expressions is more complex than
it was with one level. See large-ish comment above HlslParseContext::flatten() for details.
PR #577 addresses most but not all of the intrinsic promotion problems.
This PR resolves all known cases in the remainder.
Interlocked ops need special promotion rules because at the time
of function selection, the first argument has not been converted
to a buffer object. It's just an int or uint, but you don't want
to convert THAT argument, because that implies converting the
buffer object itself. Rather, you can convert other arguments,
but want to stay in the same "family" of functions. E.g, if
the first interlocked arg is a uint, use only the uint family,
never the int family, you can convert the other args as you please.
This PR allows making such opcode and arg specific choices by
passing the op and arg to the convertible lambda. The code in
the new test "hlsl.promote.atomic.frag" would not compile without
this change, but it must compile.
Also, it provides better handling of downconversions (to "worse"
types), which are permitted in HLSL. The existing method of
selecting upconversions is unchanged, but if that doesn't find
any valid ones, then it will allow downconversions. In effect
this always uses an upconversion if there is one.
Use "--source-entrypoint name" on the command line, or the
TShader::setSourceEntryPoint(char*) API.
When the name given to the above interfaces is detected in the
shader source, it will be renamed to the entry point name supplied
to the -e option or the TShader::setEntryPoint() method.
This PR handles implicit promotions for intrinsics when there is no exact match,
such as for example clamp(int, bool, float). In this case the int and bool will
be promoted to a float, and the clamp(float, float, float) form used.
These promotions can be mixed with shape conversions, e.g, clamp(int, bool2, float2).
Output conversions are handled either via the existing addOutputArgumentConversion
function, which this PR generalizes to handle either aggregates or unaries, or by
intrinsic decomposition. If there are methods or intrinsics to be decomposed,
then decomposition is responsible for any output conversions, which turns out to
happen automatically in all current cases. This can be revisited once inout
conversions are in place.
Some cases of actual ambiguity were fixed in several tests, e.g, spv.register.autoassign.*
Some intrinsics with only uint versions were expanded to signed ints natively, where the
underlying AST and SPIR-V supports that. E.g, countbits. This avoids extraneous
conversion nodes.
A new function promoteAggregate is added, and used by findFunction. This is essentially
a generalization of the "promote 1st or 2nd arg" algorithm in promoteBinary.
The actual selection proceeds in three steps, as described in the comments in
hlslParseContext::findFunction:
1. Attempt an exact match. If found, use it.
2. If not, obtain the operator from step 1, and promote arguments.
3. Re-select the intrinsic overload from the results of step 2.
Previously, an error was thrown when assigning a float1 to a scalar float,
or similar for other basic types. This allows that.
Also, this allows calling functions accepting scalars with float1 params,
so for example sin(float1) will work. This is a minor change in
HlslParseContext::findFunction().
Rationalizes the entire tracking of the linker object nodes, effecting
GLSL, HLSL, and SPIR-V, to allow tracked objects to be fully edited before
their type snapshot for linker objects.
Should only effect things when the rest of the AST contained no reference to
the symbol, because normal AST nodes were not stale. Also will only effect such
objects when their types were edited.
This PR adds:
1. The "u" register class for RW* objects.
2. --shift-image-bindings (== --sib), analogous to --shift-texture-bindings etc.
3. Case insensitive reg classes.
4. Tests for above.
This PR adds handling of the numthreads attribute for compute shaders, as well as a general
infrastructure for returning attribute values from acceptAttributes, which may be needed in other
cases, e.g, unroll(x), or merely to know if some attribute without params was given.
A map of enum values from TAttributeType to TIntermAggregate nodes is built and returned. It
can be queried with operator[] on the map. In the future there may be a need to also handle
strings (e.g, for patchconstantfunc), and those can be easily added into the class if needed.
New test is in hlsl.numthreads.comp.
HLSL holds the compare value in a separate intrinsic arg, but the AST wants
a vector including the cmp val, except in the 4-dim coord case, where it
doesn't fit and is in fact a separate AST parameter. This is awkward but
necessary, given AST semantics. In the process, a new vector is constructed
for the combined result, but this vector was not being given the correct
TType, so was causing some downstream troubles.
Now it is. A similar defect existed in OpTextureBias, and has also been
fixed.
This PR sets the TQualifier layoutFormat according to the HLSL image type.
For instance:
RWTexture1D <float2> g_tTex1df2;
becomes ElfRg32f. Similar on Buffers, e.g, Buffer<float4> mybuffer;
The return type for image and buffer loads is now taken from the storage format.
Also, the qualifier for the return type is now (properly) a temp, not a global.
- hlsl.struct.frag variable changed to static, assignment replacd.
- Created new low level functions addBinaryNode and addUnaryNode. These are
used by higher level functions such as addAssignment, and do not do any
argument promotion or conversion of any sort.
- Two functions above are now used in RWTexture lvalue conversions. Also,
other direction creations of unary or binary nodes now use them, e.g, addIndex.
This cleans up some existing code.
- removed handling of EOpVectorTimesScalar from promote()
- removed comment from ParseHelper.cpp
This commit splits lValueErrorCheck into machine dependent and independent
parts. The GLSL form in TParseContext inherits from and invokes the
machine dependent part in TParseContextBase. The base form checks language
independent things. This split does not change the set of errors tested
for: the test results are identical.
The new base class interface is now used from the HLSL FE to test lvalues.
There was one test diff due to this, where the test was writing to a uniform.
It still does the same indirections, but does not attempt a uniform write.
This commit adds l-value support for RW texture and buffer objects.
Supported are:
- pre and post inc/decrement
- function out parameters
- op-assignments, such as *=, +-, etc.
- result values from op-assignments. e.g, val=(MyRwTex[loc] *= 2);
Not supported are:
- Function inout parameters
- multiple post-inc/decrement operators. E.g, MyRWTex[loc]++++;
This commit adds r-value support for RW textures and buffers.
Supported is:
- Function in parameter conversions
- conversion of rvalue use to imageLoad
There's a lot to do for RWTexture and RWBuffer, so it will be broken up into
several PRs. This is #1.
This adds RWTexture and RWBuffer support, with the following limitations:
* Only 4 component formats supported
* No operator[] yet
Those will be added in other PRs.
This PR supports declarations and the Load & GetDimensions methods. New tests are
added.
If a member-wise assignment from a non-flattened struct to a flattened struct sees a complex R-value
(not a symbol), it now creates a temporary to hold that value, to avoid repeating the R-value.
This avoids, e.g, duplicating a whole function call. Also, it avoids re-using the AST node, making a
new one for each member inside the member loop.
The latter (re-use of AST node) was also an issue in the GetDimensions intrinsic decomposition,
so this PR fixes that one too.
- Add new queries: TProgram::getUniformTType and getUniformBlockTType,
which return a const TType*, or nullptr on a bad index. These are valid for
any source language.
- Interface name for HLSL cbuffers is taken from the (only) available declaration name,
whereas before it was always an empty string, which caused some troubles with reflection
mapping them all to the same index slot. This also makes it appear in the SPIR-V binary
instead of an empty string.
- Print the binding as part of the reflection textual dump.
- TType::clone becomes const. Needed to call it from a const method, and anyway it doesn't
change the object it's called on.
- Because the TObjectReflection constructor is called with a TType *reference* (not pointer)
so that it's guaranteed to pass in a type, and the "badReflection" value should use a nullptr
there, that now has a dedicated static method to obtain the bad value. It uses a private
constructor, so external users can't create one with a nullptr type.
Previously the uniform array flattening feature would trigger on loose
uniform arrays of any basic type (e.g, floats). This PR restricts it
to sampler and texture arrays. Other arrays would end up in their own
uniform block (anonymous or otherwise). (Atomic counter arrays might be an
exception, but those are not currently flattened).
Fix for two defects as follows:
- The IO mapping traverser was not setting inVisit, and would skip some AST nodes.
Depending on the order of nodes, this could have prevented the binding from
showing up in the generated SPIR-V.
- If a uniform array was flattened, each of the flattened scalars from the array
is still a (now-scalar) uniform. It was being converted to a temporary.
This checkin adds a --flatten-uniform-arrays option which can break
uniform arrays of samplers, textures, or UBOs up into individual
scalars named (e.g) myarray[0], myarray[1], etc. These appear as
individual linkage objects.
Code notes:
- shouldFlatten internally calls shouldFlattenIO, and shouldFlattenUniform,
but is the only flattening query directly called.
- flattenVariable will handle structs or arrays (but not yet arrayed structs;
this is tested an an error is generated).
- There's some error checking around unhandled situations. E.g, flattening
uniform arrays with initializer lists is not implemented.
- This piggybacks on as much of the existing mechanism for struct flattening
as it can. E.g, it uses the same flattenMap, and the same
flattenAccess() method.
- handleAssign() has been generalized to cope with either structs or arrays.
- Extended test infrastructure to test flattening ability.
Code using atEndOfFile was dead, instead do something useful with
the scanners atEndOfInput(). This allows a better error message
for early termination of cascading errors.
This also enables vecN -> vec1 shape conversions for all places doing shape
conversions.
For signature selection, makes shape changes worse than any other comparison
when deciding what conversions are better than others.
From the ES spec + Bugzilla 15931 and GL_KHR_vulkan_glsl:
- Update precision qualifiers for all built-in function prototypes.
- Implement the new algorithm used to distinguish built-in function
operation precisions from result precisions.
Also add tracking of separate result and operation precisions, and
use that in generating SPIR-V.
(SPIR-V cares about precision of operation, while the front-end
cares about precision of result, for propagation.)