Commit Graph

91 Commits

Author SHA1 Message Date
Chip Davis
39dce88d3b MSL: Add support for sampler Y'CbCr conversion.
This change introduces functions and in one case, a class, to support
the `VK_KHR_sampler_ycbcr_conversion` extension. Except in the case of
GBGR8 and BGRG8 formats, for which Metal natively supports implicit
chroma reconstruction, we're on our own here. We have to do everything
ourselves. Much of the complexity comes from the need to support
multiple planes, which must now be passed to functions that use the
corresponding combined image-samplers. The rest is from the actual
Y'CbCr conversion itself, which requires additional post-processing of
the sample retrieved from the image.

Passing sampled images to a function was a particular problem. To
support this, I've added a new class which is emitted to MSL shaders
that pass sampled images with Y'CbCr conversions attached around. It
can handle sampled images with or without Y'CbCr conversion. This is an
awful abomination that should not exist, but I'm worried that there's
some shader out there which does this. This support requires Metal 2.0
to work properly, because it uses default-constructed texture objects,
which were only added in MSL 2. I'm not even going to get into arrays of
combined image-samplers--that's a whole other can of worms.  They are
deliberately unsupported in this change.

I've taken the liberty of refactoring the support for texture swizzling
while I'm at it. It's now treated as a post-processing step similar to
Y'CbCr conversion. I'd like to think this is cleaner than having
everything in `to_function_name()`/`to_function_args()`. It still looks
really hairy, though. I did, however, get rid of the explicit type
arguments to `spvGatherSwizzle()`/`spvGatherCompareSwizzle()`.

Update the C API. In addition to supporting this new functionality, add
some compiler options that I added in previous changes, but for which I
neglected to update the C API.
2019-09-01 18:35:53 -05:00
Hans-Kristian Arntzen
9b845a4788
Merge pull request #1141 from troughton/inline-everything
MSL: Inline all non-entry-point functions
2019-08-30 11:05:04 +02:00
Thomas Roughton
91b2f34a3d Update tests to account for all non-entry-point functions being inlined 2019-08-30 09:39:06 +12:00
Hans-Kristian Arntzen
07c76f66b5 MSL: Add {Base,}{Vertex,Instance}Index to bitcast_from_builtin_load.
Totally missed these, so float(index) would not work correctly for
negative numbers.
2019-08-29 13:56:37 +02:00
Hans-Kristian Arntzen
563e994486
Merge pull request #1135 from KhronosGroup/fix-1119
MSL: Deal with array copies from and to threadgroup.
2019-08-27 15:48:08 +02:00
Hans-Kristian Arntzen
9436cd3036 MSL: Deal with array copies from and to threadgroup. 2019-08-27 13:18:01 +02:00
Hans-Kristian Arntzen
7ff2db4570 Do not allow base expressions for non-native row-major matrices. 2019-08-27 11:41:54 +02:00
Hans-Kristian Arntzen
b3305799a8 Deal correctly with sign on bitfield operations.
Need a lot of special purpose implementation functions for these.
2019-08-26 11:36:36 +02:00
Hans-Kristian Arntzen
abb345d0b3 MSL: Deal with Modf/Frexp where output is access chain to scalar.
This is not allowed as we cannot take mutable reference to a
vec.{x,y,z,w}. We only care about scalar since entire vectors are fine.
2019-07-26 11:02:38 +02:00
Hans-Kristian Arntzen
e06efb7259 Missed case where DoWhile continue block deals with Phi. 2019-07-25 12:30:50 +02:00
Hans-Kristian Arntzen
461f1506e7 Do not eagerly invalidate all active variables on a branch.
This is not necessary, as we must emit an invalidating store before we
potentially consume an invalid expression. In fact, we're a bit
conservative here in this case for example:

int tmp = variable;
if (...)
{
    variable = 10;
}
else
{
    // Consuming tmp here is fine, but it was
    // invalidated while emitting other branch.
    // Technically, we need to study if there is an invalidating store
    // in the CFG between the loading block and this block, and the other
    // branch will not be a part of that analysis.
    int tmp2 = tmp * tmp;
}

Fixing this case means complex CFG traversal *everywhere*, and it feels like overkill.

Fixing this exposed a bug with access chains, so fix a bug where expression dependencies were not
inherited properly in access chains. Access chains are now considered forwarded if there
is at least one dependency which is also forwarded.
2019-07-24 11:17:30 +02:00
Hans-Kristian Arntzen
18bcc9b790 Do not disable temporary forwarding when we suppress usage tracking.
This subtle bug removed any expression validation for trivially swizzled
variables. Make usage suppression a more explicit concept rather than
just hacking off forwarded_temporaries.

There is some fallout here with loop generation since our expression
invalidation is currently a bit too naive to handle loops properly.
The forwarding bug masked this problem until now.

If part of the loop condition is also used in the body, we end up
reading an invalid expression, which in turn forces a temporary to be
generated in the condition block, not good. We'll need to be smarter
here ...
2019-07-23 19:18:44 +02:00
Hans-Kristian Arntzen
8ba0507a6d Add another test for unpacking without load forwarding. 2019-07-23 17:14:59 +02:00
Hans-Kristian Arntzen
1ece67a050 Look at pointee type when unpacking expressions.
We might be unpacking in OpLoad, so don't want any pointer types from
access chains creeping in.
2019-07-23 17:07:15 +02:00
Hans-Kristian Arntzen
ebe109d91d Deal correctly with non-forwarded packed loads.
Need to unpack the expression if we're not forwarding.
2019-07-23 16:25:19 +02:00
Hans-Kristian Arntzen
79f533b662 Test CompositeInsert/Extract/VectorShuffle on packed vectors. 2019-07-23 15:44:35 +02:00
Hans-Kristian Arntzen
5582145549 Add test for array of scalar struct. 2019-07-23 15:30:03 +02:00
Hans-Kristian Arntzen
5c1cb7accf Recursively pack struct types when we find scalar packed structs. 2019-07-23 15:24:53 +02:00
Hans-Kristian Arntzen
ef1fa71bba Unpack vector expression in Matrix-Vector multiplies. 2019-07-23 12:22:40 +02:00
Hans-Kristian Arntzen
0f10601f27 Test matrix multiplies in more complex scenarios. 2019-07-23 12:12:24 +02:00
Hans-Kristian Arntzen
978253c804 Test implicit packing of struct members. 2019-07-23 12:04:15 +02:00
Hans-Kristian Arntzen
fc741596d4 Add tests for struct padding and self-alignment. 2019-07-23 11:46:34 +02:00
Hans-Kristian Arntzen
7277c7ac46 Use to_unpacked_row_major_expression to unify row-major in MSL/GLSL. 2019-07-23 11:36:54 +02:00
Hans-Kristian Arntzen
47a18b9f1b Simplify row-major matrix/vector multiplies. 2019-07-23 10:56:57 +02:00
Hans-Kristian Arntzen
d584d833fa Test array of std140 vectors. 2019-07-23 10:38:32 +02:00
Hans-Kristian Arntzen
6224199c76 Add struct size padding tests. 2019-07-23 10:30:37 +02:00
Hans-Kristian Arntzen
82c819ee6c Add test for CompositeExtract from row-major loaded vector. 2019-07-22 16:32:22 +02:00
Hans-Kristian Arntzen
d7a5303cf2 Add test for split access chain into row-major matrix. 2019-07-22 16:28:05 +02:00
Hans-Kristian Arntzen
609d087f8f Only transpose unpacked expressions. 2019-07-22 16:06:09 +02:00
Hans-Kristian Arntzen
6057ffcbb1 Deal correctly with complete stores to row_major matrices. 2019-07-22 15:49:17 +02:00
Hans-Kristian Arntzen
19f5cd3e90 Declare correct matrix type when unpacking. 2019-07-22 13:25:45 +02:00
Hans-Kristian Arntzen
745a2f7b0e Deal with swizzled stores to std140 matrices. 2019-07-22 13:05:23 +02:00
Hans-Kristian Arntzen
180a6b38c5 Fix some row-major column store cases. 2019-07-22 12:56:14 +02:00
Hans-Kristian Arntzen
4ab2829cf6 Fix more stray parens. 2019-07-22 12:13:07 +02:00
Hans-Kristian Arntzen
d6004bfc97 Fixup stray parent in output. 2019-07-22 12:08:56 +02:00
Hans-Kristian Arntzen
14afb968dd Correctly unpack row-major matrices when storing to LHS. 2019-07-22 12:03:12 +02:00
Hans-Kristian Arntzen
172185016f MSL: Add std140 and scalar matrix layouts. 2019-07-22 11:30:03 +02:00
Hans-Kristian Arntzen
6471236652 MSL: Add std430 matrix access test. 2019-07-22 11:23:06 +02:00
Hans-Kristian Arntzen
c7eda1bce9 Test glsl.std450 more exhaustively.
Make sure to test everything with scalar as well to catch any weird edge
cases.

Not all opcodes are covered here, just the arithmetic ones. FP64 packing
is also ignored.
2019-07-17 11:53:05 +02:00
Hans-Kristian Arntzen
932ee0e328 Deal correctly with return sign of bitscan operations. 2019-07-12 10:57:56 +02:00
Chip Davis
058f1a0933 MSL: Handle coherent, volatile, and restrict.
This maps them to their MSL equivalents. I've mapped `Coherent` to
`volatile` since MSL doesn't have anything weaker than `volatile` but
stronger than nothing.

As part of this, I had to remove the implicit `volatile` added for
atomic operation casts. If the buffer is already `coherent` or
`volatile`, then we would add a second `volatile`, which would be
redundant. I think this is OK even when the buffer *doesn't* have
`coherent`: `T *` is implicitly convertible to `volatile T *`, but not
vice-versa. It seems to compile OK at any rate. (Note that the
non-`volatile` overloads of the atomic functions documented in the spec
aren't present in the MSL 2.2 stdlib headers.)

`restrict` is tricky, because in MSL, as in C++, it needs to go *after*
the asterisk or ampersand for the pointer type it's modifying.

Another issue is that, in the `Simple`, `GLSL450`, and `Vulkan` memory
models, `Restrict` is the default (i.e. does not need to be specified);
but MSL likely follows the `OpenCL` model where `Aliased` is the
default. We probably need to implicitly set either `Restrict` or
`Aliased` depending on the module's declared memory model.
2019-07-11 10:22:30 -05:00
Hans-Kristian Arntzen
2b11b331d6
Merge pull request #1036 from KhronosGroup/msl-auto-binding
MSL: Rewrite how resources are automatically assigned bindings.
2019-06-21 15:58:50 +02:00
Hans-Kristian Arntzen
c365cc1b43 Deal with OpPhi and case fallthrough.
This is quite complex since we cannot flush Phi inside the case labels,
we have to do it outside by emitting a lot of manual branches ourselves.

This should be extremely rare, but we need to handle this case.
2019-06-21 13:38:23 +02:00
Hans-Kristian Arntzen
e2c95bdcbc MSL: Rewrite how resource indices are fallback-assigned.
We used to use the Binding decoration for this, but this method is
hopelessly broken. If no explicit MSL resource remapping exists, we
remap automatically in a manner which should always "just work".
2019-06-21 12:54:08 +02:00
Hans-Kristian Arntzen
03d93abc1a Deal with case where a variable is dominated by inner part of a loop.
There is a risk that we try to preserve a loop variable through multiple
iterations, even though the dominating block is inside a loop.

Fix this by analyzing if a block starts off by writing to a variable. In
that case, there cannot be any preservation going on. If we don't, pretend the
loop header is reading the variable, which moves the variable to an
appropriate scope.
2019-06-06 11:11:44 +02:00
Hans-Kristian Arntzen
314efdcc42 MSL: Fix declaration of unused input variables.
In multiple-entry-point modules, we declared builtin inputs which were
not supposed to be used for that entry point.

Fix this, by being more strict when checking which builtins to emit.
2019-05-31 13:23:34 +02:00
Hans-Kristian Arntzen
7b9e0fb428 MSL: Implement OpArrayLength.
This gets rather complicated because MSL does not support OpArrayLength
natively. We need to pass down a buffer which contains buffer sizes, and
we compute the array length on-demand.

Support both discrete descriptors as well as argument buffers.
2019-05-27 16:13:09 +02:00
Hans-Kristian Arntzen
eaf7afed97 MSL: Support argument buffers and image swizzling.
Change aux buffer to swizzle buffer.
There is no good reason to expand the aux buffer, so name it
appropriately.

Make the code cleaner by emitting a straight pointer to uint rather than
a dummy struct which only contains a single unsized array member anyways.

This will also end up being very similar to how we implement swizzle
buffers for argument buffers.

Do not use implied binding if it overflows int32_t.
2019-05-18 10:30:06 +02:00
Hans-Kristian Arntzen
8b236f24f1 Fix infinite loop when OpAtomic* temporaries are used in other blocks.
We made the mistake of registering a dependency on the atomic variable
even if the atomic result was forced to a temporary. There is no need to
register reads from atomic variables like this as we always force atomic
results to a temporary and argument read/writes do not need to be
tracked.
2019-04-24 09:33:39 +02:00
Hans-Kristian Arntzen
e23c9ea700 Force complex loop in certain rare access chain scenarios.
If we generate an access chain in a loop body, and it is consumed in the
loop continue block, we have a problem because we cannot emit a
temporary here holding the access chain reference. Force a complex loop
body to workaround this exceptionally rare case.
2019-04-10 16:02:03 +02:00