Commit Graph

472 Commits

Author SHA1 Message Date
Hans-Kristian Arntzen
a489ba7fd1 Reduce pressure on global allocation.
- Replace ostringstream with custom implementation.
  ~30% performance uplift on vector-shuffle-oom test.
  Allocations are measurably reduced in Valgrind.

- Replace std::vector with SmallVector.
  Classic malloc optimization, small vectors are backed by inline data.
  ~ 7-8% gain on vector-shuffle-oom on GCC 8 on Linux.

- Use an object pool for IVariant type.
  We generally allocate a lot of SPIR* objects. We can amortize these
  allocations neatly by pooling them.

- ~15% overall uplift on ./test_shaders.py --iterations 10000 shaders/.
2019-04-09 15:09:44 +02:00
Hans-Kristian Arntzen
23db744e35 Deal with case where we need to emit SpvImplArrayCopy late.
We cannot deduce if OpLoad needs ArrayCopy templates early since it's
heavily context dependent, and we might only know on 3rd iteration of
the compile loop.
2019-04-09 12:28:46 +02:00
Bill Hollings
efbe7ca16f MSL: Fix infinite CAS loop on atomic_compare_exchange_weak_explicit(). 2019-04-05 21:28:57 -04:00
Hans-Kristian Arntzen
317144a59c Detect invalid DoWhileLoop early.
We had a bug where error conditions in DoWhileLoop emit path would not
detect that statements were being emitted due to the masking behavior
which happens when force_recompile is true. Fix this.

Also, refactor force_recompile into member functions so we can properly
break on any situation where this is set, without having to rely on
watchpoints in debuggers.
2019-04-05 12:19:32 +02:00
Hans-Kristian Arntzen
9b92e68d71 Add an option to override the namespace used for spirv_cross.
This is a pragmatic trick to avoid symbol collision where a project
links against SPIRV-Cross statically, while linking to other projects
which also use SPIRV-Cross statically. We can end up with very awkward
symbol collisions which can resolve themselves silently because
SPIRV-Cross is pulled in as necessary. To fix this, we must use
different symbols and embed two copies of SPIRV-Cross in this scenario,
now with different namespaces, which in turn leads to different symbols.
2019-03-29 10:29:44 +01:00
Bill Hollings
c48702d8c2 Fix crash when backend.int16_t_literal_suffix set to null.
The design of backend.int16_t_literal_suffix and backend.uint16_t_literal_suffix
allows them to be set to null, but that was not always tested for.
I have removed the expectation that they can be null and set
backend.int16_t_literal_suffix to "" when no suffix is needed.
That has the same effect, and seemed to be a more usable and defensive approach.
2019-03-28 14:23:32 -04:00
Hans-Kristian Arntzen
18d4f67a87
Merge pull request #919 from KhronosGroup/fix-915
MSL: Declare gl_WorkGroupSize constant with [[maybe_unused]].
2019-03-28 14:00:49 +01:00
Hans-Kristian Arntzen
0909975655 MSL: Declare gl_WorkGroupSize constant with [[maybe_unused]].
Avoids ugly warnings on nearly every compute shader.
We could do analysis to detect whether we need to emit this constant,
but it's a bit tedious to figure out if an OpConstantComponent is
actually used by opcodes, so just make it simple.
2019-03-28 10:54:18 +01:00
Hans-Kristian Arntzen
c37f88fea6 MSL: Fix crash where variable storage buffer pointers are passed down.
Only deal with readonly decoration for actual block types.
2019-03-28 10:16:46 +01:00
Hans-Kristian Arntzen
eeb3f24991 Properly deal with sign-dependent GLSL opcodes.
The GLSLstd450 spec is very lax about input signs, so we need to do the
bitcasting dance to implement it correctly.
2019-03-27 12:20:53 +01:00
Hans-Kristian Arntzen
e2aadf8995 Rename "push descriptor set" to "discrete descriptor set".
Check for case where iOS doesn't support writable argument buffer
textures.
2019-03-15 21:53:21 +01:00
Hans-Kristian Arntzen
b3380ec9dd MSL: Support VK_KHR_push_descriptor.
If we have argument buffers, we also need to support using plain
descriptor sets for certain cases where API wants it.
2019-03-15 14:08:47 +01:00
Hans-Kristian Arntzen
c310b40fd3 MSL: Make sure get_buffer_block_flags is only used in right context. 2019-03-15 12:27:54 +01:00
Hans-Kristian Arntzen
bc21ccb7ce MSL: Emit correct SSBO constness for argument buffers. 2019-03-15 12:05:35 +01:00
Hans-Kristian Arntzen
969566aff5 MSL: Fixup buffer array case issue on MSL 1.0. 2019-03-15 11:37:34 +01:00
Hans-Kristian Arntzen
af8a9ccdcb MSL: Need to emit two layers of address space.
When passing down arrays of buffer pointers, the array itself needs an
address space.
2019-03-15 11:29:17 +01:00
Hans-Kristian Arntzen
e47a77d596 MSL: Implement Metal 2.0 indirect argument buffers. 2019-03-15 11:01:27 +01:00
Hans-Kristian Arntzen
e74c21a39b Review fixups. 2019-03-04 10:08:31 +01:00
Hans-Kristian Arntzen
9bbdccddb7 Add a stable C API for SPIRV-Cross.
This adds a new C API for SPIRV-Cross which is intended to be stable,
both API and ABI wise.

The C++ API has been refactored a bit to make the C wrapper easier and
cleaner to write. Especially the vertex attribute / resource interfaces
for MSL has been rewritten to avoid taking mutable pointers into the
interface. This would be very annoying to wrap and it didn't fit well
with the rest of the C++ API to begin with. While doing this, I went
ahead and removed all the old deprecated interfaces.

The CMake build system has also seen an overhaul.
It is now possible to build static/shared/CLI separately with -D
options.
The shared library only exposes the C API, as it is the only ABI-stable
API. pkg-configs as well as CMake modules are exported and installed for
the shared library configuration.
2019-03-01 11:53:51 +01:00
Hans-Kristian Arntzen
825ff4af7e Replace locale handling.
We were using std::locale::global() to force a C locale which is not
safe when SPIRV-Cross is used in a multi-threaded environment.

To fix this, we could tap into various per-platform specific locale
handling to get safe thread-local locales, but since locales only affect
the decimal point in floats, we simply query the locale instead and do
the necessary radix replacement ourselves, without touching the locale.

This should be much safer and cleaner than the alternative.
2019-02-28 11:28:31 +01:00
Hans-Kristian Arntzen
ee395afa83 MSL: Emit proper name for optimized UBO/SSBO arrays. 2019-02-25 11:09:00 +01:00
Hans-Kristian Arntzen
ad6134262e
Merge pull request #877 from cdavis5e/msl-tesc-early-return
MSL: Return early from helper tesc invocations.
2019-02-25 09:13:06 +01:00
Hans-Kristian Arntzen
7874f7fc49
Merge pull request #876 from cdavis5e/msl-tese-fixup-2
MSL: Make sure we fix up the output position.
2019-02-25 09:12:47 +01:00
Chip Davis
a43dcd7b99 MSL: Return early from helper tesc invocations.
Return after loading the input control point array if there are more
input points than output points, and this was one of the helper
invocations spun off to load the input points. I was hesitant to do this
initially, since the MSL spec has this to say about barriers:

> The `threadgroup_barrier` (or `simdgroup_barrier`) function must be
> encountered by all threads in a threadgroup (or SIMD-group) executing
> the kernel.

That is, if any thread executes the barrier, then all threads must
execute it, or the barrier'd invocations will hang. But, the key words
here seem to be "executing the kernel;" inactive invocations, those that
have already returned, need not encounter the barrier to prevent hangs.
Indeed, I've encountered no problems from doing this, at least on my
hardware. This also fixes a few CTS tests that were failing due to
execution ordering; apparently, my assumption that the later, invalid
data written by the helpers would get overwritten was wrong.
2019-02-24 12:17:47 -06:00
Chip Davis
f3267db1d8 MSL: Make sure we fix up the output position.
If a stage takes the position as both an input and an output (i.e. a
tessellation shader or a geometry shader), then we could wind up fixing
up the input position by mistake. Ensure that doesn't happen, by only
setting the `qual_pos_var_name` variable from the output position.
2019-02-22 15:28:28 -06:00
Chip Davis
f3c0942d10 MSL: Use vectors for the tessellation level builtins in tese shaders.
The tessellation levels in Metal are stored as a densely-packed array of
half-precision floating point values. But, stage-in attributes in Metal
have to have offsets and strides aligned to a multiple of four, so we
can't add them individually. Luckily for us, the arrays have lengths
less than 4. So, let's use vectors for them!

Triangles get a single attribute with a `float4`, where the outer levels
are in `.xyz` and the inner levels are in `.w`. The arrays are unpacked
as though we had added the elements individually. Quads get two: a
`float4` with the outer levels and a `float2` with the inner levels.
Further, since vectors can be indexed as arrays, there's no need to
unpack them in this case.

This also saves on precious vertex attributes. Before, we were using up
to 6 of them. Now we need two at most.
2019-02-22 12:18:51 -06:00
Hans-Kristian Arntzen
a4ac27546a MSL: Fix textures which are sampled and compared against.
depth2d in MSL only returns float, not float4, even for normal sampling.
We need to conditionally remap-swizzle back to float4.
2019-02-22 12:27:40 +01:00
Chip Davis
dae4a88b06 MSL: Don't do the fixup at all when capturing output. 2019-02-21 17:05:37 -06:00
Chip Davis
b34fd63c2d MSL: Do position fixup for tessellation evaluation shaders, too. 2019-02-21 16:57:56 -06:00
Chip Davis
7042cb9bec Quiesce truncation warnings. 2019-02-21 15:11:45 -06:00
Chip Davis
c756a91c3c MSL: Fix a case I missed initializing vtx_attrs_by_builtin. 2019-02-21 13:14:03 -06:00
Chip Davis
9d8a5be725 MSL: Ignore duplicate builtin vertex attributes.
These are often arrayed builtins, which MSL maps to more than one
attribute. SPIRV-Cross automatically assigns succeeding addresses to
arrayed attributes, so we really only need the first one. This of course
assumes that the inputs are sorted by location.
2019-02-21 13:14:03 -06:00
Chip Davis
5069ec72bb MSL: Set location of builtins based on client input.
Builtin attributes in SPIR-V aren't linked by location, but by their
built-in-ness. This poses a problem for MSL, since builtin inputs in
the vertex pipeline are just regular attributes. We must then assign
them locations so that they can be matched up to the attributes in the
stage input descriptor--and also to avoid duplicate attribute numbers in
tessellation evaluation shaders, where there are two different
stage-in structs, so the member index therein is no longer unique!
2019-02-20 22:16:51 -06:00
Chip Davis
7a7e210515 MSL: Force unnamed array builtin attributes to have a name.
That way, when we refer to them, they'll have the name that we're
expecting.
2019-02-20 22:16:51 -06:00
Hans-Kristian Arntzen
ed7292fec4
Merge pull request #867 from cdavis5e/tese-shader-origin-2
MSL: Don't bother fixing up triangle tess coords.
2019-02-20 22:36:21 +01:00
Chip Davis
285ca4c2b1 MSL: Don't bother fixing up triangle tess coords.
Instead, I'm going to have MoltenVK reverse the winding order in the
lower-left case. This seems to be what the test suite expects to happen
anyhow.
2019-02-20 14:30:44 -06:00
Hans-Kristian Arntzen
c1a93b8a71 Run format_all.sh.
Missed some nits in earlier reviews.
2019-02-20 17:29:57 +01:00
Chip Davis
ba8593b112 Fix formatting. 2019-02-20 09:19:25 -06:00
Chip Davis
8095434dc4 MSL: Drop stores to nonexistent tess levels.
In SPIR-V, there are always two inner levels and four outer levels, even
if the input patch isn't a quad patch. But in MSL, due to requirements
imposed by Metal, only one inner level and three outer levels exist when
the input patch is a triangle patch. We must explicitly ignore any write
to the nonexistent second inner and fourth outer levels in this case.
2019-02-20 09:11:24 -06:00
Chip Davis
c8ee9fbe76 MSL: Expand quad gl_TessCoord to a float3.
This is the actual SPIR-V type of the builtin. We forced to a `float2`
in the declaration because that's what Metal wants.
2019-02-20 09:11:24 -06:00
Hans-Kristian Arntzen
58f264c99d
Merge pull request #865 from KhronosGroup/fix-863
Always value-cast FP16 constants instead of using literals.
2019-02-20 14:58:44 +01:00
Hans-Kristian Arntzen
4ef51331b2 Always value-cast FP16 constants instead of using literals.
GL_NV_gpu_shader5 doesn't support "hf", so to avoid lots of complicated
workarounds, just value-cast the half literals.
2019-02-20 12:30:01 +01:00
Hans-Kristian Arntzen
056a0ba27e Fix case where a struct is loaded which contains a row-major matrix. 2019-02-20 12:19:00 +01:00
Chip Davis
41d9424233 MSL: Add an option to set the tessellation domain origin.
This is intended to be used to support `VK_KHR_maintenance2`'s
tessellation domain origin feature. If `tess_domain_origin_lower_left`
is `true`, the `v` coordinate will be inverted with respect to the
domain. Additionally, in `Triangles` mode, the `v` and `w` coordinates
will be swapped. This is because the winding order is interpreted
differently in lower-left mode.
2019-02-18 14:25:42 -06:00
Chip Davis
08863c1e28 Don't set any aliases or do any flattening for arrayed per-vertex I/O.
We already handle all that specially.
2019-02-15 17:24:16 -06:00
Chip Davis
6b7988046d Handle blocks of patch I/O.
In this case, each member of the block will be decorated with
`DecorationPatch`, rather than the block variable having the decoration.
2019-02-15 17:21:38 -06:00
Chip Davis
e75add42c9 MSL: Add support for tessellation evaluation shaders.
These are mapped to Metal's post-tessellation vertex functions. The
semantic difference is much less here, so this change should be simpler
than the previous one. There are still some hairy parts, though.

In MSL, the array of control point data is represented by a special
type, `patch_control_point<T>`, where `T` is a valid stage-input type.
This object must be embedded inside the patch-level stage input. For
this reason, I've added a new type to the type system to represent this.

On Mac, the number of input control points to the function must be
specified in the `patch()` attribute. This is optional on iOS.
SPIRV-Cross takes this from the `OutputVertices` execution mode; the
intent is that if it's not set in the shader itself, MoltenVK will set
it from the tessellation control shader. If you're translating these
offline, you'll have to update the control point count manually, since
this number must match the number that is passed to the
`drawPatches:...` family of methods.

Fixes #120.
2019-02-14 10:00:08 -06:00
Hans-Kristian Arntzen
cbd76e7c3b Run format_all.sh. 2019-02-14 09:28:46 +01:00
Hans-Kristian Arntzen
878c502f96 MSL: Hoist out complicated tesc workaround code. 2019-02-14 09:28:17 +01:00
Chip Davis
13df78bebf Unflatten inputs when copying to outputs.
This should fix a whole host of issues related to structs in the `Input`
class in a tessellation control shader.

Also, use pointer arithmetic instead of dereferencing the `ops` array.
This is critical in case we wind up stepping beyond the bounds of the
array.
2019-02-13 12:37:24 -06:00