SPIRV-Cross

Author	SHA1	Message	Date
Hans-Kristian Arntzen	f9818f0804	Update license headers to 2020.	2020-01-16 15:24:37 +01:00
Bill Hollings	ef8260dea6	Expose as public Compiler::update_active_builtins() and has_active_builtin(). MoltenVK tessellation needs to be able to identify when a shader has declared an output built-in, but does not populate it, in order to keep the expectations about how intermediary buffers are populated aligned between tessellation stages.	2019-11-25 16:53:54 -05:00
Hans-Kristian Arntzen	8066d13599	MSL: Rewrite propagated depth comparison state handling. Far cleaner, and more correct to run the traversal twice. Fixes a case where we propagate depth state through multiple functions.	2019-10-26 16:10:11 +02:00
Lukas Hermanns	f3a6d28a1d	Further updates for pull request #1162 ; also added two test cases for spvCubemapTo2DArrayFace function and added '--msl-framebuffer-fetch'/ '--msl-emulate-cube-array' compiler options.	2019-09-27 15:49:54 -04:00
Lukas Hermanns	7ad0a84778	Updates for pull request #1162	2019-09-24 14:35:25 -04:00
Lukas Hermanns	37df74035b	Merge branch 'ue4_dev'	2019-09-20 09:42:42 -04:00
Lukas Hermanns	50ac6862ac	Rearranged all 'UE Change' comments to match to project's coding style.	2019-09-18 14:03:54 -04:00
Lukas Hermanns	a9f3c981d9	Adjustments after rebase of ue4_dev branch.	2019-09-13 14:03:02 -04:00
Mark Satterthwaite	869d628521	The result of an AccessChain intrinsic in SPIRV can be referenced by multiple blocks but when they are loops that can result in compilation problems because the source variables might not be declared early enough. This forces us to hoist those variables high enough to make it work.	2019-09-11 14:01:40 -04:00
Mark Satterthwaite	32557e9093	SPIRV doesn't distinguish depth textures from regular textures, but Metal does, so if we've ever seen a depth comparison operation we must ensure that the texture is specified as a depth-texture.	2019-09-06 16:58:27 -04:00
Hans-Kristian Arntzen	333980ae91	Refactor into stronger types in public API. Some fallout where internal functions are using stronger types. Overkill to move everything over to strong types right now, but perhaps move over to it slowly over time.	2019-09-06 12:29:47 +02:00
Hans-Kristian Arntzen	36c433bd92	Deal with call stacks when analyzing access.	2019-09-04 11:42:29 +02:00
Hans-Kristian Arntzen	3f2ce375e1	Analyze complex cases for fragment interlocks. If we are using interlocks in split functions or in control flow, we have some serious workarounds we need to employ.	2019-09-04 11:20:25 +02:00
Chip Davis	2eff420d9a	Support the SPV_EXT_fragment_shader_interlock extension. This was straightforward to implement in GLSL. The `ShadingRateInterlockOrderedEXT` and `ShadingRateInterlockUnorderedEXT` modes aren't implemented yet, because we don't support `SPV_NV_shading_rate` or `SPV_EXT_fragment_invocation_density` yet. HLSL and MSL were more interesting. They don't support this directly, but they do support marking resources as "rasterizer ordered," which does roughly the same thing. So this implementation scans all accesses inside the critical section and marks all storage resources found therein as rasterizer ordered. They also don't support the fine-grained controls on pixel- vs. sample-level interlock and disabling ordering guarantees that GLSL and SPIR-V do, but that's OK. "Unordered" here merely means the order is undefined; that it just so happens to be the same as rasterizer order is immaterial. As for pixel- vs. sample-level interlock, Vulkan explicitly states: > With sample shading enabled, [the `PixelInterlockOrderedEXT` and > `PixelInterlockUnorderedEXT`] execution modes are treated like > `SampleInterlockOrderedEXT` or `SampleInterlockUnorderedEXT` > respectively. and: > If [the `SampleInterlockOrderedEXT` or `SampleInterlockUnorderedEXT`] > execution modes are used in single-sample mode they are treated like > `PixelInterlockOrderedEXT` or `PixelInterlockUnorderedEXT` > respectively. So this will DTRT for MoltenVK and gfx-rs, at least. MSL additionally supports multiple raster order groups; resources that are not accessed together can be placed in different ROGs to allow them to be synchronized separately. A more sophisticated analysis might be able to place resources optimally, but that's outside the scope of this change. For now, we assign all resources to group 0, which should do for our purposes. `glslang` doesn't support the `RasterizerOrdered` UAVs this implementation produces for HLSL, so the test case needs `fxc.exe`. It also insists on GLSL 4.50 for `GL_ARB_fragment_shader_interlock`, even though the spec says it needs either 4.20 or `GL_ARB_shader_image_load_store`; and it doesn't support the `GL_NV_fragment_shader_interlock` extension at all. So I haven't been able to test those code paths. Fixes #1002.	2019-09-02 12:31:10 -05:00
Hans-Kristian Arntzen	3ccfbce264	Run format_all.sh.	2019-08-28 14:25:26 +02:00
Hans-Kristian Arntzen	d5a65b4190	GLSL: Assume image and sampler can be RelaxedPrecision. When merging combined image samplers, we only looked at sampler, but DXC emits RelaxedPrecision only for texture. Does not hurt to check for more things.	2019-08-27 17:15:19 +02:00
Hans-Kristian Arntzen	9436cd3036	MSL: Deal with array copies from and to threadgroup.	2019-08-27 13:18:01 +02:00
Hans-Kristian Arntzen	5d97dae1eb	Move branchless analysis to CFG. Traverse backwards instead, far more robust. Should elide basically all redundant continue; statements now.	2019-08-27 10:19:19 +02:00
Hans-Kristian Arntzen	b97e9b0499	Fix severe performance issue with invariant expression invalidation. We were going down a tree of expressions multiple times and this caused an exponential explosion in time, which was not caught until recently. Fix this by blocking any traversal going through an ID more than one time. This fix overall improves performance by almost an order of magnitude on a particular test shader rather than slowing it down by ~75x.	2019-08-01 09:55:21 +02:00
Hans-Kristian Arntzen	e06efb7259	Missed case where DoWhile continue block deals with Phi.	2019-07-25 12:30:50 +02:00
Hans-Kristian Arntzen	18bcc9b790	Do not disable temporary forwarding when we suppress usage tracking. This subtle bug removed any expression validation for trivially swizzled variables. Make usage suppression a more explicit concept rather than just hacking off forwarded_temporaries. There is some fallout here with loop generation since our expression invalidation is currently a bit too naive to handle loops properly. The forwarding bug masked this problem until now. If part of the loop condition is also used in the body, we end up reading an invalid expression, which in turn forces a temporary to be generated in the condition block, not good. We'll need to be smarter here ...	2019-07-23 19:18:44 +02:00
Hans-Kristian Arntzen	a86308bce1	MSL: Begin rewrite of buffer packing logic.	2019-07-19 10:06:19 +02:00
Hans-Kristian Arntzen	45805857e5	MSL: De-virtualize get_declared_struct_member_size. It does not make sense to use a virtual call in the Compiler base class here. Make it clearer by renaming the MSL-specific version to _msl.	2019-06-26 19:11:38 +02:00
Hans-Kristian Arntzen	2b11b331d6	Merge pull request #1036 from KhronosGroup/msl-auto-binding MSL: Rewrite how resources are automatically assigned bindings.	2019-06-21 15:58:50 +02:00
Hans-Kristian Arntzen	c365cc1b43	Deal with OpPhi and case fallthrough. This is quite complex since we cannot flush Phi inside the case labels, we have to do it outside by emitting a lot of manual branches ourselves. This should be extremely rare, but we need to handle this case.	2019-06-21 13:38:23 +02:00
Hans-Kristian Arntzen	e2c95bdcbc	MSL: Rewrite how resource indices are fallback-assigned. We used to use the Binding decoration for this, but this method is hopelessly broken. If no explicit MSL resource remapping exists, we remap automatically in a manner which should always "just work".	2019-06-21 12:54:08 +02:00
Hans-Kristian Arntzen	457eba355e	Employ heuristics to figure out how to emit SSBO/UAV reflection names. This is rather shaky, but we don't have many choices here except add a lot of awkward and unintuitive options. Try to deduce this from OpSource and fallback to heuristic.	2019-06-10 11:24:24 +02:00
Hans-Kristian Arntzen	03d93abc1a	Deal with case where a variable is dominated by inner part of a loop. There is a risk that we try to preserve a loop variable through multiple iterations, even though the dominating block is inside a loop. Fix this by analyzing if a block starts off by writing to a variable. In that case, there cannot be any preservation going on. If we don't, pretend the loop header is reading the variable, which moves the variable to an appropriate scope.	2019-06-06 11:11:44 +02:00
Hans-Kristian Arntzen	96492648d4	MSL: Fix struct declaration order with complex type aliases. MSL generally emits the aliases, which means we cannot always place the master type first, unlike GLSL and HLSL. The logic fix is just to reorder after we have tagged types with packing information, rather than doing it in the parser fixup.	2019-05-23 14:54:04 +02:00
Hans-Kristian Arntzen	6f091e7c8f	GLSL: Support GL_EXT_scalar_block_layout.	2019-04-26 15:43:37 +02:00
Hans-Kristian Arntzen	2cc374a0c8	GLSL: Implement GL_EXT_buffer_reference. Buffer objects can contain arbitrary pointers to blocks. We can also implement ConvertPtrToU and ConvertUToPtr. The latter can cast a uint64_t to any type as it pleases, so we will need to generate fake buffer reference blocks to be able to cast the type.	2019-04-26 11:43:51 +02:00
Hans-Kristian Arntzen	e23c9ea700	Force complex loop in certain rare access chain scenarios. If we generate an access chain in a loop body, and it is consumed in the loop continue block, we have a problem because we cannot emit a temporary here holding the access chain reference. Force a complex loop body to workaround this exceptionally rare case.	2019-04-10 16:02:03 +02:00
Hans-Kristian Arntzen	3fe57d3798	Do not use SmallVector as input type in public interfaces. This is an API break, which we need to be careful with. Handing out SmallVectors is easier since the interface is basically the same.	2019-04-09 15:09:44 +02:00
Hans-Kristian Arntzen	a489ba7fd1	Reduce pressure on global allocation. - Replace ostringstream with custom implementation. ~30% performance uplift on vector-shuffle-oom test. Allocations are measurably reduced in Valgrind. - Replace std::vector with SmallVector. Classic malloc optimization, small vectors are backed by inline data. ~ 7-8% gain on vector-shuffle-oom on GCC 8 on Linux. - Use an object pool for IVariant type. We generally allocate a lot of SPIR* objects. We can amortize these allocations neatly by pooling them. - ~15% overall uplift on ./test_shaders.py --iterations 10000 shaders/.	2019-04-09 15:09:44 +02:00
Hans-Kristian Arntzen	317144a59c	Detect invalid DoWhileLoop early. We had a bug where error conditions in DoWhileLoop emit path would not detect that statements were being emitted due to the masking behavior which happens when force_recompile is true. Fix this. Also, refactor force_recompile into member functions so we can properly break on any situation where this is set, without having to rely on watchpoints in debuggers.	2019-04-05 12:19:32 +02:00
Hans-Kristian Arntzen	9b92e68d71	Add an option to override the namespace used for spirv_cross. This is a pragmatic trick to avoid symbol collision where a project links against SPIRV-Cross statically, while linking to other projects which also use SPIRV-Cross statically. We can end up with very awkward symbol collisions which can resolve themselves silently because SPIRV-Cross is pulled in as necessary. To fix this, we must use different symbols and embed two copies of SPIRV-Cross in this scenario, now with different namespaces, which in turn leads to different symbols.	2019-03-29 10:29:44 +01:00
Patrick Mours	b2a667520d	Add reflection support for ray tracing acceleration structures	2019-03-26 15:09:42 +01:00
Hans-Kristian Arntzen	e47a77d596	MSL: Implement Metal 2.0 indirect argument buffers.	2019-03-15 11:01:27 +01:00
Hans-Kristian Arntzen	9bbdccddb7	Add a stable C API for SPIRV-Cross. This adds a new C API for SPIRV-Cross which is intended to be stable, both API and ABI wise. The C++ API has been refactored a bit to make the C wrapper easier and cleaner to write. Especially the vertex attribute / resource interfaces for MSL has been rewritten to avoid taking mutable pointers into the interface. This would be very annoying to wrap and it didn't fit well with the rest of the C++ API to begin with. While doing this, I went ahead and removed all the old deprecated interfaces. The CMake build system has also seen an overhaul. It is now possible to build static/shared/CLI separately with -D options. The shared library only exposes the C API, as it is the only ABI-stable API. pkg-configs as well as CMake modules are exported and installed for the shared library configuration.	2019-03-01 11:53:51 +01:00
Hans-Kristian Arntzen	d2cc43e667	Fix edge case where opaque types can be declared on stack. In the bizarre case where the ID of a loaded opaque type aliased with a literal which was used as part of another texturing instruction, we could end up with a case where domination analysis assumed the loaded opaque type needed to be moved to a different scope. Fix the issue by never doing dominance analysis for opaque temporaries, and be more robust when analyzing texturing instructions. Also make sure reflection output is deterministic. This patch slightly alterered output for some unknown reason, but it came from an unordered_map, so it's fine.	2019-02-19 17:28:31 +01:00
Chip Davis	03b4d3c19f	Make is_tessellation_shader() static method protected. This is an internal helper used by the instance method.	2019-02-15 12:00:19 -06:00
Chip Davis	e75add42c9	MSL: Add support for tessellation evaluation shaders. These are mapped to Metal's post-tessellation vertex functions. The semantic difference is much less here, so this change should be simpler than the previous one. There are still some hairy parts, though. In MSL, the array of control point data is represented by a special type, `patch_control_point<T>`, where `T` is a valid stage-input type. This object must be embedded inside the patch-level stage input. For this reason, I've added a new type to the type system to represent this. On Mac, the number of input control points to the function must be specified in the `patch()` attribute. This is optional on iOS. SPIRV-Cross takes this from the `OutputVertices` execution mode; the intent is that if it's not set in the shader itself, MoltenVK will set it from the tessellation control shader. If you're translating these offline, you'll have to update the control point count manually, since this number must match the number that is passed to the `drawPatches:...` family of methods. Fixes #120.	2019-02-14 10:00:08 -06:00
Hans-Kristian Arntzen	9453b4638c	Move some interfaces out of public. They are internal, so should be protected:	2019-02-14 10:18:06 +01:00
Chip Davis	eb89c3a428	MSL: Add support for tessellation control shaders. These are transpiled to kernel functions that write the output of the shader to three buffers: one for per-vertex varyings, one for per-patch varyings, and one for the tessellation levels. This structure is mandated by the way Metal works, where the tessellation factors are supplied to the draw method in their own buffer, while the per-patch and per-vertex varyings are supplied as though they were vertex attributes; since they have different step rates, they must be in separate buffers. The kernel is expected to be run in a workgroup whose size is the greater of the number of input or output control points. It uses Metal's support for vertex-style stage input to a compute shader to get the input values; therefore, at least one instance must run per input point. Meanwhile, Vulkan mandates that it run at least once per output point. Overrunning the output array is a concern, but any values written should either be discarded or overwritten by subsequent patches. I'm probably going to put some slop space in the buffer when I integrate this into MoltenVK to be on the safe side.	2019-02-07 08:51:22 -06:00
Hans-Kristian Arntzen	3e584f2c3f	Support LUTs in single-function CFGs on Private storage class. Fairly common pattern in unoptimized SPIR-V. Support this case as well.	2019-02-06 10:38:59 +01:00
Hans-Kristian Arntzen	40e7723051	Run format_all.sh.	2019-01-17 11:29:50 +01:00
Hans-Kristian Arntzen	de7e5ccd8b	Refactor out packed expressions to extended decorations. Can't safely just cast to the original enum without lots of hacks.	2019-01-17 11:28:51 +01:00
Hans-Kristian Arntzen	72377366d3	Replace custom use of DecorationCPacked with an explicit one. Will need to use more variants of this decoration, so might as well make it clearer what is going on with CPacked.	2019-01-17 10:36:56 +01:00
Hans-Kristian Arntzen	b629878f45	Make meta a hashmap. A flat array was consuming way too much memory and was far too slow to initialize properly with a very large ID bound (8 million IDs, showed up as #1 hotspot in perf). Meta struct does not have to be in-order as we never iterate over it in a meaningful way, so using a hashmap here is reasonable. Very few IDs should need decorations or meta-data, so this should also be a quite decent memory save. For the pathological case, a 6x uplift was observed.	2019-01-10 14:04:01 +01:00
Hans-Kristian Arntzen	d92de00cc1	Rewrite how IDs are iterated over. This is a fairly fundamental change on how IDs are handled. It serves many purposes: - Improve performance. We only need to iterate over IDs which are relevant at any one time. - Makes sure we iterate through IDs in SPIR-V module declaration order rather than ID space. IDs don't have to be monotonically increasing, which was an assumption SPIRV-Cross used to have. It has apparently never been a problem until now. - Support LUTs of structs. We do this by interleaving declaration of constants and struct types in SPIR-V module order. To support this, the ParsedIR interface needed to change slightly. Before setting any ID with variant_set<T> we let ParsedIR know that an ID with a specific type has been added. The surface for change should be minimal. ParsedIR will maintain a per-type list of IDs which the cross-compiler will need to consider for later. Instead of looping over ir.ids[] (which can be extremely large), we loop over types now, using: ir.for_each_typed_id<SPIRVariable>([&](uint32_t id, SPIRVariable &var) { handle_variable(var); }); Now we make sure that we're never looking at irrelevant types.	2019-01-10 12:52:56 +01:00

1 2 3 4

187 Commits