To extract a column from row-major matrix, we need to do a strided load one
component at a time. In this case flattened_access_chain_offset still returns
the offset to the first element, but the stride is equal to matrix stride
instead of vector stride.
For this to work, we need to pass matrix stride (and transpose flag) through,
similar to how matrix flattening works.
Additionally slightly clean up recursive flattened_access_chain structure -
specifically, instead of deciding mid-traversal that we need matrix stride
information, we can just pass the matrix stride through - for access chains
that end in matrix/vector this gets us what we need, and for access chains
that end in structs the flattened_access_chain_struct code will recompute
correct stride/transposition data to pass through further.
Do not emit function prototypes. If secondary functions are used,
suppress compiler -Wmissing-prototypes warnings.
Refactor Compiler::emit_function_prototype() functions to simplify.
Rename Compiler_msl::CustomFunctionHandler to OpCodePreprocessor, and
Compiler_msl::register_custom_functions() to preprocess_op_codes()
to perform more generic preprocessing.
Add space between dynamic header lines and fixed header lines.
We currently only support access chains that end in a matrix by propagating
"needs transpose" flag upstream which flips the matrix multiplication order.
It's possible to support indexed extraction as well, however it would have to
generate code like this:
vec4 row = vec4(UBO[0].y, UBO[1].y, UBO[2].y, UBO[3].y);
for a column equivalent of:
vec4 row = UBO[1];
It is definitely possible to do so but it requires signaling the vector output
that it needs to switch to per-component extraction which is a bit more trouble
than this is worth for now.
Instead of filling a std::string buffer passed by reference return a new
string. This may be slightly slower in certain cases but they are pretty
rare and this matches the code style better.
Also streamline error handling in different branches and extract function
to generate vector swizzle.
Legacy GLSL targets do not support uniform buffers, and as such require
some sort of emulation. There are two alternatives - one is to represent
a uniform buffer as a uniform struct, and another one is to flatten it
into an array of primitive vector types (vec4).
Uniform struct have two disadvantages that make using them prohibitive
in some applications:
- The location assignment for struct members is arbitrary which means
the application has to set each struct member one by one
- Some Android drivers fail to link shader programs if both vertex and
fragment shader use the same uniform struct
Because of this, we need to support flattening uniform buffers into an
array. This is not just important for legacy GLSL but also is sometimes
useful for ESSL 3.0 where some Android drivers do not have stable UBO
support.
The way flattening works is the entire buffer is represented as a vec4
array; each access chain is rewritten into a combination of array
accesses, swizzles and data type constructors. Specifically:
- Extracting a vector or a scalar requires indexing into the array with
an optional swizzle, for example CB0[13].yz for reading vec2
- Extracting a matrix or a struct requires extracting each individual
vector or struct member and then combining them into the resulting
object
- Extracting arrays is not supported, mostly because the resulting
construct is very inefficient and ESSL 1.0 does not support array
constructors.
Additionally, while we try to constant-fold each individual indexing
operation, there are cases where we have to use dynamic index
computation (specifically for indexing arrays with non-constants); so
the general form of the primitive array extraction expression is:
buffer[stride0*index0+...+strideN*indexN+offset]
Where stride/offset are integer literals and index represents variables.
When a member of a struct is a struct, get_declared_struct_member_size
instead returned the size of the entire outer struct because it added
the offset of the last field to the size of the last field.
Restructure the function so that it handles all arrays in the same way
(by using array stride) and for the rest reuses get_declared_struct_size
if possible - this simplifies the function and fixes the issue.
- By default, emit uniform structs for UBOs, like push constant.
- Forward transpose information,
and optimize transpose(matrix) * vector to vector * matrix.