To extract a column from row-major matrix, we need to do a strided load one
component at a time. In this case flattened_access_chain_offset still returns
the offset to the first element, but the stride is equal to matrix stride
instead of vector stride.
For this to work, we need to pass matrix stride (and transpose flag) through,
similar to how matrix flattening works.
Additionally slightly clean up recursive flattened_access_chain structure -
specifically, instead of deciding mid-traversal that we need matrix stride
information, we can just pass the matrix stride through - for access chains
that end in matrix/vector this gets us what we need, and for access chains
that end in structs the flattened_access_chain_struct code will recompute
correct stride/transposition data to pass through further.
We currently only support access chains that end in a matrix by propagating
"needs transpose" flag upstream which flips the matrix multiplication order.
It's possible to support indexed extraction as well, however it would have to
generate code like this:
vec4 row = vec4(UBO[0].y, UBO[1].y, UBO[2].y, UBO[3].y);
for a column equivalent of:
vec4 row = UBO[1];
It is definitely possible to do so but it requires signaling the vector output
that it needs to switch to per-component extraction which is a bit more trouble
than this is worth for now.