To extract a column from row-major matrix, we need to do a strided load one
component at a time. In this case flattened_access_chain_offset still returns
the offset to the first element, but the stride is equal to matrix stride
instead of vector stride.
For this to work, we need to pass matrix stride (and transpose flag) through,
similar to how matrix flattening works.
Additionally slightly clean up recursive flattened_access_chain structure -
specifically, instead of deciding mid-traversal that we need matrix stride
information, we can just pass the matrix stride through - for access chains
that end in matrix/vector this gets us what we need, and for access chains
that end in structs the flattened_access_chain_struct code will recompute
correct stride/transposition data to pass through further.