Instead of doing multiple copy commands with a tiny buffer
for each glyph, we can just batch them all together. This
also avoids the issue of creating multiple barriers for the
same image.
By tracking the last transition we can build the appropriate barriers.
Also use the most appropriate initial layout/access at creation :
for linear image : predefined (we prepare the content ourself through memcpy)
for everything else : undefined (we don't care about the content, will most likely be erase)
This way we can pass the command pool around.
And that allows us to allocate and submitcustom buffers.
And that is necessary to make staging images work.
Instead of using a staging iamge, we require the final image to be
linearly allocated and have host-visible memory.
This improves performance quite a bit.
The old code is still there and can be enabled with a simple change
to a #define in gskvulkanimage.h