The relevant question here is about details, because we have to choose
if we declare alpha-only formats as having their (nonexistant) color
channels premultiplied or not, so that the code paths using them can do
the right thing.
Because we are premultiplied by default, it makes sense to treat alpha
like that, because then the alpha-only code doesn't need to do
workarounds for straight alpha.
Where this is relevant of course is when expanding the alpha channel
into color channels, where we want to end up with white.
So make sure we do color = alpha there instead of color = 1 like we did
before.
We need them for mask-only textures.
For tiffs, we convert the formats to RGBA (the idea that tiff can save
everything needs to be buried I guess) as tiffs can't do alpha-only.
Basically, memcpy() asap if possible.
This happens a lot in Vulkan, where we gdk_memory_conert() image
data from memory textures straight into the VulkanBuffer.
And usually we support the format.
Replace gdk_memory_format_prefers_high_depth with the more generic
gdk_memory_format_get_depth() that returns the depth of the individual
channels.
Also make the GL renderer use that to pick the generic F16 format
instead of immediately going for F32 when uploading textures.
Make the callers of this function check for
straight alpha themselves, and only do the
version compatibility check here. This makes
the function usable in contexts where straight
alpha is acceptable.
This one can be used for both premultiplied and non-premultiplied alpha
formats, since alpha is always 255. It is useful for opaque PNG upload
on both cairo and GL renderers.
That way, all permutations are possible. Previously it was only useful
in the cairo renderer, which required rgba8 → premultiplied bgra8, while
the GL renderer required rgba8 → premultiplied rgba8. Now both are
available.
This was only useful when building for AArch32 without -mfpu=neon, on
AArch64 or with -mfpu=neon gcc is smart enough to do the auto-
vectorisation, leading to code almost as good as what I wrote in
1fdf5b7cf8.
This more than halves the total runtime of this function since the
previous commit, from 8.36% to 4.02%, and is most likely memory
bandwidth limited on this specific board now.
I tried to do a SSE2 version as well, but couldn’t find any equivalent
of the LD4/ST4 ARM instruction.
On x86 on a Kaby Lake CPU, this makes it go from 6.63% of the total
execution time (loading some PNGs using the cairo backend) down to
3.20%.
On ARM on a Cortex-A7, on the same workload, this makes it go from 57%
to 8.36%.
Make it use gdk_memory_texture_from_texture().
Also make gdk_memory_format_alpha() privately available so that we can
detect if an image contains an alpha channel.
Also, now make gdk_memory_convert() the only conversion functions
and allow conversions between any 2 formats by going via a float[4].
This could be optimized via fast-paths, but so far it isn't.