In GLES, BGRA is still done by GL_EXT_texture_format_BGRA8888 which is
an extension that is older than GLES 2.0.
And back then, internal formats had to be specified unsized. And when
that was changed with GLES3, nobody updated the extension.
However, on OpenGL, this extension doesn't exist, and internal formats
need to be sized.
So let's use different internal formats depending on GL version.
Fixes#6333
Track fallback formats to use in the memoryformat directly instead of
using in the GL uploading code.
First of all, this allows sharing the code and ensuring all our
renderers use the same fallback mechanism.
But also, this allows tracking fallbacks per-format which is useful
because the fallback formats aren't really a tree. We want to make
FLOAT16 fall back to FLOAT32 when not available, but we also want
FLOAT32 fall back to FLOAT16.
By tracking the fallbacks per-format, we can achieve that.
Add gdk_memory_format_get_premultiplied() and
gdk_memory_format_get_straight() which return the matching
premultiplied/straight format.
Use this to pick the premultiplied format when uploading GL textures.
And remove the duplication in the dmabuf code, where we can now use
these functions instead of tracking both the premultiplied and straight
alpha versions.
Add an "RGBA" format that just maps to the swizzled version of the
default format.
This way, BGR gets mapped to RGB + swizzling first before trying to map
it to the default format for the depth.
The benefit here is that this format has the same memory width, so
uploading/downloading code can treat it equivalent to the original
format and there's no conversion neccessary later.
Now that we have gdk_gl_context_get_memory_flags() and code can use that
function, make the code do that.
Remove support checks from gdk_memory_format_gl_format().
This is an initial naive port that doesn't try to make use of the finer-grained
flags yet.
We did have 4 ordering variations of ARGB straight,
but only 3 premultiplied. Add the missing one.
Update all the places where we switch over memory formats.
Make gdk_memory_format_gl_format take the GdkGLContext,
instead of just a gles boolean. This will let us
check for extensions that may be needed for certain
formats.
Update all callers.
The relevant question here is about details, because we have to choose
if we declare alpha-only formats as having their (nonexistant) color
channels premultiplied or not, so that the code paths using them can do
the right thing.
Because we are premultiplied by default, it makes sense to treat alpha
like that, because then the alpha-only code doesn't need to do
workarounds for straight alpha.
Where this is relevant of course is when expanding the alpha channel
into color channels, where we want to end up with white.
So make sure we do color = alpha there instead of color = 1 like we did
before.
We need them for mask-only textures.
For tiffs, we convert the formats to RGBA (the idea that tiff can save
everything needs to be buried I guess) as tiffs can't do alpha-only.
Basically, memcpy() asap if possible.
This happens a lot in Vulkan, where we gdk_memory_conert() image
data from memory textures straight into the VulkanBuffer.
And usually we support the format.
Replace gdk_memory_format_prefers_high_depth with the more generic
gdk_memory_format_get_depth() that returns the depth of the individual
channels.
Also make the GL renderer use that to pick the generic F16 format
instead of immediately going for F32 when uploading textures.
Make the callers of this function check for
straight alpha themselves, and only do the
version compatibility check here. This makes
the function usable in contexts where straight
alpha is acceptable.
This one can be used for both premultiplied and non-premultiplied alpha
formats, since alpha is always 255. It is useful for opaque PNG upload
on both cairo and GL renderers.
That way, all permutations are possible. Previously it was only useful
in the cairo renderer, which required rgba8 → premultiplied bgra8, while
the GL renderer required rgba8 → premultiplied rgba8. Now both are
available.
This was only useful when building for AArch32 without -mfpu=neon, on
AArch64 or with -mfpu=neon gcc is smart enough to do the auto-
vectorisation, leading to code almost as good as what I wrote in
1fdf5b7cf8.
This more than halves the total runtime of this function since the
previous commit, from 8.36% to 4.02%, and is most likely memory
bandwidth limited on this specific board now.
I tried to do a SSE2 version as well, but couldn’t find any equivalent
of the LD4/ST4 ARM instruction.
On x86 on a Kaby Lake CPU, this makes it go from 6.63% of the total
execution time (loading some PNGs using the cairo backend) down to
3.20%.
On ARM on a Cortex-A7, on the same workload, this makes it go from 57%
to 8.36%.
Make it use gdk_memory_texture_from_texture().
Also make gdk_memory_format_alpha() privately available so that we can
detect if an image contains an alpha channel.
Also, now make gdk_memory_convert() the only conversion functions
and allow conversions between any 2 formats by going via a float[4].
This could be optimized via fast-paths, but so far it isn't.