This omission was noticed by Benjamin Otte. Add a premultiply
uniform to the external shader, and add a separate premultiply
shader for the non-external case.
When the GL renderer cannot upload a given format, print a FALLBACK
debug message with the failed format and the alternative that was
picked, for example:
Unsupported format b8g8r8a8, converting on CPU to b8g8r8a8-premultiplied
Makes it easier to figure out what's happening, especially when using
old GLES versions that don't support all formats.
Track fallback formats to use in the memoryformat directly instead of
using in the GL uploading code.
First of all, this allows sharing the code and ensuring all our
renderers use the same fallback mechanism.
But also, this allows tracking fallbacks per-format which is useful
because the fallback formats aren't really a tree. We want to make
FLOAT16 fall back to FLOAT32 when not available, but we also want
FLOAT32 fall back to FLOAT16.
By tracking the fallbacks per-format, we can achieve that.
Add gdk_memory_format_get_premultiplied() and
gdk_memory_format_get_straight() which return the matching
premultiplied/straight format.
Use this to pick the premultiplied format when uploading GL textures.
And remove the duplication in the dmabuf code, where we can now use
these functions instead of tracking both the premultiplied and straight
alpha versions.
Add an "RGBA" format that just maps to the swizzled version of the
default format.
This way, BGR gets mapped to RGB + swizzling first before trying to map
it to the default format for the depth.
The benefit here is that this format has the same memory width, so
uploading/downloading code can treat it equivalent to the original
format and there's no conversion neccessary later.
Now that we have gdk_gl_context_get_memory_flags() and code can use that
function, make the code do that.
Remove support checks from gdk_memory_format_gl_format().
This is an initial naive port that doesn't try to make use of the finer-grained
flags yet.
We need to make sure that all our textures have the same memory
format, or we'll run into trouble in the upload code, at least
on GLES, which isn't as forgiving about format mismatches.
Related: #6238
This flag must be set when creating the class or offloading
will be disabled for this renderer.
Set that flag for the GL renderer.
Fixes the Cairo and Vulkan renderer not showing Video.
During rendering, restack offloaded subsurfaces below the main
surface, and clear the area so they peek through. After rendering,
raise the last subsurface if we haven't drawn over it.
Add a blend mode to the draw command, so it can draw transparent
black. This will be used to erase the area on top of a subsurface
when we do passthrough.
Make sure all our dmabuf debug messages are display-scoped so the
inspector doesn't trigger them, use the same formatting throughout,
and improve consistency of wording here and there.
It started out as busywork, but it does many separate things. If I could
start over, I'd take them apart into multiple commits:
1. Remove G_ENABLE_DEBUG around GDK_DEBUG_*() calls
This is not needed at all, the calls themselves take care of it.
2. Remove G_ENABLE_DEBUG around profiling code
This now enables profiling support in release builds.
3. Stop poking _gdk_debug_flags and use GDK_DEBUG_CHECK()
This was old code that was never updated.
4. Make !G_ENABLE_DEBUG turn off GDK_DEBUG_CHECK()
The code used to
#define GDK_DEBUG_CHECK(...) false
#define GDK_DEBUG(...)
which would compile away all the code inside those macros. This
means a lot of variable definitions and debug utility functions
would suddenly no longer be used and cause compiler errors.
Drawing a texture-scale node like a texture node when the filter is set
to "linear" doesn't work, because the texture node switches to
trilinear when mipmaps are available.
There is no reason not check the alpha swizzle for being different
from its default value. I am thinking about implementing RGBx
upload with a swizzle of rgb1, and that would break here.
We just poking at display members here, there is no guarantee that
dmabuf formats have been initialized. So do it explicitly.
This prevents a crash in the inspector when viewing a recorded frame
containing a dmabuf texture, since the inspector uses a separate
display connection.
When we are running under GLES, we can use GL_TEXTURE_EXTERNAL_OES
to support YUV formats.
Since we don't want to deal with the combinatorial explosion of
compiling all our shaders with all combinations of sampler2D vs
samplerExternalOES for all their textures, we copy the external
textures to a regular texture before using them.
This shader uses samplerExternalOES to sample an external texture
and blit it into a 'normal' texture. It only works in GLES, but
we won't use it outside of GLES.
Allow our shaders to use samplerExternalOES, by declaring
that we use the relevant extension. Unfortunately, this
only works for gles, and requires different extensions for
gles2 and gles3. Yay
Add a GSK_GL_DEFINE_PROGRAM_NO_CLIP, which is like
GSK_GL_DEFINE_PROGRAM but compiles the shader just once,
with NO_CLIP defined.
This will be used in the future for shaders that do
texture conversion.
Prepare the plumbing in the GL renderer for textures that use
target GL_TEXTURE_EXTERNAL_OES. These need to use a special sampler,
so make sure our sampler machinery does not run over it.
Restore the bigendian support that was lost in b0e26873f6,
by just not using GL_BGRA with GLES on bigendian. Should be a
very rare combination, but still.
The glyph and icon libaries were also checking for GLES to
decide if data needs to be transformed from BGRA to RGBA.
Use the new has_bgra getter instead.
This will probably break on bigendian, because the
GL_BGRA + GL_UNSIGNED_BYTE combination is not equivalent
to the cairo format on bigendian, but this was already
broken for the gl format information that we get from
gdk_memory_format_gl_format.
Make gdk_memory_format_gl_format take the GdkGLContext,
instead of just a gles boolean. This will let us
check for extensions that may be needed for certain
formats.
Update all callers.
This is useful for colorizing in the same fashion we do for the glyph
texture atlas. In fact, for small GdkTexture, you will end up in something
like the icon texture atlas.
The primary motivator for this optimization is to draw various glyph-like
features from VTE such as many forms of boxes, lines, arrows, etc.
We don't need to be calling type node conformity checking from the tight
loop of the renderjob. Hoist that into the private header and use that
intead through via the Class pointer.
Like the previous change, this uses GdkArrayImpl instead of GArray for
tracking modelview changes. This is less important than clip tracking
simple due to being used less, but it keeps the implementation synchronous
with the Clip tracking code.
We can end up spending a lot of time in g_array_maybe_expand() through the
use of g_array_set_size() for clip tracking. That is somewhat due to the
simple nature of GArray being size-dynamic. Instead, we can use
GdkArrayImpl and let the compiler do what it does best to elide some
work and hoist other work into the calling function.
This also fixes a potential UAF in gsk_gl_render_job_push_contained_clip().
The code now follows gsk_rounded_rect_shrink() and with it the behavior
of the Cairo renderer and Webkit.
The old code did what the GL renderer and Cairo do, but I consider that
wrong.
I did not test Chrome.
Test attached
The source uniform may or may not point
to a glyph atlas. The optimization we do
for color nodes is only possible if it does,
so check this.
Fixes: #6094
We require folks to include gskglrenderer.h in order
to create a GL renderer. So we be careful to only
include header in gskglrenderer.h that won't trigger
ugly warnings.
See !6363
Take a rendernode as source and a GskPath and GskStroke,
and fill the area that is covered when stroking the path
with the given stroke parameters, like cairo_stroke() would.
Add a bunch of inline functions for graphene_rectangle_t.
We use those quite extensively in tight loops so making them as fast as
possible via inlining has massive benefits.
The current render-heavy benchmark I am playing (th paris-30k in node-editor)
went from 49fps to 85fps on my AMD.
In particular, fix the combination of luminance and alpha. We want to do
mask = luminance * alpha
and for inverted
mask = (1.0 - luminance) * alpha
so add a test that makes sure we do that and then fix the code and
existing tests to conform to it.
The IFUNC resolvers that we are using here get
run early, before asan had a chance to set up its
plumbing, and therefore things go badly if they
are compiled with asan. Turning it off makes things
work again.
The gcc bug tracking this problem:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110442
Thanks to Jakub Jelinek and Florian Weimer for
analyzing this and recommending the workaround.
g_hash_table_insert() frees the given key if it already exists
in the hashtable. But since we use the same pointer in the
following line, it will result in use-after-free.
So instead, insert the key only if it doesn't exist.
Replace gdk_memory_format_prefers_high_depth with the more generic
gdk_memory_format_get_depth() that returns the depth of the individual
channels.
Also make the GL renderer use that to pick the generic F16 format
instead of immediately going for F32 when uploading textures.
This is not the optimal way of doing it: we're
reuploading the texture with client-side conversion.
But it fits nicely into our current handling of mipmaps.
We can do better once we use shaders for colorspace
conversions.
Make the callers of this function check for
straight alpha themselves, and only do the
version compatibility check here. This makes
the function usable in contexts where straight
alpha is acceptable.
When the command queue is out of batches, there is
no point in doing further work like allocating uniforms.
This helps us avoid assertions in the uniform code
that we would hit when we run out of uniform space
too.
When we start ignoring batches, we must do it everywhere,
or we may run into assertions. This was triggered by an
enormous text node tree produced by tests/rendernode-create.
This was the intention, but the object data by itself
does not achieve that: We do run dispose on the display
when it is closed, but object data is only cleared in
finalize. So listen to the ::closed signal and remove
the driver ourselves.
Fix up the drivers dispose implementation enough for
that to actually work.
When slicing the texture, the GL renderer was
forgetting to apply the viewport origin. This
shows up when rendering things with negative
scales, leading to negative origins.