This attempts to improve the accuracy for the "presentation_time" of an
individual GdkFrameTimings. That information is currently filled in as soon
as we get a frame callback. However, if presentation-time wayland protocol
is available, that will be used to supliment a more accurate time which
may improve future presentation-time predictions within GdkFrameClockIdle.
The protocol states that all related and sub surfaces will receive the
same information so it is safe that this could be registered for more
than just the toplevel. The information becomes idempotent.
When no action is selected, use the default cursor, and only
switch to one of the action-indicating cursors when we are over
a drop target.
Fixes: #6337Fixes: #6511
In a very particular situation, it could happen that our renderpass
reordering did not work out.
Consider this nesting of renderpasses (indentation indicates subpasses):
pass A
subpass of A
pass B
subpass of B
Out reordering code would reorder this as:
subpass of B
subpass of A
pass A
pass B
Which doesn't sound too bad, the subpasses happen before the passes
after all.
However, a subpass might be a pass that converts the image for a texture
stored in the texture cache and then updates the cached image.
If "subpass of A" is such a pass *and* if "subpass of B" then renders
with exactly this texture, then "subpass of B" will use the result of
"subpass of A" as a source.
The fix is to ensure that subpasses stay ordered, too.
The new order moves subpasses right before their parent pass, so the
order of the example now looks like:
subpass of A
pass A
subpass of B
pass B
The place where this would happen most common was when drawing thumbnail
images in Nautilus, the GTK filechooser or Fractal.
Those images are usually PNG files, which are straight alpha. They are then
drawn with a drop shadow, which requires an offscreen for drawing as
well as those images as premultipled sources, so lots of subpasses happen.
If there is then a redraw with a somewhat tricky subregion, then the
slicing of the region code could end up generating 2 passes that each draw
half of the thumbnail image - the first pass drawing the top half and the
second pass drawing the bottom half.
And due to the bug the bottom half would then be drawn from the
offscreen before the actual contents of the offscreen would be drawn,
leading to a corrupt bottom part of the image.
Test included.
Fixes: #6318
We write the buffers in small chunks, and we even sometimes read it. So
prefer it when it's cached.
Speeds up the text benchmarks by a factor of 3x on my dedicated GPU.
If glBufferStorage() is available, we can replace our usage of
glBufferSubData() with persistently mapped storage via
glMappedBufferRange().
This has 1 disadvantage:
1. It's not supported everywhere, it requires GL 4.4 or
GL_EXT_buffer_storage. But every GPU of the last 10 years should
implement it. So we check for it and keep the old code.
The old code can also be forced via GDK_GL_DISABLE=buffer-storage.
But it has 2 advantages:
1. It is what Vulkan does, so it unifies the two renderers' buffer
handling.
2. It is a significant performance boost in use cases with large vertex
buffers. Those are pretty rare, but do happen with lots of text at a
small font size. An example would be a small font in a maximized VTE
terminal or the overview in gnome-text-editor.
A custom benchmark tailored for this problem can be created with:
tests/rendernode-create-tests 1000000 text.node
This creates a node file called "text.node" that draws 1 million text
nodes.
(Creating that test takes a minute or so. A smaller number may be useful
on less powerful hardware than my Intel Tigerlake laptop.)
The difference can then be compared via:
tools/gtk4-rendernode-tool benchmark --runs=20 text.node
and
GDK_GL_DISABLE=buffer-storage tools/gtk4-rendernode-tool benchmark --runs=20 text.node
For my laptop, the difference is:
before: 1.1s
after: 0.8s
Related: !7021