In a very particular situation, it could happen that our renderpass
reordering did not work out.
Consider this nesting of renderpasses (indentation indicates subpasses):
pass A
subpass of A
pass B
subpass of B
Out reordering code would reorder this as:
subpass of B
subpass of A
pass A
pass B
Which doesn't sound too bad, the subpasses happen before the passes
after all.
However, a subpass might be a pass that converts the image for a texture
stored in the texture cache and then updates the cached image.
If "subpass of A" is such a pass *and* if "subpass of B" then renders
with exactly this texture, then "subpass of B" will use the result of
"subpass of A" as a source.
The fix is to ensure that subpasses stay ordered, too.
The new order moves subpasses right before their parent pass, so the
order of the example now looks like:
subpass of A
pass A
subpass of B
pass B
The place where this would happen most common was when drawing thumbnail
images in Nautilus, the GTK filechooser or Fractal.
Those images are usually PNG files, which are straight alpha. They are then
drawn with a drop shadow, which requires an offscreen for drawing as
well as those images as premultipled sources, so lots of subpasses happen.
If there is then a redraw with a somewhat tricky subregion, then the
slicing of the region code could end up generating 2 passes that each draw
half of the thumbnail image - the first pass drawing the top half and the
second pass drawing the bottom half.
And due to the bug the bottom half would then be drawn from the
offscreen before the actual contents of the offscreen would be drawn,
leading to a corrupt bottom part of the image.
Test included.
Fixes: #6318
When ops get allocated that use the same stats as the last op, put them
into the same ShaderOp. This reduces the number of ShaderOps we need to
record, which has 3 benefits:
1. It's less work when iterating over all the ops.
This isn't a big win, but it makes submit() and print() run a bit
faster.
2. We don't need to manage data per-op.
This is a large win because we don't need to ref/unref descriptors
as much anymore, and refcounting is visible on profiles.
3. We save memory.
This is a pretty big win because we iterate over ops a lot, and when
the array is large enough (I've managed to write testcases that makes
it grow to over 4GB) it kills all the caches and that's bad.
The main benefit of all this are glyphs, which used to emit 1 ShaderOp
per glyph and can now end up with 1 ShaderOp for multiple text nodes,
even if those text nodes use different fonts or colors - because they
can all share the same ColorizeOp.
With potentially multiple ops per ShaderOp, we may encounter situations
where 1 ShaderOp contains more ops than we want to merge. (With
GSK_GPU_SKIP=merge, we don't want to merge at all.)
So we still merge the ShaderOps (now unconditionally), but we then run
a loop that potentially splits the merged ops again - exactly at the
point we want to.
This way we can merge ops inside of ShaderOps and merge ShaderOps, but
still have the draw calls contain the exact number of ops we want.
This just introduces the variable and sets it to 1 everywhere.
The ultimate goal is to allow one ShaderOp to collect multiple ops into
one, thereby saving memory in the ops array and leading to faster
performance.
Instead of having renderer API to wait for any number of frames, just
have gsk_gpu_frame_wait() to wait for a single frame.
This unifies behavior on Vulkan and GL, because unlike Vulkan, GL does
not allow waiting for multiple fences.
To make up for it, we replace waiting for multiple frames with finding
the frame with the earliest timestamp and waiting for that one.
Also implement wait() for GL.
This copies the Vulkan idea of using a fence at the end of command
submission and waiting until it gets signaled before reusing the frame.
This frees up the GL driver from doing the work of making buffers etc
reusable and instead allocates new ones when they're still in use and is
a pretty massive performance win.
Print backend can be disposed together with all its printers
as a reaction to user stopping enumeration of printers.
Adding a weak pointer help us to detect that the backend
was disposed and hence the backend and its printers should not
be used anymore.
Fixes#6265