When ops get allocated that use the same stats as the last op, put them
into the same ShaderOp. This reduces the number of ShaderOps we need to
record, which has 3 benefits:
1. It's less work when iterating over all the ops.
This isn't a big win, but it makes submit() and print() run a bit
faster.
2. We don't need to manage data per-op.
This is a large win because we don't need to ref/unref descriptors
as much anymore, and refcounting is visible on profiles.
3. We save memory.
This is a pretty big win because we iterate over ops a lot, and when
the array is large enough (I've managed to write testcases that makes
it grow to over 4GB) it kills all the caches and that's bad.
The main benefit of all this are glyphs, which used to emit 1 ShaderOp
per glyph and can now end up with 1 ShaderOp for multiple text nodes,
even if those text nodes use different fonts or colors - because they
can all share the same ColorizeOp.
With potentially multiple ops per ShaderOp, we may encounter situations
where 1 ShaderOp contains more ops than we want to merge. (With
GSK_GPU_SKIP=merge, we don't want to merge at all.)
So we still merge the ShaderOps (now unconditionally), but we then run
a loop that potentially splits the merged ops again - exactly at the
point we want to.
This way we can merge ops inside of ShaderOps and merge ShaderOps, but
still have the draw calls contain the exact number of ops we want.
This just introduces the variable and sets it to 1 everywhere.
The ultimate goal is to allow one ShaderOp to collect multiple ops into
one, thereby saving memory in the ops array and leading to faster
performance.
Instead of having renderer API to wait for any number of frames, just
have gsk_gpu_frame_wait() to wait for a single frame.
This unifies behavior on Vulkan and GL, because unlike Vulkan, GL does
not allow waiting for multiple fences.
To make up for it, we replace waiting for multiple frames with finding
the frame with the earliest timestamp and waiting for that one.
Also implement wait() for GL.
This copies the Vulkan idea of using a fence at the end of command
submission and waiting until it gets signaled before reusing the frame.
This frees up the GL driver from doing the work of making buffers etc
reusable and instead allocates new ones when they're still in use and is
a pretty massive performance win.
Print backend can be disposed together with all its printers
as a reaction to user stopping enumeration of printers.
Adding a weak pointer help us to detect that the backend
was disposed and hence the backend and its printers should not
be used anymore.
Fixes#6265
Most of the time, the image we get for the glyphs will be the
same (the atlas), so avoid adding it to the descriptor set over
and over, and check first if have to. This matches what the
pattern variant of this function already does.
Just initialize the rect directly. This matches better what the
pattern variant of this method does, and it also has the nice
side-effect of eliminating the handling of negative scales in
gsk_rect_scale, which we don't need here, since our scales are
always positive.