It didn't bring any noticable benefits and it isn't compatible with the
way we intend to do colorstate support.
And nobody seems to want to spend time on it, so let's get rid of it.
We can bring it back later if someone wants to work on it.
I was watching the log in my terminal and nothing happened.
And I wasn't sure if that was because nothing was printed or because the
same thing was printed every few seconds.
Fix that by printing a timestamp, so that in a few seconds something
else will be printed.
Previously we tracked the dead pixels, but that meant we didn't know the
alive pixels (because there's also unused pixels never accounted for).
And we would free the current atlas randomly due to that.
Now we track if any pixels are alive, and if so, we never gc the current
atlas.
After 60s, we gc the atlas, too. This ensures that after that time, we
free all cache resources, so if an application gets moved to the
background, it will no longer use GPU resources. (Well, at least the
cache won't.)
Only if a non-stale item is in the cache do we consider the cache not
empty.
Once the cache is empty, the device frees it and stops running the
periodic GC.
This is for 3 reasons:
1. Separation of concerns
The device is meant to manage the Vulkan/GL device and check stuff
like image sizes.
Caching is not part of that.
2. Refcounting
Images etc want to reference the device, but the cache wants to
reference images. If the cache is the device, that's a refcycle.
3. Flexibility
It's now easier to implement >1 cache, say one per depth or one per
color state.
Keeping the GdkRGBA requires doing later conversions, which isn't
necessary if we just keep the already converted float[4].
It also prepares for future color states, where the color will need to
be converted using the colorstate.
This way, we can simply duplicate the keys as separate pointers to store
the corresponding Vulkan handles so that we can safely hash them, as
Vulkan handles may or may not be pointers depending on the target
platform.
This will fix builds on 32-bit Windows at least.
This function does not use the standard __cdecl calling convention on
Windows, meaning using g_clear_pointer() on it directly will cause
crashes on 32-bit Windows. Just call it directly if the GLsync it uses
exists.
On Windows, gsize is a long long unsigned. The compiler complains about
that.
Use G_GSIZE_FORMAT which translates to %llu on Windows, %lu on most
platforms, and sometimes just %u on rare cases.
Due to rounding errors, it is possible after intersecting a lot of
rectangles to end up with a tiny size for an offscreen. And because we
allow an epsilon before ceil()ing to an integer (see commit afc7b46264
for details) it is now possible that we end up with a size of 0.
Avoid that by always enforcing a minimum size of 1px.
Test included
The test uses a different codepath to arrive at the same problem - it
specifies the small size instead of triggering it via rounding errors
and clipping like the original bug (and most likely the more common case
to encounter this problem.
Fixes#6656
The VK_IMAGE_LAYOUT_UNDEFINED layout means that the data hold by the
texture can be discarded, and we don't want to discard it. Because the
Vulkan spec is unclear (see [1] for a discussion), err on the side of
caution and use VK_IMAGE_LAYOUT_GENERAL.
Fixes import failures with WebKit.
[1] https://github.com/ValveSoftware/gamescope/issues/356
The goal is to generate an offscreen at 1x scale.
When not ceil()ing the numbers the offscreen code would do it *and*
adjust the scale accordingly, so we'd end up with something like a
1.01x scale.
And that would cause the code to reenter this codepath with the goal to
generate an offscreen at 1x scale.
And indeed, this would lead to infinite recursion.
Tests included.
Fixes#6553
It turns out that we mispositioned glyphs with some cff fonts
when metrics hinting is off, and hinting is on. Since we don't
fully understand the interactions of these settings at this point,
lets preserve metrics hinting as it was on the font we got.
This at least gives folks a workaround for when they experience
clipped rendering with cff fonts: Turn on hint-metrics.
We forced hint metrics off here because it made Pango do some
creative wfh for hex boxes at small sizes, but I've dropped that
on the Pango side.
In a very particular situation, it could happen that our renderpass
reordering did not work out.
Consider this nesting of renderpasses (indentation indicates subpasses):
pass A
subpass of A
pass B
subpass of B
Out reordering code would reorder this as:
subpass of B
subpass of A
pass A
pass B
Which doesn't sound too bad, the subpasses happen before the passes
after all.
However, a subpass might be a pass that converts the image for a texture
stored in the texture cache and then updates the cached image.
If "subpass of A" is such a pass *and* if "subpass of B" then renders
with exactly this texture, then "subpass of B" will use the result of
"subpass of A" as a source.
The fix is to ensure that subpasses stay ordered, too.
The new order moves subpasses right before their parent pass, so the
order of the example now looks like:
subpass of A
pass A
subpass of B
pass B
The place where this would happen most common was when drawing thumbnail
images in Nautilus, the GTK filechooser or Fractal.
Those images are usually PNG files, which are straight alpha. They are then
drawn with a drop shadow, which requires an offscreen for drawing as
well as those images as premultipled sources, so lots of subpasses happen.
If there is then a redraw with a somewhat tricky subregion, then the
slicing of the region code could end up generating 2 passes that each draw
half of the thumbnail image - the first pass drawing the top half and the
second pass drawing the bottom half.
And due to the bug the bottom half would then be drawn from the
offscreen before the actual contents of the offscreen would be drawn,
leading to a corrupt bottom part of the image.
Test included.
Fixes: #6318
We write the buffers in small chunks, and we even sometimes read it. So
prefer it when it's cached.
Speeds up the text benchmarks by a factor of 3x on my dedicated GPU.
If glBufferStorage() is available, we can replace our usage of
glBufferSubData() with persistently mapped storage via
glMappedBufferRange().
This has 1 disadvantage:
1. It's not supported everywhere, it requires GL 4.4 or
GL_EXT_buffer_storage. But every GPU of the last 10 years should
implement it. So we check for it and keep the old code.
The old code can also be forced via GDK_GL_DISABLE=buffer-storage.
But it has 2 advantages:
1. It is what Vulkan does, so it unifies the two renderers' buffer
handling.
2. It is a significant performance boost in use cases with large vertex
buffers. Those are pretty rare, but do happen with lots of text at a
small font size. An example would be a small font in a maximized VTE
terminal or the overview in gnome-text-editor.
A custom benchmark tailored for this problem can be created with:
tests/rendernode-create-tests 1000000 text.node
This creates a node file called "text.node" that draws 1 million text
nodes.
(Creating that test takes a minute or so. A smaller number may be useful
on less powerful hardware than my Intel Tigerlake laptop.)
The difference can then be compared via:
tools/gtk4-rendernode-tool benchmark --runs=20 text.node
and
GDK_GL_DISABLE=buffer-storage tools/gtk4-rendernode-tool benchmark --runs=20 text.node
For my laptop, the difference is:
before: 1.1s
after: 0.8s
Related: !7021
It's not just unused, it's also wrong.
We are reading from the buffer when reallocating the vertex buffer
and memcpy()ing the old into the new buffer - at that point we read from
it.
When ops get allocated that use the same stats as the last op, put them
into the same ShaderOp. This reduces the number of ShaderOps we need to
record, which has 3 benefits:
1. It's less work when iterating over all the ops.
This isn't a big win, but it makes submit() and print() run a bit
faster.
2. We don't need to manage data per-op.
This is a large win because we don't need to ref/unref descriptors
as much anymore, and refcounting is visible on profiles.
3. We save memory.
This is a pretty big win because we iterate over ops a lot, and when
the array is large enough (I've managed to write testcases that makes
it grow to over 4GB) it kills all the caches and that's bad.
The main benefit of all this are glyphs, which used to emit 1 ShaderOp
per glyph and can now end up with 1 ShaderOp for multiple text nodes,
even if those text nodes use different fonts or colors - because they
can all share the same ColorizeOp.
With potentially multiple ops per ShaderOp, we may encounter situations
where 1 ShaderOp contains more ops than we want to merge. (With
GSK_GPU_SKIP=merge, we don't want to merge at all.)
So we still merge the ShaderOps (now unconditionally), but we then run
a loop that potentially splits the merged ops again - exactly at the
point we want to.
This way we can merge ops inside of ShaderOps and merge ShaderOps, but
still have the draw calls contain the exact number of ops we want.
This just introduces the variable and sets it to 1 everywhere.
The ultimate goal is to allow one ShaderOp to collect multiple ops into
one, thereby saving memory in the ops array and leading to faster
performance.