We write the buffers in small chunks, and we even sometimes read it. So
prefer it when it's cached.
Speeds up the text benchmarks by a factor of 3x on my dedicated GPU.
If glBufferStorage() is available, we can replace our usage of
glBufferSubData() with persistently mapped storage via
glMappedBufferRange().
This has 1 disadvantage:
1. It's not supported everywhere, it requires GL 4.4 or
GL_EXT_buffer_storage. But every GPU of the last 10 years should
implement it. So we check for it and keep the old code.
The old code can also be forced via GDK_GL_DISABLE=buffer-storage.
But it has 2 advantages:
1. It is what Vulkan does, so it unifies the two renderers' buffer
handling.
2. It is a significant performance boost in use cases with large vertex
buffers. Those are pretty rare, but do happen with lots of text at a
small font size. An example would be a small font in a maximized VTE
terminal or the overview in gnome-text-editor.
A custom benchmark tailored for this problem can be created with:
tests/rendernode-create-tests 1000000 text.node
This creates a node file called "text.node" that draws 1 million text
nodes.
(Creating that test takes a minute or so. A smaller number may be useful
on less powerful hardware than my Intel Tigerlake laptop.)
The difference can then be compared via:
tools/gtk4-rendernode-tool benchmark --runs=20 text.node
and
GDK_GL_DISABLE=buffer-storage tools/gtk4-rendernode-tool benchmark --runs=20 text.node
For my laptop, the difference is:
before: 1.1s
after: 0.8s
Related: !7021
It's not just unused, it's also wrong.
We are reading from the buffer when reallocating the vertex buffer
and memcpy()ing the old into the new buffer - at that point we read from
it.
When ops get allocated that use the same stats as the last op, put them
into the same ShaderOp. This reduces the number of ShaderOps we need to
record, which has 3 benefits:
1. It's less work when iterating over all the ops.
This isn't a big win, but it makes submit() and print() run a bit
faster.
2. We don't need to manage data per-op.
This is a large win because we don't need to ref/unref descriptors
as much anymore, and refcounting is visible on profiles.
3. We save memory.
This is a pretty big win because we iterate over ops a lot, and when
the array is large enough (I've managed to write testcases that makes
it grow to over 4GB) it kills all the caches and that's bad.
The main benefit of all this are glyphs, which used to emit 1 ShaderOp
per glyph and can now end up with 1 ShaderOp for multiple text nodes,
even if those text nodes use different fonts or colors - because they
can all share the same ColorizeOp.
With potentially multiple ops per ShaderOp, we may encounter situations
where 1 ShaderOp contains more ops than we want to merge. (With
GSK_GPU_SKIP=merge, we don't want to merge at all.)
So we still merge the ShaderOps (now unconditionally), but we then run
a loop that potentially splits the merged ops again - exactly at the
point we want to.
This way we can merge ops inside of ShaderOps and merge ShaderOps, but
still have the draw calls contain the exact number of ops we want.
This just introduces the variable and sets it to 1 everywhere.
The ultimate goal is to allow one ShaderOp to collect multiple ops into
one, thereby saving memory in the ops array and leading to faster
performance.
Instead of having renderer API to wait for any number of frames, just
have gsk_gpu_frame_wait() to wait for a single frame.
This unifies behavior on Vulkan and GL, because unlike Vulkan, GL does
not allow waiting for multiple fences.
To make up for it, we replace waiting for multiple frames with finding
the frame with the earliest timestamp and waiting for that one.
Also implement wait() for GL.
This copies the Vulkan idea of using a fence at the end of command
submission and waiting until it gets signaled before reusing the frame.
This frees up the GL driver from doing the work of making buffers etc
reusable and instead allocates new ones when they're still in use and is
a pretty massive performance win.
Most of the time, the image we get for the glyphs will be the
same (the atlas), so avoid adding it to the descriptor set over
and over, and check first if have to. This matches what the
pattern variant of this function already does.
Just initialize the rect directly. This matches better what the
pattern variant of this method does, and it also has the nice
side-effect of eliminating the handling of negative scales in
gsk_rect_scale, which we don't need here, since our scales are
always positive.
Make a single gsk_reload_font helper that can tweak both
scale and font options, so we can ensure that our scaled
font has hint-metrics turned off (pango pays attention to
hint metrics when sizing and rendering hex boxes, and that
hurts us.
This is a tricky topic, because it can make the clip bounds grow, so
previously we were trying to be careful.
However, this can cause perfectly trivial intersections to fail that are
caused by redraw diff regions.
And in the worst case, that means we offscreen in places where we
absolutely do not want to offscreen - in subtrees with subsurface nodes.
Fixes#6499
CLIP_TYPE_NONE is valid if the clip is implemented by the scissor rect.
We always have a scissor rect and there's no way to draw outside of it.
In theory that means we can reset the clip to NONE at any point we
wish if we know nodes are contained inside a certain pixel-aligned
rectangle we can clip.
In practice that's probably quite hard...
We were turning off hinting and subpixel positioning if the
transform isn't 2D affine. The idea behind this was that transforms
likely indicate animations, and for animations, this may reduce
jitter. But the heuristic of transform==animation is not very
reliable, and we pay for this with a jump from hinted to unhinted
at the beginning and end of it. Also, the heuristic does not even
work for the most relevant 'animation' we have today: scrolling.
So, lets drop this for now. We can revisit it later.
When getting the hinted version of fonts, they often come in sequentially.
This helps reduce overhead in many sequential gtk_text_node_new() on with
fractional scaling as you see from GtkSourceView.
Some maps are used for read only and do not require uploading contents
back to the GPU afterwards. In other cases, we can often upload less than
the fully allocated buffer size.
When transforming an empty clip, it stays empty.
Previously, we were setting it to CONTAINED, but that's wrong, because
the bounds are not contained in the clip, the clip is contained in the bounds.
This reverts part of commit a51c6aed47.
Related: !6692
When scaling a font or changing font options, we need to be
careful to preserve the dpi as well, otherwise the rendering
might leak out of the node bounds, leading to spectacular
glitches.
Fixes: #6508
Enforce the following rules:
- No hinting or subpixel positioning in transformed context
- glyph-align determines if we use integral or fractional
device pixel positions
- For hinting, always use an integral y position (the hinter
assumes integral positions, and only operates vertically).
When we get an unhinted font for text node extents, don't change
the antialiasing setting. It doesn't affect the extents we get
here, but if we later need an unhinted font for rendering, the
one we create this way will be the right one, so it will already
exist.
The goal is to fix all the context that influences the rendering
of text nodes in the node file. This will help with better font
testing.
The newly accepted properties are
hint-style: none/slight/full
antialias: none/gray
We are omitting font options and values that aren't supported
in GSK or have no influence on the rendering.
Note that these settings will get incorporated in the PangoFont
that gets set on the resulting text node.
Parser tests included.
We need precise bounds. And while hinting might shift the rendering
around from these bounds by a fraction of a pixel, we account for
this in the places where it matters: when determining diff regions,
when sizing offscreens, and when determining the size of atlas
regions for glyphs.
Add a function to change the cairo font options of a font to
to the given values while keeping everything else the same.
We use pango api for this if available.
Note that this is not a fully general api, but tailored to the
needs of GSK. We don't allow setting hint-metrics (because it
only influences layout, not rendering) or subpixel-mode (since
we don't have component alpha available).
This changes the approach we take to rendering glyphs in the
presence of a scale transform: Instead of scaling the extents
and rendering to an image surface with device scale, simply
create a scaled font and use it for extents and rendering.
This avoids clipping problems with scaling of extents in
the presence of hinting.
The pango code that is drawing hex boxes, invisible glyphs, etc,
is depending on the width being set in the PangoGlyphInfo. Once
we set that, everything falls into place.
Testcase included.
It is a bit annoying that one has to specify the glyph width
when specifying glyphs numerically for a text node, since this
information really is part of the font.
Make the parser more flexible, and allow to specify just the glyph
ids, without an explicit width. In this case, the width will be
determined from the font.
With this, glyphs can now be specified in any of the follwing
ways:
glyphs: "ABC"; (ASCII)
glyphs: 23, 45, 1001; (Glyph IDs)
glyphs: 23 10, 100 11.1; (Glyph IDs and advance widths)
glyphs: 23 10 1 2 color; (with offsets and flags)
Tests have been updated to cover these variants.
We were just assuming they were if the format matches.
Fixes crashes in Webkit where the external texture is actually a dmabuf
imported as an EGL image.
Avoids getting the scale wrong when due to a rounding error our
pixel-aligned rectangle is 5.000000003px big and we ceil() to 6px
and produce blurry output.
Fixes#6439
The code was written under the assumption that the corners of
the rounded rect are disjoint. If they aren't, there are a few
more cases to consider.
Fixes: #6440
We lost this when a bunch of rect code was inlined in
commit 36314f28e2, and as it turns out, that broke some
applications. So, bring it back.
Fixes: #6435
Fixes blurriness in shadows.
Not sure to do a proper test for this feature. Usually proper pixel
alignment is tested by drawing a crips line and checking that it is
indeed crisp. But we are testing the blur operation here...
Fixes#6380
This isn't really a useful thing in itself, because none of the callers
handle the NULL return.
But the resulting crash is easier to debug when it's a NULL image than
when add_node() is called on an uninitializes NodeProcessor.
In GSK the following pattern is used four times:
```
switch (self->filter)
{
default:
g_assert_not_reached ();
G_GNUC_FALLTHROUGH;
case GSK_GPU_BLIT_LINEAR:
filter = GL_LINEAR;
break;
case GSK_GPU_BLIT_NEAREST:
filter = GL_NEAREST;
break;
}
```
The G_GNUC_FALLTHROUGH macro is not required. When G_DISABLE_ASSERT
is defined the body of the `default` case is empty, thus there is
no need. When G_DISABLE_ASSERT is not defined the body of the `default`
case contains g_assert_not_reached() thus it won't fallthrough.
This resolves the following:
```
[221/1379] Compiling C object gsk/libgsk.a.p/gpu_gskgpublitop.c.o
[...]
error: fallthrough annotation in unreachable code [-Werror,-Wimplicit-fallthrough]
1 error generated.
```
This can be helpful to see that there is an enormous scale blowing
things up. We omit the matrix, since it is 16 floats that are hard
to interpret at a glance.
Unless the renderer has been explicitly selected via the
GSK_RENDERER environment variable, don't use it with llvmpipe.
It is important that we allow explicit setting to override
this, so we can continue to use ngl in ci, where we don't
have hw and want to test with llvmpipe.
This should address many of the "performance is terrible in
GNOME OS" complaints that are coming from people running in
VMs, etc.
Look for nodes like subsurface { clip { texture {} } }, and use
the clip to provide a source rectangle for subsetting the texture.
Update affected tests, and add a new one.
This will let us use a subset of the full texture, which can
be necessary in the case that converters put padding around
content in dmabufs. The naming follows the Wayland viewporter
spec.
For now, make all callers pass the full texture rect.
We are going to introduce another rect, so better to be clear in
naming. We are following the naming of the Wayland viewporter spec
and call the rectangle that we drawing into the dest(ination).
We were collecting diffs based on the can_offload/can_raise
information, but attaching the texture to the subsurface can
fail (e.g. if its not a dmabuf texture), in which case can_offload
turned out to be wrong. So move the diff collection to the end
and do it based on the whether we actually succeeded in attaching
the texture.
We can just check if the subsurfaces contain content - and if they do,
they will be offloading and we can ignore the diff.
This essentially reverts 48740de71a
Instead of relying on diffing subsurface nodes, we track damage
generated by offloaded contents inside GskOffload.
There are 3 stages a subsurface node can be in:
1. not offloaded
Drawing is done by the renderer
2. offloaded above
The renderer draws nothing
3. offloaded below
The renderer needs to punch a hole.
Whenever the stage changes, we need to repaint.
And that can happen without the subsurface's contents changing, like
when a widget is put above the subsurface and it needs to to go from
offloaded above to below.
So we now recruit GskOffload for tracking these changes, instead of
relying on the subsurface diffing.
But we still need the subsurface diffing code to work for the
non-offloaded case, because then the offloading code is not used.
So we keep using it whenever that happens.
Not that when a subsurface transitions between being offloaded and not
being offloaded, we may diff it twice - once in the offload code and
once in the node diffing - but that shouldn't matter.
When a subsurface goes from not offloaded to offloaded (or vice versa),
we need to add the whole node to the diff region, because we switch from
whatever contents were drawn to a punched hole.