It's a struct collecting all relevant info for a texture passed to a
shader.
The ultimate goal is to get rid of the descriptors and let ops
manage them on thir own.
If GskGpuCache has an idea of what time it is, cached items can use that
time to update their last-use time instead of having to carry it around
throught function calls everywhere.
Port an optimization of the GL renderer where it fast-paths crossfades
with progress <= 0 and >=1 - which should really never happen because
nobody should emit them in the first place, but oh well.
We no longer hardcode the few different classes we have, but generically
walk over all classes.
As a side effect we now get new classes added to stats automatically.
The content itself did not change.
Commit 1580490670 included a reordering of
acquiring the frame before making the context current.
Sometimes (like at startup) new frames need to be created.
Setting up a new frame assumed the GL context was current.
Change it so that we delay the one GL setup we do in frames until later.
Vulkan requires us waiting on the image acquired from
vkAcquireNextImageKHR() before we start rendering to it, as that
function is allowed to return images that are still in use by the
compositor.
Because of that requirement, vkAcquireNextImageKHR() requires a
semaphore or fence to be passed that it can signal once it's done.
We now use a side channel to begin_frame() - calling
set_draw_semaphore() - to pass that semaphore so that the
vkAcquireNextImageKHR() call inside begin_frame() can use it, and then
we can wait on it later when we submit.
And yes, this is insanely convoluted, the Vulkan developers should
totally have thought about GTK's internal designs before coming up
with that idea.
These are just factoring out gdk_draw_context_begin/end_frame() so I can
add one tiny thing there later.
And I did both even though I only need one, because it felt wrong to
just do one.
Make the function look like that:
1. handle special case
2. maybe GC
3. draw
4. queue next gc
5. cleanup
This seems like the sanest approach to avoid gc() collecting things
necessary for drawing in the future.
And I need to refactor stuff, so having it out of the way is a good
idea.
... and plumb the color state through the downloading machinery, where
no matter what path it takes it ends up in
gdk_memory_convert_color_state() or gdk_memory_convert().
The 2nd of those has been expanded to optionally do colorstate
conversion when the 2 colorstates are different.
When a cache item is invalid, don't move it into the hash table.
Instead, just delete it.
Something like this could happen:
1. A texture is cached
In the case of #6867 this would be a webpage in epiphany.
2. The texture cache item is garbage-collected
For example, epiphany might switch to a new tab, and the previous page's
texture will remain. After 15s or so, we collect our item for that
texture.
3. The texture is cached again, but in the target colorspace
We now decide we need the texture again, but not in any colorspace, we
need it in the target colorspace. This might be because we run an
effect on it (like a crossfade) or because we want mipmaps (like in the
overview map, where its zoomed out).
4. The old invalid item is transitioned into the hash table
We now have an invalid item in the hash table. This is extra bad,
because it had only one reference (from the texture), but we treat it
like it has 2 (from us in the hash table and from the texture).
So depending on if the texture is freed before we reuse it, we get
different results: If it was free, we get invalid memory accesses, if it
was not freed, we treat it like a valid cache item and think the image
inside is still valid.
Fixes#6867
gsk_gpu_device_gc() may release the last ref on the GskGpuDevice,
leading to memory corruption when setting priv->cache_gc_source = 0.
Includes a bit of refactoring, so the ref/unref wraps nicely around the
actual code.
Fixes crashes seen after using the inspector and closing the window,
thereby closing all windows of a display and releasing all references to
the device.
Fixes#6861
Change the glsl convert_color function to proceed in stages:
- first unpremultiply
- then linearize
- then transform linearly
- then delinearize
- then premultiply
All the steps are only taken if needed.
We need to make sure our clear values are in the right colorstate, not
in sRGB.
The occluision culling managed to sneak through the big transition for
that.
This is actually the node Loupe is using, so having tiling work with it
is important.
Because of the previous commit, different filters are supported fine.
Fixes: #6324
This allows mipmapping if downscaled a lot, like we do for non-tiled
images.
A side effect is that due to the simpler caching for tiles, we can only
cache the mipmapped images in one colorstate. But we need to pick a
potentially non-default one, because we want to mipmap in a linear
colorstate.
So this is somewhat suboptimal. Patches with improvements accepted.
Use the new cache feature to split oversized textures into tiles the
size given by the new device API.
Then number those tiles from left to right and top to bottom and use
that number as the tile id.
When we draw large images, we absolutely do not want to keep memory that
we do not need. So do a GC run after every tile. That otentially slows
down things, but it also improves the chances of not running out of
memory.
Here's the node for the image I managed to create after I applied this
patch:
repeat {
bounds: 0 0 50000 50000;
child: text {
font: "Noto Color Emoji 10000px";
glyphs: 661 0 0 0 color;
offset: 0 10000;
hint-style: none;
}
}
Functions should behave as I expect, and I just spent an hour debugging
a refcount issue because I assumed our image creation functions return
refrences. Which is a very sane assumption.
The texture and texture-scale node code is creating image copies
for mipmaps and to adapt to the compositing colorstate.
Those texture should be cached.
We want to cache textures in the compositing color state, not in their
original color state. However, the compositing color state may change
(think multimonitor setups).
So we additionally keep a cache per colorstate.
That means texture lookup is now a 3-step process:
1. Look up in the compositing colorstate's cache
2. Look up in the general cache
3. Upload
GL_SRGB is doing postmultiplied alpha, so if the texture is
premultiplied, we can't use this optimization.
The optimization still works for unpremultiplied and opaque images,
because those don't do that step.
No colorstate conversions allowed here, though technically we could use
the alternate color state for the source most of the time, as the mask's
colorstate is only relevant for luminance.
This is a function that's meant to be used whenever both color states
of the shader are equal. In that case no colorspace conversion code
needs to be created and shaders can be shared.
That way, we can use it in one other place where we want to use mipmaps.
I don't really like it because it adds yet another argument,
but then the one new caller was selecting suboptimal shaders, and that's
worse.
The colormatrix shade does a whole matrix multiplication, which is
absolutely not necessary.
The convert shader has builtin opacity handling and when the colorstates
match will do no conversion.
The alternative color state is used as the interpolation color state.
Colors are transformed into that space on the CPU.
For now we set the interpolation color state to SRGB, because ultimately
we want to let callers specify it, so having something that's easy to
map to that behavior is desirable.
Otherwise we might have chosen to interpolate in the compositing
colorstate.
It also means that we need to premultiply colors on the CPU now because
of the limitations of the shader colorstates APIs.
This makes use of the GskGpuColorStates by setting the ccs as output
colorstate and the color's colorstate as alternative color state.
The shader adaption is very straightforward because of that.
This is the first op to obey the compositing color state. This means
from now on until all ops obey the ccs rendering is broken when ccs is
not set to linear.
I'll keep individual ops in seperate commits for easier review, because
they all need different adaptations.
Render to an offscreen and add a final conversion if the target
colorstate is not a rendering colorstate.
This now allows the GPU renderer to render to any colorstate.
Makes the verbose output (a lot) more verbose, but it makes the
colorstates used in the shaders very visible.
And it will be relevant once people start using different colorstates
everywhere (like oklab for gradients/colors and so on).
This adds the following functions:
output_color_from_alt()
alt_color_from_output()
Converts between the two colors
output_color_alpha()
alt_color_alpha()
Multiplies a color with an alpha value
This adds a GdkColorStates that encodes 2 of the default GdkColorStates
and wether their values are premultiplied or not.
Neither do the shaders do anything with this information yet, nor do the
shaders do anything with it yet, this is just the plumbing.
If desired, try creating GL_SRGB images. Pass a try_srgb boolean down to
the image creation functions and have them attempt to create images like
that.
When it is not possible to create srgb images in the given format, just
fall back to regular images. The calling code is meant to check the
GSK_GPU_IMAGE_SRGB flags to determine the actual format of the resulting
image.
Make the node processor and the pattern writer track the current
compositing color state. Color state nodes change it. We pass
the surface color state down via the frame apis.
The name of the variable is "ccs" for "compositing color space". It's an
unused variable name and it's common enough to deserve a short and sweet
name.
This shader converts between two color states, by using the
same functions that we use on the cpu. The conversion to perform
is passed as part of the variation.
As premultiplication is part of color states on the shader, we also
encode the premultiplication in the shader.
And because opacity is a useful optimization, we also allow setting
opacity.
For now, the only possible color states are srgb and srgb-linear.
This is an experiment for now, but it seems that encoding srgb inside
the depth makes sense, as we not just use depth to decide on the
GL fbconfigs/Vulkan formats to pick, depth also encodes how the [0...1]
color values are quantized when stored.
Let's see where this goes.
This commit just adds the flag, but I wanted to make it an individual
commit to explain the purpose:
The SRGB flag is meant to be used for images that have an SRGB format.
In Vulkan terms, that means VK_FORMAT_*_SRGB.
In GL, it means GL_SRGB or GL_SRGB_ALPHA.
As these formats have been madatory since GL 3.0, we can (ab)use them
uncoditionally. Images in these formats are renderable, too, so it's
not just usable for uploading.
What these images allow is treating the data as sRGB while shaders
access them as linear, thereby getting sRGB<=>linear conversions for
free.
It is also possible to switch off the linearization of these images and
treat them as sRGB, which allows all sorts of shenanigans, though one
has to be careful if that turning off applies to the relevant GL/Vulkan
code in question.
If the GL texture is exportable to a dmabuf, we can just use our dmabuf
importing code to get that texture into Vulkan.
There is no need to go via host memory in that case.
And if it doesn't work, we just fall back, like before.
Unimplemented nodes are a failure now.
We make this a soft failure with a g_warning() so that during
development when adding new nodes, the renderer doesn't instantly crash,
but instead prnts a warning.
But we do consider unimplemented nodes a bug now.
Because of that, add_fallback_node() is now renamed to add_cairo_node().
By moving negative affines to be treated like dihedrals, because they
also need support of the modelview, we can free up the affine branch for
doing work without it.
Not a big win I guess, but it makes scaling more efficient.
This allows handling them without ever needing to offscreen for losing
the clip, because the clip can always be transformed.
Also, all the optimizations keep working, like occlusion culling,
clears, and so on.
The main benefit of this work is the ability for offloading to now
handle dihedral transforms of the video buffer.
The other big advantage is that we can now start our rendering with a
dihedral transform from the compositor.
... instead of init_draw(); add_node(); finish_node();
We hook into the infrastructure one step earlier and close to where the
default renderer_render() and renderer_render_texture() arrive in the
nodeprocessor.
Why is this relevant?
Because process() does occlusion culling.
TL;DR: offscreens do culling now
We import them as general, so they should be exported like that.
This was a longstanding issue that I never got around to fixing and I'm
touching this code anyway atm.
See commit 3aa6c27c26 for more details.
NULL disables clearing. We only implement this for GL as in Vulkan we'd
need to create different renderpasses with different attachment
descriptions and that would require more plumbing.
We need to check that the clip is inside the opaque region, not that the
opaque region is inside the clip.
Test included, using the only not that hits the fallback path with an
opaque region smaller than its bounds.
Sometimes container nodes contain lots of overlapping opaque items. In
that case we can use the container node itself as the first node even
though none of the children cover the whole paint area.
The use case for this is a grid of cells like in a terminal where all
the cells are opaque and we want to avoid drawing the background behind
them.
Clip nodes often appear in the widget tree.
And the implementation can be trivial because of the sanity checks
already performed before calling the vfunc.
This is required because transform nodes appear everywhere.
We just exit for all transforms that can't transform the clip rect
losslessly. Both because they are rare and because we'd make the
coverage possibilities much lower.
Containers can walk the list of children back to front, trying to find
the topmost node that fully covers the viewport.
And then they can skip drawing all the nodes before that one.
Asks a node to add itself if it is fully covering the clip rectangle.
In that case, it is the first node that needs to be added.
If the node is not fully covering the clip, it should not draw itself,
because there might be stuff needing to be drawn below.
If a node adds itself, it should call gsk_gpu_render_pass_begin_op().
We wanted premultiplied images in all cases anyway, and moving that
requirement means we can also move the caching code for re-caching
textures into the texture specific code.
This uses offscreens for every call to get_node_as_image().
This is useful both for benchmarking benefits of those implementations
as well as checking that the node-specific paths produce identical
results.
gsk_gpu_node_processor_ensure_image() was a weird amalgamation of stuff
withe weird required and disallowed flags.
Refactor it to make the two operations we actually do there more
explicit: Removing straight alpha and generating mipmaps.
This untangling is also desirable in the future when we also want to
handle colorstates here.
Always return premultiplied images.
2 fallback cases for clip and transform nodes did not require that. If
those cases turn out to be important, they can call
gsk_gpu_get_node_as_image() directly as that's the more flexible option.
Pass through to the child instead of offscreening.
I mainly implemented it for the assertion, because this might be a
sneaky way to introduce bugs without exhaustive checking that we don't
offload stuff that is offscreened.
No actual bugs that I'm aware of, so no tests.
Strictly defensive coding.
When getting a texture as image, we were always returning the texture
unconditionally.
However, we want to mipmap textures when the scale factor is too large,
and this code path did not do that.
The same codepath on the GL renderer doesn't do that either, so the test
is disabled for it.
The switch statement was ugly.
Plus, the code should be close to the add_node() vfunc implementation,
so they can be modified together.
See future commits for an example where this matters.
It didn't bring any noticable benefits and it isn't compatible with the
way we intend to do colorstate support.
And nobody seems to want to spend time on it, so let's get rid of it.
We can bring it back later if someone wants to work on it.
I was watching the log in my terminal and nothing happened.
And I wasn't sure if that was because nothing was printed or because the
same thing was printed every few seconds.
Fix that by printing a timestamp, so that in a few seconds something
else will be printed.
Previously we tracked the dead pixels, but that meant we didn't know the
alive pixels (because there's also unused pixels never accounted for).
And we would free the current atlas randomly due to that.
Now we track if any pixels are alive, and if so, we never gc the current
atlas.
After 60s, we gc the atlas, too. This ensures that after that time, we
free all cache resources, so if an application gets moved to the
background, it will no longer use GPU resources. (Well, at least the
cache won't.)
Only if a non-stale item is in the cache do we consider the cache not
empty.
Once the cache is empty, the device frees it and stops running the
periodic GC.
This is for 3 reasons:
1. Separation of concerns
The device is meant to manage the Vulkan/GL device and check stuff
like image sizes.
Caching is not part of that.
2. Refcounting
Images etc want to reference the device, but the cache wants to
reference images. If the cache is the device, that's a refcycle.
3. Flexibility
It's now easier to implement >1 cache, say one per depth or one per
color state.
Keeping the GdkRGBA requires doing later conversions, which isn't
necessary if we just keep the already converted float[4].
It also prepares for future color states, where the color will need to
be converted using the colorstate.
This way, we can simply duplicate the keys as separate pointers to store
the corresponding Vulkan handles so that we can safely hash them, as
Vulkan handles may or may not be pointers depending on the target
platform.
This will fix builds on 32-bit Windows at least.
This function does not use the standard __cdecl calling convention on
Windows, meaning using g_clear_pointer() on it directly will cause
crashes on 32-bit Windows. Just call it directly if the GLsync it uses
exists.
On Windows, gsize is a long long unsigned. The compiler complains about
that.
Use G_GSIZE_FORMAT which translates to %llu on Windows, %lu on most
platforms, and sometimes just %u on rare cases.
Due to rounding errors, it is possible after intersecting a lot of
rectangles to end up with a tiny size for an offscreen. And because we
allow an epsilon before ceil()ing to an integer (see commit afc7b46264
for details) it is now possible that we end up with a size of 0.
Avoid that by always enforcing a minimum size of 1px.
Test included
The test uses a different codepath to arrive at the same problem - it
specifies the small size instead of triggering it via rounding errors
and clipping like the original bug (and most likely the more common case
to encounter this problem.
Fixes#6656
The VK_IMAGE_LAYOUT_UNDEFINED layout means that the data hold by the
texture can be discarded, and we don't want to discard it. Because the
Vulkan spec is unclear (see [1] for a discussion), err on the side of
caution and use VK_IMAGE_LAYOUT_GENERAL.
Fixes import failures with WebKit.
[1] https://github.com/ValveSoftware/gamescope/issues/356
The goal is to generate an offscreen at 1x scale.
When not ceil()ing the numbers the offscreen code would do it *and*
adjust the scale accordingly, so we'd end up with something like a
1.01x scale.
And that would cause the code to reenter this codepath with the goal to
generate an offscreen at 1x scale.
And indeed, this would lead to infinite recursion.
Tests included.
Fixes#6553
It turns out that we mispositioned glyphs with some cff fonts
when metrics hinting is off, and hinting is on. Since we don't
fully understand the interactions of these settings at this point,
lets preserve metrics hinting as it was on the font we got.
This at least gives folks a workaround for when they experience
clipped rendering with cff fonts: Turn on hint-metrics.
We forced hint metrics off here because it made Pango do some
creative wfh for hex boxes at small sizes, but I've dropped that
on the Pango side.
In a very particular situation, it could happen that our renderpass
reordering did not work out.
Consider this nesting of renderpasses (indentation indicates subpasses):
pass A
subpass of A
pass B
subpass of B
Out reordering code would reorder this as:
subpass of B
subpass of A
pass A
pass B
Which doesn't sound too bad, the subpasses happen before the passes
after all.
However, a subpass might be a pass that converts the image for a texture
stored in the texture cache and then updates the cached image.
If "subpass of A" is such a pass *and* if "subpass of B" then renders
with exactly this texture, then "subpass of B" will use the result of
"subpass of A" as a source.
The fix is to ensure that subpasses stay ordered, too.
The new order moves subpasses right before their parent pass, so the
order of the example now looks like:
subpass of A
pass A
subpass of B
pass B
The place where this would happen most common was when drawing thumbnail
images in Nautilus, the GTK filechooser or Fractal.
Those images are usually PNG files, which are straight alpha. They are then
drawn with a drop shadow, which requires an offscreen for drawing as
well as those images as premultipled sources, so lots of subpasses happen.
If there is then a redraw with a somewhat tricky subregion, then the
slicing of the region code could end up generating 2 passes that each draw
half of the thumbnail image - the first pass drawing the top half and the
second pass drawing the bottom half.
And due to the bug the bottom half would then be drawn from the
offscreen before the actual contents of the offscreen would be drawn,
leading to a corrupt bottom part of the image.
Test included.
Fixes: #6318
We write the buffers in small chunks, and we even sometimes read it. So
prefer it when it's cached.
Speeds up the text benchmarks by a factor of 3x on my dedicated GPU.
If glBufferStorage() is available, we can replace our usage of
glBufferSubData() with persistently mapped storage via
glMappedBufferRange().
This has 1 disadvantage:
1. It's not supported everywhere, it requires GL 4.4 or
GL_EXT_buffer_storage. But every GPU of the last 10 years should
implement it. So we check for it and keep the old code.
The old code can also be forced via GDK_GL_DISABLE=buffer-storage.
But it has 2 advantages:
1. It is what Vulkan does, so it unifies the two renderers' buffer
handling.
2. It is a significant performance boost in use cases with large vertex
buffers. Those are pretty rare, but do happen with lots of text at a
small font size. An example would be a small font in a maximized VTE
terminal or the overview in gnome-text-editor.
A custom benchmark tailored for this problem can be created with:
tests/rendernode-create-tests 1000000 text.node
This creates a node file called "text.node" that draws 1 million text
nodes.
(Creating that test takes a minute or so. A smaller number may be useful
on less powerful hardware than my Intel Tigerlake laptop.)
The difference can then be compared via:
tools/gtk4-rendernode-tool benchmark --runs=20 text.node
and
GDK_GL_DISABLE=buffer-storage tools/gtk4-rendernode-tool benchmark --runs=20 text.node
For my laptop, the difference is:
before: 1.1s
after: 0.8s
Related: !7021
It's not just unused, it's also wrong.
We are reading from the buffer when reallocating the vertex buffer
and memcpy()ing the old into the new buffer - at that point we read from
it.
When ops get allocated that use the same stats as the last op, put them
into the same ShaderOp. This reduces the number of ShaderOps we need to
record, which has 3 benefits:
1. It's less work when iterating over all the ops.
This isn't a big win, but it makes submit() and print() run a bit
faster.
2. We don't need to manage data per-op.
This is a large win because we don't need to ref/unref descriptors
as much anymore, and refcounting is visible on profiles.
3. We save memory.
This is a pretty big win because we iterate over ops a lot, and when
the array is large enough (I've managed to write testcases that makes
it grow to over 4GB) it kills all the caches and that's bad.
The main benefit of all this are glyphs, which used to emit 1 ShaderOp
per glyph and can now end up with 1 ShaderOp for multiple text nodes,
even if those text nodes use different fonts or colors - because they
can all share the same ColorizeOp.
With potentially multiple ops per ShaderOp, we may encounter situations
where 1 ShaderOp contains more ops than we want to merge. (With
GSK_GPU_SKIP=merge, we don't want to merge at all.)
So we still merge the ShaderOps (now unconditionally), but we then run
a loop that potentially splits the merged ops again - exactly at the
point we want to.
This way we can merge ops inside of ShaderOps and merge ShaderOps, but
still have the draw calls contain the exact number of ops we want.
This just introduces the variable and sets it to 1 everywhere.
The ultimate goal is to allow one ShaderOp to collect multiple ops into
one, thereby saving memory in the ops array and leading to faster
performance.
Instead of having renderer API to wait for any number of frames, just
have gsk_gpu_frame_wait() to wait for a single frame.
This unifies behavior on Vulkan and GL, because unlike Vulkan, GL does
not allow waiting for multiple fences.
To make up for it, we replace waiting for multiple frames with finding
the frame with the earliest timestamp and waiting for that one.
Also implement wait() for GL.
This copies the Vulkan idea of using a fence at the end of command
submission and waiting until it gets signaled before reusing the frame.
This frees up the GL driver from doing the work of making buffers etc
reusable and instead allocates new ones when they're still in use and is
a pretty massive performance win.
Most of the time, the image we get for the glyphs will be the
same (the atlas), so avoid adding it to the descriptor set over
and over, and check first if have to. This matches what the
pattern variant of this function already does.
Just initialize the rect directly. This matches better what the
pattern variant of this method does, and it also has the nice
side-effect of eliminating the handling of negative scales in
gsk_rect_scale, which we don't need here, since our scales are
always positive.
Make a single gsk_reload_font helper that can tweak both
scale and font options, so we can ensure that our scaled
font has hint-metrics turned off (pango pays attention to
hint metrics when sizing and rendering hex boxes, and that
hurts us.
This is a tricky topic, because it can make the clip bounds grow, so
previously we were trying to be careful.
However, this can cause perfectly trivial intersections to fail that are
caused by redraw diff regions.
And in the worst case, that means we offscreen in places where we
absolutely do not want to offscreen - in subtrees with subsurface nodes.
Fixes#6499
CLIP_TYPE_NONE is valid if the clip is implemented by the scissor rect.
We always have a scissor rect and there's no way to draw outside of it.
In theory that means we can reset the clip to NONE at any point we
wish if we know nodes are contained inside a certain pixel-aligned
rectangle we can clip.
In practice that's probably quite hard...
We were turning off hinting and subpixel positioning if the
transform isn't 2D affine. The idea behind this was that transforms
likely indicate animations, and for animations, this may reduce
jitter. But the heuristic of transform==animation is not very
reliable, and we pay for this with a jump from hinted to unhinted
at the beginning and end of it. Also, the heuristic does not even
work for the most relevant 'animation' we have today: scrolling.
So, lets drop this for now. We can revisit it later.
Some maps are used for read only and do not require uploading contents
back to the GPU afterwards. In other cases, we can often upload less than
the fully allocated buffer size.
When transforming an empty clip, it stays empty.
Previously, we were setting it to CONTAINED, but that's wrong, because
the bounds are not contained in the clip, the clip is contained in the bounds.
This reverts part of commit a51c6aed47.
Related: !6692
Enforce the following rules:
- No hinting or subpixel positioning in transformed context
- glyph-align determines if we use integral or fractional
device pixel positions
- For hinting, always use an integral y position (the hinter
assumes integral positions, and only operates vertically).
This changes the approach we take to rendering glyphs in the
presence of a scale transform: Instead of scaling the extents
and rendering to an image surface with device scale, simply
create a scaled font and use it for extents and rendering.
This avoids clipping problems with scaling of extents in
the presence of hinting.
The pango code that is drawing hex boxes, invisible glyphs, etc,
is depending on the width being set in the PangoGlyphInfo. Once
we set that, everything falls into place.
Testcase included.
We were just assuming they were if the format matches.
Fixes crashes in Webkit where the external texture is actually a dmabuf
imported as an EGL image.
Avoids getting the scale wrong when due to a rounding error our
pixel-aligned rectangle is 5.000000003px big and we ceil() to 6px
and produce blurry output.
Fixes#6439
Fixes blurriness in shadows.
Not sure to do a proper test for this feature. Usually proper pixel
alignment is tested by drawing a crips line and checking that it is
indeed crisp. But we are testing the blur operation here...
Fixes#6380
This isn't really a useful thing in itself, because none of the callers
handle the NULL return.
But the resulting crash is easier to debug when it's a NULL image than
when add_node() is called on an uninitializes NodeProcessor.
In GSK the following pattern is used four times:
```
switch (self->filter)
{
default:
g_assert_not_reached ();
G_GNUC_FALLTHROUGH;
case GSK_GPU_BLIT_LINEAR:
filter = GL_LINEAR;
break;
case GSK_GPU_BLIT_NEAREST:
filter = GL_NEAREST;
break;
}
```
The G_GNUC_FALLTHROUGH macro is not required. When G_DISABLE_ASSERT
is defined the body of the `default` case is empty, thus there is
no need. When G_DISABLE_ASSERT is not defined the body of the `default`
case contains g_assert_not_reached() thus it won't fallthrough.
This resolves the following:
```
[221/1379] Compiling C object gsk/libgsk.a.p/gpu_gskgpublitop.c.o
[...]
error: fallthrough annotation in unreachable code [-Werror,-Wimplicit-fallthrough]
1 error generated.
```
This can be helpful to see that there is an enormous scale blowing
things up. We omit the matrix, since it is 16 floats that are hard
to interpret at a glance.
We can just check if the subsurfaces contain content - and if they do,
they will be offloading and we can ignore the diff.
This essentially reverts 48740de71a
The 2 callers of gsk_gpu_get_node_as_image() were already computing the
minimum clip region and in particular aligning it to the pixel grid, so
intersecting with node bounds again was causing that alignment to be
busted.
When using a window size and scale that don't multiply to an integer, we
were using the wrong method to adjust it.
The Wayland fractional scaling spec just says:
> For toplevel surfaces, the size is rounded halfway away from zero.
This is meant to be interpreted as "create a large enough buffer to hold
partial pixels) and the compositor will blend it mapping to the pixel
grid" even if that means the buffer slightly overhangs.
Example:
A 11 units wide window at 150% will need a 11 * 1.5 = 16.5 pixel wide
buffer. This should be rounded to 17 pixels but rendered as if only 16.5
pixels are occupied by the window, not as if all 17 pixels are occupied.
This commit is wrong.
It does achieve what it sets out to do, but the method doesn't work.
It confused multiple things in one commit, the commit message only
describes the symptoms it tries to fix and not why the fix is correct,
it includes no tests and it wsn't properly reviewed.
Related: !6871
We want to use a viewport that gives us the right scale back.
This fixes problems where glyph lookups were inefficient because
the scale part of the key would fluctuate ever so slightly.
The previous code was ignoring non-scissor clips, which would make it
overeager at punching holes.
It also was not working with fractional coordinates.
Fixes#6375
We can reach the code that removes the item from the hash table
before or after the weak unref has triggered. Just leave the
weakref in place and let it do its thing, if it hasn't gone
off yet. That matches what we do in free.
Fixes: #6377
We do gc in a timeout, when an arbitrary GL context might be
current, so we need to make sure its ours and we don't free
random textures in another context.
Fixes: #6366
Count dead pixels in textures (ie the number of pixels in GPU
textures that are no longer backed by an alive GdkTexture object),
and when the there's too many, do a gc before rendering the next
frame.
Count the uses of cached texture - from the device (via the linked
list) and from the texture (via render data / weak ref), and only
free the item once the use count reaches zero.
Instead of forever running a timeout to do gc, ensure the timeout
is scheduled whenever we render a frame (this is done by calling
gsk_gpu_device_maybe_gc () before gsk_gpu_frame_render (), and
gsk_gpu_device_queue_gc () after).
Read the GSK_CACHE_TIMEOUT environment variable to override the
default 15s timeout for cache gc. This is mainly meant for debugging.
Since we don't really need two knobs, reuse the gc timeout value
for the max age of items too.