This is a function that's meant to be used whenever both color states
of the shader are equal. In that case no colorspace conversion code
needs to be created and shaders can be shared.
The GL renderer is using FLOAT32 instead of GL_SRGB, which is screwing
up the node-editor by making it turn on high bit depth unconditionally.
So until someone fixes the GL renderer properly, do this quickfix.
That way, we can use it in one other place where we want to use mipmaps.
I don't really like it because it adds yet another argument,
but then the one new caller was selecting suboptimal shaders, and that's
worse.
The colormatrix shade does a whole matrix multiplication, which is
absolutely not necessary.
The convert shader has builtin opacity handling and when the colorstates
match will do no conversion.
The alternative color state is used as the interpolation color state.
Colors are transformed into that space on the CPU.
For now we set the interpolation color state to SRGB, because ultimately
we want to let callers specify it, so having something that's easy to
map to that behavior is desirable.
Otherwise we might have chosen to interpolate in the compositing
colorstate.
It also means that we need to premultiply colors on the CPU now because
of the limitations of the shader colorstates APIs.
This makes use of the GskGpuColorStates by setting the ccs as output
colorstate and the color's colorstate as alternative color state.
The shader adaption is very straightforward because of that.
This is the first op to obey the compositing color state. This means
from now on until all ops obey the ccs rendering is broken when ccs is
not set to linear.
I'll keep individual ops in seperate commits for easier review, because
they all need different adaptations.
Render to an offscreen and add a final conversion if the target
colorstate is not a rendering colorstate.
This now allows the GPU renderer to render to any colorstate.
Makes the verbose output (a lot) more verbose, but it makes the
colorstates used in the shaders very visible.
And it will be relevant once people start using different colorstates
everywhere (like oklab for gradients/colors and so on).
This adds the following functions:
output_color_from_alt()
alt_color_from_output()
Converts between the two colors
output_color_alpha()
alt_color_alpha()
Multiplies a color with an alpha value
This adds a GdkColorStates that encodes 2 of the default GdkColorStates
and wether their values are premultiplied or not.
Neither do the shaders do anything with this information yet, nor do the
shaders do anything with it yet, this is just the plumbing.
If desired, try creating GL_SRGB images. Pass a try_srgb boolean down to
the image creation functions and have them attempt to create images like
that.
When it is not possible to create srgb images in the given format, just
fall back to regular images. The calling code is meant to check the
GSK_GPU_IMAGE_SRGB flags to determine the actual format of the resulting
image.
Make the node processor and the pattern writer track the current
compositing color state. Color state nodes change it. We pass
the surface color state down via the frame apis.
The name of the variable is "ccs" for "compositing color space". It's an
unused variable name and it's common enough to deserve a short and sweet
name.
This shader converts between two color states, by using the
same functions that we use on the cpu. The conversion to perform
is passed as part of the variation.
As premultiplication is part of color states on the shader, we also
encode the premultiplication in the shader.
And because opacity is a useful optimization, we also allow setting
opacity.
For now, the only possible color states are srgb and srgb-linear.
This adds the following:
- ccs argument to GskRenderNode::draw
This is the compositing color state to use when drawing.
- make implementations use the CCS argument
FIXME: Some implementations are missing
- gsk_render_node_draw_with_color_state()
Draws a node with any color state, by switching to its compositing
color state, drawing in that color state and then converting to the
desired color state.
This does draw the result OVER the previous contents in the passed in
color state, so this function should be called with the target being
empty.
- gsk_render_node_draw_ccs()
This needs to be passed a css and then draws with that ccs.
The main use for this is chaining up in rendernode draw()
implementations.
- split out shared Cairo functions into gdkcairoprivate.h
gskrendernode.c and gskrendernodeimpl.c need the same functions.
Plus, there's various code in GDK that wants to use it, so put it in
gdk/ not in gsk/
gsk_render_node_draw() now calls gsk_render_node_draw_with_color_state()
with GDK_COLOR_STATE_SRGB.
That's basically the "undefined" value. We need that when drawing
nothing, which so far only happens with empty container nodes.
But empty container nodes can be children of other nodes, and that makes
things propagate. So instead of catching them, force the whole rest of
the code to deal with an undefined depth.
We also can't just set a random depth, because that will cause merging
to fail.
This is an experiment for now, but it seems that encoding srgb inside
the depth makes sense, as we not just use depth to decide on the
GL fbconfigs/Vulkan formats to pick, depth also encodes how the [0...1]
color values are quantized when stored.
Let's see where this goes.
This commit just adds the flag, but I wanted to make it an individual
commit to explain the purpose:
The SRGB flag is meant to be used for images that have an SRGB format.
In Vulkan terms, that means VK_FORMAT_*_SRGB.
In GL, it means GL_SRGB or GL_SRGB_ALPHA.
As these formats have been madatory since GL 3.0, we can (ab)use them
uncoditionally. Images in these formats are renderable, too, so it's
not just usable for uploading.
What these images allow is treating the data as sRGB while shaders
access them as linear, thereby getting sRGB<=>linear conversions for
free.
It is also possible to switch off the linearization of these images and
treat them as sRGB, which allows all sorts of shenanigans, though one
has to be careful if that turning off applies to the relevant GL/Vulkan
code in question.
If the GL texture is exportable to a dmabuf, we can just use our dmabuf
importing code to get that texture into Vulkan.
There is no need to go via host memory in that case.
And if it doesn't work, we just fall back, like before.
Unimplemented nodes are a failure now.
We make this a soft failure with a g_warning() so that during
development when adding new nodes, the renderer doesn't instantly crash,
but instead prnts a warning.
But we do consider unimplemented nodes a bug now.
Because of that, add_fallback_node() is now renamed to add_cairo_node().
When determining which way is up for the offloaded texture, we
must take all transforms into account - the ones outside the
subsurface node, and the ones inside.
By moving negative affines to be treated like dihedrals, because they
also need support of the modelview, we can free up the affine branch for
doing work without it.
Not a big win I guess, but it makes scaling more efficient.
This allows handling them without ever needing to offscreen for losing
the clip, because the clip can always be transformed.
Also, all the optimizations keep working, like occlusion culling,
clears, and so on.
The main benefit of this work is the ability for offloading to now
handle dihedral transforms of the video buffer.
The other big advantage is that we can now start our rendering with a
dihedral transform from the compositor.
This category does a finer-grained categorization than
GskTransformCategory, but it is deliberatedly made to allow
easy backwards compatibility.
The reason for the categories is that they fit our renderers more
fine.
In particular, it allows implementing wl_output_transform support more
efficiently, thereby allowing rendering buffers the right way for
rotated phone screens or monitors.
The rectangles need to touch/overlap in both directions, otherwise
there's no coverage that covers both rectangles.
Test included.
Fixes rendering glitches in various apps when redrawing.
Fixes: #6849
... instead of init_draw(); add_node(); finish_node();
We hook into the infrastructure one step earlier and close to where the
default renderer_render() and renderer_render_texture() arrive in the
nodeprocessor.
Why is this relevant?
Because process() does occlusion culling.
TL;DR: offscreens do culling now
We import them as general, so they should be exported like that.
This was a longstanding issue that I never got around to fixing and I'm
touching this code anyway atm.
See commit 3aa6c27c26 for more details.
NULL disables clearing. We only implement this for GL as in Vulkan we'd
need to create different renderpasses with different attachment
descriptions and that would require more plumbing.
We need to check that the clip is inside the opaque region, not that the
opaque region is inside the clip.
Test included, using the only not that hits the fallback path with an
opaque region smaller than its bounds.
Sometimes container nodes contain lots of overlapping opaque items. In
that case we can use the container node itself as the first node even
though none of the children cover the whole paint area.
The use case for this is a grid of cells like in a terminal where all
the cells are opaque and we want to avoid drawing the background behind
them.
Due to the way the intermediate offscreen gets drawn, we might end up
with seams at the edges.
And I don't think it's worth spending more time on than saying "not
opaque".
Fixes the compare-render testsuite
New testcase included.
We want to operate with opacities, so it makes sense to have this radily
available.
And we're doing a walk over all children on creation anyway, so why not
just capture the rect there.
We want to be able to express opaque grids. This means that the app
provides either a row of columns of opaque nodes or a column of rows,
and then the containers will magically figure it out.
The main use case for this is terminals, which are uilt using cells. And
when there's a transparent background configured but the contents are
opaque, it'd be nice if we could figure that out.
Also remove the 80% requirement. It is rather arbitrary and while it
helps for some cases, the aforementioned grid would suffer.
Clip nodes often appear in the widget tree.
And the implementation can be trivial because of the sanity checks
already performed before calling the vfunc.
This is required because transform nodes appear everywhere.
We just exit for all transforms that can't transform the clip rect
losslessly. Both because they are rare and because we'd make the
coverage possibilities much lower.
Containers can walk the list of children back to front, trying to find
the topmost node that fully covers the viewport.
And then they can skip drawing all the nodes before that one.
Asks a node to add itself if it is fully covering the clip rectangle.
In that case, it is the first node that needs to be added.
If the node is not fully covering the clip, it should not draw itself,
because there might be stuff needing to be drawn below.
If a node adds itself, it should call gsk_gpu_render_pass_begin_op().
We find the first child that covers >80% of the container and return
that.
This is a nice speedup for the common case of a GtkWindow being covered
by a large opaque background.
It will fall apart for fancy themes that play with transparency or for
small windows because the shadow region gets too large.
But then we just scan the whole node tree.
We could think about adapting the 80% number, because that wasn't chosen
with any real scientific data behind it.
This takes both the vertical and horizontal rectangle that isn't
covering the rounded corners and intersects both with the child's opaque
rect.
And then it returns the larger of the two.
This means the small slices of a window near the left/right (or
top/bottom) will never be covered, but if we wanted that, we'd need to
use something else than a rectangle - either a region or actually a
rounded rect.
But that is a lot more expensive to implement.
We can in fact meet complex transforms here. Asserting that they
are simple doesn't make it so. Instead, simply bail out if a
transform is too complex; in this case we can't offload anyway,
so no need to walk the tree further.
Test included.
Fixes: #6824
We wanted premultiplied images in all cases anyway, and moving that
requirement means we can also move the caching code for re-caching
textures into the texture specific code.
This uses offscreens for every call to get_node_as_image().
This is useful both for benchmarking benefits of those implementations
as well as checking that the node-specific paths produce identical
results.
gsk_gpu_node_processor_ensure_image() was a weird amalgamation of stuff
withe weird required and disallowed flags.
Refactor it to make the two operations we actually do there more
explicit: Removing straight alpha and generating mipmaps.
This untangling is also desirable in the future when we also want to
handle colorstates here.
Always return premultiplied images.
2 fallback cases for clip and transform nodes did not require that. If
those cases turn out to be important, they can call
gsk_gpu_get_node_as_image() directly as that's the more flexible option.
Pass through to the child instead of offscreening.
I mainly implemented it for the assertion, because this might be a
sneaky way to introduce bugs without exhaustive checking that we don't
offload stuff that is offscreened.
No actual bugs that I'm aware of, so no tests.
Strictly defensive coding.
When getting a texture as image, we were always returning the texture
unconditionally.
However, we want to mipmap textures when the scale factor is too large,
and this code path did not do that.
The same codepath on the GL renderer doesn't do that either, so the test
is disabled for it.
The switch statement was ugly.
Plus, the code should be close to the add_node() vfunc implementation,
so they can be modified together.
See future commits for an example where this matters.
It didn't bring any noticable benefits and it isn't compatible with the
way we intend to do colorstate support.
And nobody seems to want to spend time on it, so let's get rid of it.
We can bring it back later if someone wants to work on it.
The new renderers don't support them due to the required complexity of
integrating them with Vulkan and the assumptions those nodes make about
the renderer (the GL renderer exports its internal APIs into the
GLShader).
There haven't been any complaints that I'm aware of since 4.14 was
released where the default renderer does not support the nodes, so usage
in public seems to be close to nonexistant.
The 2 uses I know of were workarounds about missing features in GTK that
have stopped since GTK now supports them:
1. GStreamer used in to do premultiplication when the old GL renderer
did not do so in hardware but on the CPU.
2. Adwaita used it for masking before the mask node wa added in 4.10.
I was watching the log in my terminal and nothing happened.
And I wasn't sure if that was because nothing was printed or because the
same thing was printed every few seconds.
Fix that by printing a timestamp, so that in a few seconds something
else will be printed.
Previously we tracked the dead pixels, but that meant we didn't know the
alive pixels (because there's also unused pixels never accounted for).
And we would free the current atlas randomly due to that.
Now we track if any pixels are alive, and if so, we never gc the current
atlas.
After 60s, we gc the atlas, too. This ensures that after that time, we
free all cache resources, so if an application gets moved to the
background, it will no longer use GPU resources. (Well, at least the
cache won't.)
Only if a non-stale item is in the cache do we consider the cache not
empty.
Once the cache is empty, the device frees it and stops running the
periodic GC.
This is for 3 reasons:
1. Separation of concerns
The device is meant to manage the Vulkan/GL device and check stuff
like image sizes.
Caching is not part of that.
2. Refcounting
Images etc want to reference the device, but the cache wants to
reference images. If the cache is the device, that's a refcycle.
3. Flexibility
It's now easier to implement >1 cache, say one per depth or one per
color state.
Keeping the GdkRGBA requires doing later conversions, which isn't
necessary if we just keep the already converted float[4].
It also prepares for future color states, where the color will need to
be converted using the colorstate.
Fill and stroke nodes were not reporting proper offscreen-for-opacity
and preferred depth.
This was unlikely to have been noticed as their child is usually a solid
color.
Despite my best effort, it seems impossible to make ci and local
builds agree on what font subsetter and fonts to use, so make this
opt-in for now: If you want to produce a node file with embedded
fonts, set GSK_SUBSET_FONTS=1.
The rendernode parser creates its own fontmap for the fonts that
we deserialize from blobs. But we were using the system fontconfig
configuration for it, leading to system fonts still being found.
This is bad, and causes test failures in ci. Try with an empty
fontconfig configuration instead.
Since GskTransform is immutable, a lot of the documented "methods" are
more like "functions", in the sense that they don't keep the instance
alive but rather consume it.
This is annotated with `(transfer full)`, but since these functions are
listed as methods, their first argument is not shown.
Instead, let's add a line to the docs of each consuming function that
clarifies this behavior.
Even if we disable font fallback, after adding Cantarell Regular
to the custom fontmap, fontconfig will helpfully synthesize
Cantarell Bold for us. So, just don't check for the font at all.
If there is a url, add it to the fontmap and leave it up to the
serializing code to ensure that we don't end up with duplicate
fonts.
The hb face is is a wrapper around the font file, which is what
we need to track here, since we want to subset and serialize each
used font file exactly once.
When serializing nodes, collect the glyphs that are used from
each font, subset the font to that set of glyphs, and embed it
into the node file. We are careful to preserve the glyph IDs,
so our text nodes transparently work with the subsettted fonts.
This way, we can simply duplicate the keys as separate pointers to store
the corresponding Vulkan handles so that we can safely hash them, as
Vulkan handles may or may not be pointers depending on the target
platform.
This will fix builds on 32-bit Windows at least.
This function does not use the standard __cdecl calling convention on
Windows, meaning using g_clear_pointer() on it directly will cause
crashes on 32-bit Windows. Just call it directly if the GLsync it uses
exists.
Switch symbolc icon drawing from color-matrix to mask nodes
make the performance of the iconscroll demo crater (from 60fps
to 10fps).
Apply the same optimization we already have for color-matrix
nodes when drawing mask nodes. This gets us back to 60fps.
Fixes: #6700
On Windows, gsize is a long long unsigned. The compiler complains about
that.
Use G_GSIZE_FORMAT which translates to %llu on Windows, %lu on most
platforms, and sometimes just %u on rare cases.
Due to rounding errors, it is possible after intersecting a lot of
rectangles to end up with a tiny size for an offscreen. And because we
allow an epsilon before ceil()ing to an integer (see commit afc7b46264
for details) it is now possible that we end up with a size of 0.
Avoid that by always enforcing a minimum size of 1px.
Test included
The test uses a different codepath to arrive at the same problem - it
specifies the small size instead of triggering it via rounding errors
and clipping like the original bug (and most likely the more common case
to encounter this problem.
Fixes#6656
Use a format of
[XXX] SYMBOL DETAILS
where SYMBOL indicates the offloading status:
🗙 - no offload
▲ - offload above, with background
△ - offload above, no background
▼ - offload below, with background
▽ - offload below, no background
The VK_IMAGE_LAYOUT_UNDEFINED layout means that the data hold by the
texture can be discarded, and we don't want to discard it. Because the
Vulkan spec is unclear (see [1] for a discussion), err on the side of
caution and use VK_IMAGE_LAYOUT_GENERAL.
Fixes import failures with WebKit.
[1] https://github.com/ValveSoftware/gamescope/issues/356
Spew a bit less per-frame. Unfortunately, we still spew for
every frame, and fixing that would require more extensive
refactoring to centralize all logging in gskoffload.c
The intent of this change to get wider testing and verify that the
Vulkan drivers we get to use in the wild are good enough for our
needs. If significant problems show up, we will revert this change
for 4.16.
The new preference order is vulkan > ngl > gl > cairo.
The gl renderer is still there because we need it to support gles2.
If you need to override the default renderer choice, you can
still use the GSK_RENDERER environment variable.
Fixes: #6537
Rename things so they make more sense. The dest/source naming got
a bit unclear when we added background into the mix. Now we're going
for:
source_rect - the texture region to display
texture_rect - dimensions of the subsurface showing the texture
background_rect - dimensions of the background subsurface
bounds - union of texture_rect and background_rect
Also use this opportunity to add some api docs.
Detect a black color node below the texture node and pass that
information to the subsurface, to take advange of the single-pixel
buffer optimization.
To make this work, we need to stop using the bounds of the subsurface
node for sizing the offload, and instead use either the clip or
the texture node for that.
Make it possible for subsurfaces to have a black background on a
secondary subsurface below the actual subsurface. Using a single-pixel
buffer for that background increases the changes that the compositor
will use direct scanout for the actual subsurface.
This changes the private subsurface API. All callers have been
updated to pass an empty background rect.
The compiler (gcc 13.2) thinks that `t` could be used uninitialised.
That’s obviously not the case, because there’s always going to be at
least one loop iteration due to the initial values of `t1` and `t2`.
Change the loop to a `do…while` to make that a bit clearer to the
compiler without making any functional changes to the code.
Signed-off-by: Philip Withnall <pwithnall@gnome.org>
The goal is to generate an offscreen at 1x scale.
When not ceil()ing the numbers the offscreen code would do it *and*
adjust the scale accordingly, so we'd end up with something like a
1.01x scale.
And that would cause the code to reenter this codepath with the goal to
generate an offscreen at 1x scale.
And indeed, this would lead to infinite recursion.
Tests included.
Fixes#6553
When we look for the texture to attach to the subsurface, keep
track of transforms we see along the way, and look at their scale
component to determine if the texture needs to be flipped.
We currently don't allow rotations here.
This fixes glarea rendering being upside-down when offloaded.