Use the new cache feature to split oversized textures into tiles the
size given by the new device API.
Then number those tiles from left to right and top to bottom and use
that number as the tile id.
When we draw large images, we absolutely do not want to keep memory that
we do not need. So do a GC run after every tile. That otentially slows
down things, but it also improves the chances of not running out of
memory.
Here's the node for the image I managed to create after I applied this
patch:
repeat {
bounds: 0 0 50000 50000;
child: text {
font: "Noto Color Emoji 10000px";
glyphs: 661 0 0 0 color;
offset: 0 10000;
hint-style: none;
}
}
Functions should behave as I expect, and I just spent an hour debugging
a refcount issue because I assumed our image creation functions return
refrences. Which is a very sane assumption.
The texture and texture-scale node code is creating image copies
for mipmaps and to adapt to the compositing colorstate.
Those texture should be cached.
We want to cache textures in the compositing color state, not in their
original color state. However, the compositing color state may change
(think multimonitor setups).
So we additionally keep a cache per colorstate.
That means texture lookup is now a 3-step process:
1. Look up in the compositing colorstate's cache
2. Look up in the general cache
3. Upload
GL_SRGB is doing postmultiplied alpha, so if the texture is
premultiplied, we can't use this optimization.
The optimization still works for unpremultiplied and opaque images,
because those don't do that step.
No colorstate conversions allowed here, though technically we could use
the alternate color state for the source most of the time, as the mask's
colorstate is only relevant for luminance.
This is a function that's meant to be used whenever both color states
of the shader are equal. In that case no colorspace conversion code
needs to be created and shaders can be shared.
The GL renderer is using FLOAT32 instead of GL_SRGB, which is screwing
up the node-editor by making it turn on high bit depth unconditionally.
So until someone fixes the GL renderer properly, do this quickfix.
That way, we can use it in one other place where we want to use mipmaps.
I don't really like it because it adds yet another argument,
but then the one new caller was selecting suboptimal shaders, and that's
worse.
The colormatrix shade does a whole matrix multiplication, which is
absolutely not necessary.
The convert shader has builtin opacity handling and when the colorstates
match will do no conversion.
The alternative color state is used as the interpolation color state.
Colors are transformed into that space on the CPU.
For now we set the interpolation color state to SRGB, because ultimately
we want to let callers specify it, so having something that's easy to
map to that behavior is desirable.
Otherwise we might have chosen to interpolate in the compositing
colorstate.
It also means that we need to premultiply colors on the CPU now because
of the limitations of the shader colorstates APIs.
This makes use of the GskGpuColorStates by setting the ccs as output
colorstate and the color's colorstate as alternative color state.
The shader adaption is very straightforward because of that.
This is the first op to obey the compositing color state. This means
from now on until all ops obey the ccs rendering is broken when ccs is
not set to linear.
I'll keep individual ops in seperate commits for easier review, because
they all need different adaptations.
Render to an offscreen and add a final conversion if the target
colorstate is not a rendering colorstate.
This now allows the GPU renderer to render to any colorstate.
Makes the verbose output (a lot) more verbose, but it makes the
colorstates used in the shaders very visible.
And it will be relevant once people start using different colorstates
everywhere (like oklab for gradients/colors and so on).
This adds the following functions:
output_color_from_alt()
alt_color_from_output()
Converts between the two colors
output_color_alpha()
alt_color_alpha()
Multiplies a color with an alpha value
This adds a GdkColorStates that encodes 2 of the default GdkColorStates
and wether their values are premultiplied or not.
Neither do the shaders do anything with this information yet, nor do the
shaders do anything with it yet, this is just the plumbing.
If desired, try creating GL_SRGB images. Pass a try_srgb boolean down to
the image creation functions and have them attempt to create images like
that.
When it is not possible to create srgb images in the given format, just
fall back to regular images. The calling code is meant to check the
GSK_GPU_IMAGE_SRGB flags to determine the actual format of the resulting
image.
Make the node processor and the pattern writer track the current
compositing color state. Color state nodes change it. We pass
the surface color state down via the frame apis.
The name of the variable is "ccs" for "compositing color space". It's an
unused variable name and it's common enough to deserve a short and sweet
name.
This shader converts between two color states, by using the
same functions that we use on the cpu. The conversion to perform
is passed as part of the variation.
As premultiplication is part of color states on the shader, we also
encode the premultiplication in the shader.
And because opacity is a useful optimization, we also allow setting
opacity.
For now, the only possible color states are srgb and srgb-linear.
This adds the following:
- ccs argument to GskRenderNode::draw
This is the compositing color state to use when drawing.
- make implementations use the CCS argument
FIXME: Some implementations are missing
- gsk_render_node_draw_with_color_state()
Draws a node with any color state, by switching to its compositing
color state, drawing in that color state and then converting to the
desired color state.
This does draw the result OVER the previous contents in the passed in
color state, so this function should be called with the target being
empty.
- gsk_render_node_draw_ccs()
This needs to be passed a css and then draws with that ccs.
The main use for this is chaining up in rendernode draw()
implementations.
- split out shared Cairo functions into gdkcairoprivate.h
gskrendernode.c and gskrendernodeimpl.c need the same functions.
Plus, there's various code in GDK that wants to use it, so put it in
gdk/ not in gsk/
gsk_render_node_draw() now calls gsk_render_node_draw_with_color_state()
with GDK_COLOR_STATE_SRGB.
That's basically the "undefined" value. We need that when drawing
nothing, which so far only happens with empty container nodes.
But empty container nodes can be children of other nodes, and that makes
things propagate. So instead of catching them, force the whole rest of
the code to deal with an undefined depth.
We also can't just set a random depth, because that will cause merging
to fail.
This is an experiment for now, but it seems that encoding srgb inside
the depth makes sense, as we not just use depth to decide on the
GL fbconfigs/Vulkan formats to pick, depth also encodes how the [0...1]
color values are quantized when stored.
Let's see where this goes.
This commit just adds the flag, but I wanted to make it an individual
commit to explain the purpose:
The SRGB flag is meant to be used for images that have an SRGB format.
In Vulkan terms, that means VK_FORMAT_*_SRGB.
In GL, it means GL_SRGB or GL_SRGB_ALPHA.
As these formats have been madatory since GL 3.0, we can (ab)use them
uncoditionally. Images in these formats are renderable, too, so it's
not just usable for uploading.
What these images allow is treating the data as sRGB while shaders
access them as linear, thereby getting sRGB<=>linear conversions for
free.
It is also possible to switch off the linearization of these images and
treat them as sRGB, which allows all sorts of shenanigans, though one
has to be careful if that turning off applies to the relevant GL/Vulkan
code in question.
If the GL texture is exportable to a dmabuf, we can just use our dmabuf
importing code to get that texture into Vulkan.
There is no need to go via host memory in that case.
And if it doesn't work, we just fall back, like before.
Unimplemented nodes are a failure now.
We make this a soft failure with a g_warning() so that during
development when adding new nodes, the renderer doesn't instantly crash,
but instead prnts a warning.
But we do consider unimplemented nodes a bug now.
Because of that, add_fallback_node() is now renamed to add_cairo_node().
When determining which way is up for the offloaded texture, we
must take all transforms into account - the ones outside the
subsurface node, and the ones inside.
By moving negative affines to be treated like dihedrals, because they
also need support of the modelview, we can free up the affine branch for
doing work without it.
Not a big win I guess, but it makes scaling more efficient.
This allows handling them without ever needing to offscreen for losing
the clip, because the clip can always be transformed.
Also, all the optimizations keep working, like occlusion culling,
clears, and so on.
The main benefit of this work is the ability for offloading to now
handle dihedral transforms of the video buffer.
The other big advantage is that we can now start our rendering with a
dihedral transform from the compositor.
This category does a finer-grained categorization than
GskTransformCategory, but it is deliberatedly made to allow
easy backwards compatibility.
The reason for the categories is that they fit our renderers more
fine.
In particular, it allows implementing wl_output_transform support more
efficiently, thereby allowing rendering buffers the right way for
rotated phone screens or monitors.
The rectangles need to touch/overlap in both directions, otherwise
there's no coverage that covers both rectangles.
Test included.
Fixes rendering glitches in various apps when redrawing.
Fixes: #6849
... instead of init_draw(); add_node(); finish_node();
We hook into the infrastructure one step earlier and close to where the
default renderer_render() and renderer_render_texture() arrive in the
nodeprocessor.
Why is this relevant?
Because process() does occlusion culling.
TL;DR: offscreens do culling now
We import them as general, so they should be exported like that.
This was a longstanding issue that I never got around to fixing and I'm
touching this code anyway atm.
See commit 3aa6c27c26 for more details.
NULL disables clearing. We only implement this for GL as in Vulkan we'd
need to create different renderpasses with different attachment
descriptions and that would require more plumbing.
We need to check that the clip is inside the opaque region, not that the
opaque region is inside the clip.
Test included, using the only not that hits the fallback path with an
opaque region smaller than its bounds.
Sometimes container nodes contain lots of overlapping opaque items. In
that case we can use the container node itself as the first node even
though none of the children cover the whole paint area.
The use case for this is a grid of cells like in a terminal where all
the cells are opaque and we want to avoid drawing the background behind
them.
Due to the way the intermediate offscreen gets drawn, we might end up
with seams at the edges.
And I don't think it's worth spending more time on than saying "not
opaque".
Fixes the compare-render testsuite
New testcase included.
We want to operate with opacities, so it makes sense to have this radily
available.
And we're doing a walk over all children on creation anyway, so why not
just capture the rect there.
We want to be able to express opaque grids. This means that the app
provides either a row of columns of opaque nodes or a column of rows,
and then the containers will magically figure it out.
The main use case for this is terminals, which are uilt using cells. And
when there's a transparent background configured but the contents are
opaque, it'd be nice if we could figure that out.
Also remove the 80% requirement. It is rather arbitrary and while it
helps for some cases, the aforementioned grid would suffer.
Clip nodes often appear in the widget tree.
And the implementation can be trivial because of the sanity checks
already performed before calling the vfunc.
This is required because transform nodes appear everywhere.
We just exit for all transforms that can't transform the clip rect
losslessly. Both because they are rare and because we'd make the
coverage possibilities much lower.
Containers can walk the list of children back to front, trying to find
the topmost node that fully covers the viewport.
And then they can skip drawing all the nodes before that one.
Asks a node to add itself if it is fully covering the clip rectangle.
In that case, it is the first node that needs to be added.
If the node is not fully covering the clip, it should not draw itself,
because there might be stuff needing to be drawn below.
If a node adds itself, it should call gsk_gpu_render_pass_begin_op().
We find the first child that covers >80% of the container and return
that.
This is a nice speedup for the common case of a GtkWindow being covered
by a large opaque background.
It will fall apart for fancy themes that play with transparency or for
small windows because the shadow region gets too large.
But then we just scan the whole node tree.
We could think about adapting the 80% number, because that wasn't chosen
with any real scientific data behind it.
This takes both the vertical and horizontal rectangle that isn't
covering the rounded corners and intersects both with the child's opaque
rect.
And then it returns the larger of the two.
This means the small slices of a window near the left/right (or
top/bottom) will never be covered, but if we wanted that, we'd need to
use something else than a rectangle - either a region or actually a
rounded rect.
But that is a lot more expensive to implement.
We can in fact meet complex transforms here. Asserting that they
are simple doesn't make it so. Instead, simply bail out if a
transform is too complex; in this case we can't offload anyway,
so no need to walk the tree further.
Test included.
Fixes: #6824
We wanted premultiplied images in all cases anyway, and moving that
requirement means we can also move the caching code for re-caching
textures into the texture specific code.
This uses offscreens for every call to get_node_as_image().
This is useful both for benchmarking benefits of those implementations
as well as checking that the node-specific paths produce identical
results.
gsk_gpu_node_processor_ensure_image() was a weird amalgamation of stuff
withe weird required and disallowed flags.
Refactor it to make the two operations we actually do there more
explicit: Removing straight alpha and generating mipmaps.
This untangling is also desirable in the future when we also want to
handle colorstates here.
Always return premultiplied images.
2 fallback cases for clip and transform nodes did not require that. If
those cases turn out to be important, they can call
gsk_gpu_get_node_as_image() directly as that's the more flexible option.
Pass through to the child instead of offscreening.
I mainly implemented it for the assertion, because this might be a
sneaky way to introduce bugs without exhaustive checking that we don't
offload stuff that is offscreened.
No actual bugs that I'm aware of, so no tests.
Strictly defensive coding.
When getting a texture as image, we were always returning the texture
unconditionally.
However, we want to mipmap textures when the scale factor is too large,
and this code path did not do that.
The same codepath on the GL renderer doesn't do that either, so the test
is disabled for it.
The switch statement was ugly.
Plus, the code should be close to the add_node() vfunc implementation,
so they can be modified together.
See future commits for an example where this matters.
It didn't bring any noticable benefits and it isn't compatible with the
way we intend to do colorstate support.
And nobody seems to want to spend time on it, so let's get rid of it.
We can bring it back later if someone wants to work on it.
The new renderers don't support them due to the required complexity of
integrating them with Vulkan and the assumptions those nodes make about
the renderer (the GL renderer exports its internal APIs into the
GLShader).
There haven't been any complaints that I'm aware of since 4.14 was
released where the default renderer does not support the nodes, so usage
in public seems to be close to nonexistant.
The 2 uses I know of were workarounds about missing features in GTK that
have stopped since GTK now supports them:
1. GStreamer used in to do premultiplication when the old GL renderer
did not do so in hardware but on the CPU.
2. Adwaita used it for masking before the mask node wa added in 4.10.
I was watching the log in my terminal and nothing happened.
And I wasn't sure if that was because nothing was printed or because the
same thing was printed every few seconds.
Fix that by printing a timestamp, so that in a few seconds something
else will be printed.
Previously we tracked the dead pixels, but that meant we didn't know the
alive pixels (because there's also unused pixels never accounted for).
And we would free the current atlas randomly due to that.
Now we track if any pixels are alive, and if so, we never gc the current
atlas.
After 60s, we gc the atlas, too. This ensures that after that time, we
free all cache resources, so if an application gets moved to the
background, it will no longer use GPU resources. (Well, at least the
cache won't.)
Only if a non-stale item is in the cache do we consider the cache not
empty.
Once the cache is empty, the device frees it and stops running the
periodic GC.
This is for 3 reasons:
1. Separation of concerns
The device is meant to manage the Vulkan/GL device and check stuff
like image sizes.
Caching is not part of that.
2. Refcounting
Images etc want to reference the device, but the cache wants to
reference images. If the cache is the device, that's a refcycle.
3. Flexibility
It's now easier to implement >1 cache, say one per depth or one per
color state.
Keeping the GdkRGBA requires doing later conversions, which isn't
necessary if we just keep the already converted float[4].
It also prepares for future color states, where the color will need to
be converted using the colorstate.
Fill and stroke nodes were not reporting proper offscreen-for-opacity
and preferred depth.
This was unlikely to have been noticed as their child is usually a solid
color.
Despite my best effort, it seems impossible to make ci and local
builds agree on what font subsetter and fonts to use, so make this
opt-in for now: If you want to produce a node file with embedded
fonts, set GSK_SUBSET_FONTS=1.
The rendernode parser creates its own fontmap for the fonts that
we deserialize from blobs. But we were using the system fontconfig
configuration for it, leading to system fonts still being found.
This is bad, and causes test failures in ci. Try with an empty
fontconfig configuration instead.