In some cases, we were creating gigantic intermediate textures
only to clip out a small section afterwards (e.g. in the listbox
example in gtk4-demo). This is wasteful if we apply effects on
the texture, such as blur or color-matrix. So, clip the dimensions
of the intermediate texture with the current clip. To make this
feasible, we move the texture coordinate computation out of the
pipeline setup functions into the node_as_texture function where
this clipping happens.
One extra complication we encounter is that the node might get
clipped away completely. Since Vulkan does not allow to create
empty images, we bail out in this case and not draw anything.
With these changes, the listbox example in gtk4-demo goes from
32M pixels of intermediate texture to 320000.
Instead of having a function with lots of arguments in
GskVulkanRender that we call from GskVulkanRenderPass which
then just calls back into GskVulkanRenderPass, just create
the new render pass object locally, and an api to add it
to the list that GskVulkanRender keeps. This makes it
a lot easier to preserve all the relevant parameters from
the parent render pass.
Move away from the idea of intra-frame sampling, since we only
push samples once per frame, anyway. Instead, make the profiler
keep a rolling average of the last n frames.
Whenever we need a node as a texture, we now start a new render
pass that renders the node into a new intermediate texture, and
set up a semaphore to make the current render pass wait for it.
As part of this reorganization, much of the setup and drawing
code moved from gskvulkanrender.c to gskvulkanrenderpass.c.
Allow to pass in semaphores to wait for before executing
and to signal after executing the command buffer. This
just exposes the capabilities of the underlying Vulkan
api. Update all callers to pass no semaphores, for now.
We will use this in the future.
I've finally figured out the right combination of src and dest
stage and access flags to make all validation warnings go away.
This commit only fixes the direct upload code.
This is another example for a 2-texture shader.
So far, only separable blend modes are implemented.
The implementation is not optimized, with an
if-else cascade in the shader.
We were looking at uninitialized memory here, instead
of the type of the source clip, as we should.
This showed up as mispositioned clip in the first frame
of a crossfade stack transition, and also as overdraw in
sliding stack transitions.
We already move the descriptor set layout out of it,
so we can just as well keep the pipeline layouts in
the render object as well, and get rid of this extra
object. Update all callers.
Instead of doing multiple copy commands with a tiny buffer
for each glyph, we can just batch them all together. This
also avoids the issue of creating multiple barriers for the
same image.
By tracking the last transition we can build the appropriate barriers.
Also use the most appropriate initial layout/access at creation :
for linear image : predefined (we prepare the content ourself through memcpy)
for everything else : undefined (we don't care about the content, will most likely be erase)
Move the glyph caching api to something that can support using
multiple textures. We now split the text render ops into multiple
ops for different textures, and make each op render just a substring
of the text node's glyph string.
This is just a proof of concept - we use a single 1024x1024 surface,
and just give up when we run out of space. The cache is populated
incrementally, and items are never removed.
This commit takes several steps towards rendering text
like we want to.
The creation of the cairo surface and texture is moved
to the backend (in GskVulkanRenderer). We add a mask
shader that is used in the next text pipeline to use
the texture as a mask, like cairo_mask_surface does.
There is a separate color text pipeline that uses the
already existing blend shaders to use the texture as
a source, like cairo_paint does.
The text node api is simplified to have just a single
offset, which determines the left end of the text baseline,
like all our other text drawing APIs.
This fixes the proper dependencies getting set up for generating
the shaders and only the necessary things getting rebuilt on
resources changing in gsk.
Currently, this information is not used since cairo_show_glyphs
deals with color glyphs for us. But when we get to uploading
glyphs to a texture atlas, we will need it to do the right thing.
We don't look at individual glyphs here, but just whether the
font has the has-color flag set. In practice, all glyphs in
such a font will be color glyphs, and we can avoid loading all
the glyphs this way.
The memory alignment requirements are different from the image layout.
We want the rowPitch to know where to upload the lines.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
https://bugzilla.gnome.org/show_bug.cgi?id=786485
Spooky action at a distance is not really allowed in Meson, so the rules
to generate the SPV files should go in their own directory.
Tested by: Rico Tzschichholz <ricotz@ubuntu.com>
If glslc is found, rebuild the shaders from GLSL to SPIR-V; otherwise,
we're just going to use the built files we have committed in the source
repository.
We have to work around some ordering problems here. We still
manage to keep most of the guts in modules/input/meson.build,
so it's not too ugly overall.
(The autotools build solves this with a 'make -C ../../input/modules'
inside gtk/Makefile, but that's not something we can or want to do.)
Add back dependencies on libgdk_dep and libsk_dep which are declared
dependencies. We removed this before because these declarations had
link_with: lines that dragged in the static libgdk.a and libgsk.a libs
which are linked into libgtk-4.so anyway and thus shouldn't be used
when linking internal exes/tools against libgtk-4. Remove the static
libs from the declared dependencies and have libgtk link those in
explicitly, so that the declared deps now just provide all the built
dependencies and include dirs and such for declared libgtk_dep users
such as the internal exes/tools, which want all the generated gsk/gdk/gtk
headers to exist before attempting to compile anything against the
gtk+ headers.
gtk_shader_builder_add_define should check both define_name and
define_value for not-NULL and not-empty, but the second precondition
check checks define_name again for not-empty-ness.
If you set GTK_INSPECTOR_RENDERER to the same type of
values that GSK_RENDERER takes this can change the renderer
used for the inspector. This is useful if you're debugging
one renderer and don't want to affect the inspector.
Instead of having 3 different shaders for the different clipping
versions, just have one shader and use a preprocessor define to use
different clip functions.
That preprocessor define is set in the Makefile.
Also use foo.frag and foo.vert as the file extensions instead of using
foo.frag.glsl and foo.vert.glsl, as that's what glslc suggests as
extension.
That way we don't need to move the clip rounded rect manually through
the vertex shader into the fragment shader but can just look at the push
constants.
Simplifies shaders a lot.
This way, we ensure that files that are built during make always get
properly listed. And we ensure that creating the resources actually
depends on them.
This was showing up quite high on the profiles, and there is
no real reason for copy to normalize, as the source is a
GskRoundedRect which should be normalized already unless
you did something very strange (and then you should have normalized
manually).
It was suggested that the project files to be moved to win32/, so that we can
have one less layer of directories we need to go down into to reach the project files.
Instead of relying on --generate-dependencies and the resource file,
actually list the resources in Make variables.
Fixes make not building new shaders because they're not inside the
resource file.
This node essentially implements the feColorMatrix SVG filter. I got the
idea yesterday after looking at the opacity implementation.
It can be used for opacity (not sure if we want to) and to implement a
bunch of the CSS filters.
...but disable them for now. Configs will be added for the projects to
support Vulkan-enabled builds which will then enable the builds of these
sources. Extra commands and items will be needed for the GSK resources
along with ensuring GSK_RENDERER_GSK being defined for the build of GDK,
GDK-Win32 and GSK so that the builds of Vulkan-enabled builds can be done
properly.
Filter out the Vulkan sources from the 'dist hook' rules in
gsk/Makefile.am as we don't want to in turn include them twice in the
projects when the 'make dist' is performed on a system with Vulkan
builds enabled.
One cannot use #if...#endif within macro calls in Visual Studio and
possibly other compilers, and there are more uses of VLAs that need to be
replaced with g_newa().
There were also checks for the clip type in gskvulkanrenderpass.c which
were possibly not done right (using the address of the type value to check
for a type value), which triggered errors as one is attempting to compare
a pointer type to an enum/int type.
https://bugzilla.gnome.org/show_bug.cgi?id=773299
Use g_newa() instead of VLAs, as VLAs may never be supported by some
compilers as it became optional in C11 and there are concerns about their
implementations in compilers that do support it.
https://bugzilla.gnome.org/show_bug.cgi?id=773299
Forces a full redraw every frame.
This is done generically, so it's supported on every renderer.
For widget-factory first page (with the spinner spinning and progressbar
pulsing), I get these numbers per frame:
action clipped full redraw
snapshot 0ms 7-10ms
cairo rendering 0ms 10-15ms
Vulkan rendering 3-5ms 18-20ms
Vulkan expected * 0ms 1-2ms
GL rendering unsupported 55-62ms
* expected means disabling rendering of unsupported render nodes,
instead of doing fallback drawing. So it overestimates the performance,
because borders and box-shadows are disabled.
It's faster to render once for every rectangle in the clip region than
rendering the outline of the clip region.
Especially because this reduces the time necessary to build up the frame
data.
In widget-factory (where we have 3 rectangles), this leads to a 5x
speedup in the rendering time rendering alone.
Snapshotting time goes from 10ms to ~1ms, which is another huge
improvement.
Note: We interpolate premultiplied colors as per the CSS spec. This i
different from Cairo, which interpolates unpremultiplied.
So in testcases with translucent gradients, it's actually Cairo that is
wrong.
This is now tracking the clips added by the clip nodes.
If any particular node can't deal with a clip, it falls back to Cairo
rendering. But if it can, it will render it directly.
... and implement it for the Cairo renderer.
It's an API that instructs a renderer to render to a texture.
So far this is mostly meant to be used for testing, but I could imagine
it being useful for rendering DND icons.
That code doesn't do anything.
And what the code should be doing (clearing the abckground) isn't
necessary as cairo drawing is guaranteed to clear the surface.
This does a conversion to/from GBytes and is intended for writing tests.
It's really crude but it works.
And that probably means Alex will (ab)use it for broadway.
I had originally thought I'd use GskShadow for box-shadow, but didn't in
the end.
So now it's only used for text-shadow and icon-shadow, and those don't
have a spread.
Instead of a separate allocation for any arrays in the render node
we allocate these as part of the render node itself, using C99
flexible arrays.
This leads to less allocations, which is nice, but the major reason
for this is that it allows us to change the allocation scheme further
in the future. For instance, we want to do stack-like allocation so
that all the render-nodes for an entire frame are allocated in one
(or a few) chunks.
Instead of constantly recalculating this (especially recursively for
parents!) we do it only on construction, because everything is
immutable anyway. Also, most nodes had a bounds already and can
use the new parent member instead.
We also do direct access to the node bounds rather than calling
gsk_render_node_get_bounds in various places, which means
we do less copying.
... and make the icon rendering code use it.
This requires moving even more shadow renering code into GSK, but so be
it. At least the "shadows not implemented" warning is now gone!
The node draws a solid CSS border, which can be used to cover everything
but dashed and dotted borders (double, groove, inset, ...).
For different border styles, we overlay multiple nodes and set their
colors to transparent for sides with non-matching styles.
This way we can pass the command pool around.
And that allows us to allocate and submitcustom buffers.
And that is necessary to make staging images work.
This code makes renderers fall back to Cairo rendering if they don't
know how to handle a render node's type.
This allows adding new render nodes with impunity.
Instead of appending a container node and adding the nodes to it as they
come in, we now collect the nodes until gtk_snapshot_pop() is called and
then hand them out in a container node.
The caller of gtk_snapshot_push() is then responsible for doing whatever
he wants with the created node.
Another addigion is the keep_coordinates flag to gtk_snapshot_push()
which allows callers to keep the current offset and clip region or
discard it. Discarding is useful when doing transforms, keeping it is
useful when inserting effect nodes (like the ones I'm about to add).
Instead of having a setter for the transform, have a GskTransformNode.
Most of the oprations that GTK does do not require a transform, so it
doesn't make sense to have it as a primary attribute.
Also, changing the transform requires updating the uniforms of the GL
renderer, so we're happy if we can avoid that.
gsk_render_node_get_bounds() still exists and is computed via vfunc
call:
- containers dynamically compute the bounds from their children
- surface and texture nodes get bounds passed on construction
In the brave new world of refactored render nodes, this function doesn't
really make any sense anymore. We could turn it into a vfunc, but I
don't think it's useful.
Especially because even in the brave old world, this function was
causing a vastl overallocation of nodes when the GL renderer needed render
targets.
If we ever feel, we need this function again, we can readd it later.
But nobody is using it other than for overriding opactiy. And you can
just override opacity directly if you care.
Creating render nodes is fire-and-forget, so all one should do is create
a container, append, append, append and then send it off to the
renderer. So there's no need to replace, insert between or anything
else.
- Recognize "gl" as well as "opengl" for the GL renderer
- GSK_RENDERER=help now works
- g_warning() for an unrecognized renderer (typo detection!)
- g_print() the actual renderer that is used (and error messages when
selecting) when a GSK_RENDERER is given, so you'll notice if your
renderer isn't taken.
By creating unlimited render objects, we would never wait on the GPU.
This would mean that if the GPU was the bottleneck, we would fill its
queue with render commands faster than it could process them.
And because the nvidia binary driver and my code work surprisingly well
and bugfree, this lead to exhaustion of RAM. I had 50GB of swap
configured and my hard disk was quicker as swap storage than my GPU was
at processing the commands, so stuff still filled up.
At that point my computer became rather unresponsive and I decided to
reboot it, so I that could write this patch.
Add SURFACE and TEXTURE operations. This way, we actually render more
than one node every frame because not everything is a fallback node
anymore that gets composited with its children into a cairo surface.
Instead of pushing the root matrix, push the world matrix for the
current node. That way, the bounds we emit as vertices are actually
properly transformed.
First, we collect all the info about descriptor sets into a hash table,
then we use its size to determine the amount of sets and allocate those
before we finally go ahead and use the hash table's contents to
initialize the descriptor sets.
And then we're ready to render.
We can let the GPU do its stuff without waiting. The GPU knows what it's
doing.
Which means we now get a lot of time to spend on doing CPU things (read:
we're way better in benchmarks).
The old behavior is safer, so we want to keep it around for debugging.
It can be reenabled with GSK_RENDERING_MODE=sync.
And move the actual rendering code there.
A RenderPass is a collection of operations on the same target that
get executed one after another. It roughly targets VkRenderPass or
rather the subpasses of a VkRenderPass.
For now, only the infrastructure is there. No real stuff is happening.
This is refactoring work.
GskVulkanRender is supposed to be the global object for a render
operation, ie GskVulkanRenderer.render() will create this object for
what it does.
The object will be split into stages that perform the operations
necessary to create a drawing.
Instead of using a staging iamge, we require the final image to be
linearly allocated and have host-visible memory.
This improves performance quite a bit.
The old code is still there and can be enabled with a simple change
to a #define in gskvulkanimage.h
We can now upload vertices.
And we use this to draw a yellow background. Which is clearly superior
to not drawing anything.
Also, we have shaders now. If you modify them, you need glslc installed
so they can be recompiled into Spir-V bytecode.
This is a way to query the damaged area of the backbuffer.
The GL renderer uses this to compute the extents of that damage region
(computed via buffer age) and use them to minimize the area to redraw.
This changes the semantics of GL rendering to "When calling
gdk_window_begin_frame() with a GL context, the area by
gdk_gl_context_get_damage() needs to be redrawn and every other pixel of
the backbuffer is guaranteed to be correct.
After gdk_window_end_frame() on a GL-drawn window, the whole backbuffer
must be correct.
We can always glXBufferSwap() now because of this.
... instead of a gl context.
This requires some refactoring in the way we mark the shared context as
drawing: We now call begin_frame/end_frame() on it and ignore the call
on the main context.
Unfortunately we need to do this check in all vfuncs, which sucks. But I
haven't found a better way.
Reenable GL drawing, but do it without Cairo.
Now, the context passed to gdk_window_begin_draw_frame() decides how
drawing is going to happen. If it is NULL, Cairo is used like before.
If a context is passed, Cairo may not be used for drawing and
gdk_drawing_context_get_cairo_context() is going to return NULL.
Instead, the GL renderer must draw to the GL backbuffer and
end_draw_frame() is then swapping that to the front.
The GskGLRenderer has lost the texture it used to render to and adapted
to render directly to the backbuffer instead.
The only thing missing is for GtkGLArea to gain back a performant way to
render. But it didn't have one since the introduction of GSK, this
patchset doesn't change anything about it.
The new rendering avoids two indirections (the GSK renderer's texture
and the GDK double buffering surface).
It improves icon count in the fishbowl demo by 30%.
This way, we don't spam criticals when GL is not available. Instead, we
print a useful debug message to stderr and continue with the Cairo renderer.
Signed-off-by: Emmanuele Bassi <ebassi@gnome.org>
and remove gsk_renderer_get_for_display().
This new function returns a realized renderer. Because of that, GSK can
catch failures to realize, destroy the renderer and try another one.
Or in short: I can finally use GTK on Weston with the nvidia binary
drivers again.
Signed-off-by: Emmanuele Bassi <ebassi@gnome.org>