We align the data to a multiple of vertex stride, that way we use more
memory, but we could compute an offset into the vertex buffer without
changing the offset.
We can set the vertex offset while counting the data, this gets rid of
the need of passing all the counting machinery into the actual data
collection code.
When attempting a complex transform, check if the clip can be ignored
and do that if possible.
That way we don't cause fallbacks when transforming the clip is too
complex.
The idea is that for a rectangle intersection, each corner of the
result is either entirely part of one original rectangle or it is
an intersection point.
By detecting those 2 cases and treating them differently, we can
simplify the code to compare rounded rectangles.
Instead of emitting the render commands once per rectangle of the clip
region, just emit them once with the region's extents.
This is generally faster because it emits fewer commands to the GPU,
even though it may touch significantly more pixels.
For a proper method, we'd need to record the commands per clip rectangle
instead of emitting all of them all the time.
The border and color shaders - the ones that do AA - now multiply their
coordinates by the scale factor, which gives them better rounding
capabilities.
This in particular improves the case where they are used in fractional
scaling situations, where the scale is defined at the root element.
Previously, we just used the defaultscale factor, but now that we're
having it available in push constants, we can read it back for creating
offscreens and rendering fallbacks.
So do that.
It's a 1:1 replacement for GskVulkanPushConstants, just without the
indirection through a different file.
GskVulkanPushConstants as a struct is gone now.
The file still exists to handle the push_constants operation.
1. Use a graphene_vec2_t
2. Ensure it's always positive
3. Don't break with fallback
The scale value is nothing more than an indication of how many pixels to
assume per unit of a node.
We don't want to render the offscreen trnsformed, we want to render it
as-is.
We lose the correct scale factor, but that requires some separate work,
so for now it gets a bit blurry on hidpi.
This introduces the rect object and adds a rect_distance() and
rect_coverage() function.
_distance() returns the signed distance tp the rectangle.
_coverage() returns the coverage of a pixel centered at that position.
Note that the pixel size is computed using dFdx/dFdy.
When the node bounds were a non-integer size, the texture would get
ceil()ed pixels, but various viewport or scissor computations might
floor() instead, leaving the right/bottom row of pixels untouched.
Make sure those functions ceil(), too.
This is not the optimal way of doing it: we're
reuploading the texture with client-side conversion.
But it fits nicely into our current handling of mipmaps.
We can do better once we use shaders for colorspace
conversions.
Make the callers of this function check for
straight alpha themselves, and only do the
version compatibility check here. This makes
the function usable in contexts where straight
alpha is acceptable.
When the command queue is out of batches, there is
no point in doing further work like allocating uniforms.
This helps us avoid assertions in the uniform code
that we would hit when we run out of uniform space
too.
When we start ignoring batches, we must do it everywhere,
or we may run into assertions. This was triggered by an
enormous text node tree produced by tests/rendernode-create.
Vulkan has a different initial coordinate system to GL.
GL:
(-1, 1, -1) +------+.
|`. | `.
| `·--|---·
| : | :
+------+. :
`. : `.:
`·------· (1, -1, 1)
Vulkan:
(-1, -1, 0) +------+.
|`. | `.
| `·--|---·
| : | :
+------+. :
`. : `.:
`·------· (1, 1, 1)
so adjust the near and far plane we pass to
graphene_matrix_init_ortho() to make it end up with the same
projection as the GL renderer.
This was the intention, but the object data by itself
does not achieve that: We do run dispose on the display
when it is closed, but object data is only cleared in
finalize. So listen to the ::closed signal and remove
the driver ourselves.
Fix up the drivers dispose implementation enough for
that to actually work.
Most of the time we want to compute them based on the child node we
render to the offscreen, but not always.
For blend and cross-fade nodes, they need to be computed based on the
node's bounds.
Fixes widget-factory page fade animation weirdly resizing the fading
pages.
We weren't looking in the build dir for generated files.
Actually make sure that we look in the build dir *first*, otherwise
glib-compile-resources will still use the wrong files.
... and use it in rendernodes.
Setting up textures for diffing is done via gdk_texture_set_diff() which
should only be used during texture construction.
Note that the pointers to next/previous are allowed to dangle if one of
the textures is finalized, but that's fine because we always check both
textures' links to each other before we consider the pointer valid.
When slicing the texture, the GL renderer was
forgetting to apply the viewport origin. This
shows up when rendering things with negative
scales, leading to negative origins.
Our coverage computation only works for well-behaved
rects and rounded rects. But our modelview transform
might flip x or y around, causing things to fail.
Add functions to normalize rects and rounded rects,
and use it whenever we transform a rounded rect in GLSL.
Pass the GLsync object from texture into our
command queue, and when executing the queue,
wait on the sync object the first time we
use its associated texture.
When adding mask nodes, I overlooked that
we have two separate functions for determining
what transforms a node supports without offlines.
Since we claim that mask nodes support general
transform, they must certainly support 2d transforms
as well.
They're not needed and GLES doesn't technically support them, even
though GTK had been using them via epoxy sneakily using the
GL_OES_vertex_array_object extension behind our back.
gsk_vulkan_render_download_target() currently resets the uploader
objects before downloading the image that it produces. This is
problematic because there might be unreleased buffers and images
in the command queue.
In particular, this can make validation layers complain about the
glyph atlas - of all things! - upload buffer being released while
still being used by the command queue.
Fix that by resetting the uploader after downloading the image.
The current implementation of the glyph cache deals with atlases by
padding them with 1 pixel at the beginning, at the end, and between
each glyph.
That's cool and all, however, there's a very subtle problem with
this approach: the contents of the atlas are garbage, so this padding
is filled with garbage memory!
Rework the Vulkan glyph cache to draw each and every glyph in a
surface that has 1 pixel border of padding around it. Ensure the
surface is completely black by drawing a rectangle before handing
it to Pango to draw the glyph. Update tx and ty to pick the texture
position adjusted to the 1 pixel padding. The atlas now starts at
position (0, 0), since each glyph individually contains its own padding.
To improve legibility, add a PADDING define and use it everywhere.
Vulkan renders text using VK_BLEND_FACTOR_ONE_MINUS_SRC_ALPHA and
VK_BLEND_FACTOR_SRC_ALPHA, but that implies per-channel alpha
blending, which currently produces the wrong results when blending
glyphs with the images beneath them.
Use the default pipeline constructors, which implies using the
ONE and ONE_MINUS_SRC_ALPHA.
Basically what GL does, but without any debug or feature flag
to gatekeep it, since the Vulkan backend itself is experimental
already.
Ceil surface sizes, and floor coordinates, to the fractional scale
value.
The rects passed to the clip region are in buffer coordinates, and
must not be scaled. Consider the following scenario: Wayland, with
a 1024x768@2 window. That gives us a 2048x1536 raw image. To setup
the Vulkan render pass code, we'd scale 2048x1536 *again*, to an
unreasonable 4196x3072, which is (1) incorrect and (2) really
incorrect and (3) can lead to crashes at best, full GPU resets
at worst - and a GPU reset is incredibly not fun!
Now that we pass the right clip regions at the right coordinates
at all times, remove the extra scaling from the render pass.
This part of the Vulkan renderer is almost exactly equal to the GL
renderer, and the GL renderer already does that since at least
2a38cecd33. Copy that into the Vulkan renderer.
A nice side effect from this commit is that resizing a window now
actually works again.
Sneak in a trivial cleanup by using a variable to hold the draw
index.
This was a tricky one to figure out, but it's pretty simple to
understand (I hope!).
So, this AMD card I'm using requires buffer memory sizes to be
aligned to 16 bytes. Intel is aligned to 4 bytes I think, but
AMD - or at least this AMD model in particular - uses 16 bytes
for alignment.
When creating a a particular texture (I did not determin which one
specifically!) a buffer of size 1276 bytes is requested.
1276 / 16 = 79.75, which is clearly not aligned to the required
16 bytes.
We request Vulkan to create a buffer of 1276 bytes for us, it
figures out that it's not aligned, and creates a buffer of 1280
bytes, which is aligned. The extra 4 bytes are wasted, but that's
okay. We immediately query this buffer for this exact information,
using vkGetBufferMemoryRequirements(), and proceed to create actual
memory to back this buffer up.
The buffer tells us we must use 1280 bytes, so we pass 1280 bytes
and everyone is happy, right? Of course not. We pass 1276 bytes,
and Vulkan is subtly unhappy at us.
Fix that by passing the value that Vulkan asks us to use, i.e.,
the size returned by vkGetBufferMemoryRequirements().
This is what GL does, and for a reason: it can lead to width or
height for very small glyphs. Also, switch to dividing by a float
(1024.0) instead of an integer (1024).
This doesn't make any difference now, but will allow us to copy
subregions more easily. This is not obvious, but here's a quick
explanation:
Leaving 'bufferRowLength' and 'bufferImageHeight' implies that
Vulkan will assume the size passed in the 'imageExtent' field.
Right now, this assumption is correct - the only user of this
function is the glyph cache, and it only copies and uploads
exact rects. Next commits will change that assumption, so we
must pass 'buffer*' fields, and tell Vulkan, "this part of the
buffer represents an image of width x height, and I want the
subregion (x, y, smallerWidth, smallerHeight) of this image".
When creating an image using gsk_vulkan_image_new_for_framebuffer(),
it passes VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.
However, this is a mistake. The spec demands that the initial
layout must be either VK_IMAGE_LAYOUT_UNDEFINED or
VK_IMAGE_LAYOUT_PREINITIALIZED.
Apparently this was an oversight from commit b97fb75146, since the
commit message even documents that, and all other calls pass either
VK_IMAGE_LAYOUT_UNDEFINED or VK_IMAGE_LAYOUT_PREINITIALIZED.
Create framebuffer images using VK_IMAGE_LAYOUT_UNDEFINED, which is
what was originally expected.
Fractional scaling with the GL renderer is
experimental for now, so we disable it unless
GDK_DEBUG=gl-fractional is set.
This will give us time to work out the kinks.
This commit combines changes in the Wayland backend,
the GL context frontend, and the GL renderer to switch
them all to use the fractional scale.
In the Wayland backend, we now use the fractional scale
to size the EGL window.
In the GL frontend code, we use the fractional scale to
scale the damage region and surface in begin/end_frame.
And in the GL renderer, we replace gdk_surface_get_scale_factor()
with gdk_surface_get_scale().
Instead of tracking a single scale, track x and y scales separately.
Factor out gsk_vulkan_render_pass_new() into a private function that
receives both scales, and pass 'scale_factor' for both.
This is mostly a cosmetic change, and the goal is twofold:
1. Make it easier to spot unimplemented render node types; and
2. Prepare for a small rework
The implementation for each node now lives in specific functions,
like the GL renderer; unlike the GL renderer, however, we use a
node type vtable to map GskRenderNodeType → implementation. Render
node without an implementation map to NULL, and use the fallback
implementation. Render nodes that fail any check and return FALSE
also use fallback implementation.
If we encounter a node or texture the 1st time and they are going
to be used again, give them a name.
Then, when encountering them again, print them by name instead
of duplicating them.
We extend the syntax for nodes from:
<node-type> { ... }
to
<node-type> { ... }
<node-type> <string> { ... }
<string>;
where the first is the same as before, the 2nd defines a named node and
the last references a previously defined node.
Or to give an example:
color "node" {
bounds: 0 0 10 10;
color: red;
}
transform {
bounds: 20 0 10 10;
child: "node";
}
This will draw the red box twice, once at (0,0) and once at
(20,0).
The intended use for this is both shortening generated node files as
well as allowing to write tests that reuse nodes, in particular when
dealing with caches.
We extend the syntax for textures from just:
<url>
to
[<string>] <url>
<string>
where the first defines a named texture while the second references a
texture.
Or to give an example:
texture {
bounds: 0 0 10 10;
texture: "foo" url("foo.png");
}
texture {
bounds: 20 0 10 10;
texture: "foo";
}
This will draw the texture "foo.png" twice, once at (0,0) and once at
(20,0).
The intended use for this is both shortening generated node files as
well as allowing to write tests that reuse textures, in particular when
mixing them in texture and texture-scale nodes.
When the GL texture already has a mipmap, we don't
have to download and reupload it to generate one.
We differentiate the handling for texture scale nodes,
where we do want to force the mipmap creation even if
it requires us to reupload the GL texture, and plain
texture nodes, where we just take advantage of a
preexisting mipmap to allow trilinear filtering for
downscaling, or create one if we have to upload the
texture anyway.
Store texture coordinates for each slice
instead of assuming 0,0,1,1, and generate
overlapping slices to allow for proper mipmaps.
This almost fixes trilinear filtering with
sliced textures.
We cheat and just set the texture parameters instead and hope nothing
explodes.
So far it didn't.
This is only needed to support GLES 2.0 so it's quite a limited set of
hardware these days.
Instead of uploading a texture once per filter, ensure textures are
uploaded as little as possible and use samplers instead to switch
different filters.
Sometimes we have to reupload a texture unfortunately, when it is an
external one and we want to create mipmaps.
When filtering changes for an already-cached
texture, we need to clear the render data
before setting the new one, otherwise it
does not take and we end up reuploading
the texture every frame.
Allow to set max texture size using the
GSK_MAX_TEXTURE_SIZE environment variable.
We only allow to lower the max (for obvious
reasons), and we don't allow values smaller
than 512 (since our atlases use that size).
This allows dropping or copy/pasting rendernodes into apps that accept
SVGs.
Not sure how useful this is because we advertise text/plain from
rendernodes already and we prefer that.
In certain scenarios, address the issue where gnome.compile_resources
fails to transmit the present source directory. This is most notably
visible with MSBuild.
Use the same approach and only create an offscreen
that is big enough for the clipped part of the scaled
texture.
If the clipped part is still too large for a single
texture, we give up and just render the texture without
filters (using the regular texture rendering code path
which supports slicing).
The following commit will add the texture-scale-magnify-10000x
test which fails without this fix.
Scale nodes can use large scale factors and we don't want to create
insanely huge Cairo surfaces.
A subsequent commit will add the texture-scale-magnify-10000x
test which fails without this fix.
Cairo surfaces are created transparent.
And even if they weren't, overdrawing with transparency wouldn't erase
what's in the surface because it's a no-op.
It would require CAIRO_OPERATOR_CLEAR or CAIRO_OPERATOR_SOURCE.
The API docs outline why quite well.
This should make it possible to do saving of textures to image files
without any private API with the same featureset that GTK uses.
Also remove the gsktextureprivate.h include where
gdk_texture_get_format() was the only reason for it.
When we truncate the command queue because it
is too big, we were messing up our state accounting
and running into criticals as a consequence.
This can be reproduced by opening a well-populated
fishbowl demo in the inspectors recorder.
Fixes: #5188
Add GskMaskNode, and support it in the render node
parser, in the inspector and in GtkSnapshot.
The rendering is just fallback for now.
Based on old work by Timm Bäder.
By dividing the blur radius to obtain the clip radius, we may end up
with halved values that result in an overshunk clip mask. Extend this
so that we ensure to cover the last pixel.
Fixes artifacts seen with the cairo renderer in X11 when resizing
windows horizontally, a black 1px high line would be seen in the
top of the window due to these outset bounds being used in clipping.
More mysteriously, also seems to fix resize lag in the GL renderer
(also X11), if e.g. the bottom-right corner of a window is resized
diagonally in bottom-left -> top-right direction, or
bottom-right -> top-left.
Related: https://gitlab.gnome.org/GNOME/mutter/-/merge_requests/2175#note_1599335
Instead of asserting only in debug builds (which are generally not
shipped in distributions) we should deliver a critical log-level message
so that these can be found sooner when not developing with jhbuild,
Flatpak, etc.
Also assert that we've setup the state correctly when realizing the
GskGLRenderer object.
Fixes#4625
Those property features don't seem to be in use anywhere.
They are redundant since the docs cover the same information
and more. They also created unnecessary translation work.
Closes#4904
Having the initial layout set to VK_IMAGE_LAYOUT_GENERAL causes issues
when going from the final layout to the initial layout since the image
layout is expected to be the general layout. Setting the initial layout
to undefined doesn't have this restriction.
We now collect this information during node
construction, so use it here.
The concrete change here is that we now avoid
offscreens for container nodes with multiple children,
as long as they don't overlap. In particular, this
avoid offscreens for ellipsized dim labels.
This fixes two issues with the offscreen rendering code for nodes with
bounds not aligned with the pixel grid:
1.) When drawing to an offscreen buffer the size of the offscreen buffer
was rounded up, but then later when used as texture the vertices
correspond to the original bounds with the unrounded size. This could
then result in the offscreen texture being drawn onscreen at a slightly
smaller size, which then lead to it being visually shifted and blurry.
This is fixed by adjusting the u/v coordinates to ignore the padding
region in the offscreen texture that got added by the size increase from
rounding.
2.) The viewport used when rendering to the offscreen buffer was not
aligned with the pixel grid for nodes at coordinates not aligned with
the pixel grid. Then because the content of the offscreen buffer is not
aligned with the pixel grid and later when used as textures sampling
from it will result in interpolated values for an onscreen pixel. This
could also result in shifting and blurriness, especially for nested
offscreen rendering at different offsets.
This is fixed by adding similar padding at the beginning of the
texture and also adjusting the u/v coordinates to ignore this region.
Fixes: https://gitlab.gnome.org/GNOME/gtk/-/issues/3833
This moves a lot of the texture atlas control out of the driver and into
the various texture libraries through their base GskGLTextureLibrary class.
Additionally, this gives more control to libraries on allocating which can
be necessary for some tooling such as future Glyphy integration.
As part of this, the 1x1 pixel initialization is moved to the Glyph library
which is the only place where it is actually needed.
The compact vfunc now is responsible for compaction and it allows for us
to iterate the atlas hashtable a single time instead of twice as we were
doing previously.
The init_atlas vfunc is used to do per-library initialization such as
adding a 1x1 pixel in the Glyph cache used for coloring lines.
The allocate vfunc purely allocates but does no upload. This can be useful
for situations where a library wants to reuse the allocator from the
base class but does not want to actually insert a key/value entry. The
glyph library uses this for it's 1x1 pixel.
In the future, we will also likely want to decouple the rectangle packing
implementation from the atlas structure, or at least move it into a union
so that we do not allocate unused memory for alternate allocators.
This removes the sharing of atlases across various texture libraries. Doing
so is necessary so that atlases can have different semantics for how they
allocate within the texture as well as potentially allowing for different
formats of texture data.
For example, in the future we might store non-pixel data in the textures
such as Glyphy or even keep glyphs with color content separate from glyphs
which do not and can use alpha channel only.
This allows the gskglprograms.defs a bit more control over how a shader
will get generated and if it needs to combine sources. Currently, none of
the built-in shaders do that, but upcoming shaders which come from external
libraries will need the ability to inject additional sources in-between
layers.
If the max_entry_size is zero, then assume we can add anything to the
atlas. This allows for situations where we might be uploading an arc list
to the atlas instead of pixel data for GPU font rendering.
When large viewports are passed to gsk_renderer_render_texture(), don't
fail (or even return NULL).
Instead, draw multiple tiles and assemble them into a memory texture.
Tests added to the testsuite for this.
There are situations where our "default framebuffer" is not actually
zero, yet we still want to apply a scissor rect.
Generally, 0 is the default framebuffer. But on platforms where we need
to bind a platform-specific feature to a GL_FRAMEBUFFER, we might have a
default that is not 0. For example, on macOS we bind an IOSurfaceRef to
a GL_TEXTURE_RECTANGLE which then is assigned as the backing store for a
framebuffer. This is different than using gsk_gl_renderer_render_texture()
in that we don't want to incur an extra copy to the destination surface
nor do we even have a way to pass a texture_id into render_texture().
If the rendering operation is over an opaque region, we can potentially
avoid clearing a large section of the framebuffer destination. Some cases
you do want to clear, such as when clearing the whole contents as some
drivers have fast paths for that to avoid bringing data back into the
framebuffer.
Since now we have the shaders working on Windows under GLES with libANGLE using
a 3.0+ context, drop the check to fall back to the Cairo renderer when GLES is
being used.
Various transforms are normalized with their next transform, and if they
end up being the identity transform will return NULL.
For example a translation by (x,y,z) and followed by (-x,-y,-z) will
result in NULL.
Document the return value and more importantly, specify that a call to
`gsk_renderer_realize()` needs to be matched with a call
`gsk_renderer_unrealize()`.
Prevents issues like https://gitlab.gnome.org/GNOME/gtk/-/issues/4625
Instead of just passing major/minor, pass them twice, once for GL and
once for GLES. This way, we don't need to check for GL and GLES
separately.
If something is supported unconditionally, passing 0/0 works fine.
That said, I'd like to group the arguments somehow, because otherwise
it's just a confusing list of numbers - but I have no idea how to do
that.
Limit the diff region to 30 rectangles (randomly chosen because it
looked big enough to not trigger by accident and small enough to not
cause performance issues).
If the diff region gets more complicated, we abort to the parent node
and use its bounds as the diff region instead and then continue diffing
the rest of the node tree.
Fixes: #4560Fixes: #2396
For libANGLE to work with our shaders, we must use "300 es" for
the #version directive in our shaders, as well as using the non-legacy/
non-GLES codepath in the shaders. In order to check whether we are
using the GLSL 300 es shaders, we check whether we are using a GLES 3.0+
context. As a result, make ->glsl_version a const char* and make sure
the existing shader version macros are defined apprpriately, and add a
new macro for the "300 es" shader version string.
This will allow the gtk4 programs to run under Windows using EGL via
libANGLE. Some of the GL demos won't work for now, but at least this
makes things a lot better for using GL-accelerated graphics under Windows
for those that want to or need to use libANGLE (such as those with
graphics drivers that aren't capable of our Desktop (W)GL requirements in
GTK.
On Windows with nVidia drivers at least, when we create a legacy context
via wglCreateContext(), we may still get a (W)GL 4.x context. Allow
such contexts to also use GLSL version 130 instead of 110, so that
things do continue to work.
But don't call it too early, we only want to call it once we have
prepared the target.
This way, we guarantee that a GL context is always available and that it
is bound to the correct target.
Don't pass texture + rect, but instead have
gdk_memory_texture_new_subtexture()
and use it to generate subtextures and pass them.
This has the advantage of downloading the a too large texture only once
instead of N times.
It does not belong in GdkGLContext, it's a renderer thing.
It's also the only user of that API.
Introduce gdk_gl_context_check_version() private API to make version
checks simpler.
It turns out glReadPixels() cannot convert pixels and you are only
allowed to pass a single value into the function arguments. You need to
know which ones or things will explode.
GL is great.
Pass a format do GdkTextureClass::download(). That way we can download
data in any format.
Also replace gdk_texture_download_texture() with
gdk_memory_texture_from_texture() which again takes a format.
The old functionality is still there for code that wants it: Just pass
gdk_texture_get_format (texture) as the format argument.
Make a deep texture, if the render nodes have
high depth content.
For now, we use 32F here for the deep format,
since using 16F causes small rounding errors
that break the memorytexture roundtrip tests.
Look at the framebuffer and the rendernode to
determine what format to use for intermediate
textures.
Our preference here is to use fp16, if we have it
and it makes sense for the framebuffer we're given.
Add private api to find out if the content
of a render node should be considered 'deep'.
The information is collected at creation time,
so there is no tree-walking involved when we
are using this information in the renderer.
Currently, this comes down to whether there are
any texture nodes with high depth textures in the subtree.
In the future, we may want to allow marking gradient
nodes in this way as well.
For MemoryTexture, this is a simple change.
For GLTexture, we need to query the format at texture creation. This
sounds like a bad idea and extra work until one realizes that we'd
need to do that anyway when using the texure the first time - either
when downloading, or when trying to use it in a rendernode, where we
will soon need that information to determine if the texture prefers high
depth.
Also, now make gdk_memory_convert() the only conversion functions
and allow conversions between any 2 formats by going via a float[4].
This could be optimized via fast-paths, but so far it isn't.
We never put large icons into the icon cache,
so all its items are always atlased, but we do
put large glyphs in to the glyph cache, and we
were never freeing those items, even when they
go unused. Fix that.
This reverts commit 87af45403a.
I've found that this change is needed to ensure that the
bounding boxes of text nodes encompass all the glyphd drawing.
Without it, we overdraw the widget boundaries and cut off
glyphs.
We are rendering the glyphs on a larger surface,
and we should avoid introducing unnecessary
rounding errors here. Also, I've found that
we always need to enlarge the surface by one
pixels in each direction to avoid cutting off
the tops of large glyphs.
For 2D transforms, we can read the scale
factors more directly off the matrix.
This should eventually be moved out into a
function to decompose a 2D transform into
scale + rotation + skew + translation.
We avoid an offscreen if we know the child node
can 'handle' the transform. Shadow nodes can if their
child node does - either the child node is a text node
in which case the shortcuts we take for shadow nodes
will work fine with the transform (we just render the
text node offset), or the child is not a text node,
in which case we render the shadow to an offscreen
anyway.
This change makes the label-shadows reftest pass with
the GL renderer, not by fixing the issue but by avoiding
it.
For shadow nodes, we try pretty hard to avoid
rendering shadows, and and we have a shortcut
that just renders text offset, but we can try
harder to do nothing - if the text is offset
by zero, we don't need to draw it at all.
We need to use an offscreen whenever there is overlapping
children somewhere in the tree below, just checking the
direct child of the opacity node is not enough.
Fixes: #4261
The cairo_t that we create to render glyphs for
the glyph cache needs to match the font options
that are supposedly governing how glyphs are
drawn.
Since we allow font options to be different per
widget in gtk, we need to have them at least at
the level of individual render nodes. Adding them
to the lookup key for the glyph cache has the
side effect of solving another problem: We are
not flushing the cache when font options change.
Since font options affect how the glyphs get rendered,
we need to pass the font options down from the gtk level
to where the glyph cache is populated.
Add a new gsk_text_node_new_full api that takes a
cairo_font_options_t in addition to the other parameters.
This is needed as GskRenderNode is its own fundamental type and has its
own GValue infrastructure. And I want to put render nodes into the
clipboard which uses GValues.
Long time ago, Cairo shadows in both GTK3 and 4 were drawn at a size about
twice their radius. Eventually this was fixed but the shadow extents are
still calculated for the previous size and appear unreasonably large: for
example, 141px for a 50px radius shadow. This can get very noticeable in
places such as invisible window frame which gets included into screenshots.
https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/3419 just divides the
radius by 2 when drawing a shadow with Cairo, do the same when calculating
extents.
See https://gitlab.gnome.org/GNOME/gtk/-/issues/3841
harfbuzz has all the information we need, so we
can avoid poking directly at freetype apis. Also
drop the caching of color glyph information until
it turns out to be a problem.
The vfunc is called to initialize GL and it returns a "base" context
that GDK then uses as the context all others are shared with. So the GL
context share tree now looks like:
+ context from init_gl
- context1
- context2
...
So this is a flat tree now, the complexity is gone.
The only caveat is that backends now need to create a GL context when
initializing GL so some refactoring was needed.
Two new functions have been added:
* gdk_display_prepare_gl()
This is public API and can be used to ensure that GL has been
initialized or if not, retrieve an error to display (or debug-print).
* gdk_display_get_gl_context()
This is a private function to retrieve the base context from
init_gl(). It replaces gdk_surface_get_shared_data_context().
Scale factors can be negative, but we were not
looking out for that, triggering an assertion when
trying to create a render target with negative
width of height. Avoid that.
Fixes: #4096
We are pretty good at batching commands now, and we can easily
produce batches that exceed the maximum number of elements per
draw call that the hw can handle. Query that number, and respect
it when merging batches.
This fixes the rendering of the overview map in GtkSourceView.
Remove a boatload of "or %NULL" from nullable parameters
and return values. gi-docgen generates suitable text from
the annotation that we don't need to duplicate.
This adds a few missing nullable annotations too.
Make gsk_ngl_texture_library_pack always return
the position including the padding. And compute
texture coordinates accurately in all cases (we
were fudging the padding for standalone textures.
We can't use this flag for any code that may get run
outside the __builtin_cpu_supports() check, and meson
doesn't allow per-file cflags. So we have to split this
code off into its own static library.
When we clean up the uniform allocations after a frame,
it can happen that our space requirements actually increase,
due to padding that depends on the order of allocations.
Instead of asserting that it doesn't happen, just make
it work by growing our allocation.
Fixes: #3853
We need to use __cpuid() to check for the presence of F16C instructions on
Visual Studio builds, and call the half_to_float4() or float_to_half4()
implementation accordingly, as the __builtin_cpu...() functions are strictly
for GCC or CLang only.
Also, since __m128i_u is not a standard intrisics type across the board, just
use __m128i on Visual Studio as it is safe to do so there for use for
_mm_loadl_epi64().
Like running on Darwin, we cannot use the alias __attribute__ as __attribute__
is also for GCC and CLang only.
When 9-slicing shadows, omit the center tile when it is
entirely contained in the outline (that is not always
the case, depending on corners and offsets).
gsk_rounded_rect_contains_rect was calling
gsk_rounded_rect_contains_point, which potentially
checks all four corners, for a total of up to 16
corner/point checks. But there is no need to do
more than 4 such checks to answer the question.
Opportunistically use the coloring program for
drawing underlines instead of the color program.
This avoids program changes in the middle of
text.
For the Emoji text scrolling benchmark, this reduces
the program changes per frame from > 1000 to around 100.
Use an IFUNC resolver to determine whether we can use
intrinsics for FP16 conversion. This requires the functions
to be no longer inline.
Sadly, it turns out that __builtin_cpu_supports ("f16c")
doesn't compile on the systems where we want it to prevent
us from getting a SIGILL at runtime.
We only have one shader that uses the color2 attribute,
and it doesn't use the uv attribute, so save vertex
memory by putting those in the same space.
This reduce the per vertex space from 32 to 24 bytes.
This reduces the size of our Vertex struct from
48 to 32 bytes. It would be nicer if we could store
the colors in fp16 format in the rendernodes, and
avoid conversion here. But this is still good.
Move the resources of each renderer to its subdirectory.
We've previously done that for the ngl renderer, but it
is better to be consistent and do it for all the renderers.
Arrange things so that non-child parameters
are always printed before the children. This
greatly helps with readability, which really
suffers when there's hundreds of lines of indented
children between the node start and its parameters.
Update all affected tests.
Instead of rendering the unclipped child to a texture
(and risking blowing the texture size limit, and bad
downscaling), just render the clipped region, and live
with the fact that we can't cache the rendered texture.
This avoid bad artifacts when scrolling long textviews
in rounded clips.
There was confusion here about the handling of the
modelview transform. The modelview transform we are
getting is already set up for rendering the node
we are given, so keep it - except for possible adding
an extra scale on top when the texture would otherwise
be too big.
Move some work out of the loop in visit_text_node.
This takes advantage of the fact that the yoffset
of most glyphs is zero, so yphase generally does
not change in a line of text.
Allow comparing container nodes to any other
node, by pretending the other node is a single
child container (if it isn't one already).
This fixes a glitch where we redraw the full
entry text when the blinking cursor goes to
opacity 0, since GskSnapshot then optimizes
away first the opacity node, and then the
single-child container.
Previously, we translated the uniform key (an enum) into a location within
the shader program in GskNglProgram. A number of performance improvements
were focused around having low nubers for the uniform locations. Generally
this is the case, but some drivers such as old Intel drivers on Windows
may use rather large numbers for those.
To combat this, we can push the translation of uniform keys into locations
at the GskNglUniformState level so that we work with unranslated keys
through the process until applying them.
Fixes#3780
The effectiveness of the front cache is limited by
subpixel positioning making it very likely that we
will meet the same glyph in different x phases inside
a single line of text.
Factoring the xphase into the front cache key makes things
better. For the string eeeeeeeeeeeeeeeeeee
before: 0% front cache hits
after: >90% front cache hits
We don't want to be responsible for duplicating the effort of the hash
table, we just want to speed up subsequent lookups. Otherwise, we risk
not marking glyph usage when tracking usage for compaction.
This required finishing up the begin_frame/end_frame semantics for
GskNglTextureLibraryw which was apparently overlooked.
The driver was changed to provide more information to the library when
beginning frames. We do not need to use end_frame so that was removed.
The frame age is the same as GL (60) but I do wonder if that is based
on seconds if we should be using something longer for situations where
we have higher frame rates.
Fixes#3771
If cairo is a subproject, it's not necessarily installed when gtk
is built. In the build tree, libcairo-script-interpreter is not stored
in the same directory as other cairo libraries.
Recognize a common pattern: A rounded clip with
a color node, followed by a border node, with the
same outline. This is what CSS backgrounds frequently
produce, and we can render it more efficiently with
a combined shader.
Now that colors aren't uniforms anymore, we don't
win much by using the inset_shadow shader. The fragment
shaders of inset_shadow and border are identical. And
the regular border setup does nine-slicing.
Colors are not state that we carry across draw ops,
so setting the color on the render job doesn't make
much sense. Instead, pass the color to the various
draw calls. Add a few new ones for that purpose.
Also, shorten the names of some by going from
'load_vertices_from_offscreen' to 'draw_offscreen'.
This reduces how many changes we make when recording uniform state, which
increases the chances that the data offset will be the same when applying
uniforms.
Since we make full snapshots when recording uniform state of batches, we
need to perform some deduplication to avoid so many repeated uniform calls.
This uses a closed hashtable to determine if we are likely changing the
value to something new.
This does not currently compare values, it instead only compares that we
are going to point at a new offset into the uniform buffer. We could go
further if we compare upon updating values (we did that early on in the
prototype) so that offsets are less likely to be changed.
When the color passed is transparent black, use
the color from the texture as source, instead of
as mask. This lets use use the coloring program
both for regular and color glyphs, avoiding
program changes in text with Emoji.
Instead of using uniforms for color used in multiple
programs, pass it as vertex attributes. This will let
us batch more draw calls, since we don't have to change
uniforms so often. In particular, for syntax-highlighted
text.
We may want to change the interface between the
shaders and the renderer for ngl, and therefore,
sharing the shaders between gl and ngl will not
be practical, going forward.