Make a deep texture, if the render nodes have
high depth content.
For now, we use 32F here for the deep format,
since using 16F causes small rounding errors
that break the memorytexture roundtrip tests.
Look at the framebuffer and the rendernode to
determine what format to use for intermediate
textures.
Our preference here is to use fp16, if we have it
and it makes sense for the framebuffer we're given.
Add private api to find out if the content
of a render node should be considered 'deep'.
The information is collected at creation time,
so there is no tree-walking involved when we
are using this information in the renderer.
Currently, this comes down to whether there are
any texture nodes with high depth textures in the subtree.
In the future, we may want to allow marking gradient
nodes in this way as well.
For MemoryTexture, this is a simple change.
For GLTexture, we need to query the format at texture creation. This
sounds like a bad idea and extra work until one realizes that we'd
need to do that anyway when using the texure the first time - either
when downloading, or when trying to use it in a rendernode, where we
will soon need that information to determine if the texture prefers high
depth.
Also, now make gdk_memory_convert() the only conversion functions
and allow conversions between any 2 formats by going via a float[4].
This could be optimized via fast-paths, but so far it isn't.
We never put large icons into the icon cache,
so all its items are always atlased, but we do
put large glyphs in to the glyph cache, and we
were never freeing those items, even when they
go unused. Fix that.
This reverts commit 87af45403a.
I've found that this change is needed to ensure that the
bounding boxes of text nodes encompass all the glyphd drawing.
Without it, we overdraw the widget boundaries and cut off
glyphs.
We are rendering the glyphs on a larger surface,
and we should avoid introducing unnecessary
rounding errors here. Also, I've found that
we always need to enlarge the surface by one
pixels in each direction to avoid cutting off
the tops of large glyphs.
For 2D transforms, we can read the scale
factors more directly off the matrix.
This should eventually be moved out into a
function to decompose a 2D transform into
scale + rotation + skew + translation.
We avoid an offscreen if we know the child node
can 'handle' the transform. Shadow nodes can if their
child node does - either the child node is a text node
in which case the shortcuts we take for shadow nodes
will work fine with the transform (we just render the
text node offset), or the child is not a text node,
in which case we render the shadow to an offscreen
anyway.
This change makes the label-shadows reftest pass with
the GL renderer, not by fixing the issue but by avoiding
it.
For shadow nodes, we try pretty hard to avoid
rendering shadows, and and we have a shortcut
that just renders text offset, but we can try
harder to do nothing - if the text is offset
by zero, we don't need to draw it at all.
We need to use an offscreen whenever there is overlapping
children somewhere in the tree below, just checking the
direct child of the opacity node is not enough.
Fixes: #4261
The cairo_t that we create to render glyphs for
the glyph cache needs to match the font options
that are supposedly governing how glyphs are
drawn.
Since we allow font options to be different per
widget in gtk, we need to have them at least at
the level of individual render nodes. Adding them
to the lookup key for the glyph cache has the
side effect of solving another problem: We are
not flushing the cache when font options change.
Since font options affect how the glyphs get rendered,
we need to pass the font options down from the gtk level
to where the glyph cache is populated.
Add a new gsk_text_node_new_full api that takes a
cairo_font_options_t in addition to the other parameters.
This is needed as GskRenderNode is its own fundamental type and has its
own GValue infrastructure. And I want to put render nodes into the
clipboard which uses GValues.
Long time ago, Cairo shadows in both GTK3 and 4 were drawn at a size about
twice their radius. Eventually this was fixed but the shadow extents are
still calculated for the previous size and appear unreasonably large: for
example, 141px for a 50px radius shadow. This can get very noticeable in
places such as invisible window frame which gets included into screenshots.
https://gitlab.gnome.org/GNOME/gtk/-/merge_requests/3419 just divides the
radius by 2 when drawing a shadow with Cairo, do the same when calculating
extents.
See https://gitlab.gnome.org/GNOME/gtk/-/issues/3841
harfbuzz has all the information we need, so we
can avoid poking directly at freetype apis. Also
drop the caching of color glyph information until
it turns out to be a problem.
The vfunc is called to initialize GL and it returns a "base" context
that GDK then uses as the context all others are shared with. So the GL
context share tree now looks like:
+ context from init_gl
- context1
- context2
...
So this is a flat tree now, the complexity is gone.
The only caveat is that backends now need to create a GL context when
initializing GL so some refactoring was needed.
Two new functions have been added:
* gdk_display_prepare_gl()
This is public API and can be used to ensure that GL has been
initialized or if not, retrieve an error to display (or debug-print).
* gdk_display_get_gl_context()
This is a private function to retrieve the base context from
init_gl(). It replaces gdk_surface_get_shared_data_context().
Scale factors can be negative, but we were not
looking out for that, triggering an assertion when
trying to create a render target with negative
width of height. Avoid that.
Fixes: #4096
We are pretty good at batching commands now, and we can easily
produce batches that exceed the maximum number of elements per
draw call that the hw can handle. Query that number, and respect
it when merging batches.
This fixes the rendering of the overview map in GtkSourceView.
Remove a boatload of "or %NULL" from nullable parameters
and return values. gi-docgen generates suitable text from
the annotation that we don't need to duplicate.
This adds a few missing nullable annotations too.
Make gsk_ngl_texture_library_pack always return
the position including the padding. And compute
texture coordinates accurately in all cases (we
were fudging the padding for standalone textures.
We can't use this flag for any code that may get run
outside the __builtin_cpu_supports() check, and meson
doesn't allow per-file cflags. So we have to split this
code off into its own static library.
When we clean up the uniform allocations after a frame,
it can happen that our space requirements actually increase,
due to padding that depends on the order of allocations.
Instead of asserting that it doesn't happen, just make
it work by growing our allocation.
Fixes: #3853
We need to use __cpuid() to check for the presence of F16C instructions on
Visual Studio builds, and call the half_to_float4() or float_to_half4()
implementation accordingly, as the __builtin_cpu...() functions are strictly
for GCC or CLang only.
Also, since __m128i_u is not a standard intrisics type across the board, just
use __m128i on Visual Studio as it is safe to do so there for use for
_mm_loadl_epi64().
Like running on Darwin, we cannot use the alias __attribute__ as __attribute__
is also for GCC and CLang only.
When 9-slicing shadows, omit the center tile when it is
entirely contained in the outline (that is not always
the case, depending on corners and offsets).
gsk_rounded_rect_contains_rect was calling
gsk_rounded_rect_contains_point, which potentially
checks all four corners, for a total of up to 16
corner/point checks. But there is no need to do
more than 4 such checks to answer the question.
Opportunistically use the coloring program for
drawing underlines instead of the color program.
This avoids program changes in the middle of
text.
For the Emoji text scrolling benchmark, this reduces
the program changes per frame from > 1000 to around 100.
Use an IFUNC resolver to determine whether we can use
intrinsics for FP16 conversion. This requires the functions
to be no longer inline.
Sadly, it turns out that __builtin_cpu_supports ("f16c")
doesn't compile on the systems where we want it to prevent
us from getting a SIGILL at runtime.
We only have one shader that uses the color2 attribute,
and it doesn't use the uv attribute, so save vertex
memory by putting those in the same space.
This reduce the per vertex space from 32 to 24 bytes.
This reduces the size of our Vertex struct from
48 to 32 bytes. It would be nicer if we could store
the colors in fp16 format in the rendernodes, and
avoid conversion here. But this is still good.
Move the resources of each renderer to its subdirectory.
We've previously done that for the ngl renderer, but it
is better to be consistent and do it for all the renderers.
Arrange things so that non-child parameters
are always printed before the children. This
greatly helps with readability, which really
suffers when there's hundreds of lines of indented
children between the node start and its parameters.
Update all affected tests.
Instead of rendering the unclipped child to a texture
(and risking blowing the texture size limit, and bad
downscaling), just render the clipped region, and live
with the fact that we can't cache the rendered texture.
This avoid bad artifacts when scrolling long textviews
in rounded clips.
There was confusion here about the handling of the
modelview transform. The modelview transform we are
getting is already set up for rendering the node
we are given, so keep it - except for possible adding
an extra scale on top when the texture would otherwise
be too big.
Move some work out of the loop in visit_text_node.
This takes advantage of the fact that the yoffset
of most glyphs is zero, so yphase generally does
not change in a line of text.
Allow comparing container nodes to any other
node, by pretending the other node is a single
child container (if it isn't one already).
This fixes a glitch where we redraw the full
entry text when the blinking cursor goes to
opacity 0, since GskSnapshot then optimizes
away first the opacity node, and then the
single-child container.
Previously, we translated the uniform key (an enum) into a location within
the shader program in GskNglProgram. A number of performance improvements
were focused around having low nubers for the uniform locations. Generally
this is the case, but some drivers such as old Intel drivers on Windows
may use rather large numbers for those.
To combat this, we can push the translation of uniform keys into locations
at the GskNglUniformState level so that we work with unranslated keys
through the process until applying them.
Fixes#3780
The effectiveness of the front cache is limited by
subpixel positioning making it very likely that we
will meet the same glyph in different x phases inside
a single line of text.
Factoring the xphase into the front cache key makes things
better. For the string eeeeeeeeeeeeeeeeeee
before: 0% front cache hits
after: >90% front cache hits
We don't want to be responsible for duplicating the effort of the hash
table, we just want to speed up subsequent lookups. Otherwise, we risk
not marking glyph usage when tracking usage for compaction.
This required finishing up the begin_frame/end_frame semantics for
GskNglTextureLibraryw which was apparently overlooked.
The driver was changed to provide more information to the library when
beginning frames. We do not need to use end_frame so that was removed.
The frame age is the same as GL (60) but I do wonder if that is based
on seconds if we should be using something longer for situations where
we have higher frame rates.
Fixes#3771
If cairo is a subproject, it's not necessarily installed when gtk
is built. In the build tree, libcairo-script-interpreter is not stored
in the same directory as other cairo libraries.
Recognize a common pattern: A rounded clip with
a color node, followed by a border node, with the
same outline. This is what CSS backgrounds frequently
produce, and we can render it more efficiently with
a combined shader.
Now that colors aren't uniforms anymore, we don't
win much by using the inset_shadow shader. The fragment
shaders of inset_shadow and border are identical. And
the regular border setup does nine-slicing.
Colors are not state that we carry across draw ops,
so setting the color on the render job doesn't make
much sense. Instead, pass the color to the various
draw calls. Add a few new ones for that purpose.
Also, shorten the names of some by going from
'load_vertices_from_offscreen' to 'draw_offscreen'.
This reduces how many changes we make when recording uniform state, which
increases the chances that the data offset will be the same when applying
uniforms.
Since we make full snapshots when recording uniform state of batches, we
need to perform some deduplication to avoid so many repeated uniform calls.
This uses a closed hashtable to determine if we are likely changing the
value to something new.
This does not currently compare values, it instead only compares that we
are going to point at a new offset into the uniform buffer. We could go
further if we compare upon updating values (we did that early on in the
prototype) so that offsets are less likely to be changed.
When the color passed is transparent black, use
the color from the texture as source, instead of
as mask. This lets use use the coloring program
both for regular and color glyphs, avoiding
program changes in text with Emoji.
Instead of using uniforms for color used in multiple
programs, pass it as vertex attributes. This will let
us batch more draw calls, since we don't have to change
uniforms so often. In particular, for syntax-highlighted
text.
We may want to change the interface between the
shaders and the renderer for ngl, and therefore,
sharing the shaders between gl and ngl will not
be practical, going forward.
Hook up the "Show fallback rendering" switch for Vulkan.
This brings home the sobering truth that the Vulkan renderer
is doing *all* fallback, since we switched from offset nodes
to transform nodes.
The primary goal here was to cleanup the current GL renderer to make
maintenance easier going forward. Furthermore, it tracks state to allow
us to implement more advanced renderer features going forward.
Reordering
This renderer will reorder batches by render target to reduce the number
of times render targets are changed.
In the future, we could also reorder by program within the render target
if we can determine that vertices do not overlap.
Uniform Snapshots
To allow for reordering of batches all uniforms need to be tracked for
the programs. This allows us to create the full uniform state when the
batch has been moved into a new position.
Some care was taken as it can be performance sensitive.
Attachment Snapshots
Similar to uniform snapshots, we need to know all of the texture
attachments so that we can rebind them when necessary.
Render Jobs
To help isolate the process of creating GL commands from the renderer
abstraction a render job abstraction was added. This could be extended
in the future if we decided to do tiling.
Command Queue
Render jobs create batches using the command queue. The command queue
will snapshot uniform and attachment state so that it can reorder
batches right before executing them.
Currently, the only reordering done is to ensure that we only visit
each render target once. We could extend this by tracking vertices,
attachments, and others.
This code currently uses an inline array helper to reduce overhead
from GArray which was showing up on profiles. It could be changed to
use GdkArray without too much work, but had roughly double the
instructions. Cycle counts have not yet been determined.
GLSL Programs
This was simplified to use XMACROS so that we can just extend one file
(gskglprograms.defs) instead of multiple places. The programs are added
as fields in the driver for easy access.
Driver
The driver manages textures, render targets, access to atlases,
programs, and more. There is one driver per display, by using the
shared GL context.
Some work could be done here to batch uploads so that we make fewer
calls to upload when sending icon theme data to the GPU. We'd need
to keep a copy of the atlas data for such purposes.
When we are rendering a texture node to an offscreen,
and we have a clip, we must force the offscreen rendering.
Otherwise, the code will notice: Hey, it already is a texture
node, so no need to render it to a texture again. But when
clipping is involved, that is exactly what we want to do.
Testcase included.
Fixes: #3651
There isn't any state to modify in the type so we can use const here.
Doing so allows some of the renderer code to use const across a
number of functions so that repeated calls are elided if inlined.
These do not do modify anything so they can be marked as pure to
potentially ellide calls. Since they do dereference, I do not believe
we can make them const although that is unclear since we could technically
just return a pointer + offset. Therefore it *might* be possible to also
make these G_GNUC_CONST.
This also removes the return if fail macros from these as a good portion
of them didn't have them anyway. I think it's fair to say that access to
these incorrectly is a programmer error.
It significantly reduces the amount of code generated into generally a
movss,ret.
According to OpenGL spec, a shader object will only be flagged
for deletion unless it has been detached; when a program object
is deleted, those shader objects attached to it will be detached
but not deleted unless they have already been flagged for deletion.
So we shall detach a shader object before it is deleted, and delete
it before the program object is deleted best.
Here is a command to reproduce this testcase:
GDK_DEBUG=gl-gles gtk4-demo --run gears
Without this patch, Mesa throws this compile error:
0:130(13): error: no matching function for call to `mod(error, float)'; candidates are:
This is caused by `u_rotation - 90` being of type error since
`u_rotation` is a float and it’s illegal to subtract it with an integer.
The special case for ASCII glyphs is unfortunately not
working very well, because of an oversight in pango:
When I added subpixel positioning, I made pango_shape()
default to not rounding, and make PangoLayout call
pango_shape_with_flags() and pass the rounding information
down. The upshot is that we need to use the _with_flags
variant here and tell it to round position, so it matches
what the text node contains.
This way we can render the first frame of tests/testoutsetshadowdrawing
in 153 ops instead of 183.
And the first frame of gtk4-demo in 260 instead of 300.
These positions are not guaranteed to be in a specific order when linked
into the final GPU program. They need to be specified so that our code
in gskglrenderer.c can use known positions for them to match up with
our GskQuadVertex.
This fixes the GL renderer on macOS's OpenGL shader compiler.
Fixes#3420
Catch the error when it happens, so that we can emit a specific and more
helpful error message.
Also verify that all branches in the code now do indeed set a proper
GError when they fail, so that the final catch-all is no longer needed.
Instead, assert that the error is set so that we catch future code
additions early that do not set the GError.
The use of volatile was incorrect in GLib and has been that way for
a long time. Recently however that has changed, and this makes GTK
follow suit to avoid using volatile in the type registration.
See also: https://gitlab.gnome.org/GNOME/glib/-/merge_requests/1719
Combined with the above merge request for GLib, this fixes a large
number of compilation warnings when using Clang.