We are pretty good at batching commands now, and we can easily
produce batches that exceed the maximum number of elements per
draw call that the hw can handle. Query that number, and respect
it when merging batches.
This fixes the rendering of the overview map in GtkSourceView.
Remove a boatload of "or %NULL" from nullable parameters
and return values. gi-docgen generates suitable text from
the annotation that we don't need to duplicate.
This adds a few missing nullable annotations too.
Make gsk_ngl_texture_library_pack always return
the position including the padding. And compute
texture coordinates accurately in all cases (we
were fudging the padding for standalone textures.
We can't use this flag for any code that may get run
outside the __builtin_cpu_supports() check, and meson
doesn't allow per-file cflags. So we have to split this
code off into its own static library.
When we clean up the uniform allocations after a frame,
it can happen that our space requirements actually increase,
due to padding that depends on the order of allocations.
Instead of asserting that it doesn't happen, just make
it work by growing our allocation.
Fixes: #3853
We need to use __cpuid() to check for the presence of F16C instructions on
Visual Studio builds, and call the half_to_float4() or float_to_half4()
implementation accordingly, as the __builtin_cpu...() functions are strictly
for GCC or CLang only.
Also, since __m128i_u is not a standard intrisics type across the board, just
use __m128i on Visual Studio as it is safe to do so there for use for
_mm_loadl_epi64().
Like running on Darwin, we cannot use the alias __attribute__ as __attribute__
is also for GCC and CLang only.
When 9-slicing shadows, omit the center tile when it is
entirely contained in the outline (that is not always
the case, depending on corners and offsets).
gsk_rounded_rect_contains_rect was calling
gsk_rounded_rect_contains_point, which potentially
checks all four corners, for a total of up to 16
corner/point checks. But there is no need to do
more than 4 such checks to answer the question.
Opportunistically use the coloring program for
drawing underlines instead of the color program.
This avoids program changes in the middle of
text.
For the Emoji text scrolling benchmark, this reduces
the program changes per frame from > 1000 to around 100.
Use an IFUNC resolver to determine whether we can use
intrinsics for FP16 conversion. This requires the functions
to be no longer inline.
Sadly, it turns out that __builtin_cpu_supports ("f16c")
doesn't compile on the systems where we want it to prevent
us from getting a SIGILL at runtime.
We only have one shader that uses the color2 attribute,
and it doesn't use the uv attribute, so save vertex
memory by putting those in the same space.
This reduce the per vertex space from 32 to 24 bytes.
This reduces the size of our Vertex struct from
48 to 32 bytes. It would be nicer if we could store
the colors in fp16 format in the rendernodes, and
avoid conversion here. But this is still good.
Move the resources of each renderer to its subdirectory.
We've previously done that for the ngl renderer, but it
is better to be consistent and do it for all the renderers.
Arrange things so that non-child parameters
are always printed before the children. This
greatly helps with readability, which really
suffers when there's hundreds of lines of indented
children between the node start and its parameters.
Update all affected tests.
Instead of rendering the unclipped child to a texture
(and risking blowing the texture size limit, and bad
downscaling), just render the clipped region, and live
with the fact that we can't cache the rendered texture.
This avoid bad artifacts when scrolling long textviews
in rounded clips.
There was confusion here about the handling of the
modelview transform. The modelview transform we are
getting is already set up for rendering the node
we are given, so keep it - except for possible adding
an extra scale on top when the texture would otherwise
be too big.
Move some work out of the loop in visit_text_node.
This takes advantage of the fact that the yoffset
of most glyphs is zero, so yphase generally does
not change in a line of text.
Allow comparing container nodes to any other
node, by pretending the other node is a single
child container (if it isn't one already).
This fixes a glitch where we redraw the full
entry text when the blinking cursor goes to
opacity 0, since GskSnapshot then optimizes
away first the opacity node, and then the
single-child container.
Previously, we translated the uniform key (an enum) into a location within
the shader program in GskNglProgram. A number of performance improvements
were focused around having low nubers for the uniform locations. Generally
this is the case, but some drivers such as old Intel drivers on Windows
may use rather large numbers for those.
To combat this, we can push the translation of uniform keys into locations
at the GskNglUniformState level so that we work with unranslated keys
through the process until applying them.
Fixes#3780
The effectiveness of the front cache is limited by
subpixel positioning making it very likely that we
will meet the same glyph in different x phases inside
a single line of text.
Factoring the xphase into the front cache key makes things
better. For the string eeeeeeeeeeeeeeeeeee
before: 0% front cache hits
after: >90% front cache hits
We don't want to be responsible for duplicating the effort of the hash
table, we just want to speed up subsequent lookups. Otherwise, we risk
not marking glyph usage when tracking usage for compaction.
This required finishing up the begin_frame/end_frame semantics for
GskNglTextureLibraryw which was apparently overlooked.
The driver was changed to provide more information to the library when
beginning frames. We do not need to use end_frame so that was removed.
The frame age is the same as GL (60) but I do wonder if that is based
on seconds if we should be using something longer for situations where
we have higher frame rates.
Fixes#3771
If cairo is a subproject, it's not necessarily installed when gtk
is built. In the build tree, libcairo-script-interpreter is not stored
in the same directory as other cairo libraries.
Recognize a common pattern: A rounded clip with
a color node, followed by a border node, with the
same outline. This is what CSS backgrounds frequently
produce, and we can render it more efficiently with
a combined shader.
Now that colors aren't uniforms anymore, we don't
win much by using the inset_shadow shader. The fragment
shaders of inset_shadow and border are identical. And
the regular border setup does nine-slicing.
Colors are not state that we carry across draw ops,
so setting the color on the render job doesn't make
much sense. Instead, pass the color to the various
draw calls. Add a few new ones for that purpose.
Also, shorten the names of some by going from
'load_vertices_from_offscreen' to 'draw_offscreen'.