When 9-slicing shadows, omit the center tile when it is
entirely contained in the outline (that is not always
the case, depending on corners and offsets).
Opportunistically use the coloring program for
drawing underlines instead of the color program.
This avoids program changes in the middle of
text.
For the Emoji text scrolling benchmark, this reduces
the program changes per frame from > 1000 to around 100.
We only have one shader that uses the color2 attribute,
and it doesn't use the uv attribute, so save vertex
memory by putting those in the same space.
This reduce the per vertex space from 32 to 24 bytes.
This reduces the size of our Vertex struct from
48 to 32 bytes. It would be nicer if we could store
the colors in fp16 format in the rendernodes, and
avoid conversion here. But this is still good.
Instead of rendering the unclipped child to a texture
(and risking blowing the texture size limit, and bad
downscaling), just render the clipped region, and live
with the fact that we can't cache the rendered texture.
This avoid bad artifacts when scrolling long textviews
in rounded clips.
There was confusion here about the handling of the
modelview transform. The modelview transform we are
getting is already set up for rendering the node
we are given, so keep it - except for possible adding
an extra scale on top when the texture would otherwise
be too big.
Move some work out of the loop in visit_text_node.
This takes advantage of the fact that the yoffset
of most glyphs is zero, so yphase generally does
not change in a line of text.
Previously, we translated the uniform key (an enum) into a location within
the shader program in GskNglProgram. A number of performance improvements
were focused around having low nubers for the uniform locations. Generally
this is the case, but some drivers such as old Intel drivers on Windows
may use rather large numbers for those.
To combat this, we can push the translation of uniform keys into locations
at the GskNglUniformState level so that we work with unranslated keys
through the process until applying them.
Fixes#3780
Recognize a common pattern: A rounded clip with
a color node, followed by a border node, with the
same outline. This is what CSS backgrounds frequently
produce, and we can render it more efficiently with
a combined shader.
Now that colors aren't uniforms anymore, we don't
win much by using the inset_shadow shader. The fragment
shaders of inset_shadow and border are identical. And
the regular border setup does nine-slicing.
Colors are not state that we carry across draw ops,
so setting the color on the render job doesn't make
much sense. Instead, pass the color to the various
draw calls. Add a few new ones for that purpose.
Also, shorten the names of some by going from
'load_vertices_from_offscreen' to 'draw_offscreen'.
When the color passed is transparent black, use
the color from the texture as source, instead of
as mask. This lets use use the coloring program
both for regular and color glyphs, avoiding
program changes in text with Emoji.
Instead of using uniforms for color used in multiple
programs, pass it as vertex attributes. This will let
us batch more draw calls, since we don't have to change
uniforms so often. In particular, for syntax-highlighted
text.
The primary goal here was to cleanup the current GL renderer to make
maintenance easier going forward. Furthermore, it tracks state to allow
us to implement more advanced renderer features going forward.
Reordering
This renderer will reorder batches by render target to reduce the number
of times render targets are changed.
In the future, we could also reorder by program within the render target
if we can determine that vertices do not overlap.
Uniform Snapshots
To allow for reordering of batches all uniforms need to be tracked for
the programs. This allows us to create the full uniform state when the
batch has been moved into a new position.
Some care was taken as it can be performance sensitive.
Attachment Snapshots
Similar to uniform snapshots, we need to know all of the texture
attachments so that we can rebind them when necessary.
Render Jobs
To help isolate the process of creating GL commands from the renderer
abstraction a render job abstraction was added. This could be extended
in the future if we decided to do tiling.
Command Queue
Render jobs create batches using the command queue. The command queue
will snapshot uniform and attachment state so that it can reorder
batches right before executing them.
Currently, the only reordering done is to ensure that we only visit
each render target once. We could extend this by tracking vertices,
attachments, and others.
This code currently uses an inline array helper to reduce overhead
from GArray which was showing up on profiles. It could be changed to
use GdkArray without too much work, but had roughly double the
instructions. Cycle counts have not yet been determined.
GLSL Programs
This was simplified to use XMACROS so that we can just extend one file
(gskglprograms.defs) instead of multiple places. The programs are added
as fields in the driver for easy access.
Driver
The driver manages textures, render targets, access to atlases,
programs, and more. There is one driver per display, by using the
shared GL context.
Some work could be done here to batch uploads so that we make fewer
calls to upload when sending icon theme data to the GPU. We'd need
to keep a copy of the atlas data for such purposes.