When the GL renderer cannot upload a given format, print a FALLBACK
debug message with the failed format and the alternative that was
picked, for example:
Unsupported format b8g8r8a8, converting on CPU to b8g8r8a8-premultiplied
Makes it easier to figure out what's happening, especially when using
old GLES versions that don't support all formats.
Track fallback formats to use in the memoryformat directly instead of
using in the GL uploading code.
First of all, this allows sharing the code and ensuring all our
renderers use the same fallback mechanism.
But also, this allows tracking fallbacks per-format which is useful
because the fallback formats aren't really a tree. We want to make
FLOAT16 fall back to FLOAT32 when not available, but we also want
FLOAT32 fall back to FLOAT16.
By tracking the fallbacks per-format, we can achieve that.
Add gdk_memory_format_get_premultiplied() and
gdk_memory_format_get_straight() which return the matching
premultiplied/straight format.
Use this to pick the premultiplied format when uploading GL textures.
And remove the duplication in the dmabuf code, where we can now use
these functions instead of tracking both the premultiplied and straight
alpha versions.
Add an "RGBA" format that just maps to the swizzled version of the
default format.
This way, BGR gets mapped to RGB + swizzling first before trying to map
it to the default format for the depth.
The benefit here is that this format has the same memory width, so
uploading/downloading code can treat it equivalent to the original
format and there's no conversion neccessary later.
Now that we have gdk_gl_context_get_memory_flags() and code can use that
function, make the code do that.
Remove support checks from gdk_memory_format_gl_format().
This is an initial naive port that doesn't try to make use of the finer-grained
flags yet.
This is the result of experimenting with corner cases when blurring.
The result is a test that tests when the child of a blur node is
clipped out but the blurred child is not, the blurred parts are still
visible.
This immediately broke the cairo renderer, so the fix is included.
We need to make sure that all our textures have the same memory
format, or we'll run into trouble in the upload code, at least
on GLES, which isn't as forgiving about format mismatches.
Related: #6238
Instead, do it all in attach(), which becomes more and more like
ConfigureWindow. This is good, because it will let us take the
above-ness into account when making decisions about attaching.
This flag must be set when creating the class or offloading
will be disabled for this renderer.
Set that flag for the GL renderer.
Fixes the Cairo and Vulkan renderer not showing Video.
During rendering, restack offloaded subsurfaces below the main
surface, and clear the area so they peek through. After rendering,
raise the last subsurface if we haven't drawn over it.
Add a blend mode to the draw command, so it can draw transparent
black. This will be used to erase the area on top of a subsurface
when we do passthrough.
Add an extra argument to pass offload info to the diffing code.
This is then used for diffing subsurface nodes differently,
depending on their offloading status.
Make sure all our dmabuf debug messages are display-scoped so the
inspector doesn't trigger them, use the same formatting throughout,
and improve consistency of wording here and there.
It started out as busywork, but it does many separate things. If I could
start over, I'd take them apart into multiple commits:
1. Remove G_ENABLE_DEBUG around GDK_DEBUG_*() calls
This is not needed at all, the calls themselves take care of it.
2. Remove G_ENABLE_DEBUG around profiling code
This now enables profiling support in release builds.
3. Stop poking _gdk_debug_flags and use GDK_DEBUG_CHECK()
This was old code that was never updated.
4. Make !G_ENABLE_DEBUG turn off GDK_DEBUG_CHECK()
The code used to
#define GDK_DEBUG_CHECK(...) false
#define GDK_DEBUG(...)
which would compile away all the code inside those macros. This
means a lot of variable definitions and debug utility functions
would suddenly no longer be used and cause compiler errors.
Drawing a texture-scale node like a texture node when the filter is set
to "linear" doesn't work, because the texture node switches to
trilinear when mipmaps are available.
There is no reason not check the alpha swizzle for being different
from its default value. I am thinking about implementing RGBx
upload with a swizzle of rgb1, and that would break here.
We just poking at display members here, there is no guarantee that
dmabuf formats have been initialized. So do it explicitly.
This prevents a crash in the inspector when viewing a recorded frame
containing a dmabuf texture, since the inspector uses a separate
display connection.
When we are running under GLES, we can use GL_TEXTURE_EXTERNAL_OES
to support YUV formats.
Since we don't want to deal with the combinatorial explosion of
compiling all our shaders with all combinations of sampler2D vs
samplerExternalOES for all their textures, we copy the external
textures to a regular texture before using them.
This shader uses samplerExternalOES to sample an external texture
and blit it into a 'normal' texture. It only works in GLES, but
we won't use it outside of GLES.
Allow our shaders to use samplerExternalOES, by declaring
that we use the relevant extension. Unfortunately, this
only works for gles, and requires different extensions for
gles2 and gles3. Yay
Add a GSK_GL_DEFINE_PROGRAM_NO_CLIP, which is like
GSK_GL_DEFINE_PROGRAM but compiles the shader just once,
with NO_CLIP defined.
This will be used in the future for shaders that do
texture conversion.
Prepare the plumbing in the GL renderer for textures that use
target GL_TEXTURE_EXTERNAL_OES. These need to use a special sampler,
so make sure our sampler machinery does not run over it.
This is a simple helper that feed a GdkTexture
through a renderer and returns the resulting
texture. This will be used to convert dmabuf
textures to 'native' textures.
Restore the bigendian support that was lost in b0e26873f6,
by just not using GL_BGRA with GLES on bigendian. Should be a
very rare combination, but still.
We did have 4 ordering variations of ARGB straight,
but only 3 premultiplied. Add the missing one.
Update all the places where we switch over memory formats.
The glyph and icon libaries were also checking for GLES to
decide if data needs to be transformed from BGRA to RGBA.
Use the new has_bgra getter instead.
This will probably break on bigendian, because the
GL_BGRA + GL_UNSIGNED_BYTE combination is not equivalent
to the cairo format on bigendian, but this was already
broken for the gl format information that we get from
gdk_memory_format_gl_format.
Make gdk_memory_format_gl_format take the GdkGLContext,
instead of just a gles boolean. This will let us
check for extensions that may be needed for certain
formats.
Update all callers.
This is useful for colorizing in the same fashion we do for the glyph
texture atlas. In fact, for small GdkTexture, you will end up in something
like the icon texture atlas.
The primary motivator for this optimization is to draw various glyph-like
features from VTE such as many forms of boxes, lines, arrows, etc.
We don't need to be calling type node conformity checking from the tight
loop of the renderjob. Hoist that into the private header and use that
intead through via the Class pointer.
Anything that includes gskrendernodeprivate.h will get an alternate form
of ref/unref for render nodes which does not need to do type checking on
the parameter. We can expect that things are correct within GTK itself and
this saves excessive amounts of TypeNode conformities checking.
Like the previous change, this uses GdkArrayImpl instead of GArray for
tracking modelview changes. This is less important than clip tracking
simple due to being used less, but it keeps the implementation synchronous
with the Clip tracking code.
We can end up spending a lot of time in g_array_maybe_expand() through the
use of g_array_set_size() for clip tracking. That is somewhat due to the
simple nature of GArray being size-dynamic. Instead, we can use
GdkArrayImpl and let the compiler do what it does best to elide some
work and hoist other work into the calling function.
This also fixes a potential UAF in gsk_gl_render_job_push_contained_clip().
When getting a colorized texture we're downloading the texture as a
Cairo surface, and then feeding it to another texture, but we never drop
the reference of the new surface.
When shadows were offset - in particular when offset so the original
source was out of bounds of the result - the drawing code would create a
pattern for it that didn't include enough of it to compose a shadow.
Fix that by not creating those patterns anymore, but instead drawing the
source (potentially multiple times) at the required offsets.
While that does more drawing, it simplifies the shadow node draw code,
and that's the primary goal of the Cairo rendering.
Test included.
Make circle contours use 'foreach coordinates' for
its points. This works here, but not for general
conics. As with the other custom contours, avoid
emitting collapsed conics.
The code now follows gsk_rounded_rect_shrink() and with it the behavior
of the Cairo renderer and Webkit.
The old code did what the GL renderer and Cairo do, but I consider that
wrong.
I did not test Chrome.
Test attached
The source uniform may or may not point
to a glyph atlas. The optimization we do
for color nodes is only possible if it does,
so check this.
Fixes: #6094
Cairo and the GL renderer have a different idea of how to handle
transitioning of colors outside the defined range.
Consider these stops:
black 50%, white 50%
What color is at 0%?
Cairo would transition between the last and first stop, ie it'd do a
white-to-black transition and end up at rgb(0.5,0.5,0.5) at 0%.
GL would behave as it would for non-repeating gradients and use black
for the range [0%..50%] and white for [50%..100%].
The web would rescale the range so the first stop would be at 0% and
the last stop would be at 100%, so this gradient would be illegal.
Considering that it's possible for code to transition between the
different behaviors by adding explicit stops at 0%/100%, I could choose
any method.
So I chose the simplest one, which is what the GL renderer does and
which treats repeating and non-repeating gradients the same.
Tests attached.
We require folks to include gskglrenderer.h in order
to create a GL renderer. So we be careful to only
include header in gskglrenderer.h that won't trigger
ugly warnings.
See !6363
There is no decomposition going on for any contours,
and the tolerance argument is entirely unused.
Decomposition and tolerance is handled entirely
in gskpath.c by its trampoline.
Make gsk_path_builder_add_rect always
produce a clockwise rectangle. This matches
what we do for circles and rounded rects,
which also go clockwise. Note that we
still need to allow negative widths in
the contour code, to implement reverse().
Add a contour that optimizes some things for
rectangles. Also add rectangle detection to the
path parser, and add tests similar to what we
have for the other special contours.
This special contour takes advantage of its
rounded-rect-ness for speeding up bounding
boxes and winding numbers. It falls back
to the standard contour code for everything
else.
Add a private gsk_path_point_to_string that
can be called in the debugger if you want
to see the contents of a GskPathPoint and
are too lazy to cast it to GskRealPathPoint
yourself.
Only do the work for a curve the first time
we need it. This should greatly speed up
use cases where you only create a measure
to get the length of the path.
In order to compute path lengths efficiently, we need
to cache lookup tables. This commit adds API to let
contours allocate and free such measure data, as well
as API to use the data to go length -> point and
vice versa.
...and not around the center of the render node, as one could expect
given that the render node syntax for rotation, transform: rotate(90);,
happens to match the CSS syntax for the same thing, and CSS does rotate
around the center by default.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
We don't need to have the derivative as a curve,
it is enough for us to compute values of the
derivative at a given t, which we can also do
for conics.
Arcs were appealing, but they have a fatal flaw: we can't
split our arcs without changing the ellipse they trace.
That could be fixed by adding an extra parameter, but then
it is no longer any better than conics.
So switch back to conics, which have the advantage that they
are used elsewhere.
Add a new curve type for elliptical arcs
and use it for rounded rectangles and circles.
We use the 'E' command to represent elliptical
arcs in serialized paths.
The magical term to know about (because the GLSL compiler or the
validation layers sure as hell don't) is:
"dynamically uniform expression"
because if you don't have that when indexing a texture or buffer array,
you need to add nonuniformEXT() around the index variable.
Fixes the close icon on AMD having glitches of the previous icon visible
in some pixels.
We must be careful with single-point contours
that contain just a move. These never occur in
practice, but our randomized tests produce them
regularly.
Based on reverse engineering the color node and contrary to my
expectations, the matrix/offset is expressed in, and applied to,
unpremultiplied colors. The colors are being explicitly
unpremultiplied, transformed according to the matrix/offset, and
premultiplied back (see color_matrix.glsl). The matrix is getting
transposed.
Also, copy the same blurb to the corresponding GtkSnapshot function.
Signed-off-by: Sergey Bugaev <bugaevc@gmail.com>
The (out caller-allocates) and (out callee-allocates) annotations are
meant for structured or pointer types. Plain old data types are just
regular out parameters and don't need the annotation about who allocates
them.
See glib!2005, gjs#570
Take a rendernode as source and a GskPath and GskStroke,
and fill the area that is covered when stroking the path
with the given stroke parameters, like cairo_stroke() would.
This commit adds the basic infrastructure for paths.
The public APIs consists of GskPath, GskPathPoint and
GskPathBuilder.
GskPath is a data structure for paths that consists
of contours, which in turn might contain Bézier curves.
The Bezier data structure is inspired by Skia, with separate
arrays for points and operations. One advantage of this
arrangement is that start and end points are shared
between adjacent curves.
A GskPathPoint represents a point on a path, which can
be queried for various properties.
GskPathBuilder is an auxiliary builder object for paths.
graphene_rect_t is not well-suited for this purpose,
since you end up with floating-point precision problems
at the upper bound (x + width, y + height).
Instead of scale and whatnot, pass:
1. The image size
2. The viewport to map to that image size
and compute everything else from there.
In particular, we set the Vulkan viewport to the image dimensions
instead of the viewport size.
All of this makes things a lot simpler while keeping the required
functionality.
We need them for mask-only textures.
For tiffs, we convert the formats to RGBA (the idea that tiff can save
everything needs to be buried I guess) as tiffs can't do alpha-only.
Add a bunch of inline functions for graphene_rectangle_t.
We use those quite extensively in tight loops so making them as fast as
possible via inlining has massive benefits.
The current render-heavy benchmark I am playing (th paris-30k in node-editor)
went from 49fps to 85fps on my AMD.
When a GdkMemoryFormat is not supported natively and there's
postprocessing required, add a way to mark a VulkanImage as such via the
new postprocess flags.
Also allow texting such iamges only with new_for_upload() and detect
when that is the case and then run a postprocessing step that converts
that image to a suitable format.
This is done with a new "convert" shader/op.
This now supports all formats natively, no conversions happen on the CPU
anymore (unless the GPU is old).
Add an explicit begin() and an end() op. For now, this looks like
overkill, but it allows doing renderpasses with custom ops that are not
meant to render a rendernode.
Examples for this are pre/postprocessing passes or 2-pass blur.
The API was using regions because it always had. But all the code ever
did was get the extents of the region.
So simplify everything by using rectangles everywhere.
These days, we can query it with gsk_vulkan_render_get_context().
Makes quite a few functions require one less argument.
And it also makes the GskVulkanRenderPass empty. Gotta figure out what
to do with it.
Instead, build-depnd on glslc to build them.
glslc is available in all important distros for a while:
Fedora >= 28
Ubuntu >= 23.04
Debian >= 12
Arch
Opensuse >= 15.2
msys2
are the ones I checked.
So we can depend on it and avoid having to deal with keeping spirv files
up-to-date in all commits.
It's also 700kB of data, and not updating it helps.
We now store all the relevant state of the image inside the VulkanImage
struct, so we can delay barriers for as long as possible.
Whenever we want to use an image, we call the new
gsk_vulkan_image_transition() and it will add a barrier to the desired
state if one is necessary.
... and all the remaining functions still using it.
It's all unused and has been replaced by upload and download ops.
With this change, all GPU operations now go via GskVulkanOp.command()
and no more side channels exist.
This op queues a download of an image. The image will only be available
once the commands finished executing, so it requires waiting for the
render to finish, which makes the API a bit awkward.
Included is also a download_png_op() useful for debugging.
The render pass ops were not updating the image's layout to the final
layout when a render pass ends.
Fix that.
Also make the layouts explicit arguments to the render pass op.
Split out the function that uploads using a buffer, so that it can be
used with an area to only update parts of the image.
That feature is not used yet, but will be in future commits.
If a command takes too long to execute, Vulkan drivers will think they
are inflooping and abort what they were doing.
For the simple color shader with smallish nodes, this happens around
10M instances, as tested with the output of
./tests/rendernode-create-tests 10000000 colors.node
So just limit it to way lower, so that we barely never hit it, ut still
pick a big number so this optimization stays noticable.
For small regions, the optimization doesn't matter that much, so we
don't need to do lots of work on the CPU.
In particular, this should catch icons and their backgrounds (32x32),
but I was generous in selecting the number.
Gets my discrete AMD on widget-factory back to the 1900fps it had before
this optimization while making the driver clock the GPU's shader at
1.7GHz instead of the 2.1GHz it used before.
Using clear avoids the shader engine (see last commit), so if we can get
pixels out of it, we should.
So we detect the overlap with the rounded corners of the clip region and
emit shaders for those, but then use Clear() for the rest.
With this in place, widget-factory on my integrated Intel TigerLake gets
a 60% performance boost.
The op emits a vkCmdClearAttachments() with a given color. That can be
used with color nodes that are pixel-aligned and opaque to significantly
speed up rendering when the window background is a solid color.
However, currently this fails a bit outside of fullscreen when rounded
clip rectangles are in use to draw rounded corners.
Instead of using the upload vfunc and going via the code in
GskVulkanImage, copy/paste the relevant code into the command() vfunc.
This is meant to achieve multiple things:
1. Get rid of GskVulkanUploader and its own command buffer and general
non-integration with operations.
2. Get rid of GskVulkanOp:upload()
3. Get the upload/download code machinery for GskVulkanImage and put it
with the actual operations.
The current code can't do direct upload/download, that will follow in a
future commit.
... instead of doing the equivalent things manually by creating a
RenderPass and calling the relevant functions.
Now all renderpass operations are indeed stored in ops.
Also reshuffle the command emission code, because we no longer need to
emit the ops for the base renderpass.
As a result we only submit a single command buffer containing all the
render passes instead of once per render pass.
We also bind vertex buffers and descriptor sets only once now at the
start instead of once per renderpass.