Reserve 3 texture units per immutable sampler (because that's the
maximum per YUV sampler).
Ensure that the max-sampler calculations always include the immutable
samplers, too.
We now handle the case where memory is not HOST_CACHED.
We also track the memory type now so we can avoid mapping image memory
that is not HOST_CACHED and use buffer transfers instead.
Shader compilers struggle with compiling code that indexes texture
arrays by indexes, so keep the fallback shaders simple and don't do that
there.
There's not much of a performance difference anyway between those two
methods.
In the case where descriptor indexing is not enabled and the number of
max images is small (or we use extensive amounts of immutable samplers),
we need to be able to switch descriptors.
This patch makes that possible.
We compile custom shaders for Vulkan 1.0 that don't require the
extension.
We also ensure that our accesses are uniform by only executing one
shader at a time.
Let the objects track the number of samplers or buffers needed.
This is a required step for making Vulkan work with less featureful
(read: mobile) implementations.
This is relevant went encountering repeat nodes, where the repeat cutoff
will make the fwidth of the position go wild otherwise.
Gradients require more work now, because we need to compute offsets
twice - once for the pixel, once for the offst.
Carry an n_external_textures variable around when selecting programs and
compile different programs for different amounts of external textures.
For now, this code is unused, but dmabufs will need it.
This adds GSK_GPU_IMAGE_CAN_MIPMAP and GSK_GPU_IMAGE_MIPMAP flags and
support to ensure_image() and image creation functions for creating a
mipmapped image.
Mipmaps are created using the new mipmap op that uses
glGenerateMipmap() on GL and equivalent blit ops on Vulkan.
This is then used to ensure the image is mipmapped when rendering it
with a texture-scale node.
Add a GSK_GPU_IMAGE_STRAIGHT_ALPHA and use it for images that have
straight alpha.
Make sure those images get passed through a premultiplying pass with
the new straight alpha shader.
Also remove the old Postprocess flags from the Vulkan image that were a
leftover from copying that code from the old Vulkan renderer.
There's a well hidden line in the spec that says in
https://registry.khronos.org/vulkan/specs/1.3/html/chap15.html#interfaces-resources-descset
If the combined image sampler enables sampler Y′CBCR conversion,
it **must** be indexed only by constant integral expressions when
aggregated into arrays in shader code, irrespective of the
shaderSampledImageArrayDynamicIndexing feature.
So we'll use the same trick that we use for old GL here and do an
if dance that gives us dynamically uniform expressions.
This now uses all the previously added features to allow displaying YUV
images.
Also add a utility function that turns an image into a toggle ref for a
texture. This makes sure that reffing the image also refs the texture
and that ensures that textures stay alive as long as the image is in
use.
For now, the flags are just there because, and nobody uses them yet.
The only flag is EXTERNAL, which for now I'm using for YUV buffers,
though it's a bit undefined what that means.
Images can now have samplers - meaning they must be rendered with that
sampler. It also means that sampler must be handled as an immutable
sampler in descriptorsets.
These samplers can be created with a samplerYcbcrConversion, so code has
been added to pass that conversion when creating the imageview.
Also add code to GskVulkanFrame to track immutable samplers.
Nobody is making use of this yet.
Define an array with a compile-time-constant variable size for the
immutable samplers.
A bunch of work is necessary to ensure that at least one element is in
the sampler array, because the GLSL code
sampler2D immutable_textures[0];
is invalid.
This allows having different layouts sothat we can support immutable
samplers, whcih are required for multiplane and YUV formats.
We don't use them yet.
use it to collect the optional features we are interested in and turn
them on only if available.
For now we add the dmabuf features, but we don't use them yet.
The main reason here is that we want to not fail when the texture size
is larger than the supported GpuImage size.
When that happens, for now we just fallback slowly - ulitmately to
drawing with Cairo, which is going to be clipped.
There's multiple uses I want it for:
1. Generating the box-shadow area for blurring
2. Generating masks for rounded-rect masking
3. Optimizing the common use case of rounded-clip + color
Only the last one is implemented in this commit.
Don't try to use all those fancy GL features like glMapBuffer() and
such. Just malloc() some buffer memory and glBufferSubData() it later.
That works everywhere and is faster than (almost?) any combination of
fancy new buffer APIs. And yes I'm frustrated because I played with
those flags and none of them were better than this.
Doubles the framerate on my discrete AMD GPU.
Introduce a new GskGpuImageDescriptors object that tracks descriptors
for a set of images that can be managed by the GPU.
Then have each GskGpuShaderOp just reference the descriptors object they are
using, so that the coe can set things up properly.
To reference an image, the ops now just reference their descriptor -
which is the uint32 we've been sending to the shaders since forever.
Use glDrawArraysInstancedBaseInstance() to draw. (Yay for GL naming.)
That allows setting up the offset in the vertex array without having to
glVertexAttribPointer() everything again.
However, this is only supported since GL 4.2 and not at all in stock GLES,
so we need to have code that can work without it.
Fortunately, it is mandatory in Vulkan, so every recent GPU supports it.
And if that GPU has a proper driver, it will also expose the GL extension
for it.
(Hint: You can check https://opengles.gpuinfo.org/listextensions.php for
how many proper drivers exist outside of Mesa.)
The env var allows skipping various optimizations in the GPU shader.
This is useful for testing during development when trying to figure
out how to make a renderer as fast as possible.
We could also use it to enable/disable optimizations depending on GL
version or so, but I didn't think about that too much yet.
When drawing opaque color regions that are large enough, use
vkCmdClearAttachments()/glClear() instead of a shader. This speeds up
background rendering on particular on older GPUs.
See the commit messages of
bb2cd7225ece042f7ba10edd7547c1
for a further discussion of performance impacts.
The previous algorithm would reverse the order of subpasses, whcih leads
to unexpected behavior if dependent subpasses are not added as children
of a subpass, but just as a previous subpass - like when a subpass is
used multiple times later.
An example for this is a shadow node with multiple shadows - the source
of the shadow is used by the multiple shadows.
So ensure that adjacent subpasses stay in the same order.
The code generated by glslc -O is optimized worse by Mesa than
code generated unoptimized.
So generate unoptimized code until somebody figures out what's going
wrong here.
They're done using the pattern shader.
The pattern shader now gained a stack where vec4's can be pushed and
popped back later, which allows storing the position before computing
the new position inside the repeat node's child.
Due to GLES and old GL not allowing non-constant texture array
lookups,we need to turn the array lookup into a big switch statementin
those versions, and that requires putting the texture() call into that
switch.
But with that trick, we can use texture IDs in GLSL.
... and use it for glyphs.
The name is a slight variation of the "coloring" name from the GL
renderer.
The functionality is exactly what the "glyph" shader from the Vulkan
renderer does.
1. Compute the fwidth() twice with offset offsets
That way, we avoid glitches at the boundary between 0.0 and 1.0,
because by offsetting it by 0.5, that boundary goes away.
Then we take the min() of both which gives us the one we care about.
2. Set the gradient to repeating
By doing that, we don't get values at the 0.0/1.0 boundary clamped,
but things smoothly transition.
This smoothes the line at that boundary and makes it look just like
every other line.
Instead of strictly rounding to the given clip rectangle, increase the
rectangle to the next pixel boundary.
Also add docs that the clip_bounds do not influence the actual size of
the returned image.
It's just an object that encapsulates everything needed to create (the
data for) a pattern op.
It also clarifies which code does what, because now the NodeProcessor
and the PatternWriter are 2 different things.
Pretty much a copy of the Vulkan border shader.
A notable change is that the input arguments are changed, because GL
gets confused if you put a mat4 at the end.
when doing get_node_as_image(), that may spawn a new buffer writer that
writes into the samme buffer when rendering an offscreen with patterns.
So as a more or less hacky workaround, we now abort the current buffer
write and restart it once we've created the image.
If creation fails, create an offscreen image instead and draw that as a
texture.
Because offscreens basically always succeed, we can pretty much assume
success everywhere - apart from pattern creation functions that also
create images, because they can run out of shader space.
Frames now carry a timestamp for when they are used.
This is mainly intended to attach timestamps to cached items (textures
or glyphs), but it could in theory also be used when profiling.
We use wallclock time here, not server time, because it's cheaper and
because we're more intereseted in the local machine we're rendering on.
Now we can extend the pattern creation easily - and we can add new
patterns quickly later.
Plus, we need to keep this file in sync with pattern.glsl and it's neat
when those 2 files reference only each other.
Because GL flips its shit sometimes (ie when it's the framebuffer),
pass the height of the target as the flip variable, so commands
that need to operate on the pixels can flip the y axis around this value.
This is again mostly a copy of the Vulkan renderer.
It's a bit awkward codewise with the new invalidation framework,
because we need to cache the previous values individually now,
but it's a lot more finegrained, and we don't emit globals multiple
times when clips are nested.
... and use it to initialize the "proper" projection matrix to use in
shaders.
The resulting viewport will go from top left (0,0) to bottom right
(width, height) and the z clipping plane will go from -10000 to 10000.
This heaves over an inital chunk of code from the Vulkan renderer to
execute shaders.
The only shader that exists for now is a shader that draws a single
texture.
We use that to replace the blit op we were doing before.
For now, it just renders using cairo, uploads the result to the GPU,
blits it onto the framebuffer and then is happy.
But it can do that using Vulkan and using GL (no idea which version).
The most important thing still missing is shaders.
It also has a bunch of copy/paste from the Vulkan renderer that isn't
used yet.
But I didn't want to rip it out and then try to copy it back later
We want to introduce a new one next.
Technically, this breaks API, because gsk_vulkan_renderer_new() is going
away, but practically, we're gonna bring it back once we introduce that
renderer in a few commits.
These are not usable outside of GTK, so lets not burden bindings
with them.
I'll keep the get_child() function exposed, since it is needed to
iterate over node trees containing subsurface nodes.
asan randomly failed when this almost correct code wasn't quite correct.
Hopefully this is the correct incantation to compute the size.
Related: glib#205
This is mostly untested and a result of reading the code.
The main effect here happens when a node was drawn that didn't start on
an integer boundary, which is very rare.
However, with specially crafted tests and when using fractional scaling,
this can happen.
This happened most often when clipping by the node bounds to restrict a
push_group() call. Enlarge that rectangle to fall on a pixel boundary.
Testcase included
The code was writing invalid memory, so this might not have always
crashed, but I did my best to write the test so it causes a SEGV.
Also included is a fix for the testsuite where the expected result was
wrong.
These 2 rectangles used to intersect fine:
0 0 50 50 / 50 0
0 0 50 50 / 0 50
But the computed result was:
0 0 50 50 / 50
which is not a valid rectangle, because the corners overlap.
Make sure such rectangles return NOT_REPRESENTABLE.
The above rectangle has been added to the testsuite.
After discussion on IRC about debug messages:
- FALLBACK is meant to be used for printing stuff about fallbacks
(Cairo, offscreens, conversion when uploading, etc)
- CAIRO is for overdrawing everything drawn with Cairo
When hilighting Cairo nodes, use a different hilight color than when
hilighting other nodes.
This allows differentiating application use of Cairo (via nodes) from
renderer use of Cairo (via fallback).
Use it to overlay an error pattern over all Cairo drawing done by
renderers.
This has 2 purposes:
1. It allows detecting fallbacks in GPU renderers.
2. Application code can use it to detect where it is using Cairo
drawing.
As such, it is meant to trigger both with cairo nodes as well as when
renderers fallback for regular nodes.
The old use of the debug flag - which were 2 not very useful print
statements - was removed.
Mask nodes are transparent outside of the intersection of source and
mask, unless the mask ode is inverted alpha.
Set the bounds accordingly.
Tests have been updated accordingly.
Doing this in a way that is picked up by gobject-introspection
requires splitting off new enum members into separate doc
comments, which is a bit unfortunate.
The convert_texture() path only works for the GL renderer, the new
renderers potentially use dmabuf textures as result of render_texture(),
so they need to be smarter here.
This omission was noticed by Benjamin Otte. Add a premultiply
uniform to the external shader, and add a separate premultiply
shader for the non-external case.
When the GL renderer cannot upload a given format, print a FALLBACK
debug message with the failed format and the alternative that was
picked, for example:
Unsupported format b8g8r8a8, converting on CPU to b8g8r8a8-premultiplied
Makes it easier to figure out what's happening, especially when using
old GLES versions that don't support all formats.
Track fallback formats to use in the memoryformat directly instead of
using in the GL uploading code.
First of all, this allows sharing the code and ensuring all our
renderers use the same fallback mechanism.
But also, this allows tracking fallbacks per-format which is useful
because the fallback formats aren't really a tree. We want to make
FLOAT16 fall back to FLOAT32 when not available, but we also want
FLOAT32 fall back to FLOAT16.
By tracking the fallbacks per-format, we can achieve that.
Add gdk_memory_format_get_premultiplied() and
gdk_memory_format_get_straight() which return the matching
premultiplied/straight format.
Use this to pick the premultiplied format when uploading GL textures.
And remove the duplication in the dmabuf code, where we can now use
these functions instead of tracking both the premultiplied and straight
alpha versions.
Add an "RGBA" format that just maps to the swizzled version of the
default format.
This way, BGR gets mapped to RGB + swizzling first before trying to map
it to the default format for the depth.
The benefit here is that this format has the same memory width, so
uploading/downloading code can treat it equivalent to the original
format and there's no conversion neccessary later.
Now that we have gdk_gl_context_get_memory_flags() and code can use that
function, make the code do that.
Remove support checks from gdk_memory_format_gl_format().
This is an initial naive port that doesn't try to make use of the finer-grained
flags yet.
This is the result of experimenting with corner cases when blurring.
The result is a test that tests when the child of a blur node is
clipped out but the blurred child is not, the blurred parts are still
visible.
This immediately broke the cairo renderer, so the fix is included.
We need to make sure that all our textures have the same memory
format, or we'll run into trouble in the upload code, at least
on GLES, which isn't as forgiving about format mismatches.
Related: #6238
Instead, do it all in attach(), which becomes more and more like
ConfigureWindow. This is good, because it will let us take the
above-ness into account when making decisions about attaching.
This flag must be set when creating the class or offloading
will be disabled for this renderer.
Set that flag for the GL renderer.
Fixes the Cairo and Vulkan renderer not showing Video.
During rendering, restack offloaded subsurfaces below the main
surface, and clear the area so they peek through. After rendering,
raise the last subsurface if we haven't drawn over it.
Add a blend mode to the draw command, so it can draw transparent
black. This will be used to erase the area on top of a subsurface
when we do passthrough.
Add an extra argument to pass offload info to the diffing code.
This is then used for diffing subsurface nodes differently,
depending on their offloading status.
Make sure all our dmabuf debug messages are display-scoped so the
inspector doesn't trigger them, use the same formatting throughout,
and improve consistency of wording here and there.
It started out as busywork, but it does many separate things. If I could
start over, I'd take them apart into multiple commits:
1. Remove G_ENABLE_DEBUG around GDK_DEBUG_*() calls
This is not needed at all, the calls themselves take care of it.
2. Remove G_ENABLE_DEBUG around profiling code
This now enables profiling support in release builds.
3. Stop poking _gdk_debug_flags and use GDK_DEBUG_CHECK()
This was old code that was never updated.
4. Make !G_ENABLE_DEBUG turn off GDK_DEBUG_CHECK()
The code used to
#define GDK_DEBUG_CHECK(...) false
#define GDK_DEBUG(...)
which would compile away all the code inside those macros. This
means a lot of variable definitions and debug utility functions
would suddenly no longer be used and cause compiler errors.
Drawing a texture-scale node like a texture node when the filter is set
to "linear" doesn't work, because the texture node switches to
trilinear when mipmaps are available.
There is no reason not check the alpha swizzle for being different
from its default value. I am thinking about implementing RGBx
upload with a swizzle of rgb1, and that would break here.
We just poking at display members here, there is no guarantee that
dmabuf formats have been initialized. So do it explicitly.
This prevents a crash in the inspector when viewing a recorded frame
containing a dmabuf texture, since the inspector uses a separate
display connection.
When we are running under GLES, we can use GL_TEXTURE_EXTERNAL_OES
to support YUV formats.
Since we don't want to deal with the combinatorial explosion of
compiling all our shaders with all combinations of sampler2D vs
samplerExternalOES for all their textures, we copy the external
textures to a regular texture before using them.
This shader uses samplerExternalOES to sample an external texture
and blit it into a 'normal' texture. It only works in GLES, but
we won't use it outside of GLES.
Allow our shaders to use samplerExternalOES, by declaring
that we use the relevant extension. Unfortunately, this
only works for gles, and requires different extensions for
gles2 and gles3. Yay
Add a GSK_GL_DEFINE_PROGRAM_NO_CLIP, which is like
GSK_GL_DEFINE_PROGRAM but compiles the shader just once,
with NO_CLIP defined.
This will be used in the future for shaders that do
texture conversion.
Prepare the plumbing in the GL renderer for textures that use
target GL_TEXTURE_EXTERNAL_OES. These need to use a special sampler,
so make sure our sampler machinery does not run over it.
This is a simple helper that feed a GdkTexture
through a renderer and returns the resulting
texture. This will be used to convert dmabuf
textures to 'native' textures.
Restore the bigendian support that was lost in b0e26873f6,
by just not using GL_BGRA with GLES on bigendian. Should be a
very rare combination, but still.
We did have 4 ordering variations of ARGB straight,
but only 3 premultiplied. Add the missing one.
Update all the places where we switch over memory formats.
The glyph and icon libaries were also checking for GLES to
decide if data needs to be transformed from BGRA to RGBA.
Use the new has_bgra getter instead.
This will probably break on bigendian, because the
GL_BGRA + GL_UNSIGNED_BYTE combination is not equivalent
to the cairo format on bigendian, but this was already
broken for the gl format information that we get from
gdk_memory_format_gl_format.
Make gdk_memory_format_gl_format take the GdkGLContext,
instead of just a gles boolean. This will let us
check for extensions that may be needed for certain
formats.
Update all callers.
This is useful for colorizing in the same fashion we do for the glyph
texture atlas. In fact, for small GdkTexture, you will end up in something
like the icon texture atlas.
The primary motivator for this optimization is to draw various glyph-like
features from VTE such as many forms of boxes, lines, arrows, etc.
We don't need to be calling type node conformity checking from the tight
loop of the renderjob. Hoist that into the private header and use that
intead through via the Class pointer.
Anything that includes gskrendernodeprivate.h will get an alternate form
of ref/unref for render nodes which does not need to do type checking on
the parameter. We can expect that things are correct within GTK itself and
this saves excessive amounts of TypeNode conformities checking.
Like the previous change, this uses GdkArrayImpl instead of GArray for
tracking modelview changes. This is less important than clip tracking
simple due to being used less, but it keeps the implementation synchronous
with the Clip tracking code.
We can end up spending a lot of time in g_array_maybe_expand() through the
use of g_array_set_size() for clip tracking. That is somewhat due to the
simple nature of GArray being size-dynamic. Instead, we can use
GdkArrayImpl and let the compiler do what it does best to elide some
work and hoist other work into the calling function.
This also fixes a potential UAF in gsk_gl_render_job_push_contained_clip().
When getting a colorized texture we're downloading the texture as a
Cairo surface, and then feeding it to another texture, but we never drop
the reference of the new surface.
When shadows were offset - in particular when offset so the original
source was out of bounds of the result - the drawing code would create a
pattern for it that didn't include enough of it to compose a shadow.
Fix that by not creating those patterns anymore, but instead drawing the
source (potentially multiple times) at the required offsets.
While that does more drawing, it simplifies the shadow node draw code,
and that's the primary goal of the Cairo rendering.
Test included.