Use an offscreen and mask it if the clips get too complicated.
Technically, the code could be improved to set the rounded clip on the
offscreen instead of rendering it as a mask, but that would require more
sophisticated tracking of clip regions by respecting the scissor, and
the current clip handling can't do that yet.
This removes one of the last places where the GPU renderer was still
using Cairo fallbacks.
This is for generating descriptors for more than 1 image. The arguments
for this function are very awkward, but I couldn't come up with better
ones and the function isn't that important.
And the calling places still look a lot nicer now.
For now this uses Cairo to generate a mask and then runs a mask op.
This is different from just using fallback in that the child is rendered
with the GPU and not via fallback.
A generic part that can be shared by all gradient shaders that does the
color stop handling and a gradient-specific part that needs to be
implemented individually by each gradient implementation.
If there are more than 7 color stops, we can split the gradient into
multiple gradients with color stops like so:
0, 1, 2, 3, 4, 5, transparent
transparent, 6, 7, 8, 9, 10, transparent
...
transparent, n-2, n-1, n
and use the new BLEND_ADD to draw them on top of each other.
Adapt the testcae that tests this to use colors that work with the fancy
algorithm we use now, so that BLEND_ADD and transitions to transparent
do not cause issues.
Instead of scaled coordinates, use the unscaled ones.
This ensure that gradients get computed correctly as they are not safe
against nonorthogonal transforms - like scales with different scale
factors.
The shader can only deal with up to 7 color stops - but that's good
enough for the real world.
Plus, we have the uber shader.
And if that fails, we can still fall back to Cairo.
The code also doesn't handle repeating linear gradients yet.
This shader can take over from the ubershader. And it can be used
instead of launching the ubershader when no offscreens are necessary.
Also includes an optimization that uses the colorize shader when
appropriate.
The ubershader has some corner cases where it can't be used, in
particular when the child is massively larger than the repeat node and
the repeat node is used to clip lots of the source.
It's better than the Cairo renderer, so use it instead.
It's still only picked once GL fails, so it will probably only ever be
picked when people use GDK_DEBUG=gl-disable, but at least it will be
picked.
per-backend renderers and GL renderers are a different thing, so treat
them as such.
Also, try the GL renderer unconditionally. The renderer initialization
code will take care of GL not being available.
This is using the Vulkan renderer.
It also allows claiming support for all the formats that only Vulkan
supports, but that neither GL nor native mmap can handle.
Add GSK_GPU_IMAGE_RENDERABLE and GSK_GPU_IMAGE_FILTERABLE and make sure
to check formats for this feature.
This requires reorganizing code to actually do this work instead of just
pretending formats are supported.
This fixes GLES upload tests with NGL.
This ensures both that we signal a semaphore for a dmabuf when we export
an image and that we import semaphores for dmabufs and wait on them.
Fixes Vulkan node-editor displaying the Vulkan renderer in the sidebar.
Make gsk_renderer_render_texture() create a dmabuf texture if that is
possible.
If it isn't (ie if we're not on Linux or if dmabufs are otherwise not
working) fall back to the previous code of creating a memory texture.
When using the uber shader a lot, we may overflow the (only 16kB large)
storage buffer.
Stop crashing when that happens and instead just allocate a new one.
This makes the (currently single) storage buffer handled by
GskGpuDescriptors.
A side effect is that we now have support for multiple buffers in place.
We just have to use it.
Mixed into this commit is a complete rework of the pattern writer.
Instead of writing straight into the buffer (complete with repeatedly
backtracking when we have to do offscreens), write into a temporary
buffer and copy into the storage buffer on committing.
The GL branch should eventually call into gdk_gl_context_get_scale(),
which is what checks for GDK_DEBUG=gl-fractional; whereas the Vulkan
branch needs no change.
If we have the choice between running the ubershader or a normal shader
with offscreens, make the choice depend on if the ubershader would
offscreen anyway.
If so, just run the normal shader.
This really gets rid of all ubershader invocations in Adwaita
widget-factory.
Instead of using an enum, use a usual custom class struct like we use
for GskGpuOp.
As a side effect of that refactoring, the display gained a hash table
for textures where we can't use the render data because the texture is
used in multiple renderers.
The goal here is that a texture is always cached and we can ensure that
there is a 1:1 relation between textures and their GskGpuImage. This is
important in particular for external textures - like dmabufs - where we
absolutely don't want 2 images with 2 device memories, and where we use
toggle references to keep them alive.
Reserve 3 texture units per immutable sampler (because that's the
maximum per YUV sampler).
Ensure that the max-sampler calculations always include the immutable
samplers, too.
We now handle the case where memory is not HOST_CACHED.
We also track the memory type now so we can avoid mapping image memory
that is not HOST_CACHED and use buffer transfers instead.
Shader compilers struggle with compiling code that indexes texture
arrays by indexes, so keep the fallback shaders simple and don't do that
there.
There's not much of a performance difference anyway between those two
methods.
In the case where descriptor indexing is not enabled and the number of
max images is small (or we use extensive amounts of immutable samplers),
we need to be able to switch descriptors.
This patch makes that possible.
We compile custom shaders for Vulkan 1.0 that don't require the
extension.
We also ensure that our accesses are uniform by only executing one
shader at a time.
Let the objects track the number of samplers or buffers needed.
This is a required step for making Vulkan work with less featureful
(read: mobile) implementations.
This is relevant went encountering repeat nodes, where the repeat cutoff
will make the fwidth of the position go wild otherwise.
Gradients require more work now, because we need to compute offsets
twice - once for the pixel, once for the offst.
Carry an n_external_textures variable around when selecting programs and
compile different programs for different amounts of external textures.
For now, this code is unused, but dmabufs will need it.
This adds GSK_GPU_IMAGE_CAN_MIPMAP and GSK_GPU_IMAGE_MIPMAP flags and
support to ensure_image() and image creation functions for creating a
mipmapped image.
Mipmaps are created using the new mipmap op that uses
glGenerateMipmap() on GL and equivalent blit ops on Vulkan.
This is then used to ensure the image is mipmapped when rendering it
with a texture-scale node.
Add a GSK_GPU_IMAGE_STRAIGHT_ALPHA and use it for images that have
straight alpha.
Make sure those images get passed through a premultiplying pass with
the new straight alpha shader.
Also remove the old Postprocess flags from the Vulkan image that were a
leftover from copying that code from the old Vulkan renderer.
There's a well hidden line in the spec that says in
https://registry.khronos.org/vulkan/specs/1.3/html/chap15.html#interfaces-resources-descset
If the combined image sampler enables sampler Y′CBCR conversion,
it **must** be indexed only by constant integral expressions when
aggregated into arrays in shader code, irrespective of the
shaderSampledImageArrayDynamicIndexing feature.
So we'll use the same trick that we use for old GL here and do an
if dance that gives us dynamically uniform expressions.
This now uses all the previously added features to allow displaying YUV
images.
Also add a utility function that turns an image into a toggle ref for a
texture. This makes sure that reffing the image also refs the texture
and that ensures that textures stay alive as long as the image is in
use.
For now, the flags are just there because, and nobody uses them yet.
The only flag is EXTERNAL, which for now I'm using for YUV buffers,
though it's a bit undefined what that means.
Images can now have samplers - meaning they must be rendered with that
sampler. It also means that sampler must be handled as an immutable
sampler in descriptorsets.
These samplers can be created with a samplerYcbcrConversion, so code has
been added to pass that conversion when creating the imageview.
Also add code to GskVulkanFrame to track immutable samplers.
Nobody is making use of this yet.
Define an array with a compile-time-constant variable size for the
immutable samplers.
A bunch of work is necessary to ensure that at least one element is in
the sampler array, because the GLSL code
sampler2D immutable_textures[0];
is invalid.
This allows having different layouts sothat we can support immutable
samplers, whcih are required for multiplane and YUV formats.
We don't use them yet.
use it to collect the optional features we are interested in and turn
them on only if available.
For now we add the dmabuf features, but we don't use them yet.
The main reason here is that we want to not fail when the texture size
is larger than the supported GpuImage size.
When that happens, for now we just fallback slowly - ulitmately to
drawing with Cairo, which is going to be clipped.
There's multiple uses I want it for:
1. Generating the box-shadow area for blurring
2. Generating masks for rounded-rect masking
3. Optimizing the common use case of rounded-clip + color
Only the last one is implemented in this commit.
Don't try to use all those fancy GL features like glMapBuffer() and
such. Just malloc() some buffer memory and glBufferSubData() it later.
That works everywhere and is faster than (almost?) any combination of
fancy new buffer APIs. And yes I'm frustrated because I played with
those flags and none of them were better than this.
Doubles the framerate on my discrete AMD GPU.
Introduce a new GskGpuImageDescriptors object that tracks descriptors
for a set of images that can be managed by the GPU.
Then have each GskGpuShaderOp just reference the descriptors object they are
using, so that the coe can set things up properly.
To reference an image, the ops now just reference their descriptor -
which is the uint32 we've been sending to the shaders since forever.
Use glDrawArraysInstancedBaseInstance() to draw. (Yay for GL naming.)
That allows setting up the offset in the vertex array without having to
glVertexAttribPointer() everything again.
However, this is only supported since GL 4.2 and not at all in stock GLES,
so we need to have code that can work without it.
Fortunately, it is mandatory in Vulkan, so every recent GPU supports it.
And if that GPU has a proper driver, it will also expose the GL extension
for it.
(Hint: You can check https://opengles.gpuinfo.org/listextensions.php for
how many proper drivers exist outside of Mesa.)
The env var allows skipping various optimizations in the GPU shader.
This is useful for testing during development when trying to figure
out how to make a renderer as fast as possible.
We could also use it to enable/disable optimizations depending on GL
version or so, but I didn't think about that too much yet.
When drawing opaque color regions that are large enough, use
vkCmdClearAttachments()/glClear() instead of a shader. This speeds up
background rendering on particular on older GPUs.
See the commit messages of
bb2cd7225ece042f7ba10edd7547c1
for a further discussion of performance impacts.
The previous algorithm would reverse the order of subpasses, whcih leads
to unexpected behavior if dependent subpasses are not added as children
of a subpass, but just as a previous subpass - like when a subpass is
used multiple times later.
An example for this is a shadow node with multiple shadows - the source
of the shadow is used by the multiple shadows.
So ensure that adjacent subpasses stay in the same order.
The code generated by glslc -O is optimized worse by Mesa than
code generated unoptimized.
So generate unoptimized code until somebody figures out what's going
wrong here.
They're done using the pattern shader.
The pattern shader now gained a stack where vec4's can be pushed and
popped back later, which allows storing the position before computing
the new position inside the repeat node's child.
Due to GLES and old GL not allowing non-constant texture array
lookups,we need to turn the array lookup into a big switch statementin
those versions, and that requires putting the texture() call into that
switch.
But with that trick, we can use texture IDs in GLSL.
... and use it for glyphs.
The name is a slight variation of the "coloring" name from the GL
renderer.
The functionality is exactly what the "glyph" shader from the Vulkan
renderer does.
1. Compute the fwidth() twice with offset offsets
That way, we avoid glitches at the boundary between 0.0 and 1.0,
because by offsetting it by 0.5, that boundary goes away.
Then we take the min() of both which gives us the one we care about.
2. Set the gradient to repeating
By doing that, we don't get values at the 0.0/1.0 boundary clamped,
but things smoothly transition.
This smoothes the line at that boundary and makes it look just like
every other line.
Instead of strictly rounding to the given clip rectangle, increase the
rectangle to the next pixel boundary.
Also add docs that the clip_bounds do not influence the actual size of
the returned image.
It's just an object that encapsulates everything needed to create (the
data for) a pattern op.
It also clarifies which code does what, because now the NodeProcessor
and the PatternWriter are 2 different things.
Pretty much a copy of the Vulkan border shader.
A notable change is that the input arguments are changed, because GL
gets confused if you put a mat4 at the end.
when doing get_node_as_image(), that may spawn a new buffer writer that
writes into the samme buffer when rendering an offscreen with patterns.
So as a more or less hacky workaround, we now abort the current buffer
write and restart it once we've created the image.
If creation fails, create an offscreen image instead and draw that as a
texture.
Because offscreens basically always succeed, we can pretty much assume
success everywhere - apart from pattern creation functions that also
create images, because they can run out of shader space.
Frames now carry a timestamp for when they are used.
This is mainly intended to attach timestamps to cached items (textures
or glyphs), but it could in theory also be used when profiling.
We use wallclock time here, not server time, because it's cheaper and
because we're more intereseted in the local machine we're rendering on.
Now we can extend the pattern creation easily - and we can add new
patterns quickly later.
Plus, we need to keep this file in sync with pattern.glsl and it's neat
when those 2 files reference only each other.
Because GL flips its shit sometimes (ie when it's the framebuffer),
pass the height of the target as the flip variable, so commands
that need to operate on the pixels can flip the y axis around this value.
This is again mostly a copy of the Vulkan renderer.
It's a bit awkward codewise with the new invalidation framework,
because we need to cache the previous values individually now,
but it's a lot more finegrained, and we don't emit globals multiple
times when clips are nested.
... and use it to initialize the "proper" projection matrix to use in
shaders.
The resulting viewport will go from top left (0,0) to bottom right
(width, height) and the z clipping plane will go from -10000 to 10000.
This heaves over an inital chunk of code from the Vulkan renderer to
execute shaders.
The only shader that exists for now is a shader that draws a single
texture.
We use that to replace the blit op we were doing before.