linear will average all the pixels for the lod, nearest will just pick
one (using the same method as OpenGL/Vulkan, picking bottom right
center).
This doesn't really make linear/nearest filtering work as it should
(because it's still a form of mipmaps), but it has 2 advantages:
1. it gets closer to the desired effect
2. it is a lot faster
Because only 1 pixel is chosen from the original image, instead of
averaging all pixels, a lot less memory needs to be accessed, and
because memory access is the bottleneck for large images, the speedup is
almost linear with the number of pixels not accessed.
And that means that even for lot level 3, aka 1/8th scale, only 1/64 of
the pixels need to be accessed, and everything is 50x faster.
Switching gtk4-demo --run=image_scaling to linear/nearest makes all the
lag go away for me, even with a 64k x 64k image.
Why do we need this? Because RGB images are provided in RGB format but
GPUs can't handle RGB, only RGBA, so we need to convert.
And we need to do that without allocating too much memory, because
allocating memory is slow. Which means in aprticular we need to do the
conversion after mipmapping, not before (like we were doing).
This allows uploading less memory but requires computing lod levels on
the CPU which is slow because it reads through all of the memory and so
far entirely not optimized.
However, it uses significantly less VRAM.
This is done by adding a gdk_memory_mipmap() function that does this
task.
The texture upload op now accepts a lod level and if that is >0 it uses
gdk_memory_mipmap() on the source texture.
Depth of a rendernode should be determined by the textures used and the
compositing colorstate requirements.
Colors influence the colorstate choice, so they indirectly influence the
depth, but they should not influence the depth directly.
Otherwise a single color in a border being rec2100-pq would make us
switch to 16bit float.
Also remove gdk_color_get_depth(), because it was only used here and
because again: Colors should not influence depth decisions.
We want to use the display's context on the resulting texture,
but we do not want to use it for the stufff we need to do while
exporting - most importantly the GLsync.
Fixes#6976
The texture ID is not deleted on dmabuf export; a copy is made, the
GskGpuImage retains ownership.
However when doing GL export, the texture *does* take ownership, so we
need the stealing semantics for that case.
We write a debug message and then handle things using fallback.
Fixes error messages when trying to import incompatible dmabufs.
(in my case: llvmpipe dmabufs into radv)
The non-shared context's surface must survive the lifetime of the
GL texture, and when the renderer gets unrealized the surface goes away,
but we cannot guarantee that all GL textures have been destroyed by
then.
So better use a context we know will survive becuase it isn't bound to a
surface.
This is the same fix for NGL as f3ac0535f8
was for GL.
Instead of running one renderpass per clip region, run one renderpass for
the whole clip extents, and just set the scissor to the individual clip
rects.
This means that we need to use LOAD_OP_LOAD in cases where we don't
redraw the full extents, but nonetheless, the eprformance wins of
avoiding renderpasses are worth it, in particualr on tilers like the
Raspberry Pi or other mobile chips and the Apple M1/2.
We want to differentiate between CLEAR, DONT_CARE and LOAD in the
future, and the current boolean doesn't allow that.
Also implement support for the the different ops in the Vulkan
renderpass code.
This starts the renderpass at the given scissor rect.
It just splits out the gsk_gpu_render_pass_begin_op() call into a
simpler function, so it's harder to mess up.
Add gsk_gpu_node_processor_set_scissor() that allows resetting the
nodeprocessor's scissor and clip rectangle.
That in turn allows using the same nodeprocessor instance for all the
rects we draw for the clip region.
So they must not copy the fully_opaque flag from the child.
Adapted the testcase that accidentally caught it do now always catch it
by setting a proper background.
When we encounter many dead textures, we want to GC. Textures can take
up fds (potentially even more than 1) and when we are not collecting
them quickly enough, we may run out of fds.
So GC when more then 50 dead textures exist.
An example for this happening is recent Mesa with llvmpipe udmabuf
support, which takes 2 fds per texture and the test in
testsuite/gdk/memorytexture that creates 800 textures of ~1 pixel each,
which it diligently releases but doesn't GC.
Related: #6917
the non-shared context's surface must survive the lifetime of the
GL texture, and when the renderer gets unrealized the surface goes away,
but we cannot guarantee that all GL textures have been destroyed by
then.
So better use a context we know will survive becuase it isn't bound to a
surface.
Use the clamp() API from the previous commit to:
1. Clamp values into range
2. Emit an error if values were out of range
Unlike CSS, which just clamps and doesn't emit an error, we do want to
emit one because we care about colors being correct in our node files.
Make sure the radii are strictly positive.
Also handle the case where start >= end.
We can't really underline that error, because we don't track the
locations of the start/end properties until we know that there's an
error.
So just underline the whole radial gradient declaration.
Test included
When transforming back from a complex transform to a simpler transform,
the resulting clip might turn out to clip everything, because clips can
grow while transforming, but the scissor rect won't. So when this
process happens, we can end up with an empty clip by transforming:
1. Set a clip that also sets the scissor
2. transform in a way that grows the clip, say rotate(45)
3. modify the clip to shrink it
4. transform in a way that simplifies the transform, say another
rotate(45)
5. Figure out that this clip and the scissor rect do no longer overlap
Catch this case and avoid drawing anything.
Add a new GskShadow2 struct that has a GdkColor instead of
a GdkRGBA, and a new constructor and getter to go along with
it.
With this commit, my_color_get_depth is no longer used and
has been dropped.
We know it at begin_frame() time, so if we pass it there instead of
end_frame(), we can use it then to make decisions about opacity.
For example, we could notice that the whole surface is opaque and choose
an RGBx format.
We don't do that yet, but now we could.
This is poking into the surface directly, so not a good idea.
And I want to hide that struct in the priv member.
Technically, this code should look at the opaque region, but I am lazy
and the GL renderer is on its way out, so I think it's not worth doing.
Make sure both GL renderers don't leave their contexts alive via the
current context, but ensure they dispose of them properly.
Fixes issues when the corresponding GL resources in the surfaces they
were attached to go away.
GLContexts marked as surface_attached are always attached to the surface
in make_current().
Other contexts continue to only get attached to their surface between
begin_frame() and end_frame().
All our renderer use surface-attached contexts now.
Public API only gives out non-surface-attached contexts.
The benefit here is that we can now choose whenever we want to
call make_current() because it will not cause a re-make_current() if we
call it outside vs inside the begin/end_frame() region.
Or in other words: I want to call make_current() before begin_frame()
without a performance penalty, and now I can.
... and pass the opaque region of the node.
We don't do anything with it yet, this is just the plumbing.
The original function still exists, it passes NULL which is the value
for no opaque region at all.
If we apply a rounded clip, we might change the clip in a way that makes
it intersectable with the scissor again, if both had diverged before.
So try and intersect with the clip.
In the case of no offloading, we want to pass through to the child
(which is likely a big texture doing occlusion).
In the case of punching a hole, we want to punch the hole and not draw
anything behind it, so we start an occlusion pass with transparency.
And in the final case with offloading active, we don't draw anything,
so we don't draw anything.
This should fix concerns about drawing the background behind the video
as mentioned for example in
https://github.com/Rafostar/clapper/issues/343#issuecomment-1445425004
Container nodes save their opaque region, so it's quick to access. Use
that to check if the largest opaque region even qualifies for culling -
and if not, just exit.
Speeds up walking node trees by a lot.
Now that we can specify the min size for an occlusion pass, we can
specify that we want the full clip rect to be occluded for occlusion to
trigger.
The benefit of this is that for partial redraws we almost
always get the background color to cover the redrawn rectangle, so
occlusion will kick in.
When trying to cull, try culling from the largest rectangle of the
remaining draw region first. That region has the biggest chance of
containing a large area to skip.
As a side effect, we can stop trying to cull once the largest rectangle
isn't big enough anymore to contain anything worth culling.
When querying clip bounds, also check the scissor rect, because
sometimes that one is tighter than the clip bounds, because the clip
bounds need to track some larger rounded corners.
Makes a few tests harder to break.
Instead of requiring an occlusion pass to cover the whole given scissor
rect, allow using a smaller rect to start the pass.
When starting such a pass, we adjust the scissor rect to the size of
that pass and do not grow it again until the pass is done.
The rectangle subtraction at the end will then take care of subtraction
that rectangle from the remaining pixels.
To not end up with lots of tiny occlusion passes, add a limit for how
small such a pass may be.
For now that limit is arbitrarily chosen at 100k pixels.
gsk_gpu_node_processor_rect_to_device() is a useful function to have,
even if it has to return FALSE sometimes when there is no simple 1:1
mapping - ie when the modelview contains a rotation.
Instead of just iterating over all the rectangles of the region,
always draw the first rectangle of the region and subtract it when done.
This sounds more complicated, but it will allow us to modify the
rectangle in future commits.
Add a function that tracks whether a render node's content is
in a wide gamut color state (in practice, that means non-sRGB).
This will be used in render_texture to determine the color
state to use when creating a texture.
When creating images for use with different colorstates, ensure that
they have the depth of that colorstate. Otherwise we might lose accuracy
due to quantization.
Fixes mipmaps in rec2100 being rendered as RGBA8.
Pass the ccs, opacity and GdkColors to the op to let it make
decisions about color conversion. Also, reorder the offset to
follow the same order as the color ops.
Update the callers.
We can't use gsk_gpu_node_processor_color_states_self() for ops which
apply alt to colors that don't come from textures, since those are
always unpremultiplied.
This fixes the + and - in disabled spin buttons appearing completely
white.
Allow defining cicp color states with an @-rule:
@cicp "jpeg" {
primaries: 1;
transfer: 13;
matrix: 6;
range: full;
}
And allow using them in color() like this:
color("jpeg" 50% 0.5 1 / 75%)
Note that custom color states use a string, unlike default color
states which use an ident.
Test included.
And allow using color states for colors with a syntax similar
to modern css color syntax.
color(srgb 50% 0.5 1 / 75%)
Both floating point numbers and percentages can be used.
Currently, this is only supported for color nodes.
Test included.
Use the color state returned by this function instead of assuming
the color of a color node is always sRGB.
Node colors are converted to the css on the cpu. That is necessary
since we don't know if they are in one of the default color states,
and our shaders can't deal with non-default color states.
Make color-related ops take the ccs and a GdkColor, and make
decisions about color conversion on the cpu vs the gpu.
This makes the node processor code simpler, and lets use convert
the color directly into the op instance without extra copying.
We also pass opacity to the op, so it can be applied when we
write the color into the instance.
Lastly, rorder the offset to come right after the opacity argument.
Treat the color and rounded color ops the same way.
Update all callers.
With this, the prepare_color apis in gskgpunodeprocessor.c are
no longer used and have been dropped.
We want to reuse gsk_gpu_color_to_float() for use with GdkColor and this
function will be replaced. But until that's fully done, we need 2
different names.
So rename this one to something else
It turns out the "step" variable could up as 0 when p.y ~= 3.0 ||
p.y ~= r.y - 3.0
That was not enough to trigger it though because if "start" and "end"
were the same value, the "y <= end" check in the loop would immediately
terminate it.
However, if start + epsilon == end so that end != start but (end - start)
/ 7 == 0, then step would end up as 0 and the loop would never
terminate.
And if that happened, it would bring down GPUs.
So recode this whole machinery to make it impossible to infloop.
Fixes#6896
The fix in commit 5e7f227d broke shadows while trying to make them
faster.
So use a better way to make them faster.
With the normalized blur radius, we can now conclude that all the values
too far from p.y will cause the gauss() call to return close to 0, so we
can skip any y value that is too far from p.y.
And that allows us to put an upper limit on the loop iterations.
Tests included
Fixes#6888
Instead of doing complicated math, normalize the values to a sigma
of 1.0, and then use that.
This should also be beneficial for shader performance, because 1.0 is a
constant and constant-elimination can kick in on the inlined functions.
When we allocate a graphene_point_t on the stack, there's no guarantee
that it will be aligned at an 8-byte boundary, which is an assumption
made by gsk_pathop_encode() (which wants to use the lowest 3 bits to
encode the operation). In the places where it matters, force the
points on the stack and embedded in structs to be nicely aligned.
By using a distinct type for this (a union with a suitable size and
alignment), we ensure that the compiler will warn or error whenever we
can't prove that a particular point is, in fact, suitably aligned.
We can go from a `GskAlignedPoint *` to a `graphene_point_t *`
(which is always valid, because the `GskAlignedPoint` is aligned)
via &aligned_points[0].pt, but we cannot go back the other way
(which is not always valid, because the `graphene_point_t` is not
necessarily aligned nicely) without a cast.
In practice, it seems that a graphene_point_t on x86_64 *is* usually
placed at an 8-byte boundary, but this is not the case on 32-bit
architectures or on s390x.
In many cases we can avoid needing an explicit reference to the more
complicated type by making use of a transparent union. There's already
at least one transparent union in GSK's public API, so it's presumably
portable enough to match GTK's requirements.
Increasing the alignment of GskAlignedPoint also requires adjusting how
a GskStandardContour is allocated and initialized. This data structure
allocates extra memory to hold an array of GskAlignedPoint outside the
bounds of the struct itself, and that array now needs to be aligned
suitably. Previously the array started with at next byte after the
flexible array of gskpathop, but the alignment of a gskpathop is only
4 bytes on 32-bit architectures, so depending on the number of gskpathop
in the trailing flexible array, that pointer might be an unsuitable
location to allocate a GskAlignedPoint.
Resolves: https://gitlab.gnome.org/GNOME/gtk/-/issues/6395
Signed-off-by: Simon McVittie <smcv@debian.org>
Similar to the previous commit, to avoid undefined behaviour we need
to avoid evaluating out-of-bounds shifts, even if their result is going
to ignored by being multiplied by 0 later.
Detected by running a subset of the test suite with
-Dsanitize=address,undefined on x86_64.
Signed-off-by: Simon McVittie <smcv@debian.org>
If, for example, e == 0, it is undefined behaviour to compute an
expression involving an out-of-range shift by (125 - e), even if the
result is in fact irrelevant because it's going to be multiplied by 0.
This was already fixed for the memorytexture test in
commit 5d1b839 "testsuite: Fix another ubsan warning", so use the
implementation from that test everywhere. It's in the header as an
inline function to keep the linking of the relevant tests simple:
its only caller in production code is fp16.c, so there will be no
duplication outside the test suite.
Detected by running a subset of the test suite with
-Dsanitize=address,undefined on x86_64.
Signed-off-by: Simon McVittie <smcv@debian.org>
With the changes in !7473 we now use sampler2D arguments in functions.
However, when there's a function we call with a samplerExternalOES -
which means we need to overload it with that shader variant.
We were using slightly different numbers here, which isn't good.
The matrices in gdkcolordefs.h are tested in the colorstate-internal
tests, so they are at least properly inverse, and the products match.
It would be better to generate the glsl definitions, somehow.
When we the image color state is not a default one, use the cicp
convert op to convert it to the ccs. And when the target color
state is a non-default one, use the shader in the reverse direction.
This shader receives cicp parameters via uniforms, and converts
the texture data from or to the output colorstate. It computes
the matrix in the vertex shader, and then picks the eotf/oetf
according to the cicp parameters in the fragment shader.
We were passing the wrong rect to the clip mode computation, resulting
in a rounded rect every time, even though it should pretty much always
be unclipped.
The visual results are unaffected, because the clip sent to the shader
was still correct.
Instead of allocating one large descriptor pool and hoping we never run
out of descriptors, allocate small ones dynamically, so we know we never
run out.
Test incldued, though the test doesn't fail in CI, because llvmpipe
doesn't care about pool size limits. It does fail on my AMD though.
A fun side note about that test is that the GL renderer handles it best
in normal operationbecause it caches offscreens per node and we draw the
same node repeatedly.
But, the replay test expands them to duplicated unique nodes, and then
the GL renderer runs out of command queue length, so I had to disable
the test on it.
There is now a GskGpuYcbcr struct that maintains all the Vulkan
machinery related to YCbCrConversions.
It's a GskGpuCached, so it will make itself go away when it is no longer
used, ie a video stopped playing.