Make color-related ops take the ccs and a GdkColor, and make
decisions about color conversion on the cpu vs the gpu.
This makes the node processor code simpler, and lets use convert
the color directly into the op instance without extra copying.
We also pass opacity to the op, so it can be applied when we
write the color into the instance.
Lastly, rorder the offset to come right after the opacity argument.
Treat the color and rounded color ops the same way.
Update all callers.
With this, the prepare_color apis in gskgpunodeprocessor.c are
no longer used and have been dropped.
We want to reuse gsk_gpu_color_to_float() for use with GdkColor and this
function will be replaced. But until that's fully done, we need 2
different names.
So rename this one to something else
It turns out the "step" variable could up as 0 when p.y ~= 3.0 ||
p.y ~= r.y - 3.0
That was not enough to trigger it though because if "start" and "end"
were the same value, the "y <= end" check in the loop would immediately
terminate it.
However, if start + epsilon == end so that end != start but (end - start)
/ 7 == 0, then step would end up as 0 and the loop would never
terminate.
And if that happened, it would bring down GPUs.
So recode this whole machinery to make it impossible to infloop.
Fixes#6896
The fix in commit 5e7f227d broke shadows while trying to make them
faster.
So use a better way to make them faster.
With the normalized blur radius, we can now conclude that all the values
too far from p.y will cause the gauss() call to return close to 0, so we
can skip any y value that is too far from p.y.
And that allows us to put an upper limit on the loop iterations.
Tests included
Fixes#6888
Instead of doing complicated math, normalize the values to a sigma
of 1.0, and then use that.
This should also be beneficial for shader performance, because 1.0 is a
constant and constant-elimination can kick in on the inlined functions.
When we allocate a graphene_point_t on the stack, there's no guarantee
that it will be aligned at an 8-byte boundary, which is an assumption
made by gsk_pathop_encode() (which wants to use the lowest 3 bits to
encode the operation). In the places where it matters, force the
points on the stack and embedded in structs to be nicely aligned.
By using a distinct type for this (a union with a suitable size and
alignment), we ensure that the compiler will warn or error whenever we
can't prove that a particular point is, in fact, suitably aligned.
We can go from a `GskAlignedPoint *` to a `graphene_point_t *`
(which is always valid, because the `GskAlignedPoint` is aligned)
via &aligned_points[0].pt, but we cannot go back the other way
(which is not always valid, because the `graphene_point_t` is not
necessarily aligned nicely) without a cast.
In practice, it seems that a graphene_point_t on x86_64 *is* usually
placed at an 8-byte boundary, but this is not the case on 32-bit
architectures or on s390x.
In many cases we can avoid needing an explicit reference to the more
complicated type by making use of a transparent union. There's already
at least one transparent union in GSK's public API, so it's presumably
portable enough to match GTK's requirements.
Increasing the alignment of GskAlignedPoint also requires adjusting how
a GskStandardContour is allocated and initialized. This data structure
allocates extra memory to hold an array of GskAlignedPoint outside the
bounds of the struct itself, and that array now needs to be aligned
suitably. Previously the array started with at next byte after the
flexible array of gskpathop, but the alignment of a gskpathop is only
4 bytes on 32-bit architectures, so depending on the number of gskpathop
in the trailing flexible array, that pointer might be an unsuitable
location to allocate a GskAlignedPoint.
Resolves: https://gitlab.gnome.org/GNOME/gtk/-/issues/6395
Signed-off-by: Simon McVittie <smcv@debian.org>
Similar to the previous commit, to avoid undefined behaviour we need
to avoid evaluating out-of-bounds shifts, even if their result is going
to ignored by being multiplied by 0 later.
Detected by running a subset of the test suite with
-Dsanitize=address,undefined on x86_64.
Signed-off-by: Simon McVittie <smcv@debian.org>
If, for example, e == 0, it is undefined behaviour to compute an
expression involving an out-of-range shift by (125 - e), even if the
result is in fact irrelevant because it's going to be multiplied by 0.
This was already fixed for the memorytexture test in
commit 5d1b839 "testsuite: Fix another ubsan warning", so use the
implementation from that test everywhere. It's in the header as an
inline function to keep the linking of the relevant tests simple:
its only caller in production code is fp16.c, so there will be no
duplication outside the test suite.
Detected by running a subset of the test suite with
-Dsanitize=address,undefined on x86_64.
Signed-off-by: Simon McVittie <smcv@debian.org>
With the changes in !7473 we now use sampler2D arguments in functions.
However, when there's a function we call with a samplerExternalOES -
which means we need to overload it with that shader variant.
We were using slightly different numbers here, which isn't good.
The matrices in gdkcolordefs.h are tested in the colorstate-internal
tests, so they are at least properly inverse, and the products match.
It would be better to generate the glsl definitions, somehow.
When we the image color state is not a default one, use the cicp
convert op to convert it to the ccs. And when the target color
state is a non-default one, use the shader in the reverse direction.
This shader receives cicp parameters via uniforms, and converts
the texture data from or to the output colorstate. It computes
the matrix in the vertex shader, and then picks the eotf/oetf
according to the cicp parameters in the fragment shader.
We were passing the wrong rect to the clip mode computation, resulting
in a rounded rect every time, even though it should pretty much always
be unclipped.
The visual results are unaffected, because the clip sent to the shader
was still correct.
Instead of allocating one large descriptor pool and hoping we never run
out of descriptors, allocate small ones dynamically, so we know we never
run out.
Test incldued, though the test doesn't fail in CI, because llvmpipe
doesn't care about pool size limits. It does fail on my AMD though.
A fun side note about that test is that the GL renderer handles it best
in normal operationbecause it caches offscreens per node and we draw the
same node repeatedly.
But, the replay test expands them to duplicated unique nodes, and then
the GL renderer runs out of command queue length, so I had to disable
the test on it.
There is now a GskGpuYcbcr struct that maintains all the Vulkan
machinery related to YCbCrConversions.
It's a GskGpuCached, so it will make itself go away when it is no longer
used, ie a video stopped playing.
Now that we don't use the fancy features anymore, we don't need to
enable them.
And that also means we don't need an env var to disable it for testing.
Now that we don't do fancy texture stuff anymore, we don't need fancy
shaders either, so we can just compile against Vulkan 1.0 again.
And that means we need no fallback shaders for Vulkan 1.0 anymore.
Instead of trying to cram all descriptors into one large array and only
binding it at the start, we now keep 1 descriptor set per image+sampler
combo and just rebind it every time we switch textures.
This is the very dumb solution that essentially maps to what GL does,
but the performance impact is negligible compared to the complicated
dance we were attempting before.
Rewrite all shaders to use 2 predefined samplers called GSK_TEXTURE0 and
GSK_TEXTURE1 instead of wrapper functions.
On GL and Vulkan compat mode, these map directly to samplers.
On Vulkan proper, they map to 2 indices into the texture array, like
before.
From now on, the old nvidia GPUs - ie the 3xx drivers - should start
working again.
Fixes: #6564Fixes: #6574Fixes: #6654
This allows GskGpuFrame implementations to store data per vertex
attribute.
This is just the plumbing, no actual implementation is done in this
commit.
This guarantees that the images get ID 0 and 1 (on GL), which is going
to be quite important for the next steps.
Just for funsies, here's fps numbers on my desktop for this change:
NGL 1500 => 1400
Vulkan 2650 => 2250
This by itself is just more work refcounting all those images, but
there's actually a goal here, that will become visible in future
commits.
But this is split out for correctness and benchmarking purposes (the
overhead from refcounting seems to be negligible on my computer).