Pretty much copy what GL does and just use the default display to create
GPU-related resources without the need for a display.
This also adds gdk_display_create_vulkan_context() but I've
kept it private because the Vulkan API is generally considered in flux,
in particular with our pending attempts to redo how renderers work.
Fixes a bug introduced in d1135f9e3c.
Luckily the buffer was large enough that all my testing didn't catch it
because it took a few minutes to overflow.
Replace gdk_memory_format_prefers_high_depth with the more generic
gdk_memory_format_get_depth() that returns the depth of the individual
channels.
Also make the GL renderer use that to pick the generic F16 format
instead of immediately going for F32 when uploading textures.
Now that we don't use the old environment variables anymore to force
staging buffer/image uploads, we don't need them.
However, we do autodetect the fast path for avoiding a staging buffer
now, and we might want to be able to turn that off for testing.
So add GSK_DEBUG=staging that does exactly that.
This is unused now that all the code uses map/unmap.
The only thing that map/unmap doesn't do that the old code did, was use
a staging image instead as alternative to a staging buffer for image
uploads.
However, that code is not necessary for anything, so I'm sure we can do
without.
If the memory heap that the GPU uses allows CPU access
(which is the case on basically every integrated GPU, including phones),
we can avoid a staging buffer and write directly into the image memory.
Check for this case and do that automatically.
Unfortunately we need to change the image format we use from
VK_IMAGE_TILING_OPTIMAL to VK_IMAGE_TILING_LINEAR, I haven't found a way
around that yet.
Use the new map/unmap image upload method for Cairo node drawing:
1. map() the memory
2. create an image surface or that memory
3. draw to that image surface
4. success
There's no longer a need for Cairo to allocate image memory.
As an alternative to gsk_vulkan_image_new_from_data() that
takes a given data and creates an image from it, add a 3 step process:
gsk_vulkan_image_new_for_upload()
gsk_vulkan_image_map_memory()
/* put data into memory */
gsk_vulkan_image_unmap_memory()
The benefit of this approach is that it potentially avoids a copy;
instead of creating a buffer to pass and writing the data into it before
then memcpy()ing it into the image, the data can be written straight
into image memory.
So far, only the staging buffer upload is implemented.
There are also no users, those come in the next commit(s).
When nodes are added, nothing was warning us that we need to bump
N_RENDER_NODES.
Make sure that that's no longer necessary by refactoring the code to
remove the define.
This is more expensive, but it finds more cases, and in particular it
catches corner cases like empty nodes or fully clipped nodes that might
otherwise make the kernel throw signals in our direction.
If one of the descriptor sets doesn't have any items, don't include it
in the sets passed to vkUpdateDescriptorSets().
This has no effect right now, because we either have both images and
samplers or neither, but it will become relevant once we also support
buffers.
Respect the matrix in use at time of encountering a repeat node so that
the offscreen uses roughly the same device pixel density as the target.
Fixes the handling of the clipped-repeat test.
Sometimes the GPU is still busy when the next frame starts (like when
no-vsync benchmarking), so we need to keep all those resources alone and
create new ones.
That's what the render object is for, so we just create another one.
However, when we create too many, we'll starve the CPU. So we'll limit
it. Currently, that limit is at 4, but I've never reached it (I've also
not starved the GPU yet), so that number may want to be set lower/higher
in the future.
Note that this is different from the number of outstanding buffers, as
those are not busy on the GPU but on the compositor, and as such a
buffer may have not finished rendering but have been returend from the
compositor (very busy GPU) or have finished rendering but not been
returned from the compositor (very idle GPU).
The idea here is that we can do more complex combinations and use that
to support texture-scale nodes or use fancy texture formats (suc as
YUV).
I'm not sure this is actually necessary, but for now it gives more
flexibility.
For blend and crossfade nodes, one of the children may exist and
influence the rendering, while the other does not.
Previously, we would skip the node, which would cause the required
rendering to not happen. We now send a valid texture id for the
invalid offscreen, thereby actually rendering the required parts.
Fixes the blend-invisible-child compare test
Current state for compare tests:
Ok: 397
Expected Fail: 0
Fail: 26
Unexpected Pass: 0
Skipped: 2
Timeout: 0
Instead of having a descriptor set per operation, we just have one
descriptor set and bind all our images into it.
Then the shaders get to use an index into the large texture array
instead.
Getting this to work - because it's a Vulkan extension that needs to be
manually enabled, even though it's officially part of Vulkan 1.2 - is
insane.
If we have a rectangular clip without transforms, we can use
scissoring. This works particularly well because it allows intersecting
rounded rectangles with regular rectangles in all cases:
Use the scissor rect for the rectangle and the normal clipping code for
the rounded rectangle.
The idea is to use it for clip nodes when they are integer-aligned.
To do that, we need to track the scissor rect in the parse state, so we
do that, too.
Also move the viewport offset out of the projection matrix, as it is
part of the transform between clip and scissor, so it needs to live in
the offset.
We align the data to a multiple of vertex stride, that way we use more
memory, but we could compute an offset into the vertex buffer without
changing the offset.
We can set the vertex offset while counting the data, this gets rid of
the need of passing all the counting machinery into the actual data
collection code.
When attempting a complex transform, check if the clip can be ignored
and do that if possible.
That way we don't cause fallbacks when transforming the clip is too
complex.
The idea is that for a rectangle intersection, each corner of the
result is either entirely part of one original rectangle or it is
an intersection point.
By detecting those 2 cases and treating them differently, we can
simplify the code to compare rounded rectangles.
Instead of emitting the render commands once per rectangle of the clip
region, just emit them once with the region's extents.
This is generally faster because it emits fewer commands to the GPU,
even though it may touch significantly more pixels.
For a proper method, we'd need to record the commands per clip rectangle
instead of emitting all of them all the time.
The border and color shaders - the ones that do AA - now multiply their
coordinates by the scale factor, which gives them better rounding
capabilities.
This in particular improves the case where they are used in fractional
scaling situations, where the scale is defined at the root element.
Previously, we just used the defaultscale factor, but now that we're
having it available in push constants, we can read it back for creating
offscreens and rendering fallbacks.
So do that.
It's a 1:1 replacement for GskVulkanPushConstants, just without the
indirection through a different file.
GskVulkanPushConstants as a struct is gone now.
The file still exists to handle the push_constants operation.
1. Use a graphene_vec2_t
2. Ensure it's always positive
3. Don't break with fallback
The scale value is nothing more than an indication of how many pixels to
assume per unit of a node.
We don't want to render the offscreen trnsformed, we want to render it
as-is.
We lose the correct scale factor, but that requires some separate work,
so for now it gets a bit blurry on hidpi.
This introduces the rect object and adds a rect_distance() and
rect_coverage() function.
_distance() returns the signed distance tp the rectangle.
_coverage() returns the coverage of a pixel centered at that position.
Note that the pixel size is computed using dFdx/dFdy.
When the node bounds were a non-integer size, the texture would get
ceil()ed pixels, but various viewport or scissor computations might
floor() instead, leaving the right/bottom row of pixels untouched.
Make sure those functions ceil(), too.
This is not the optimal way of doing it: we're
reuploading the texture with client-side conversion.
But it fits nicely into our current handling of mipmaps.
We can do better once we use shaders for colorspace
conversions.
Make the callers of this function check for
straight alpha themselves, and only do the
version compatibility check here. This makes
the function usable in contexts where straight
alpha is acceptable.
When the command queue is out of batches, there is
no point in doing further work like allocating uniforms.
This helps us avoid assertions in the uniform code
that we would hit when we run out of uniform space
too.
When we start ignoring batches, we must do it everywhere,
or we may run into assertions. This was triggered by an
enormous text node tree produced by tests/rendernode-create.
Vulkan has a different initial coordinate system to GL.
GL:
(-1, 1, -1) +------+.
|`. | `.
| `·--|---·
| : | :
+------+. :
`. : `.:
`·------· (1, -1, 1)
Vulkan:
(-1, -1, 0) +------+.
|`. | `.
| `·--|---·
| : | :
+------+. :
`. : `.:
`·------· (1, 1, 1)
so adjust the near and far plane we pass to
graphene_matrix_init_ortho() to make it end up with the same
projection as the GL renderer.
This was the intention, but the object data by itself
does not achieve that: We do run dispose on the display
when it is closed, but object data is only cleared in
finalize. So listen to the ::closed signal and remove
the driver ourselves.
Fix up the drivers dispose implementation enough for
that to actually work.
Most of the time we want to compute them based on the child node we
render to the offscreen, but not always.
For blend and cross-fade nodes, they need to be computed based on the
node's bounds.
Fixes widget-factory page fade animation weirdly resizing the fading
pages.
We weren't looking in the build dir for generated files.
Actually make sure that we look in the build dir *first*, otherwise
glib-compile-resources will still use the wrong files.
... and use it in rendernodes.
Setting up textures for diffing is done via gdk_texture_set_diff() which
should only be used during texture construction.
Note that the pointers to next/previous are allowed to dangle if one of
the textures is finalized, but that's fine because we always check both
textures' links to each other before we consider the pointer valid.
When slicing the texture, the GL renderer was
forgetting to apply the viewport origin. This
shows up when rendering things with negative
scales, leading to negative origins.
Our coverage computation only works for well-behaved
rects and rounded rects. But our modelview transform
might flip x or y around, causing things to fail.
Add functions to normalize rects and rounded rects,
and use it whenever we transform a rounded rect in GLSL.
Pass the GLsync object from texture into our
command queue, and when executing the queue,
wait on the sync object the first time we
use its associated texture.
When adding mask nodes, I overlooked that
we have two separate functions for determining
what transforms a node supports without offlines.
Since we claim that mask nodes support general
transform, they must certainly support 2d transforms
as well.
They're not needed and GLES doesn't technically support them, even
though GTK had been using them via epoxy sneakily using the
GL_OES_vertex_array_object extension behind our back.
gsk_vulkan_render_download_target() currently resets the uploader
objects before downloading the image that it produces. This is
problematic because there might be unreleased buffers and images
in the command queue.
In particular, this can make validation layers complain about the
glyph atlas - of all things! - upload buffer being released while
still being used by the command queue.
Fix that by resetting the uploader after downloading the image.
The current implementation of the glyph cache deals with atlases by
padding them with 1 pixel at the beginning, at the end, and between
each glyph.
That's cool and all, however, there's a very subtle problem with
this approach: the contents of the atlas are garbage, so this padding
is filled with garbage memory!
Rework the Vulkan glyph cache to draw each and every glyph in a
surface that has 1 pixel border of padding around it. Ensure the
surface is completely black by drawing a rectangle before handing
it to Pango to draw the glyph. Update tx and ty to pick the texture
position adjusted to the 1 pixel padding. The atlas now starts at
position (0, 0), since each glyph individually contains its own padding.
To improve legibility, add a PADDING define and use it everywhere.
Vulkan renders text using VK_BLEND_FACTOR_ONE_MINUS_SRC_ALPHA and
VK_BLEND_FACTOR_SRC_ALPHA, but that implies per-channel alpha
blending, which currently produces the wrong results when blending
glyphs with the images beneath them.
Use the default pipeline constructors, which implies using the
ONE and ONE_MINUS_SRC_ALPHA.
Basically what GL does, but without any debug or feature flag
to gatekeep it, since the Vulkan backend itself is experimental
already.
Ceil surface sizes, and floor coordinates, to the fractional scale
value.
The rects passed to the clip region are in buffer coordinates, and
must not be scaled. Consider the following scenario: Wayland, with
a 1024x768@2 window. That gives us a 2048x1536 raw image. To setup
the Vulkan render pass code, we'd scale 2048x1536 *again*, to an
unreasonable 4196x3072, which is (1) incorrect and (2) really
incorrect and (3) can lead to crashes at best, full GPU resets
at worst - and a GPU reset is incredibly not fun!
Now that we pass the right clip regions at the right coordinates
at all times, remove the extra scaling from the render pass.
This part of the Vulkan renderer is almost exactly equal to the GL
renderer, and the GL renderer already does that since at least
2a38cecd33. Copy that into the Vulkan renderer.
A nice side effect from this commit is that resizing a window now
actually works again.
Sneak in a trivial cleanup by using a variable to hold the draw
index.
This was a tricky one to figure out, but it's pretty simple to
understand (I hope!).
So, this AMD card I'm using requires buffer memory sizes to be
aligned to 16 bytes. Intel is aligned to 4 bytes I think, but
AMD - or at least this AMD model in particular - uses 16 bytes
for alignment.
When creating a a particular texture (I did not determin which one
specifically!) a buffer of size 1276 bytes is requested.
1276 / 16 = 79.75, which is clearly not aligned to the required
16 bytes.
We request Vulkan to create a buffer of 1276 bytes for us, it
figures out that it's not aligned, and creates a buffer of 1280
bytes, which is aligned. The extra 4 bytes are wasted, but that's
okay. We immediately query this buffer for this exact information,
using vkGetBufferMemoryRequirements(), and proceed to create actual
memory to back this buffer up.
The buffer tells us we must use 1280 bytes, so we pass 1280 bytes
and everyone is happy, right? Of course not. We pass 1276 bytes,
and Vulkan is subtly unhappy at us.
Fix that by passing the value that Vulkan asks us to use, i.e.,
the size returned by vkGetBufferMemoryRequirements().
This is what GL does, and for a reason: it can lead to width or
height for very small glyphs. Also, switch to dividing by a float
(1024.0) instead of an integer (1024).
This doesn't make any difference now, but will allow us to copy
subregions more easily. This is not obvious, but here's a quick
explanation:
Leaving 'bufferRowLength' and 'bufferImageHeight' implies that
Vulkan will assume the size passed in the 'imageExtent' field.
Right now, this assumption is correct - the only user of this
function is the glyph cache, and it only copies and uploads
exact rects. Next commits will change that assumption, so we
must pass 'buffer*' fields, and tell Vulkan, "this part of the
buffer represents an image of width x height, and I want the
subregion (x, y, smallerWidth, smallerHeight) of this image".
When creating an image using gsk_vulkan_image_new_for_framebuffer(),
it passes VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL.
However, this is a mistake. The spec demands that the initial
layout must be either VK_IMAGE_LAYOUT_UNDEFINED or
VK_IMAGE_LAYOUT_PREINITIALIZED.
Apparently this was an oversight from commit b97fb75146, since the
commit message even documents that, and all other calls pass either
VK_IMAGE_LAYOUT_UNDEFINED or VK_IMAGE_LAYOUT_PREINITIALIZED.
Create framebuffer images using VK_IMAGE_LAYOUT_UNDEFINED, which is
what was originally expected.
Fractional scaling with the GL renderer is
experimental for now, so we disable it unless
GDK_DEBUG=gl-fractional is set.
This will give us time to work out the kinks.
This commit combines changes in the Wayland backend,
the GL context frontend, and the GL renderer to switch
them all to use the fractional scale.
In the Wayland backend, we now use the fractional scale
to size the EGL window.
In the GL frontend code, we use the fractional scale to
scale the damage region and surface in begin/end_frame.
And in the GL renderer, we replace gdk_surface_get_scale_factor()
with gdk_surface_get_scale().
Instead of tracking a single scale, track x and y scales separately.
Factor out gsk_vulkan_render_pass_new() into a private function that
receives both scales, and pass 'scale_factor' for both.
This is mostly a cosmetic change, and the goal is twofold:
1. Make it easier to spot unimplemented render node types; and
2. Prepare for a small rework
The implementation for each node now lives in specific functions,
like the GL renderer; unlike the GL renderer, however, we use a
node type vtable to map GskRenderNodeType → implementation. Render
node without an implementation map to NULL, and use the fallback
implementation. Render nodes that fail any check and return FALSE
also use fallback implementation.
If we encounter a node or texture the 1st time and they are going
to be used again, give them a name.
Then, when encountering them again, print them by name instead
of duplicating them.
We extend the syntax for nodes from:
<node-type> { ... }
to
<node-type> { ... }
<node-type> <string> { ... }
<string>;
where the first is the same as before, the 2nd defines a named node and
the last references a previously defined node.
Or to give an example:
color "node" {
bounds: 0 0 10 10;
color: red;
}
transform {
bounds: 20 0 10 10;
child: "node";
}
This will draw the red box twice, once at (0,0) and once at
(20,0).
The intended use for this is both shortening generated node files as
well as allowing to write tests that reuse nodes, in particular when
dealing with caches.
We extend the syntax for textures from just:
<url>
to
[<string>] <url>
<string>
where the first defines a named texture while the second references a
texture.
Or to give an example:
texture {
bounds: 0 0 10 10;
texture: "foo" url("foo.png");
}
texture {
bounds: 20 0 10 10;
texture: "foo";
}
This will draw the texture "foo.png" twice, once at (0,0) and once at
(20,0).
The intended use for this is both shortening generated node files as
well as allowing to write tests that reuse textures, in particular when
mixing them in texture and texture-scale nodes.
When the GL texture already has a mipmap, we don't
have to download and reupload it to generate one.
We differentiate the handling for texture scale nodes,
where we do want to force the mipmap creation even if
it requires us to reupload the GL texture, and plain
texture nodes, where we just take advantage of a
preexisting mipmap to allow trilinear filtering for
downscaling, or create one if we have to upload the
texture anyway.