Basically, memcpy() asap if possible.
This happens a lot in Vulkan, where we gdk_memory_conert() image
data from memory textures straight into the VulkanBuffer.
And usually we support the format.
When a GdkMemoryFormat is not supported natively and there's
postprocessing required, add a way to mark a VulkanImage as such via the
new postprocess flags.
Also allow texting such iamges only with new_for_upload() and detect
when that is the case and then run a postprocessing step that converts
that image to a suitable format.
This is done with a new "convert" shader/op.
This now supports all formats natively, no conversions happen on the CPU
anymore (unless the GPU is old).
Add an explicit begin() and an end() op. For now, this looks like
overkill, but it allows doing renderpasses with custom ops that are not
meant to render a rendernode.
Examples for this are pre/postprocessing passes or 2-pass blur.
The API was using regions because it always had. But all the code ever
did was get the extents of the region.
So simplify everything by using rectangles everywhere.
These days, we can query it with gsk_vulkan_render_get_context().
Makes quite a few functions require one less argument.
And it also makes the GskVulkanRenderPass empty. Gotta figure out what
to do with it.
Instead, build-depnd on glslc to build them.
glslc is available in all important distros for a while:
Fedora >= 28
Ubuntu >= 23.04
Debian >= 12
Arch
Opensuse >= 15.2
msys2
are the ones I checked.
So we can depend on it and avoid having to deal with keeping spirv files
up-to-date in all commits.
It's also 700kB of data, and not updating it helps.
We now store all the relevant state of the image inside the VulkanImage
struct, so we can delay barriers for as long as possible.
Whenever we want to use an image, we call the new
gsk_vulkan_image_transition() and it will add a barrier to the desired
state if one is necessary.
... and all the remaining functions still using it.
It's all unused and has been replaced by upload and download ops.
With this change, all GPU operations now go via GskVulkanOp.command()
and no more side channels exist.
This op queues a download of an image. The image will only be available
once the commands finished executing, so it requires waiting for the
render to finish, which makes the API a bit awkward.
Included is also a download_png_op() useful for debugging.
The render pass ops were not updating the image's layout to the final
layout when a render pass ends.
Fix that.
Also make the layouts explicit arguments to the render pass op.
Split out the function that uploads using a buffer, so that it can be
used with an area to only update parts of the image.
That feature is not used yet, but will be in future commits.
We were clowing through all the Pango caches for no benefit.
It made the test generation stuck in fontconfig loops instead of
quickly generating tests.
So don't do that and limit the different fonts to some reasonable list
of options.
If a command takes too long to execute, Vulkan drivers will think they
are inflooping and abort what they were doing.
For the simple color shader with smallish nodes, this happens around
10M instances, as tested with the output of
./tests/rendernode-create-tests 10000000 colors.node
So just limit it to way lower, so that we barely never hit it, ut still
pick a big number so this optimization stays noticable.
For small regions, the optimization doesn't matter that much, so we
don't need to do lots of work on the CPU.
In particular, this should catch icons and their backgrounds (32x32),
but I was generous in selecting the number.
Gets my discrete AMD on widget-factory back to the 1900fps it had before
this optimization while making the driver clock the GPU's shader at
1.7GHz instead of the 2.1GHz it used before.
Using clear avoids the shader engine (see last commit), so if we can get
pixels out of it, we should.
So we detect the overlap with the rounded corners of the clip region and
emit shaders for those, but then use Clear() for the rest.
With this in place, widget-factory on my integrated Intel TigerLake gets
a 60% performance boost.
The op emits a vkCmdClearAttachments() with a given color. That can be
used with color nodes that are pixel-aligned and opaque to significantly
speed up rendering when the window background is a solid color.
However, currently this fails a bit outside of fullscreen when rounded
clip rectangles are in use to draw rounded corners.