This fixes two issues with the offscreen rendering code for nodes with
bounds not aligned with the pixel grid:
1.) When drawing to an offscreen buffer the size of the offscreen buffer
was rounded up, but then later when used as texture the vertices
correspond to the original bounds with the unrounded size. This could
then result in the offscreen texture being drawn onscreen at a slightly
smaller size, which then lead to it being visually shifted and blurry.
This is fixed by adjusting the u/v coordinates to ignore the padding
region in the offscreen texture that got added by the size increase from
rounding.
2.) The viewport used when rendering to the offscreen buffer was not
aligned with the pixel grid for nodes at coordinates not aligned with
the pixel grid. Then because the content of the offscreen buffer is not
aligned with the pixel grid and later when used as textures sampling
from it will result in interpolated values for an onscreen pixel. This
could also result in shifting and blurriness, especially for nested
offscreen rendering at different offsets.
This is fixed by adding similar padding at the beginning of the
texture and also adjusting the u/v coordinates to ignore this region.
Fixes: https://gitlab.gnome.org/GNOME/gtk/-/issues/3833
This moves a lot of the texture atlas control out of the driver and into
the various texture libraries through their base GskGLTextureLibrary class.
Additionally, this gives more control to libraries on allocating which can
be necessary for some tooling such as future Glyphy integration.
As part of this, the 1x1 pixel initialization is moved to the Glyph library
which is the only place where it is actually needed.
The compact vfunc now is responsible for compaction and it allows for us
to iterate the atlas hashtable a single time instead of twice as we were
doing previously.
The init_atlas vfunc is used to do per-library initialization such as
adding a 1x1 pixel in the Glyph cache used for coloring lines.
The allocate vfunc purely allocates but does no upload. This can be useful
for situations where a library wants to reuse the allocator from the
base class but does not want to actually insert a key/value entry. The
glyph library uses this for it's 1x1 pixel.
In the future, we will also likely want to decouple the rectangle packing
implementation from the atlas structure, or at least move it into a union
so that we do not allocate unused memory for alternate allocators.
This removes the sharing of atlases across various texture libraries. Doing
so is necessary so that atlases can have different semantics for how they
allocate within the texture as well as potentially allowing for different
formats of texture data.
For example, in the future we might store non-pixel data in the textures
such as Glyphy or even keep glyphs with color content separate from glyphs
which do not and can use alpha channel only.
This allows the gskglprograms.defs a bit more control over how a shader
will get generated and if it needs to combine sources. Currently, none of
the built-in shaders do that, but upcoming shaders which come from external
libraries will need the ability to inject additional sources in-between
layers.
If the max_entry_size is zero, then assume we can add anything to the
atlas. This allows for situations where we might be uploading an arc list
to the atlas instead of pixel data for GPU font rendering.
When large viewports are passed to gsk_renderer_render_texture(), don't
fail (or even return NULL).
Instead, draw multiple tiles and assemble them into a memory texture.
Tests added to the testsuite for this.
There are situations where our "default framebuffer" is not actually
zero, yet we still want to apply a scissor rect.
Generally, 0 is the default framebuffer. But on platforms where we need
to bind a platform-specific feature to a GL_FRAMEBUFFER, we might have a
default that is not 0. For example, on macOS we bind an IOSurfaceRef to
a GL_TEXTURE_RECTANGLE which then is assigned as the backing store for a
framebuffer. This is different than using gsk_gl_renderer_render_texture()
in that we don't want to incur an extra copy to the destination surface
nor do we even have a way to pass a texture_id into render_texture().
If the rendering operation is over an opaque region, we can potentially
avoid clearing a large section of the framebuffer destination. Some cases
you do want to clear, such as when clearing the whole contents as some
drivers have fast paths for that to avoid bringing data back into the
framebuffer.
Instead of just passing major/minor, pass them twice, once for GL and
once for GLES. This way, we don't need to check for GL and GLES
separately.
If something is supported unconditionally, passing 0/0 works fine.
That said, I'd like to group the arguments somehow, because otherwise
it's just a confusing list of numbers - but I have no idea how to do
that.
For libANGLE to work with our shaders, we must use "300 es" for
the #version directive in our shaders, as well as using the non-legacy/
non-GLES codepath in the shaders. In order to check whether we are
using the GLSL 300 es shaders, we check whether we are using a GLES 3.0+
context. As a result, make ->glsl_version a const char* and make sure
the existing shader version macros are defined apprpriately, and add a
new macro for the "300 es" shader version string.
This will allow the gtk4 programs to run under Windows using EGL via
libANGLE. Some of the GL demos won't work for now, but at least this
makes things a lot better for using GL-accelerated graphics under Windows
for those that want to or need to use libANGLE (such as those with
graphics drivers that aren't capable of our Desktop (W)GL requirements in
GTK.
On Windows with nVidia drivers at least, when we create a legacy context
via wglCreateContext(), we may still get a (W)GL 4.x context. Allow
such contexts to also use GLSL version 130 instead of 110, so that
things do continue to work.
But don't call it too early, we only want to call it once we have
prepared the target.
This way, we guarantee that a GL context is always available and that it
is bound to the correct target.
Don't pass texture + rect, but instead have
gdk_memory_texture_new_subtexture()
and use it to generate subtextures and pass them.
This has the advantage of downloading the a too large texture only once
instead of N times.
It does not belong in GdkGLContext, it's a renderer thing.
It's also the only user of that API.
Introduce gdk_gl_context_check_version() private API to make version
checks simpler.
It turns out glReadPixels() cannot convert pixels and you are only
allowed to pass a single value into the function arguments. You need to
know which ones or things will explode.
GL is great.
Pass a format do GdkTextureClass::download(). That way we can download
data in any format.
Also replace gdk_texture_download_texture() with
gdk_memory_texture_from_texture() which again takes a format.
The old functionality is still there for code that wants it: Just pass
gdk_texture_get_format (texture) as the format argument.
Move the resources of each renderer to its subdirectory.
We've previously done that for the ngl renderer, but it
is better to be consistent and do it for all the renderers.
When we are rendering a texture node to an offscreen,
and we have a clip, we must force the offscreen rendering.
Otherwise, the code will notice: Hey, it already is a texture
node, so no need to render it to a texture again. But when
clipping is involved, that is exactly what we want to do.
Testcase included.
Fixes: #3651
According to OpenGL spec, a shader object will only be flagged
for deletion unless it has been detached; when a program object
is deleted, those shader objects attached to it will be detached
but not deleted unless they have already been flagged for deletion.
So we shall detach a shader object before it is deleted, and delete
it before the program object is deleted best.
This way we can render the first frame of tests/testoutsetshadowdrawing
in 153 ops instead of 183.
And the first frame of gtk4-demo in 260 instead of 300.
These positions are not guaranteed to be in a specific order when linked
into the final GPU program. They need to be specified so that our code
in gskglrenderer.c can use known positions for them to match up with
our GskQuadVertex.
This fixes the GL renderer on macOS's OpenGL shader compiler.
Fixes#3420
Catch the error when it happens, so that we can emit a specific and more
helpful error message.
Also verify that all branches in the code now do indeed set a proper
GError when they fail, so that the final catch-all is no longer needed.
Instead, assert that the error is set so that we catch future code
additions early that do not set the GError.
glFrameBufferTexture maps to all faces of a cube and that is not needed
here. Additionally, texture_id is not deleted after we use the additional
flipped texture, but should be.
On desktop GL, GL 1.5 or GL_ARB_occlusion_query is required to get the
glGenQueries() etc. symbols. This isn’t the case on GLES, where they
are provided by GL_EXT_occlusion_query_boolean, and more importantly
have never been made core.
This patch allows gtk4-demo to start when GDK_DEBUG=gl-gles is set, on
my Mali 400 MP running the Lima driver from Mesa.
We're caching two things, either a node itself being rendered, or a
parent storing a cached version of a child as rendered to an offscreen
the size and location of the parent.
If both the parent and child uses the cache this will cause a conflict in
the cache as it is currently use keying of a node pointer which will have
the same value for the node-as-itself and the child-node-of-the-parent.
We fix this by adding another part to the key "pointer_is_child" which means
we can have the same node pointer twice in the cache.
Additionally, in the child-is-rendered-offscreen case the offscreen
result actually depends on the position and size of the parent viewport,
so we need to store the parent bounds in that case.
This allows us to avoid updating uniforms if that is not necessary. This
in turn allows us to sometimes reuse the same draw op by just extending the
vertex array size we draw rather than doing a separate glDraw call.
For example, in the fishbowl demo, all the icons added at the same
time will have the same time and size, so we emit single draw calls
with 100s of triangles instead of 100s of draw calls with 2 triangles.
For vulkan/broadway this just means to ignore it, but for the gl
backend we support (with up to 4 texture inputs, which is similar to
what shadertoy does, so should be widely supported).
Print out the full assembled shader sources when
GSK_DEBUG=shaders is given. This is very verbose,
but may be useful to see what we actually pass
to the compiler.
Almost always the source is created by combining various sources, which
means the line numbers in the error messages are hard to use. Adding
the line numbers to the source in the error message helps with this.
There is no real reason to have this on the side indexed via the
index, as it is stored next to each other anyway. Plus, storing them
together lets use use `Program` structures not in the array.
I found that the gears demo was spending 40% cpu
downloading a GL texture every frame, only to
upload it again to another context.
While the GSK rendering and the GtkGLArea use different
GL contexts, they are (usually) connected by sharing data
with the same global context, so we can just use the
texture without the download/upload dance. This brings
gears down to < 10% cpu.
Do custom uploads rather than using gdk_cairo_surface_upload_to_gl(),
because this way we avoids a roundtrip (memcpy and possibly conversion)
to the cairo image surface format.
GLES doesn't support the GL_BGRA + GL_UNSIGNED_INT_24_8 hack that
we use on desktop OpenGL to upload textures directly in the cairo
pixel format. This adds the required conversions to all the places
that currently need it.
We also add a data_format to the internal gdk_gl_context_upload_texture()
function to make it clearer what the format are. Currently it is always
the cairo image surface format, but eventually we want to support other
formats so that we can avoid some of the unnecessary conversions we do.
Also, the current gdk_gl_context_upload_texture() code always converts
to a cairo format and uploads that like we did before. Later commits
will allow this to use other upload formats that gl supports to avoid
conversions.
We need to include both the scale and the filtering
in the key for the texture cache, since those affect
the texture.
This fixes misrendering in the recorder in the inspector
whenever transforms are involved. An example where this
was showing up is testrevealer's swing transition.
When rendering to an offscreen because of transforms,
check if transforming the bounds of the node results
in a non-axis-aligned quad. If it doesn't, we want
GL_NEAREST interpolation to get sharp edges. Otherwise,
we use GL_LINEAR to get better results for things
that are actually transformed.
Track what we really need to send for inset shadows, which are used
as a border replacement in many cases.
Fishbowl says I can draw around 200-300 more switches per frame like
this too.
This fixes the widget factory rendering too much.
In the widget-factory, we generally have a pretty small update area (two
spinners and a progressbar). We take the extents of that as a update
area and inital clip.
However, the first clip node we see is from the toplevel window, which
essentially increases the clip again to almost the entire window.
Fix that by ignoring such cases.
If the inner clip intersects with the corners of the outer clip, we
potentially need a texture. We should add more fine-grained checks for
this in the future though.
Test case included.
Language bindings—especially ones based on introspection—cannot deal
with custom type hiearchies. Luckily for us, GType has a derivable type
with low overhead: GTypeInstance.
By turning GskRenderNode into a GTypeInstance, and creating derived
types for each class of node, we can provide an introspectable API to
our non-C API consumers, with no functional change to the C API itself.
Sprinkle various g_assert() around the code where gcc cannot figure out
on its own that a variable is not NULL and too much refactoring would be
needed to make it do that.
Also fix usage of g_assert_nonnull(x) to use g_assert(x) because the
first is not marked as G_GNUC_NORETURN because of course GTester
supports not aborting on aborts.
Some systems (notably macOS) will not allow enumeration of an extension that has been promoted to core OpenGL for context in use. This change assumes that GL_ARB_timer_query is available on OpenGL 3.3+.
I could not find definitive information on whether GL_ARB_debug_output or GL_KHR_debug have been added to core. Other extensions in use were addressed by https://gitlab.gnome.org/GNOME/gtk/merge_requests/1422 .
Commit 47c44644b1 was a bit overzealous in fixing
compiler warnings. We still need to call collect_textures,
even if we don't need the number that it returns.
These don't take a duration, instead they call g_get_monotonic_time() to
and subtract the start time for it.
Almost all our calls are like this, and this makes the callsites clearer
and avoids inlining the clock call into the call site.
When we use if (GDK_PROFILER_IS_RUNNING) this means we get an
inlined if (FALSE) when the compiler support is not compiled in, which
gets rid of all the related code completely.
We also expand to G_UNLIKELY(gdk_profiler_is_running ()) in the supported
case which might cause somewhat better code generation.
usec is the scale of the monotonic timer which is where we get almost
all the times from. The only actual source of nsec is the opengl
GPU time (but who knows what the actual resulution of that is).
Changing this to usec allows us to get rid of " * 1000" in a *lot* of
places all over the codebase, which are ugly and confusing.
This is similar to how we share texture atlases. Some added complexity
in that the program state also needed to be shared, so it had to move to
the shared Programs object.
With this change realization of additional GskRenderers when opening
popups went from ~60msec to ~35 msec on average.
When rendering ops to an offscreen texture we take max-texture-size
in consideration and modify the scale we use such that the required
texture does not exceed the limit.
This means some rendering will be blocky/fuzzy, but that is better
than it being clipped.
It would probably be better to not do this and always render the outline
in plain white, then later recolor it but do this for no, just for
correctness.