We're caching two things, either a node itself being rendered, or a
parent storing a cached version of a child as rendered to an offscreen
the size and location of the parent.
If both the parent and child uses the cache this will cause a conflict in
the cache as it is currently use keying of a node pointer which will have
the same value for the node-as-itself and the child-node-of-the-parent.
We fix this by adding another part to the key "pointer_is_child" which means
we can have the same node pointer twice in the cache.
Additionally, in the child-is-rendered-offscreen case the offscreen
result actually depends on the position and size of the parent viewport,
so we need to store the parent bounds in that case.
This allows us to avoid updating uniforms if that is not necessary. This
in turn allows us to sometimes reuse the same draw op by just extending the
vertex array size we draw rather than doing a separate glDraw call.
For example, in the fishbowl demo, all the icons added at the same
time will have the same time and size, so we emit single draw calls
with 100s of triangles instead of 100s of draw calls with 2 triangles.
For vulkan/broadway this just means to ignore it, but for the gl
backend we support (with up to 4 texture inputs, which is similar to
what shadertoy does, so should be widely supported).
Print out the full assembled shader sources when
GSK_DEBUG=shaders is given. This is very verbose,
but may be useful to see what we actually pass
to the compiler.
Almost always the source is created by combining various sources, which
means the line numbers in the error messages are hard to use. Adding
the line numbers to the source in the error message helps with this.
There is no real reason to have this on the side indexed via the
index, as it is stored next to each other anyway. Plus, storing them
together lets use use `Program` structures not in the array.
I found that the gears demo was spending 40% cpu
downloading a GL texture every frame, only to
upload it again to another context.
While the GSK rendering and the GtkGLArea use different
GL contexts, they are (usually) connected by sharing data
with the same global context, so we can just use the
texture without the download/upload dance. This brings
gears down to < 10% cpu.
Do custom uploads rather than using gdk_cairo_surface_upload_to_gl(),
because this way we avoids a roundtrip (memcpy and possibly conversion)
to the cairo image surface format.
GLES doesn't support the GL_BGRA + GL_UNSIGNED_INT_24_8 hack that
we use on desktop OpenGL to upload textures directly in the cairo
pixel format. This adds the required conversions to all the places
that currently need it.
We also add a data_format to the internal gdk_gl_context_upload_texture()
function to make it clearer what the format are. Currently it is always
the cairo image surface format, but eventually we want to support other
formats so that we can avoid some of the unnecessary conversions we do.
Also, the current gdk_gl_context_upload_texture() code always converts
to a cairo format and uploads that like we did before. Later commits
will allow this to use other upload formats that gl supports to avoid
conversions.
We need to include both the scale and the filtering
in the key for the texture cache, since those affect
the texture.
This fixes misrendering in the recorder in the inspector
whenever transforms are involved. An example where this
was showing up is testrevealer's swing transition.
When rendering to an offscreen because of transforms,
check if transforming the bounds of the node results
in a non-axis-aligned quad. If it doesn't, we want
GL_NEAREST interpolation to get sharp edges. Otherwise,
we use GL_LINEAR to get better results for things
that are actually transformed.
Track what we really need to send for inset shadows, which are used
as a border replacement in many cases.
Fishbowl says I can draw around 200-300 more switches per frame like
this too.
This fixes the widget factory rendering too much.
In the widget-factory, we generally have a pretty small update area (two
spinners and a progressbar). We take the extents of that as a update
area and inital clip.
However, the first clip node we see is from the toplevel window, which
essentially increases the clip again to almost the entire window.
Fix that by ignoring such cases.
If the inner clip intersects with the corners of the outer clip, we
potentially need a texture. We should add more fine-grained checks for
this in the future though.
Test case included.
Language bindings—especially ones based on introspection—cannot deal
with custom type hiearchies. Luckily for us, GType has a derivable type
with low overhead: GTypeInstance.
By turning GskRenderNode into a GTypeInstance, and creating derived
types for each class of node, we can provide an introspectable API to
our non-C API consumers, with no functional change to the C API itself.
Sprinkle various g_assert() around the code where gcc cannot figure out
on its own that a variable is not NULL and too much refactoring would be
needed to make it do that.
Also fix usage of g_assert_nonnull(x) to use g_assert(x) because the
first is not marked as G_GNUC_NORETURN because of course GTester
supports not aborting on aborts.
Some systems (notably macOS) will not allow enumeration of an extension that has been promoted to core OpenGL for context in use. This change assumes that GL_ARB_timer_query is available on OpenGL 3.3+.
I could not find definitive information on whether GL_ARB_debug_output or GL_KHR_debug have been added to core. Other extensions in use were addressed by https://gitlab.gnome.org/GNOME/gtk/merge_requests/1422 .
Commit 47c44644b1 was a bit overzealous in fixing
compiler warnings. We still need to call collect_textures,
even if we don't need the number that it returns.
These don't take a duration, instead they call g_get_monotonic_time() to
and subtract the start time for it.
Almost all our calls are like this, and this makes the callsites clearer
and avoids inlining the clock call into the call site.
When we use if (GDK_PROFILER_IS_RUNNING) this means we get an
inlined if (FALSE) when the compiler support is not compiled in, which
gets rid of all the related code completely.
We also expand to G_UNLIKELY(gdk_profiler_is_running ()) in the supported
case which might cause somewhat better code generation.
usec is the scale of the monotonic timer which is where we get almost
all the times from. The only actual source of nsec is the opengl
GPU time (but who knows what the actual resulution of that is).
Changing this to usec allows us to get rid of " * 1000" in a *lot* of
places all over the codebase, which are ugly and confusing.
This is similar to how we share texture atlases. Some added complexity
in that the program state also needed to be shared, so it had to move to
the shared Programs object.
With this change realization of additional GskRenderers when opening
popups went from ~60msec to ~35 msec on average.
When rendering ops to an offscreen texture we take max-texture-size
in consideration and modify the scale we use such that the required
texture does not exceed the limit.
This means some rendering will be blocky/fuzzy, but that is better
than it being clipped.
It would probably be better to not do this and always render the outline
in plain white, then later recolor it but do this for no, just for
correctness.
Instead of loading the unflipped version first and then flipping it.
Don't do it in add_render_ops either but only in the function actually
adding the render ops for the nodes, since those frequently have
early-out conditions that don't need the vertex data at all.